Difference between revisions of "Kelvin Architecture"

From Lawa
Jump to: navigation, search
 
Line 17: Line 17:
 
specific type of query is created for each target retriever (and more than one is possible per
 
specific type of query is created for each target retriever (and more than one is possible per
 
retriever). After being processed by the target Retriever, the data is sent back to the client.
 
retriever). After being processed by the target Retriever, the data is sent back to the client.
 +
 +
Objects inheriting the StatisticPacket abstract class can be also sent via the Statistic Client to
 +
the Statistic Server in order to store information in all the storers which support the specific
 +
packet, as opposed to the queries which are targeted specifically for a target retriever.

Latest revision as of 14:50, 19 July 2013

High-Level Overview

There are two main parts in Hadoop Kelvin: These are the Statistics Server and the Statistics Client.

The Statistics Server is a program which runs on a single machine in the cluster (typically one of the master machines in the cluster if a single Statistics Server is present. Alternatively a subset of slave machines can be used if several such servers are required, or each machine can run its own server for the tasks that run on it) and serves as a sink for all the traffic reports arriving from the cluster nodes. The server operates a set of user-configurable (via XML) data storers (which are write-only), data retrievers (which are read-only) and data manipulators (which provide read-and-write access) to which measurement data is stored and from which queries about past measurement data are completed. Currently, Hadoop Kelvin provides a Log-based information store which stores all traffic reports in plaintext form via a Log4J logger. The protocol all Hadoop Kelvin traffic uses is HTTP.

Data Storers, Data Retrievers and Data Manipulators: Why Hadoop Kelvin is (potentially) more than a Logger

As briefly described above, the system incorporates the notions of a Data Storer, a Data Retriever and a Data Manipulator. We refer to them all as Data Handlers. The first two define a Java Interface which can be implemented by anyone seeking to expand upon the functionality of Kelvin, while the latter is simply an entity implementing both these interfaces at once. The addition of extra such elements does not require the recompilation of Hadoop (they just need to be located in a JAR file which is located on the classpath and need to be enabled in the XML configuration files), but it does require a re-start of the statistic server(s) to load the classes configured in the XML configuration files. The current Kelvin implementation supplies one Data Storer (LogStatisticStore). The LogStatisticStore logs all traffic reports to a Log4J log file. This is the simplest form of a Data Storer, and should be mainly used for debugging or research purposes (we used it for the latter). The log files have a tendency to grow very large rather quickly, so it is not suited for long-term, constant deployment in a production environment. It is easy to implement the interfaces we specified for additional storers and/or retrievers, with the most obvious example being an SQL database.

Packets and Queries – Submitting and Requesting Information to/from Kelvin:

The communication with Kelvin is done via two serializable Java types. Objects inheriting the StatisticQuery abstract class can be sent via the Statistic Client to the Statistic Server in order to obtain a response from a retriever (the specific response depends on the data retriever the query is addressed to). These queries allow access to the data stored within Kelvin. A specific type of query is created for each target retriever (and more than one is possible per retriever). After being processed by the target Retriever, the data is sent back to the client.

Objects inheriting the StatisticPacket abstract class can be also sent via the Statistic Client to the Statistic Server in order to store information in all the storers which support the specific packet, as opposed to the queries which are targeted specifically for a target retriever.