Difference between revisions of "Kelvin Architecture"

From Lawa
Jump to: navigation, search
(Created page with '== High-Level Overview == There are two main parts in Hadoop Kelvin: These are the Statistics Server and the Statistics Client. The Statistics Server is a program which runs o…')
 
 
(3 intermediate revisions by one user not shown)
Line 5: Line 5:
 
The Statistics Server is a program which runs on a single machine in the cluster (typically one of the master machines in the cluster if a single Statistics Server is present. Alternatively a subset of slave machines can be used if several such servers are required, or each machine can run its own server for the tasks that run on it) and serves as a sink for all the traffic reports arriving from the cluster nodes. The server operates a set of user-configurable (via XML) data storers (which are write-only), data retrievers (which are read-only) and data manipulators (which provide read-and-write access) to which measurement data is stored and from which queries about past measurement data are completed. Currently, Hadoop Kelvin provides a Log-based information store which stores all traffic reports in plaintext form via a Log4J logger. The protocol all Hadoop Kelvin traffic uses is HTTP.
 
The Statistics Server is a program which runs on a single machine in the cluster (typically one of the master machines in the cluster if a single Statistics Server is present. Alternatively a subset of slave machines can be used if several such servers are required, or each machine can run its own server for the tasks that run on it) and serves as a sink for all the traffic reports arriving from the cluster nodes. The server operates a set of user-configurable (via XML) data storers (which are write-only), data retrievers (which are read-only) and data manipulators (which provide read-and-write access) to which measurement data is stored and from which queries about past measurement data are completed. Currently, Hadoop Kelvin provides a Log-based information store which stores all traffic reports in plaintext form via a Log4J logger. The protocol all Hadoop Kelvin traffic uses is HTTP.
  
== Data Storers, Data Retrievers and Data Manipulators Why Hadoop Kelvin is More than a
+
== Data Storers, Data Retrievers and Data Manipulators: Why Hadoop Kelvin is (potentially) more than a Logger ==
Logger ==
+
  
As briefly described above, the system incorporates the notions of a Data Storer, a Data
+
As briefly described above, the system incorporates the notions of a Data Storer, a Data Retriever and a Data Manipulator. We refer to them all as Data Handlers. The first two define a Java Interface which can be implemented by anyone seeking to expand upon the functionality of Kelvin, while the latter is simply an entity implementing both these interfaces at once. The addition of extra such elements does not require the recompilation of Hadoop (they just need to be located in a JAR file which is located on the classpath and
Retriever and a Data Manipulator, we refer to them all as Data Handlers. The first two define
+
need to be enabled in the XML configuration files), but it does require a re-start of the statistic server(s) to load the classes configured in the XML configuration files. The current Kelvin implementation supplies one Data Storer (LogStatisticStore). The LogStatisticStore logs all traffic reports to a Log4J log file. This is the simplest form of a Data Storer, and should be mainly used for debugging or research purposes (we used it for the latter). The log files have a tendency to grow very large rather quickly, so it is not suited for long-term, constant deployment in a production environment. It is easy to implement the interfaces we specified for additional storers and/or retrievers, with the most obvious example being an SQL database.
a Java Interface which can be implemented by anyone seeking to expand upon the
+
 
functionality of Kelvin, while the latter is simply an entity implementing both these
+
== Packets and Queries – Submitting and Requesting Information to/from Kelvin: ==
interfaces at once. The addition of extra such elements does not require the recompilation
+
The communication with Kelvin is done via two serializable Java types. Objects inheriting the
of Hadoop (they just need to be located in a JAR file which is located on the classpath and
+
StatisticQuery abstract class can be sent via the Statistic Client to the Statistic Server in order
need to be enabled in the XML configuration files), but it does require a re-start of the
+
to obtain a response from a retriever (the specific response depends on the data retriever
statistic server(s).
+
the query is addressed to). These queries allow access to the data stored within Kelvin. A
The default Kelvin implementation supplies one Data Storer (LogStatisticStore) and one Data
+
specific type of query is created for each target retriever (and more than one is possible per
Manipulator (H2DBManipulator) which is also a Storer. The LogStatisticStore logs all traffic
+
retriever). After being processed by the target Retriever, the data is sent back to the client.
reports to a log4j log file. This is the simplest form of a Data Storer, and should be mainly
+
 
used for debugging or research purposes. The log files have a tendency to grow very large
+
Objects inheriting the StatisticPacket abstract class can be also sent via the Statistic Client to
rather quickly, so it is not suited for long-term, constant deployment in a production
+
the Statistic Server in order to store information in all the storers which support the specific
environment. The H2DBManipulator stores the traffic reports into a H2 (SQL) database. This
+
packet, as opposed to the queries which are targeted specifically for a target retriever.
database provides the basic building block for the future Hadoop scheduler as it allows other
+
code to access the traffic reports collected over a period of time.
+
The existence of the SQL database, and the extensibility of Kelvin set it apart from a simple
+
logger interface (it in fact contains just one such logger as a default).
+

Latest revision as of 13:50, 19 July 2013

High-Level Overview

There are two main parts in Hadoop Kelvin: These are the Statistics Server and the Statistics Client.

The Statistics Server is a program which runs on a single machine in the cluster (typically one of the master machines in the cluster if a single Statistics Server is present. Alternatively a subset of slave machines can be used if several such servers are required, or each machine can run its own server for the tasks that run on it) and serves as a sink for all the traffic reports arriving from the cluster nodes. The server operates a set of user-configurable (via XML) data storers (which are write-only), data retrievers (which are read-only) and data manipulators (which provide read-and-write access) to which measurement data is stored and from which queries about past measurement data are completed. Currently, Hadoop Kelvin provides a Log-based information store which stores all traffic reports in plaintext form via a Log4J logger. The protocol all Hadoop Kelvin traffic uses is HTTP.

Data Storers, Data Retrievers and Data Manipulators: Why Hadoop Kelvin is (potentially) more than a Logger

As briefly described above, the system incorporates the notions of a Data Storer, a Data Retriever and a Data Manipulator. We refer to them all as Data Handlers. The first two define a Java Interface which can be implemented by anyone seeking to expand upon the functionality of Kelvin, while the latter is simply an entity implementing both these interfaces at once. The addition of extra such elements does not require the recompilation of Hadoop (they just need to be located in a JAR file which is located on the classpath and need to be enabled in the XML configuration files), but it does require a re-start of the statistic server(s) to load the classes configured in the XML configuration files. The current Kelvin implementation supplies one Data Storer (LogStatisticStore). The LogStatisticStore logs all traffic reports to a Log4J log file. This is the simplest form of a Data Storer, and should be mainly used for debugging or research purposes (we used it for the latter). The log files have a tendency to grow very large rather quickly, so it is not suited for long-term, constant deployment in a production environment. It is easy to implement the interfaces we specified for additional storers and/or retrievers, with the most obvious example being an SQL database.

Packets and Queries – Submitting and Requesting Information to/from Kelvin:

The communication with Kelvin is done via two serializable Java types. Objects inheriting the StatisticQuery abstract class can be sent via the Statistic Client to the Statistic Server in order to obtain a response from a retriever (the specific response depends on the data retriever the query is addressed to). These queries allow access to the data stored within Kelvin. A specific type of query is created for each target retriever (and more than one is possible per retriever). After being processed by the target Retriever, the data is sent back to the client.

Objects inheriting the StatisticPacket abstract class can be also sent via the Statistic Client to the Statistic Server in order to store information in all the storers which support the specific packet, as opposed to the queries which are targeted specifically for a target retriever.