Hadoop Kelvin

From Lawa
Revision as of 13:44, 19 July 2013 by Direwolf007 (Talk | contribs)

Jump to: navigation, search

Hadoop Kelvin

Hadoop Kelvin is a network monitoring system designed for the Hadoop Map-Reduce framework. It monitors data (not control) traffic between Hadoop nodes and provides the basis for multiple ways to store and access the stored monitoring data (the current implementation provides for log-based storage). It is designed to be easily extensible, flexible and to operate with a minimal effect on the running time of Hadoop jobs.

Method: Hadoop Kelvin collects data about the following data transfers:

• HDFS reads (regardless of who is performing the read).

• HDFS writes (regardless of who is the origin of the data).

• Data transfers between Mappers and Reducers during a Map-Reduce job execution.


The data collected about each transfer includes:

• Source machine.

• Destination machine.

• Starting timestamp.

• Duration of transfer in milliseconds.

• Size of the transferred data, in bytes.