Difference between revisions of "Hadoop Kelvin"

From Lawa
Jump to: navigation, search
Line 1: Line 1:
 +
<big>'''Hadoop Kelvin'''</big>
 +
 
Hadoop Kelvin is a network monitoring system designed for the Hadoop Map-Reduce
 
Hadoop Kelvin is a network monitoring system designed for the Hadoop Map-Reduce
 
framework. It monitors data (not control) traffic between Hadoop nodes and provides
 
framework. It monitors data (not control) traffic between Hadoop nodes and provides
 
the basis for multiple ways to store and access the stored monitoring data (the current implementation provides for log-based storage). It is designed to be easily extensible, flexible and to operate with a minimal effect on the running time of
 
the basis for multiple ways to store and access the stored monitoring data (the current implementation provides for log-based storage). It is designed to be easily extensible, flexible and to operate with a minimal effect on the running time of
 
Hadoop jobs.
 
Hadoop jobs.
 +
 +
Method:
 +
Hadoop Kelvin collects data about the following data transfers:
 +
• HDFS reads (regardless of who is performing the read).
 +
• HDFS writes (regardless of who is the origin of the data).
 +
• Data transfers between Mappers and Reducers during a Map-Reduce job execution.
 +
 +
The data collected about each transfer includes:
 +
• Source machine.
 +
• Destination machine.
 +
• Starting timestamp.
 +
• Duration of transfer in milliseconds.
 +
• Size of the transferred data, in bytes.
  
 
* [https://www.cs.huji.ac.il/wikis/MediaWiki/lawa/index.php/Hook_Points Hadoop HUJI: Measurement hook-points.]
 
* [https://www.cs.huji.ac.il/wikis/MediaWiki/lawa/index.php/Hook_Points Hadoop HUJI: Measurement hook-points.]
 
* [https://www.cs.huji.ac.il/wikis/MediaWiki/lawa/index.php/Scheduler_Hook_Points Hadoop HUJI: Scheduler hook-points]
 
* [https://www.cs.huji.ac.il/wikis/MediaWiki/lawa/index.php/Scheduler_Hook_Points Hadoop HUJI: Scheduler hook-points]

Revision as of 13:43, 19 July 2013

Hadoop Kelvin

Hadoop Kelvin is a network monitoring system designed for the Hadoop Map-Reduce framework. It monitors data (not control) traffic between Hadoop nodes and provides the basis for multiple ways to store and access the stored monitoring data (the current implementation provides for log-based storage). It is designed to be easily extensible, flexible and to operate with a minimal effect on the running time of Hadoop jobs.

Method: Hadoop Kelvin collects data about the following data transfers: • HDFS reads (regardless of who is performing the read). • HDFS writes (regardless of who is the origin of the data). • Data transfers between Mappers and Reducers during a Map-Reduce job execution.

The data collected about each transfer includes: • Source machine. • Destination machine. • Starting timestamp. • Duration of transfer in milliseconds. • Size of the transferred data, in bytes.