Hadoop HUJI: Measurement Hook-Points
From Lawa
Hadoop 0.21 Hook Points
1. Mapper Input:
org.apache.hadoop.hdfs.BlockReader.readChunk()
2. Reducer Input:
org.apache.hadoop.mapreduce.task.reduce.Fetcher.shuffleToMemory() org.apache.hadoop.mapreduce.task.reduce.Fetcher.shuffleToDisk()
3. HDFS Writes:
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock()
Hadoop 0.20.2 (CDH3) Hook Points
1. Mapper Input
hdfs.org.apache.hadoop.hdfs.DFSClient.BlockReader.readChunk()
Instrumentation of the input stream is in the factory method public static BlockReader newBlockReader (Same as Hadoop 0.21 )
2. Reducer Input:
mapred.org.apache.hadoop.mapred.ReduceTask.ReduceCopier.MapOutputCopier.shuffleInMemory() mapred.org.apache.hadoop.mapred.ReduceTask.ReduceCopier.MapOutputCopier.shuffleToDisk()
3. HDFS Writes:
hdfs.org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock()
The following diagram helps illustrate the locations of these hook-points in the execution flow of the job: