Building Hadoop

From Lawa
Revision as of 11:35, 12 November 2010 by Direwolf007 (Talk | contribs)

Jump to: navigation, search

The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).


First of all, you will need the build scripts located in this tar file:

https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar

These are heavily modified versions of the "official" 0.21.x build scripts I took from one of the Haddop forums (the ones which they apparently used to release 0.21.0, and which do not quite work in their plain form).


Follow these instructions:


1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop directories in the following fashion:

hadoop-common

hadoop-hdfs

hadoop-mapreduce

These can be found at the following SVN repository locations (links are to the 0.21 release tag):

Common: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/

HDFS: http://svn.apache.org/repos/asf/hadoop/hdfs/tags/release-0.21.0/

MapReduce: http://svn.apache.org/repos/asf/hadoop/mapreduce/tags/release-0.21.0/


For example, I have:

~/hadoop-workspace/hadoop-common

~/hadoop-workspace/hadoop-hdfs

~/hadoop-workspace/hadoop-mapreduce

With the build scripts located in ~/hadoop-workspace


2) zlib must be installed. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install zlib1g zlib1g-dev


3) You need (the command-line) svn to be installed. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install subversion


4) You need g++. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install g++


5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:

add-apt-repository "deb http://archive.canonical.com/ lucid partner"

apt-get update

sudo apt-get install sun-java6-jdk sun-java6-jre


6) You need Forrest, download a release from here:

http://forrest.apache.org/mirrors.cgi#how

The location where it is extracted will be referenced from the script as FORREST_HOME.


7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.


8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here:

http://xerces.apache.org/xerces-c/download.cgi

Build/Install it according to the instructions here:

http://xerces.apache.org/xerces-c/build-2.html

You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME.


9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. 64-bit build mode is supported, but since my Ubuntu system is a 32-bit machine, I cannot test it.


10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in:

${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).


11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work:

+ Go to your hadoop-mapreduce directory and edit build.xml.

+ Find the "init" target (search for: target name="init").

+ In this target, find the tag:

<unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" dest="${build.dir}"> Finally, change ${hadoop-hdfs.version} to ${version}.


12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.


Good Luck, and May the source be with you.....Always!