Difference between revisions of "Building Hadoop"

From Lawa
Jump to: navigation, search
(Created page with 'The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the…')
 
Line 1: Line 1:
 
The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).
 
The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).
 +
 +
----
  
 
First of all, you will need the handy build scripts located in this tar file:
 
First of all, you will need the handy build scripts located in this tar file:
 
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar
 
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar
 
These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form).
 
These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form).
 +
 +
----
  
 
Follow the instructions:
 
Follow the instructions:
Line 28: Line 32:
  
 
5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:
 
5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:
add-apt-repository "deb http://archive.canonical.com/ lucid partner"
+
<nowiki>add-apt-repository "deb http://archive.canonical.com/ lucid partner"</nowiki>
 
apt-get update
 
apt-get update
 
sudo apt-get install sun-java6-jdk sun-java6-jre
 
sudo apt-get install sun-java6-jdk sun-java6-jre
Line 53: Line 57:
 
+ Find the "init" target (search for: target name="init").  
 
+ Find the "init" target (search for: target name="init").  
 
+ In this target, find the tag:
 
+ In this target, find the tag:
<unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar"
+
<nowiki><unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar"
 
         dest="${build.dir}">
 
         dest="${build.dir}">
 
Finally, change ${hadoop-hdfs.version} to ${version}.
 
Finally, change ${hadoop-hdfs.version} to ${version}.
 
+
</nowiki>
 
12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.
 
12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.
  
 
Good Luck!
 
Good Luck!

Revision as of 15:33, 26 October 2010

The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).


First of all, you will need the handy build scripts located in this tar file: https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form).


Follow the instructions: 1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop projects in the following fashion: hadoop-common hadoop-hdfs hadoop-mapreduce

For example, I have: ~/hadoop-workspace/hadoop-common ~/hadoop-workspace/hadoop-hdfs ~/hadoop-workspace/hadoop-mapreduce With the build scripts located in ~/hadoop-workspace

2) zlib must be installed. You can get it (in Ubuntu 10.4) by running: sudo apt-get install zlib1g zlib1g-dev

3) You need svn to be installed. You can get it (in Ubuntu 10.4) by running: sudo apt-get install subversion

4) You need g++. You can get it (in Ubuntu 10.4) by running: sudo apt-get install g++

5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running: add-apt-repository "deb http://archive.canonical.com/ lucid partner" apt-get update sudo apt-get install sun-java6-jdk sun-java6-jre

6) You need Forrest, download a release from here: http://forrest.apache.org/mirrors.cgi#how The location where it is extracted will be referenced from the script as FORREST_HOME.

7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.

8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here: http://xerces.apache.org/xerces-c/download.cgi Build/Install it according to the instructions here: http://xerces.apache.org/xerces-c/build-2.html You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME.

9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths.

10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in: ${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).

11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work: + Go to your hadoop-mapreduce directory and edit build.xml. + Find the "init" target (search for: target name="init"). + In this target, find the tag: <unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" dest="${build.dir}"> Finally, change ${hadoop-hdfs.version} to ${version}. 12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.

Good Luck!