Difference between revisions of "Building Hadoop"

From Lawa
Jump to: navigation, search
 
(7 intermediate revisions by one user not shown)
Line 1: Line 1:
The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).
+
'''The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).'''
  
 
----
 
----
Line 7: Line 7:
 
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar
 
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar
  
These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form).
+
These are heavily modified versions of the "official" 0.21.x build scripts I took from one of the Haddop forums (the ones which they apparently used to release 0.21.0, and which do not quite work in their plain form).
  
 
----
 
----
Line 14: Line 14:
  
  
1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop directories in the following fashion:
+
'''1)''' First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop directories in the following fashion:
  
 
hadoop-common
 
hadoop-common
Line 43: Line 43:
  
  
2) zlib must be installed. You can get it (in Ubuntu 10.4) by running:
+
'''2)''' zlib must be installed. You can get it (in Ubuntu 10.4) by running:
  
 
sudo apt-get install zlib1g zlib1g-dev
 
sudo apt-get install zlib1g zlib1g-dev
  
  
3) You need (the command-line) svn to be installed. You can get it (in Ubuntu 10.4) by running:
+
'''3)''' You need (the command-line) svn to be installed. You can get it (in Ubuntu 10.4) by running:
  
 
sudo apt-get install subversion
 
sudo apt-get install subversion
  
  
4) You need g++. You can get it (in Ubuntu 10.4) by running:
+
'''4)''' You need g++. You can get it (in Ubuntu 10.4) by running:
  
 
sudo apt-get install g++
 
sudo apt-get install g++
  
  
5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:
+
'''5)''' You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:
  
<nowiki>add-apt-repository "deb http://archive.canonical.com/ lucid partner"
+
<nowiki>sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"</nowiki>
apt-get update
+
sudo apt-get install sun-java6-jdk sun-java6-jre</nowiki>
+
  
 +
<nowiki>sudo apt-get update</nowiki>
  
6) You need Forrest, download a release from here:
+
<nowiki>sudo apt-get install sun-java6-jdk sun-java6-jre</nowiki>
 +
 
 +
 
 +
'''6)''' You need Forrest, download a release from here:
  
 
http://forrest.apache.org/mirrors.cgi#how
 
http://forrest.apache.org/mirrors.cgi#how
Line 72: Line 74:
  
  
7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.
+
'''7)''' You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.
  
  
8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here:
+
'''8)''' You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here:
  
 
http://xerces.apache.org/xerces-c/download.cgi
 
http://xerces.apache.org/xerces-c/download.cgi
Line 86: Line 88:
  
  
9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. 64-bit build mode is supported, but since my Ubuntu system is a 32-bit machine, I cannot test it.
+
'''9)''' Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. 64-bit build mode is supported, but since my Ubuntu system is a 32-bit machine, I cannot test it.
  
  
10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in:
+
'''10)''' The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in:
  
 
${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).
 
${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).
  
  
11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work:  
+
'''11)''' There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work:  
  
 
+ Go to your hadoop-mapreduce directory and edit build.xml.  
 
+ Go to your hadoop-mapreduce directory and edit build.xml.  
Line 109: Line 111:
  
  
12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.
+
'''12)''' Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.
  
  
Good Luck!
+
'''Good Luck, and May the source be with you.....Always!'''

Latest revision as of 11:36, 12 November 2010

The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).


First of all, you will need the build scripts located in this tar file:

https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar

These are heavily modified versions of the "official" 0.21.x build scripts I took from one of the Haddop forums (the ones which they apparently used to release 0.21.0, and which do not quite work in their plain form).


Follow these instructions:


1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop directories in the following fashion:

hadoop-common

hadoop-hdfs

hadoop-mapreduce

These can be found at the following SVN repository locations (links are to the 0.21 release tag):

Common: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/

HDFS: http://svn.apache.org/repos/asf/hadoop/hdfs/tags/release-0.21.0/

MapReduce: http://svn.apache.org/repos/asf/hadoop/mapreduce/tags/release-0.21.0/


For example, I have:

~/hadoop-workspace/hadoop-common

~/hadoop-workspace/hadoop-hdfs

~/hadoop-workspace/hadoop-mapreduce

With the build scripts located in ~/hadoop-workspace


2) zlib must be installed. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install zlib1g zlib1g-dev


3) You need (the command-line) svn to be installed. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install subversion


4) You need g++. You can get it (in Ubuntu 10.4) by running:

sudo apt-get install g++


5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:

sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

sudo apt-get update

sudo apt-get install sun-java6-jdk sun-java6-jre


6) You need Forrest, download a release from here:

http://forrest.apache.org/mirrors.cgi#how

The location where it is extracted will be referenced from the script as FORREST_HOME.


7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.


8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here:

http://xerces.apache.org/xerces-c/download.cgi

Build/Install it according to the instructions here:

http://xerces.apache.org/xerces-c/build-2.html

You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME.


9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. 64-bit build mode is supported, but since my Ubuntu system is a 32-bit machine, I cannot test it.


10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in:

${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).


11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work:

+ Go to your hadoop-mapreduce directory and edit build.xml.

+ Find the "init" target (search for: target name="init").

+ In this target, find the tag:

<unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" dest="${build.dir}"> Finally, change ${hadoop-hdfs.version} to ${version}.


12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.


Good Luck, and May the source be with you.....Always!