Difference between revisions of "Building Hadoop"
Direwolf007 (Talk | contribs) |
Direwolf007 (Talk | contribs) |
||
Line 1: | Line 1: | ||
The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case). | The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case). | ||
+ | |||
---- | ---- | ||
+ | |||
First of all, you will need the handy build scripts located in this tar file: | First of all, you will need the handy build scripts located in this tar file: | ||
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar | https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar | ||
These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form). | These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form). | ||
+ | |||
---- | ---- | ||
+ | |||
Follow the instructions: | Follow the instructions: | ||
− | 1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. | + | |
− | The names matter, so please name the Hadoop projects in the following fashion: | + | |
+ | 1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop projects in the following fashion: | ||
+ | |||
hadoop-common | hadoop-common | ||
+ | |||
hadoop-hdfs | hadoop-hdfs | ||
+ | |||
hadoop-mapreduce | hadoop-mapreduce | ||
+ | |||
+ | |||
For example, I have: | For example, I have: | ||
+ | |||
~/hadoop-workspace/hadoop-common | ~/hadoop-workspace/hadoop-common | ||
+ | |||
~/hadoop-workspace/hadoop-hdfs | ~/hadoop-workspace/hadoop-hdfs | ||
+ | |||
~/hadoop-workspace/hadoop-mapreduce | ~/hadoop-workspace/hadoop-mapreduce | ||
+ | |||
With the build scripts located in ~/hadoop-workspace | With the build scripts located in ~/hadoop-workspace | ||
+ | |||
2) zlib must be installed. You can get it (in Ubuntu 10.4) by running: | 2) zlib must be installed. You can get it (in Ubuntu 10.4) by running: | ||
+ | |||
sudo apt-get install zlib1g zlib1g-dev | sudo apt-get install zlib1g zlib1g-dev | ||
3) You need svn to be installed. You can get it (in Ubuntu 10.4) by running: | 3) You need svn to be installed. You can get it (in Ubuntu 10.4) by running: | ||
+ | |||
sudo apt-get install subversion | sudo apt-get install subversion | ||
4) You need g++. You can get it (in Ubuntu 10.4) by running: | 4) You need g++. You can get it (in Ubuntu 10.4) by running: | ||
+ | |||
sudo apt-get install g++ | sudo apt-get install g++ | ||
5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running: | 5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running: | ||
− | <nowiki>add-apt-repository "deb http://archive.canonical.com/ lucid partner" | + | |
+ | <nowiki>add-apt-repository "deb http://archive.canonical.com/ lucid partner" | ||
apt-get update | apt-get update | ||
− | sudo apt-get install sun-java6-jdk sun-java6-jre | + | sudo apt-get install sun-java6-jdk sun-java6-jre</nowiki> |
+ | |||
6) You need Forrest, download a release from here: | 6) You need Forrest, download a release from here: | ||
+ | |||
http://forrest.apache.org/mirrors.cgi#how | http://forrest.apache.org/mirrors.cgi#how | ||
+ | |||
The location where it is extracted will be referenced from the script as FORREST_HOME. | The location where it is extracted will be referenced from the script as FORREST_HOME. | ||
+ | |||
7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME. | 7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME. | ||
+ | |||
8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here: | 8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here: | ||
+ | |||
http://xerces.apache.org/xerces-c/download.cgi | http://xerces.apache.org/xerces-c/download.cgi | ||
+ | |||
Build/Install it according to the instructions here: | Build/Install it according to the instructions here: | ||
+ | |||
http://xerces.apache.org/xerces-c/build-2.html | http://xerces.apache.org/xerces-c/build-2.html | ||
+ | |||
You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME. | You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME. | ||
+ | |||
9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. | 9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths. | ||
+ | |||
10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in: | 10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in: | ||
+ | |||
${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project). | ${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project). | ||
11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work: | 11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work: | ||
+ | |||
+ Go to your hadoop-mapreduce directory and edit build.xml. | + Go to your hadoop-mapreduce directory and edit build.xml. | ||
+ | |||
+ Find the "init" target (search for: target name="init"). | + Find the "init" target (search for: target name="init"). | ||
+ | |||
+ In this target, find the tag: | + In this target, find the tag: | ||
+ | |||
<nowiki><unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" | <nowiki><unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" | ||
dest="${build.dir}"> | dest="${build.dir}"> | ||
+ | |||
Finally, change ${hadoop-hdfs.version} to ${version}. | Finally, change ${hadoop-hdfs.version} to ${version}. | ||
</nowiki> | </nowiki> | ||
+ | |||
12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release. | 12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release. | ||
+ | |||
Good Luck! | Good Luck! |
Revision as of 14:35, 26 October 2010
The following are instructions to build Hadoop 0.21.x from source. These have been developed on Ubuntu 10.4, but the basic flow should work for any *nix operating system (but the "how to get X" instructions may not be correct in such a case).
First of all, you will need the handy build scripts located in this tar file:
https://www.cs.huji.ac.il/project/lawa/downloadables/hadoop-buildscript.tar
These are heavily modified versions of the "official" 0.21.x build scripts (the ones which they apparently used to release 0.21.0, and which do not work in their plain form).
Follow the instructions:
1) First of all, you need the Hadoop sources in three sub-directories at the same directory. This directory also needs to contain the build scripts from the build script tar file you downloaded. The names matter, so please name the Hadoop projects in the following fashion:
hadoop-common
hadoop-hdfs
hadoop-mapreduce
For example, I have:
~/hadoop-workspace/hadoop-common
~/hadoop-workspace/hadoop-hdfs
~/hadoop-workspace/hadoop-mapreduce
With the build scripts located in ~/hadoop-workspace
2) zlib must be installed. You can get it (in Ubuntu 10.4) by running:
sudo apt-get install zlib1g zlib1g-dev
3) You need svn to be installed. You can get it (in Ubuntu 10.4) by running:
sudo apt-get install subversion
4) You need g++. You can get it (in Ubuntu 10.4) by running:
sudo apt-get install g++
5) You need Java 6 to build Hadoop. In Ubuntu 10.4 you can get it by running:
add-apt-repository "deb http://archive.canonical.com/ lucid partner" apt-get update sudo apt-get install sun-java6-jdk sun-java6-jre
6) You need Forrest, download a release from here:
http://forrest.apache.org/mirrors.cgi#how
The location where it is extracted will be referenced from the script as FORREST_HOME.
7) You need Java 5 JDK for Forrest. You can get it by downloading it from the web in .bin form for Ubuntu. chmod it so you can execute the .bin file, and run it. It will create a folder you can then move anywhere you like. You will reference it from the building script as JAVA5_HOME.
8) You need Xerces-C 2.8.x (3.1.x WILL NOT WORK! Save yourself the time I already wasted!), download it from here:
http://xerces.apache.org/xerces-c/download.cgi
Build/Install it according to the instructions here:
http://xerces.apache.org/xerces-c/build-2.html
You can also download a binary distribution of Xerces-C. The root directory of the distribution will be referenced from the script as XERCES_C_HOME.
9) Open the build-local-defs.sh script file. Set the variables within as appropriate for your machine. Please use full, rather than relative, paths.
10) The Eclipse-Plugin in 0.21 Map Reduce fails to compile, but it is not needed to setup a cluster (It makes running programs on it easier, though). The current work-around is to delete the folder from the Hadoop Mapreduce project prior to executing the build script. It is located in:
${MAPREDUCE_DIR}/src/contrib/eclipse-plugin, simply delete this directory (or move it out of the project).
11) There is a problem with the build.xml file for hadoop-mapreduce 0.21. If you specify a release version other than 0.21.0 in build-local-defs.sh it will fail to build because it will specifically look for 0.21.0 in the init target of said build.xml file due to an internal problem. You are welcome to try and figure it out, but if you just want it to work:
+ Go to your hadoop-mapreduce directory and edit build.xml.
+ Find the "init" target (search for: target name="init").
+ In this target, find the tag:
<unzip src="${common.ivy.lib.dir}/hadoop-hdfs-${hadoop-hdfs.version}.jar.jar" dest="${build.dir}"> Finally, change ${hadoop-hdfs.version} to ${version}.
12) Execute the build-hadoop.sh script. The result will be a tar file containing the built Hadoop release located in the same directory as build-hadoop.sh script as well as a directory named hadoop-<version> containing the same release.
Good Luck!