INSTALL 3.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
  1. To compile Hadoop Mapreduce next following, do the following:
  2. Step 1) Install dependencies for yarn
  3. See http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce/hadoop-yarn/README
  4. Make sure protbuf library is in your library path or set: export LD_LIBRARY_PATH=/usr/local/lib
  5. Step 2) Checkout
  6. svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk
  7. Step 3) Build common
  8. Go to common directory - choose your regular common build command
  9. Example: mvn clean install package -Pbintar -DskipTests
  10. Step 4) Build HDFS
  11. Go to hdfs directory
  12. ant veryclean mvn-install -Dresolvers=internal
  13. Step 5) Build yarn and mapreduce
  14. Go to mapreduce directory
  15. export MAVEN_OPTS=-Xmx512m
  16. mvn clean install assembly:assembly -DskipTests
  17. Copy in build.properties if appropriate - make sure eclipse.home not set
  18. ant veryclean tar -Dresolvers=internal
  19. You will see a tarball in
  20. ls target/hadoop-mapreduce-0.23.0-SNAPSHOT-all.tar.gz
  21. Step 6) Untar the tarball in a clean and different directory.
  22. say YARN_HOME.
  23. Make sure you aren't picking up avro-1.3.2.jar, remove:
  24. $HADOOP_COMMON_HOME/share/hadoop/common/lib/avro-1.3.2.jar
  25. $YARN_HOME/lib/avro-1.3.2.jar
  26. Step 7)
  27. Install hdfs/common and start hdfs
  28. To run Hadoop Mapreduce next applications:
  29. Step 8) export the following variables to where you have things installed:
  30. You probably want to export these in hadoop-env.sh and yarn-env.sh also.
  31. export HADOOP_MAPRED_HOME=<mapred loc>
  32. export HADOOP_COMMON_HOME=<common loc>
  33. export HADOOP_HDFS_HOME=<hdfs loc>
  34. export YARN_HOME=directory where you untarred yarn
  35. export HADOOP_CONF_DIR=<conf loc>
  36. export YARN_CONF_DIR=$HADOOP_CONF_DIR
  37. Step 9) Setup config: for running mapreduce applications, which now are in user land, you need to setup nodemanager with the following configuration in your yarn-site.xml before you start the nodemanager.
  38. <property>
  39. <name>yarn.nodemanager.aux-services</name>
  40. <value>mapreduce.shuffle</value>
  41. </property>
  42. <property>
  43. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  44. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  45. </property>
  46. Step 10) Modify mapred-site.xml to use yarn framework
  47. <property>
  48. <name> mapreduce.framework.name</name>
  49. <value>yarn</value>
  50. </property>
  51. Step 11) Create the following symlinks in $HADOOP_COMMON_HOME/share/hadoop/common/lib
  52. ln -s $YARN_HOME/modules/hadoop-mapreduce-client-app-0.23.0-SNAPSHOT.jar .
  53. ln -s $YARN_HOME/modules/hadoop-yarn-api-0.23.0-SNAPSHOT.jar .
  54. ln -s $YARN_HOME/modules/hadoop-mapreduce-client-common-0.23.0-SNAPSHOT.jar .
  55. ln -s $YARN_HOME/modules/hadoop-yarn-common-0.23.0-SNAPSHOT.jar .
  56. ln -s $YARN_HOME/modules/hadoop-mapreduce-client-core-0.23.0-SNAPSHOT.jar .
  57. ln -s $YARN_HOME/modules/hadoop-yarn-server-common-0.23.0-SNAPSHOT.jar .
  58. ln -s $YARN_HOME/modules/hadoop-mapreduce-client-jobclient-0.23.0-SNAPSHOT.jar .
  59. Step 12) cd $YARN_HOME
  60. Step 13) bin/yarn-daemon.sh start resourcemanager
  61. Step 14) bin/yarn-daemon.sh start nodemanager
  62. Step 15) bin/yarn-daemon.sh start historyserver
  63. Step 16) You are all set, an example on how to run a mapreduce job is:
  64. cd $HADOOP_MAPRED_HOME
  65. ant examples -Dresolvers=internal
  66. $HADOOP_COMMON_HOME/bin/hadoop jar $HADOOP_MAPRED_HOME/build/hadoop-mapreduce-examples-0.23.0-SNAPSHOT.jar randomwriter -Dmapreduce.job.user.name=$USER -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars $YARN_HOME/modules/hadoop-mapreduce-client-jobclient-0.23.0-SNAPSHOT.jar output
  67. The output on the command line should be almost similar to what you see in the JT/TT setup (Hadoop 0.20/0.21)