소스 검색

SUBMARINE-64. Improve TonY runtime's document. Contributed by Keqiu Hu.

Zhankun Tang 6 년 전
부모
커밋
24f218aef8
1개의 변경된 파일102개의 추가작업 그리고 3개의 파일을 삭제
  1. 102 3
      hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md

+ 102 - 3
hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md

@@ -19,6 +19,8 @@
 Must:
 
 - Apache Hadoop 2.7 or above.
+- TonY library 0.3.2 or above. You could download latest TonY jar from
+https://github.com/linkedin/TonY/releases.
 
 Optional:
 
@@ -149,9 +151,106 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
  --worker_resources memory=3G,vcores=2 \
  --num_ps 2 \
  --ps_resources memory=3G,vcores=2 \
- --worker_launch_cmd "venv.zip/venv/bin/python --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
- --ps_launch_cmd "venv.zip/venv/bin/python --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
- --container_resources /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
+ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
+ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
+ --insecure
+ --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
+PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar
+
+```
+You should then be able to see links and status of the jobs from command line:
+
+```
+2019-04-22 20:30:42,611 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for ps 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 1 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi
+2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: FINISHED
+2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: FINISHED
+2019-04-22 20:30:44,626 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: FINISHED
+
+```
+
+### With Docker
+
+```
+CLASSPATH=$(hadoop classpath --glob): \
+./hadoop-submarine-core/target/hadoop-submarine-core-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-yarnservice-runtime/target/hadoop-submarine-score-yarnservice-runtime-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-tony-runtime/target/hadoop-submarine-tony-runtime-0.2.0-SNAPSHOT.jar: \
+/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
+
+java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
+ --docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
+ --input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
+ --worker_resources memory=3G,vcores=2 \
+ --worker_launch_cmd "export CLASSPATH=\$(/hadoop-3.1.0/bin/hadoop classpath --glob) && cd /test/models/tutorials/image/cifar10_estimator && python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16 --train-batch-size=16 --variable-strategy=CPU --num-gpus=0 --sync" \
+ --env JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
+ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \
+ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
+ --env HADOOP_HOME=/hadoop-3.1.0 \
+ --env HADOOP_YARN_HOME=/hadoop-3.1.0 \
+ --env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
+ --env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
+ --env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
+ --conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
+```
+
+
+### Launch PyToch Application:
+
+#### Commandline
+
+### Without Docker
+
+You need:
+* Build a Python virtual environment with PyTorch 0.4.* installed
+* A cluster with Hadoop 2.7 or above.
+
+### Building a Python virtual environment with PyTorch
+
+TonY requires a Python virtual environment zip with PyTorch and any needed Python libraries already installed.
+
+```
+wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz
+tar xf virtualenv-16.0.0.tar.gz
+
+python virtualenv-16.0.0/virtualenv.py venv
+. venv/bin/activate
+pip install pytorch==0.4.0
+zip -r venv.zip venv
+```
+
+### PyTorch version
+
+ - Version 0.4.0+
+
+
+### Installing Hadoop
+
+TonY only requires YARN, not HDFS. Please see the [open-source documentation](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) on how to set YARN up.
+
+### Get the training examples
+
+Get mnist_distributed.py from https://github.com/linkedin/TonY/tree/master/tony-examples/mnist-pytorch
+
+
+```
+CLASSPATH=$(hadoop classpath --glob): \
+./hadoop-submarine-core/target/hadoop-submarine-core-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-yarnservice-runtime/target/hadoop-submarine-score-yarnservice-runtime-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-tony-runtime/target/hadoop-submarine-tony-runtime-0.2.0-SNAPSHOT.jar: \
+/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
+
+java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
+ --num_workers 2 \
+ --worker_resources memory=3G,vcores=2 \
+ --num_ps 2 \
+ --ps_resources memory=3G,vcores=2 \
+ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
+ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
  --insecure
  --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
 PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar