浏览代码

Merge -r 951613:951614 from trunk to branch-0.21. Fixes: HADOOP-6780

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21@951616 13f79535-47bb-0310-9956-ffa450edef68
Thomas White 15 年之前
父节点
当前提交
5382255687
共有 33 个文件被更改,包括 0 次插入5061 次删除
  1. 0 19
      CHANGES.txt
  2. 0 497
      src/contrib/cloud/README.txt
  3. 0 45
      src/contrib/cloud/build.xml
  4. 0 202
      src/contrib/cloud/lib/pyAntTasks-1.3-LICENSE.txt
  5. 二进制
      src/contrib/cloud/lib/pyAntTasks-1.3.jar
  6. 0 52
      src/contrib/cloud/src/integration-test/create-ebs-snapshot.sh
  7. 0 30
      src/contrib/cloud/src/integration-test/ebs-storage-spec.json
  8. 0 122
      src/contrib/cloud/src/integration-test/persistent-cluster.sh
  9. 0 112
      src/contrib/cloud/src/integration-test/transient-cluster.sh
  10. 0 21
      src/contrib/cloud/src/py/hadoop-cloud
  11. 0 21
      src/contrib/cloud/src/py/hadoop-ec2
  12. 0 14
      src/contrib/cloud/src/py/hadoop/__init__.py
  13. 0 15
      src/contrib/cloud/src/py/hadoop/cloud/__init__.py
  14. 0 438
      src/contrib/cloud/src/py/hadoop/cloud/cli.py
  15. 0 187
      src/contrib/cloud/src/py/hadoop/cloud/cluster.py
  16. 0 459
      src/contrib/cloud/src/py/hadoop/cloud/data/boot-rackspace.sh
  17. 0 548
      src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh
  18. 0 22
      src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-rackspace-init-remote.sh
  19. 0 112
      src/contrib/cloud/src/py/hadoop/cloud/data/zookeeper-ec2-init-remote.sh
  20. 0 14
      src/contrib/cloud/src/py/hadoop/cloud/providers/__init__.py
  21. 0 61
      src/contrib/cloud/src/py/hadoop/cloud/providers/dummy.py
  22. 0 479
      src/contrib/cloud/src/py/hadoop/cloud/providers/ec2.py
  23. 0 239
      src/contrib/cloud/src/py/hadoop/cloud/providers/rackspace.py
  24. 0 640
      src/contrib/cloud/src/py/hadoop/cloud/service.py
  25. 0 173
      src/contrib/cloud/src/py/hadoop/cloud/storage.py
  26. 0 84
      src/contrib/cloud/src/py/hadoop/cloud/util.py
  27. 0 30
      src/contrib/cloud/src/py/setup.py
  28. 0 37
      src/contrib/cloud/src/test/py/testcluster.py
  29. 0 74
      src/contrib/cloud/src/test/py/testrackspace.py
  30. 0 143
      src/contrib/cloud/src/test/py/teststorage.py
  31. 0 44
      src/contrib/cloud/src/test/py/testuserdata.py
  32. 0 81
      src/contrib/cloud/src/test/py/testutil.py
  33. 0 46
      src/contrib/cloud/tools/rackspace/remote-setup.sh

+ 0 - 19
CHANGES.txt

@@ -236,15 +236,10 @@ Release 0.21.0 - Unreleased
     and the init of the class is made to take a Configuration argument.
     and the init of the class is made to take a Configuration argument.
     (Jakob Homan via ddas)
     (Jakob Homan via ddas)
 
 
-    HADOOP-6108. Add support for EBS storage on EC2. (tomwhite)
-
     Hadoop-6223. Add new file system interface AbstractFileSystem with
     Hadoop-6223. Add new file system interface AbstractFileSystem with
     implementation of some file systems that delegate to old FileSystem.
     implementation of some file systems that delegate to old FileSystem.
     (Sanjay Radia via suresh)
     (Sanjay Radia via suresh)
 
 
-    HADOOP-6392. Run namenode and jobtracker on separate EC2 instances.
-    (tomwhite)
-
     HADOOP-6433. Introduce asychronous deletion of files via a pool of
     HADOOP-6433. Introduce asychronous deletion of files via a pool of
     threads. This can be used to delete files in the Distributed
     threads. This can be used to delete files in the Distributed
     Cache. (Zheng Shao via dhruba)
     Cache. (Zheng Shao via dhruba)
@@ -252,13 +247,9 @@ Release 0.21.0 - Unreleased
     HADOOP-6415. Adds a common token interface for both job token and 
     HADOOP-6415. Adds a common token interface for both job token and 
     delegation token. (Kan Zhang via ddas)
     delegation token. (Kan Zhang via ddas)
 
 
-    HADOOP-6466. Add a ZooKeeper service to the cloud scripts. (tomwhite)
-
     HADOOP-6408. Add a /conf servlet to dump running configuration.
     HADOOP-6408. Add a /conf servlet to dump running configuration.
     (Todd Lipcon via tomwhite)
     (Todd Lipcon via tomwhite)
 
 
-    HADOOP-6464. Write a Rackspace cloud provider. (tomwhite)
-
     HADOOP-6520. Adds APIs to read/write Token and secret keys. Also
     HADOOP-6520. Adds APIs to read/write Token and secret keys. Also
     adds the automatic loading of tokens into UserGroupInformation
     adds the automatic loading of tokens into UserGroupInformation
     upon login. The tokens are read from a file specified in the
     upon login. The tokens are read from a file specified in the
@@ -737,15 +728,8 @@ Release 0.21.0 - Unreleased
     HADOOP-6394. Add a helper class to simplify FileContext related tests and
     HADOOP-6394. Add a helper class to simplify FileContext related tests and
     improve code reusability. (Jitendra Nath Pandey via suresh)
     improve code reusability. (Jitendra Nath Pandey via suresh)
 
 
-    HADOOP-6426. Create ant build for running EC2 unit tests. (tomwhite)
-
     HADOOP-4656. Add a user to groups mapping service. (boryas, acmurthy)
     HADOOP-4656. Add a user to groups mapping service. (boryas, acmurthy)
 
 
-    HADOOP-6444. Support additional security group option in hadoop-ec2 script.
-    (Paul Egan via tomwhite)
-
-    HADOOP-6454. Create setup.py for EC2 cloud scripts. (tomwhite)
-
     HADOOP-6435. Make RPC.waitForProxy with timeout public. (Steve Loughran
     HADOOP-6435. Make RPC.waitForProxy with timeout public. (Steve Loughran
     via tomwhite)
     via tomwhite)
   
   
@@ -1500,9 +1484,6 @@ Release 0.21.0 - Unreleased
     HADOOP-6640. FileSystem.get() does RPC retries within a static
     HADOOP-6640. FileSystem.get() does RPC retries within a static
     synchronized block. (hairong)
     synchronized block. (hairong)
 
 
-    HADOOP-6680. hadoop-cloud push command invokes proxy creation.
-    (Andrew Klochkov via tomwhite)
-
     HADOOP-6691. TestFileSystemCaching sometimes hangs. (hairong)
     HADOOP-6691. TestFileSystemCaching sometimes hangs. (hairong)
 
 
     HADOOP-6507. Hadoop Common Docs - delete 3 doc files that do not belong
     HADOOP-6507. Hadoop Common Docs - delete 3 doc files that do not belong

+ 0 - 497
src/contrib/cloud/README.txt

@@ -1,497 +0,0 @@
-Hadoop Cloud Scripts
-====================
-
-These scripts allow you to run Hadoop on cloud providers. These instructions
-assume you are running on Amazon EC2, the differences for other providers are
-noted at the end of this document.
-
-Getting Started
-===============
-
-First, unpack the scripts on your system. For convenience, you may like to put
-the top-level directory on your path.
-
-You'll also need python (version 2.5 or newer) and the boto and simplejson
-libraries. After you download boto and simplejson, you can install each in turn
-by running the following in the directory where you unpacked the distribution:
-
-% sudo python setup.py install
-
-Alternatively, you might like to use the python-boto and python-simplejson RPM
-and Debian packages.
-
-You need to tell the scripts your AWS credentials. The simplest way to do this
-is to set the environment variables (but see
-http://code.google.com/p/boto/wiki/BotoConfig for other options):
-
-    * AWS_ACCESS_KEY_ID - Your AWS Access Key ID
-    * AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
-
-To configure the scripts, create a directory called .hadoop-cloud (note the
-leading ".") in your home directory. In it, create a file called
-clusters.cfg with a section for each cluster you want to control. e.g.:
-
-[my-hadoop-cluster]
-image_id=ami-6159bf08
-instance_type=c1.medium
-key_name=tom
-availability_zone=us-east-1c
-private_key=PATH_TO_PRIVATE_KEY
-ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
-
-The image chosen here is one with a i386 Fedora OS. For a list of suitable AMIs
-see http://wiki.apache.org/hadoop/AmazonEC2.
-
-The architecture must be compatible with the instance type. For m1.small and
-c1.medium instances use the i386 AMIs, while for m1.large, m1.xlarge, and
-c1.xlarge instances use the x86_64 AMIs. One of the high CPU instances
-(c1.medium or c1.xlarge) is recommended.
-
-Then you can run the hadoop-ec2 script. It will display usage instructions when
-invoked without arguments.
-
-You can test that it can connect to AWS by typing:
-
-% hadoop-ec2 list
-
-LAUNCHING A CLUSTER
-===================
-
-To launch a cluster called "my-hadoop-cluster" with 10 worker (slave) nodes
-type:
-
-% hadoop-ec2 launch-cluster my-hadoop-cluster 10
-
-This will boot the master node and 10 worker nodes. The master node runs the
-namenode, secondary namenode, and jobtracker, and each worker node runs a
-datanode and a tasktracker. Equivalently the cluster could be launched as:
-
-% hadoop-ec2 launch-cluster my-hadoop-cluster 1 nn,snn,jt 10 dn,tt
-
-Note that using this notation you can launch a split namenode/jobtracker cluster
-
-% hadoop-ec2 launch-cluster my-hadoop-cluster 1 nn,snn 1 jt 10 dn,tt
-
-When the nodes have started and the Hadoop cluster has come up, the console will
-display a message like
-
-  Browse the cluster at http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/
-
-You can access Hadoop's web UI by visiting this URL. By default, port 80 is
-opened for access from your client machine. You may change the firewall settings
-(to allow access from a network, rather than just a single machine, for example)
-by using the Amazon EC2 command line tools, or by using a tool like Elastic Fox.
-There is a security group for each node's role. The one for the namenode
-is <cluster-name>-nn, for example.
-
-For security reasons, traffic from the network your client is running on is
-proxied through the master node of the cluster using an SSH tunnel (a SOCKS
-proxy on port 6666). To set up the proxy run the following command:
-
-% hadoop-ec2 proxy my-hadoop-cluster
-
-Web browsers need to be configured to use this proxy too, so you can view pages
-served by worker nodes in the cluster. The most convenient way to do this is to
-use a proxy auto-config (PAC) file, such as this one:
-
-  http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac
-
-If you are using Firefox, then you may find
-FoxyProxy useful for managing PAC files. (If you use FoxyProxy, then you need to
-get it to use the proxy for DNS lookups. To do this, go to Tools -> FoxyProxy ->
-Options, and then under "Miscellaneous" in the bottom left, choose "Use SOCKS
-proxy for DNS lookups".)
-
-PERSISTENT CLUSTERS
-===================
-
-Hadoop clusters running on EC2 that use local EC2 storage (the default) will not
-retain data once the cluster has been terminated. It is possible to use EBS for
-persistent data, which allows a cluster to be shut down while it is not being
-used.
-
-Note: EBS support is a Beta feature.
-
-First create a new section called "my-ebs-cluster" in the
-.hadoop-cloud/clusters.cfg file.
-
-Now we need to create storage for the new cluster. Create a temporary EBS volume
-of size 100GiB, format it, and save it as a snapshot in S3. This way, we only
-have to do the formatting once.
-
-% hadoop-ec2 create-formatted-snapshot my-ebs-cluster 100
-
-We create storage for a single namenode and for two datanodes. The volumes to
-create are described in a JSON spec file, which references the snapshot we just
-created. Here is the contents of a JSON file, called
-my-ebs-cluster-storage-spec.json:
-
-{
-  "nn": [
-    {
-      "device": "/dev/sdj",
-      "mount_point": "/ebs1",
-      "size_gb": "100",
-      "snapshot_id": "snap-268e704f"
-    },
-    {
-      "device": "/dev/sdk",
-      "mount_point": "/ebs2",
-      "size_gb": "100",
-      "snapshot_id": "snap-268e704f"
-    }
-  ],
-  "dn": [
-    {
-      "device": "/dev/sdj",
-      "mount_point": "/ebs1",
-      "size_gb": "100",
-      "snapshot_id": "snap-268e704f"
-    },
-    {
-      "device": "/dev/sdk",
-      "mount_point": "/ebs2",
-      "size_gb": "100",
-      "snapshot_id": "snap-268e704f"
-    }
-  ]
-}
-
-
-Each role (here "nn" and "dn") is the key to an array of volume
-specifications. In this example, the "slave" role has two devices ("/dev/sdj"
-and "/dev/sdk") with different mount points, sizes, and generated from an EBS
-snapshot. The snapshot is the formatted snapshot created earlier, so that the
-volumes we create are pre-formatted. The size of the drives must match the size
-of the snapshot created earlier.
-
-Let's create actual volumes using this file.
-
-% hadoop-ec2 create-storage my-ebs-cluster nn 1 \
-    my-ebs-cluster-storage-spec.json
-% hadoop-ec2 create-storage my-ebs-cluster dn 2 \
-    my-ebs-cluster-storage-spec.json
-
-Now let's start the cluster with 2 slave nodes:
-
-% hadoop-ec2 launch-cluster my-ebs-cluster 2
-
-Login and run a job which creates some output.
-
-% hadoop-ec2 login my-ebs-cluster
-
-# hadoop fs -mkdir input
-# hadoop fs -put /etc/hadoop/conf/*.xml input
-# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output \
-    'dfs[a-z.]+'
-
-Look at the output:
-
-# hadoop fs -cat output/part-00000 | head
-
-Now let's shutdown the cluster.
-
-% hadoop-ec2 terminate-cluster my-ebs-cluster
-
-A little while later we restart the cluster and login.
-
-% hadoop-ec2 launch-cluster my-ebs-cluster 2
-% hadoop-ec2 login my-ebs-cluster
-
-The output from the job we ran before should still be there:
-
-# hadoop fs -cat output/part-00000 | head
-
-RUNNING JOBS
-============
-
-When you launched the cluster, a hadoop-site.xml file was created in the
-directory ~/.hadoop-cloud/<cluster-name>. You can use this to connect to the
-cluster by setting the HADOOP_CONF_DIR enviroment variable (it is also possible
-to set the configuration file to use by passing it as a -conf option to Hadoop
-Tools):
-
-% export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
-
-Let's try browsing HDFS:
-
-% hadoop fs -ls /
-
-Running a job is straightforward:
-
-% hadoop fs -mkdir input # create an input directory
-% hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there
-% hadoop jar $HADOOP_HOME/hadoop-*-examples.jar wordcount input output
-% hadoop fs -cat output/part-00000 | head
-
-Of course, these examples assume that you have installed Hadoop on your local
-machine. It is also possible to launch jobs from within the cluster. First log
-into the namenode:
-
-% hadoop-ec2 login my-hadoop-cluster
-
-Then run a job as before:
-
-# hadoop fs -mkdir input
-# hadoop fs -put /etc/hadoop/conf/*.xml input
-# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
-# hadoop fs -cat output/part-00000 | head
-
-TERMINATING A CLUSTER
-=====================
-
-When you've finished with your cluster you can stop it with the following
-command.
-
-NOTE: ALL DATA WILL BE LOST UNLESS YOU ARE USING EBS!
-
-% hadoop-ec2 terminate-cluster my-hadoop-cluster
-
-You can then delete the EC2 security groups with:
-
-% hadoop-ec2 delete-cluster my-hadoop-cluster
-
-AUTOMATIC CLUSTER SHUTDOWN
-==========================
-
-You may use the --auto-shutdown option to automatically terminate a cluster
-a given time (specified in minutes) after launch. This is useful for short-lived
-clusters where the jobs complete in a known amount of time.
-
-If you want to cancel the automatic shutdown, then run
-
-% hadoop-ec2 exec my-hadoop-cluster shutdown -c
-% hadoop-ec2 update-slaves-file my-hadoop-cluster
-% hadoop-ec2 exec my-hadoop-cluster /usr/lib/hadoop/bin/slaves.sh shutdown -c
-
-CONFIGURATION NOTES
-===================
-
-It is possible to specify options on the command line: these take precedence
-over any specified in the configuration file. For example:
-
-% hadoop-ec2 launch-cluster --image-id ami-2359bf4a --instance-type c1.xlarge \
-  my-hadoop-cluster 10
-
-This command launches a 10-node cluster using the specified image and instance
-type, overriding the equivalent settings (if any) that are in the
-"my-hadoop-cluster" section of the configuration file. Note that words in
-options are separated by hyphens (--instance-type) while the corresponding
-configuration parameter is are separated by underscores (instance_type).
-
-The scripts install Hadoop RPMs or Debian packages (depending on the OS) at
-instance boot time.
-
-By default, Apache Hadoop 0.20.1 is installed. You can also run other versions
-of Apache Hadoop. For example the following uses version 0.18.3:
-
-% hadoop-ec2 launch-cluster --env HADOOP_VERSION=0.18.3 \
-  my-hadoop-cluster 10
-
-CUSTOMIZATION
-=============
-
-You can specify a list of packages to install on every instance at boot time
-using the --user-packages command-line option (or the user_packages
-configuration parameter). Packages should be space-separated. Note that package
-names should reflect the package manager being used to install them (yum or
-apt-get depending on the OS).
-
-Here's an example that installs RPMs for R and git:
-
-% hadoop-ec2 launch-cluster --user-packages 'R git-core' my-hadoop-cluster 10
-
-You have full control over the script that is run when each instance boots. The
-default script, hadoop-ec2-init-remote.sh, may be used as a starting point to
-add extra configuration or customization of the instance. Make a copy of the
-script in your home directory, or somewhere similar, and set the
---user-data-file command-line option (or the user_data_file configuration
-parameter) to point to the (modified) copy.  hadoop-ec2 will replace "%ENV%"
-in your user data script with
-USER_PACKAGES, AUTO_SHUTDOWN, and EBS_MAPPINGS, as well as extra parameters
-supplied using the --env commandline flag.
-
-Another way of customizing the instance, which may be more appropriate for
-larger changes, is to create you own image.
-
-It's possible to use any image, as long as it i) runs (gzip compressed) user
-data on boot, and ii) has Java installed.
-
-OTHER SERVICES
-==============
-
-ZooKeeper
-=========
-
-You can run ZooKeeper by setting the "service" parameter to "zookeeper". For
-example:
-
-[my-zookeeper-cluster]
-service=zookeeper
-ami=ami-ed59bf84
-instance_type=m1.small
-key_name=tom
-availability_zone=us-east-1c
-public_key=PATH_TO_PUBLIC_KEY
-private_key=PATH_TO_PRIVATE_KEY
-
-Then to launch a three-node ZooKeeper ensemble, run:
-
-% ./hadoop-ec2 launch-cluster my-zookeeper-cluster 3 zk
-
-PROVIDER-SPECIFIC DETAILS
-=========================
-
-Rackspace
-=========
-
-Running on Rackspace is very similar to running on EC2, with a few minor
-differences noted here.
-
-Security Warning
-================
-
-Currently, Hadoop clusters on Rackspace are insecure since they don't run behind
-a firewall.
-
-Creating an image
-=================
-
-Rackspace doesn't support shared images, so you will need to build your own base
-image to get started. See "Instructions for creating an image" at the end of
-this document for details.
-
-Installation
-============
-
-To run on rackspace you need to install libcloud by checking out the latest
-source from Apache:
-
-git clone git://git.apache.org/libcloud.git
-cd libcloud; python setup.py install
-
-Set up your Rackspace credentials by exporting the following environment
-variables:
-
-    * RACKSPACE_KEY - Your Rackspace user name
-    * RACKSPACE_SECRET - Your Rackspace API key
-    
-Configuration
-=============
-
-The cloud_provider parameter must be set to specify Rackspace as the provider.
-Here is a typical configuration:
-
-[my-rackspace-cluster]
-cloud_provider=rackspace
-image_id=200152
-instance_type=4
-public_key=/path/to/public/key/file
-private_key=/path/to/private/key/file
-ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
-
-It's a good idea to create a dedicated key using a command similar to:
-
-ssh-keygen -f id_rsa_rackspace -P ''
-
-Launching a cluster
-===================
-
-Use the "hadoop-cloud" command instead of "hadoop-ec2".
-
-After launching a cluster you need to manually add a hostname mapping for the
-master node to your client's /etc/hosts to get it to work. This is because DNS
-isn't set up for the cluster nodes so your client won't resolve their addresses.
-You can do this with
-
-hadoop-cloud list my-rackspace-cluster | grep 'nn,snn,jt' \
- | awk '{print $4 " " $3 }'  | sudo tee -a /etc/hosts
-
-Instructions for creating an image
-==================================
-
-First set your Rackspace credentials:
-
-export RACKSPACE_KEY=<Your Rackspace user name>
-export RACKSPACE_SECRET=<Your Rackspace API key>
-
-Now create an authentication token for the session, and retrieve the server
-management URL to perform operations against.
-
-# Final SED is to remove trailing ^M
-AUTH_TOKEN=`curl -D - -H X-Auth-User:$RACKSPACE_KEY \
-  -H X-Auth-Key:$RACKSPACE_SECRET https://auth.api.rackspacecloud.com/v1.0 \
-  | grep 'X-Auth-Token:' | awk '{print $2}' | sed 's/.$//'`
-SERVER_MANAGEMENT_URL=`curl -D - -H X-Auth-User:$RACKSPACE_KEY \
-  -H X-Auth-Key:$RACKSPACE_SECRET https://auth.api.rackspacecloud.com/v1.0 \
-  | grep 'X-Server-Management-Url:' | awk '{print $2}' | sed 's/.$//'`
-
-echo $AUTH_TOKEN
-echo $SERVER_MANAGEMENT_URL
-
-You can get a list of images with the following
-
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images
-
-Here's the same query, but with pretty-printed XML output:
-
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images.xml | xmllint --format -
-
-There are similar queries for flavors and running instances:
-
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/flavors.xml | xmllint --format -
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers.xml | xmllint --format -
-
-The following command will create a new server. In this case it will create a
-2GB Ubuntu 8.10 instance, as determined by the imageId and flavorId attributes.
-The name of the instance is set to something meaningful too.
-
-curl -v -X POST -H X-Auth-Token:$AUTH_TOKEN -H 'Content-type: text/xml' -d @- $SERVER_MANAGEMENT_URL/servers << EOF
-<server xmlns="http://docs.rackspacecloud.com/servers/api/v1.0" name="apache-hadoop-ubuntu-8.10-base" imageId="11" flavorId="4">
-  <metadata/>
-</server>
-EOF
-
-Make a note of the new server's ID, public IP address and admin password as you
-will need these later.
-
-You can check the status of the server with
-
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers/$SERVER_ID.xml | xmllint --format -
-
-When it has started (status "ACTIVE"), copy the setup script over:
-
-scp tools/rackspace/remote-setup.sh root@$SERVER:remote-setup.sh
-
-Log in to and run the setup script (you will need to manually accept the
-Sun Java license):
-
-sh remote-setup.sh
-
-Once the script has completed, log out and create an image of the running
-instance (giving it a memorable name):
-
-curl -v -X POST -H X-Auth-Token:$AUTH_TOKEN -H 'Content-type: text/xml' -d @- $SERVER_MANAGEMENT_URL/images << EOF
-<image xmlns="http://docs.rackspacecloud.com/servers/api/v1.0" name="Apache Hadoop Ubuntu 8.10" serverId="$SERVER_ID" />
-EOF
-
-Keep a note of the image ID as this is what you will use to launch fresh
-instances from.
-
-You can check the status of the image with
-
-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images/$IMAGE_ID.xml | xmllint --format -
-
-When it's "ACTIVE" is is ready for use. It's important to realize that you have
-to keep the server from which you generated the image running for as long as the
-image is in use.
-
-However, if you want to clean up an old instance run:
-
-curl -X DELETE -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers/$SERVER_ID
-
-Similarly, you can delete old images:
-
-curl -X DELETE -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images/$IMAGE_ID
-
-

+ 0 - 45
src/contrib/cloud/build.xml

@@ -1,45 +0,0 @@
-<?xml version="1.0"?>
-
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
-<project name="hadoop-cloud" default="test-py">
-  <property name="lib.dir" value="${basedir}/lib"/>
-  <path id="java.classpath">
-    <fileset dir="${lib.dir}">
-      <include name="**/*.jar" />
-    </fileset>
-  </path>
-  <path id="test.py.path">
-    <pathelement location="${basedir}/src/py"/>
-    <pathelement location="${basedir}/src/test/py"/>
-  </path>
-  <target name="test-py" description="Run python unit tests">
-    <taskdef name="py-test" classname="org.pyant.tasks.PythonTestTask">
-      <classpath refid="java.classpath" />
-    </taskdef>
-    <py-test python="python" pythonpathref="test.py.path" >
-      <fileset dir="${basedir}/src/test/py">
-        <include name="*.py"/>
-      </fileset>
-    </py-test>
-  </target>
-  <target name="compile"/>
-  <target name="package"/>
-  <target name="test" depends="test-py"/>
-  <target name="clean"/>
-</project>

+ 0 - 202
src/contrib/cloud/lib/pyAntTasks-1.3-LICENSE.txt

@@ -1,202 +0,0 @@
-
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.

二进制
src/contrib/cloud/lib/pyAntTasks-1.3.jar


+ 0 - 52
src/contrib/cloud/src/integration-test/create-ebs-snapshot.sh

@@ -1,52 +0,0 @@
-#!/usr/bin/env bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# This script tests the "hadoop-ec2 create-formatted-snapshot" command.
-# The snapshot is deleted immediately afterwards.
-#
-# Example usage:
-# ./create-ebs-snapshot.sh
-#
-
-set -e
-set -x
-
-bin=`dirname "$0"`
-bin=`cd "$bin"; pwd`
-
-WORKSPACE=${WORKSPACE:-`pwd`}
-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
-CLUSTER=${CLUSTER:-hadoop-cloud-$USER-test-cluster}
-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
-KEY_NAME=${KEY_NAME:-$USER}
-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
-SSH_OPTIONS=${SSH_OPTIONS:-"-i ~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME \
-  -o StrictHostKeyChecking=no"}
-
-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-$HADOOP_CLOUD_PROVIDER
-
-$HADOOP_CLOUD_SCRIPT create-formatted-snapshot --config-dir=$CONFIG_DIR \
-  --key-name=$KEY_NAME --availability-zone=$AVAILABILITY_ZONE \
-  --ssh-options="$SSH_OPTIONS" \
-  $CLUSTER 1 > out.tmp
-
-snapshot_id=`grep 'Created snapshot' out.tmp | awk '{print $3}'`
-
-ec2-delete-snapshot $snapshot_id
-
-rm -f out.tmp

+ 0 - 30
src/contrib/cloud/src/integration-test/ebs-storage-spec.json

@@ -1,30 +0,0 @@
-{
-  "nn": [
-    {
-      "device": "/dev/sdj",
-      "mount_point": "/ebs1",
-      "size_gb": "7",
-      "snapshot_id": "snap-fe44bb97"
-    },
-    {
-      "device": "/dev/sdk",
-      "mount_point": "/ebs2",
-      "size_gb": "7",
-      "snapshot_id": "snap-fe44bb97"
-    }
-  ],
-  "dn": [
-    {
-      "device": "/dev/sdj",
-      "mount_point": "/ebs1",
-      "size_gb": "7",
-      "snapshot_id": "snap-fe44bb97"
-    },
-    {
-      "device": "/dev/sdk",
-      "mount_point": "/ebs2",
-      "size_gb": "7",
-      "snapshot_id": "snap-fe44bb97"
-    }
-  ]
-}

+ 0 - 122
src/contrib/cloud/src/integration-test/persistent-cluster.sh

@@ -1,122 +0,0 @@
-#!/usr/bin/env bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# This script tests the Hadoop cloud scripts by running through a minimal
-# sequence of steps to start a persistent (EBS) cluster, run a job, then
-# shutdown the cluster.
-#
-# Example usage:
-# HADOOP_HOME=~/dev/hadoop-0.20.1/ ./persistent-cluster.sh
-#
-
-function wait_for_volume_detachment() {
-  set +e
-  set +x
-  while true; do
-    attached=`$HADOOP_CLOUD_SCRIPT list-storage --config-dir=$CONFIG_DIR \
-      $CLUSTER | awk '{print $6}' | grep 'attached'`
-    sleep 5
-    if [ -z "$attached" ]; then
-      break
-    fi
-  done
-  set -e
-  set -x
-}
-
-set -e
-set -x
-
-bin=`dirname "$0"`
-bin=`cd "$bin"; pwd`
-
-WORKSPACE=${WORKSPACE:-`pwd`}
-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
-CLUSTER=${CLUSTER:-hadoop-cloud-ebs-$USER-test-cluster}
-IMAGE_ID=${IMAGE_ID:-ami-6159bf08} # default to Fedora 32-bit AMI
-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
-KEY_NAME=${KEY_NAME:-$USER}
-AUTO_SHUTDOWN=${AUTO_SHUTDOWN:-15}
-LOCAL_HADOOP_VERSION=${LOCAL_HADOOP_VERSION:-0.20.1}
-HADOOP_HOME=${HADOOP_HOME:-$WORKSPACE/hadoop-$LOCAL_HADOOP_VERSION}
-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
-SSH_OPTIONS=${SSH_OPTIONS:-"-i ~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME \
-  -o StrictHostKeyChecking=no"}
-
-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-$HADOOP_CLOUD_PROVIDER
-export HADOOP_CONF_DIR=$CONFIG_DIR/$CLUSTER
-
-# Install Hadoop locally
-if [ ! -d $HADOOP_HOME ]; then
-  wget http://archive.apache.org/dist/hadoop/core/hadoop-\
-$LOCAL_HADOOP_VERSION/hadoop-$LOCAL_HADOOP_VERSION.tar.gz
-  tar zxf hadoop-$LOCAL_HADOOP_VERSION.tar.gz -C $WORKSPACE
-  rm hadoop-$LOCAL_HADOOP_VERSION.tar.gz
-fi
-
-# Create storage
-$HADOOP_CLOUD_SCRIPT create-storage --config-dir=$CONFIG_DIR \
-  --availability-zone=$AVAILABILITY_ZONE $CLUSTER nn 1 \
-  $bin/ebs-storage-spec.json
-$HADOOP_CLOUD_SCRIPT create-storage --config-dir=$CONFIG_DIR \
-  --availability-zone=$AVAILABILITY_ZONE $CLUSTER dn 1 \
-  $bin/ebs-storage-spec.json
-
-# Launch a cluster
-$HADOOP_CLOUD_SCRIPT launch-cluster --config-dir=$CONFIG_DIR \
-  --image-id=$IMAGE_ID --key-name=$KEY_NAME --auto-shutdown=$AUTO_SHUTDOWN \
-  --availability-zone=$AVAILABILITY_ZONE $CLIENT_CIDRS $ENVS $CLUSTER 1
-
-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
-eval `$HADOOP_CLOUD_SCRIPT proxy --config-dir=$CONFIG_DIR \
-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
-
-# Run a job and check it works
-$HADOOP_HOME/bin/hadoop fs -mkdir input
-$HADOOP_HOME/bin/hadoop fs -put $HADOOP_HOME/LICENSE.txt input
-$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep \
-  input output Apache
-# following returns a non-zero exit code if no match
-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
-
-# Shutdown the cluster
-kill $HADOOP_CLOUD_PROXY_PID
-$HADOOP_CLOUD_SCRIPT terminate-cluster --config-dir=$CONFIG_DIR --force $CLUSTER
-sleep 5 # wait for termination to take effect
-
-# Relaunch the cluster
-$HADOOP_CLOUD_SCRIPT launch-cluster --config-dir=$CONFIG_DIR \
-  --image-id=$IMAGE_ID --key-name=$KEY_NAME --auto-shutdown=$AUTO_SHUTDOWN \
-  --availability-zone=$AVAILABILITY_ZONE $CLIENT_CIDRS $ENVS $CLUSTER 1
-
-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
-eval `$HADOOP_CLOUD_SCRIPT proxy --config-dir=$CONFIG_DIR \
-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
-
-# Check output is still there
-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
-
-# Shutdown the cluster
-kill $HADOOP_CLOUD_PROXY_PID
-$HADOOP_CLOUD_SCRIPT terminate-cluster --config-dir=$CONFIG_DIR --force $CLUSTER
-sleep 5 # wait for termination to take effect
-
-# Cleanup
-$HADOOP_CLOUD_SCRIPT delete-cluster --config-dir=$CONFIG_DIR $CLUSTER
-wait_for_volume_detachment
-$HADOOP_CLOUD_SCRIPT delete-storage --config-dir=$CONFIG_DIR --force $CLUSTER

+ 0 - 112
src/contrib/cloud/src/integration-test/transient-cluster.sh

@@ -1,112 +0,0 @@
-#!/usr/bin/env bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# This script tests the Hadoop cloud scripts by running through a minimal
-# sequence of steps to start a cluster, run a job, then shutdown the cluster.
-#
-# Example usage:
-# HADOOP_HOME=~/dev/hadoop-0.20.1/ ./transient-cluster.sh
-#
-
-set -e
-set -x
-
-bin=`dirname "$0"`
-bin=`cd "$bin"; pwd`
-
-WORKSPACE=${WORKSPACE:-`pwd`}
-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
-CLUSTER=${CLUSTER:-hadoop-cloud-$USER-test-cluster}
-IMAGE_ID=${IMAGE_ID:-ami-6159bf08} # default to Fedora 32-bit AMI
-INSTANCE_TYPE=${INSTANCE_TYPE:-m1.small}
-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
-KEY_NAME=${KEY_NAME:-$USER}
-AUTO_SHUTDOWN=${AUTO_SHUTDOWN:-15}
-LOCAL_HADOOP_VERSION=${LOCAL_HADOOP_VERSION:-0.20.1}
-HADOOP_HOME=${HADOOP_HOME:-$WORKSPACE/hadoop-$LOCAL_HADOOP_VERSION}
-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
-PUBLIC_KEY=${PUBLIC_KEY:-~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME.pub}
-PRIVATE_KEY=${PRIVATE_KEY:-~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME}
-SSH_OPTIONS=${SSH_OPTIONS:-"-i $PRIVATE_KEY -o StrictHostKeyChecking=no"}
-LAUNCH_ARGS=${LAUNCH_ARGS:-"1 nn,snn,jt 1 dn,tt"}
-
-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-cloud
-export HADOOP_CONF_DIR=$CONFIG_DIR/$CLUSTER
-
-# Install Hadoop locally
-if [ ! -d $HADOOP_HOME ]; then
-  wget http://archive.apache.org/dist/hadoop/core/hadoop-\
-$LOCAL_HADOOP_VERSION/hadoop-$LOCAL_HADOOP_VERSION.tar.gz
-  tar zxf hadoop-$LOCAL_HADOOP_VERSION.tar.gz -C $WORKSPACE
-  rm hadoop-$LOCAL_HADOOP_VERSION.tar.gz
-fi
-
-# Launch a cluster
-if [ $HADOOP_CLOUD_PROVIDER == 'ec2' ]; then
-  $HADOOP_CLOUD_SCRIPT launch-cluster \
-    --config-dir=$CONFIG_DIR \
-    --image-id=$IMAGE_ID \
-    --instance-type=$INSTANCE_TYPE \
-    --key-name=$KEY_NAME \
-    --auto-shutdown=$AUTO_SHUTDOWN \
-    --availability-zone=$AVAILABILITY_ZONE \
-    $CLIENT_CIDRS $ENVS $CLUSTER $LAUNCH_ARGS
-else
-  $HADOOP_CLOUD_SCRIPT launch-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-    --config-dir=$CONFIG_DIR \
-    --image-id=$IMAGE_ID \
-    --instance-type=$INSTANCE_TYPE \
-    --public-key=$PUBLIC_KEY \
-    --private-key=$PRIVATE_KEY \
-    --auto-shutdown=$AUTO_SHUTDOWN \
-    $CLIENT_CIDRS $ENVS $CLUSTER $LAUNCH_ARGS
-fi
-  
-# List clusters
-$HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-  --config-dir=$CONFIG_DIR
-$HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-  --config-dir=$CONFIG_DIR $CLUSTER
-
-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
-eval `$HADOOP_CLOUD_SCRIPT proxy --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-  --config-dir=$CONFIG_DIR \
-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
-  
-if [ $HADOOP_CLOUD_PROVIDER == 'rackspace' ]; then
-  # Need to update /etc/hosts (interactively)
-  $HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-    --config-dir=$CONFIG_DIR $CLUSTER | grep 'nn,snn,jt' \
-    | awk '{print $4 " " $3 }'  | sudo tee -a /etc/hosts
-fi
-
-# Run a job and check it works
-$HADOOP_HOME/bin/hadoop fs -mkdir input
-$HADOOP_HOME/bin/hadoop fs -put $HADOOP_HOME/LICENSE.txt input
-$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep \
-  input output Apache
-# following returns a non-zero exit code if no match
-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
-
-# Shutdown the cluster
-kill $HADOOP_CLOUD_PROXY_PID
-$HADOOP_CLOUD_SCRIPT terminate-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-  --config-dir=$CONFIG_DIR --force $CLUSTER
-sleep 5 # wait for termination to take effect
-$HADOOP_CLOUD_SCRIPT delete-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
-  --config-dir=$CONFIG_DIR $CLUSTER

+ 0 - 21
src/contrib/cloud/src/py/hadoop-cloud

@@ -1,21 +0,0 @@
-#!/usr/bin/env python2.5
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from hadoop.cloud.cli import main
-
-if __name__ == "__main__":
-  main()

+ 0 - 21
src/contrib/cloud/src/py/hadoop-ec2

@@ -1,21 +0,0 @@
-#!/usr/bin/env python2.5
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from hadoop.cloud.cli import main
-
-if __name__ == "__main__":
-  main()

+ 0 - 14
src/contrib/cloud/src/py/hadoop/__init__.py

@@ -1,14 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.

+ 0 - 15
src/contrib/cloud/src/py/hadoop/cloud/__init__.py

@@ -1,15 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-VERSION="0.22.0"

+ 0 - 438
src/contrib/cloud/src/py/hadoop/cloud/cli.py

@@ -1,438 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import with_statement
-
-import ConfigParser
-from hadoop.cloud import VERSION
-from hadoop.cloud.cluster import get_cluster
-from hadoop.cloud.service import get_service
-from hadoop.cloud.service import InstanceTemplate
-from hadoop.cloud.service import NAMENODE
-from hadoop.cloud.service import SECONDARY_NAMENODE
-from hadoop.cloud.service import JOBTRACKER
-from hadoop.cloud.service import DATANODE
-from hadoop.cloud.service import TASKTRACKER
-from hadoop.cloud.util import merge_config_with_options
-from hadoop.cloud.util import xstr
-import logging
-from optparse import OptionParser
-from optparse import make_option
-import os
-import sys
-
-DEFAULT_SERVICE_NAME = 'hadoop'
-DEFAULT_CLOUD_PROVIDER = 'ec2'
-
-DEFAULT_CONFIG_DIR_NAME = '.hadoop-cloud'
-DEFAULT_CONFIG_DIR = os.path.join(os.environ['HOME'], DEFAULT_CONFIG_DIR_NAME)
-CONFIG_FILENAME = 'clusters.cfg'
-
-CONFIG_DIR_OPTION = \
-  make_option("--config-dir", metavar="CONFIG-DIR",
-    help="The configuration directory.")
-
-PROVIDER_OPTION = \
-  make_option("--cloud-provider", metavar="PROVIDER",
-    help="The cloud provider, e.g. 'ec2' for Amazon EC2.")
-
-BASIC_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-]
-
-LAUNCH_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-  make_option("-a", "--ami", metavar="AMI",
-    help="The AMI ID of the image to launch. (Amazon EC2 only. Deprecated, use \
---image-id.)"),
-  make_option("-e", "--env", metavar="ENV", action="append",
-    help="An environment variable to pass to instances. \
-(May be specified multiple times.)"),
-  make_option("-f", "--user-data-file", metavar="URL",
-    help="The URL of the file containing user data to be made available to \
-instances."),
-  make_option("--image-id", metavar="ID",
-    help="The ID of the image to launch."),
-  make_option("-k", "--key-name", metavar="KEY-PAIR",
-    help="The key pair to use when launching instances. (Amazon EC2 only.)"),
-  make_option("-p", "--user-packages", metavar="PACKAGES",
-    help="A space-separated list of packages to install on instances on start \
-up."),
-  make_option("-t", "--instance-type", metavar="TYPE",
-    help="The type of instance to be launched. One of m1.small, m1.large, \
-m1.xlarge, c1.medium, or c1.xlarge."),
-  make_option("-z", "--availability-zone", metavar="ZONE",
-    help="The availability zone to run the instances in."),
-  make_option("--auto-shutdown", metavar="TIMEOUT_MINUTES",
-    help="The time in minutes after launch when an instance will be \
-automatically shut down."),
-  make_option("--client-cidr", metavar="CIDR", action="append",
-    help="The CIDR of the client, which is used to allow access through the \
-firewall to the master node. (May be specified multiple times.)"),
-  make_option("--security-group", metavar="SECURITY_GROUP", action="append",
-    default=[], help="Additional security groups within which the instances \
-should be run. (Amazon EC2 only.) (May be specified multiple times.)"),
-  make_option("--public-key", metavar="FILE",
-    help="The public key to authorize on launching instances. (Non-EC2 \
-providers only.)"),
-  make_option("--private-key", metavar="FILE",
-    help="The private key to use when connecting to instances. (Non-EC2 \
-providers only.)"),
-]
-
-SNAPSHOT_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-  make_option("-k", "--key-name", metavar="KEY-PAIR",
-    help="The key pair to use when launching instances."),
-  make_option("-z", "--availability-zone", metavar="ZONE",
-    help="The availability zone to run the instances in."),
-  make_option("--ssh-options", metavar="SSH-OPTIONS",
-    help="SSH options to use."),
-]
-
-PLACEMENT_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-  make_option("-z", "--availability-zone", metavar="ZONE",
-    help="The availability zone to run the instances in."),
-]
-
-FORCE_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-  make_option("--force", action="store_true", default=False,
-  help="Do not ask for confirmation."),
-]
-
-SSH_OPTIONS = [
-  CONFIG_DIR_OPTION,
-  PROVIDER_OPTION,
-  make_option("--ssh-options", metavar="SSH-OPTIONS",
-    help="SSH options to use."),
-]
-
-def print_usage(script):
-  print """Usage: %(script)s COMMAND [OPTIONS]
-where COMMAND and [OPTIONS] may be one of:
-  list [CLUSTER]                      list all running Hadoop clusters
-                                        or instances in CLUSTER
-  launch-master CLUSTER               launch or find a master in CLUSTER
-  launch-slaves CLUSTER NUM_SLAVES    launch NUM_SLAVES slaves in CLUSTER
-  launch-cluster CLUSTER (NUM_SLAVES| launch a master and NUM_SLAVES slaves or
-    N ROLE [N ROLE ...])                N instances in ROLE in CLUSTER
-  create-formatted-snapshot CLUSTER   create an empty, formatted snapshot of
-    SIZE                                size SIZE GiB
-  list-storage CLUSTER                list storage volumes for CLUSTER
-  create-storage CLUSTER ROLE         create volumes for NUM_INSTANCES instances
-    NUM_INSTANCES SPEC_FILE             in ROLE for CLUSTER, using SPEC_FILE
-  attach-storage ROLE                 attach storage volumes for ROLE to CLUSTER
-  login CLUSTER                       log in to the master in CLUSTER over SSH
-  proxy CLUSTER                       start a SOCKS proxy on localhost into the
-                                        CLUSTER
-  push CLUSTER FILE                   scp FILE to the master in CLUSTER
-  exec CLUSTER CMD                    execute CMD on the master in CLUSTER
-  terminate-cluster CLUSTER           terminate all instances in CLUSTER
-  delete-cluster CLUSTER              delete the group information for CLUSTER
-  delete-storage CLUSTER              delete all storage volumes for CLUSTER
-  update-slaves-file CLUSTER          update the slaves file on the CLUSTER
-                                        master
-
-Use %(script)s COMMAND --help to see additional options for specific commands.
-""" % locals()
-
-def print_deprecation(script, replacement):
-  print "Deprecated. Use '%(script)s %(replacement)s'." % locals()
-
-def parse_options_and_config(command, option_list=[], extra_arguments=(),
-                             unbounded_args=False):
-  """
-  Parse the arguments to command using the given option list, and combine with
-  any configuration parameters.
-
-  If unbounded_args is true then there must be at least as many extra arguments
-  as specified by extra_arguments (the first argument is always CLUSTER).
-  Otherwise there must be exactly the same number of arguments as
-  extra_arguments.
-  """
-  expected_arguments = ["CLUSTER",]
-  expected_arguments.extend(extra_arguments)
-  (options_dict, args) = parse_options(command, option_list, expected_arguments,
-                                       unbounded_args)
-
-  config_dir = get_config_dir(options_dict)
-  config_files = [os.path.join(config_dir, CONFIG_FILENAME)]
-  if 'config_dir' not in options_dict:
-    # if config_dir not set, then also search in current directory
-    config_files.insert(0, CONFIG_FILENAME)
-
-  config = ConfigParser.ConfigParser()
-  read_files = config.read(config_files)
-  logging.debug("Read %d configuration files: %s", len(read_files),
-                ", ".join(read_files))
-  cluster_name = args[0]
-  opt = merge_config_with_options(cluster_name, config, options_dict)
-  logging.debug("Options: %s", str(opt))
-  service_name = get_service_name(opt)
-  cloud_provider = get_cloud_provider(opt)
-  cluster = get_cluster(cloud_provider)(cluster_name, config_dir)
-  service = get_service(service_name, cloud_provider)(cluster)
-  return (opt, args, service)
-
-def parse_options(command, option_list=[], expected_arguments=(),
-                  unbounded_args=False):
-  """
-  Parse the arguments to command using the given option list.
-
-  If unbounded_args is true then there must be at least as many extra arguments
-  as specified by extra_arguments (the first argument is always CLUSTER).
-  Otherwise there must be exactly the same number of arguments as
-  extra_arguments.
-  """
-
-  config_file_name = "%s/%s" % (DEFAULT_CONFIG_DIR_NAME, CONFIG_FILENAME)
-  usage = """%%prog %s [options] %s
-
-Options may also be specified in a configuration file called
-%s located in the user's home directory.
-Options specified on the command line take precedence over any in the
-configuration file.""" % (command, " ".join(expected_arguments),
-                          config_file_name)
-  parser = OptionParser(usage=usage, version="%%prog %s" % VERSION,
-                        option_list=option_list)
-  parser.disable_interspersed_args()
-  (options, args) = parser.parse_args(sys.argv[2:])
-  if unbounded_args:
-    if len(args) < len(expected_arguments):
-      parser.error("incorrect number of arguments")
-  elif len(args) != len(expected_arguments):
-    parser.error("incorrect number of arguments")
-  return (vars(options), args)
-
-def get_config_dir(options_dict):
-  config_dir = options_dict.get('config_dir')
-  if not config_dir:
-    config_dir = DEFAULT_CONFIG_DIR
-  return config_dir
-
-def get_service_name(options_dict):
-  service_name = options_dict.get("service", None)
-  if service_name is None:
-    service_name = DEFAULT_SERVICE_NAME
-  return service_name
-
-def get_cloud_provider(options_dict):
-  provider = options_dict.get("cloud_provider", None)
-  if provider is None:
-    provider = DEFAULT_CLOUD_PROVIDER
-  return provider
-
-def check_options_set(options, option_names):
-  for option_name in option_names:
-    if options.get(option_name) is None:
-      print "Option '%s' is missing. Aborting." % option_name
-      sys.exit(1)
-
-def check_launch_options_set(cluster, options):
-  if cluster.get_provider_code() == 'ec2':
-    if options.get('ami') is None and options.get('image_id') is None:
-      print "One of ami or image_id must be specified. Aborting."
-      sys.exit(1)
-    check_options_set(options, ['key_name'])
-  else:
-    check_options_set(options, ['image_id', 'public_key'])
-
-def get_image_id(cluster, options):
-  if cluster.get_provider_code() == 'ec2':
-    return options.get('image_id', options.get('ami'))
-  else:
-    return options.get('image_id')
-
-def main():
-  # Use HADOOP_CLOUD_LOGGING_LEVEL=DEBUG to enable debugging output.
-  logging.basicConfig(level=getattr(logging,
-                                    os.getenv("HADOOP_CLOUD_LOGGING_LEVEL",
-                                              "INFO")))
-
-  if len(sys.argv) < 2:
-    print_usage(sys.argv[0])
-    sys.exit(1)
-
-  command = sys.argv[1]
-
-  if command == 'list':
-    (opt, args) = parse_options(command, BASIC_OPTIONS, unbounded_args=True)
-    if len(args) == 0:
-      service_name = get_service_name(opt)
-      cloud_provider = get_cloud_provider(opt)
-      service = get_service(service_name, cloud_provider)(None)
-      service.list_all(cloud_provider)
-    else:
-      (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
-      service.list()
-
-  elif command == 'launch-master':
-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS)
-    check_launch_options_set(service.cluster, opt)
-    config_dir = get_config_dir(opt)
-    template = InstanceTemplate((NAMENODE, SECONDARY_NAMENODE, JOBTRACKER), 1,
-                         get_image_id(service.cluster, opt),
-                         opt.get('instance_type'), opt.get('key_name'),
-                         opt.get('public_key'), opt.get('private_key'),
-                         opt.get('user_data_file'),
-                         opt.get('availability_zone'), opt.get('user_packages'),
-                         opt.get('auto_shutdown'), opt.get('env'),
-                         opt.get('security_group'))
-    service.launch_master(template, config_dir, opt.get('client_cidr'))
-
-  elif command == 'launch-slaves':
-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS,
-                                                    ("NUM_SLAVES",))
-    number_of_slaves = int(args[1])
-    check_launch_options_set(service.cluster, opt)
-    template = InstanceTemplate((DATANODE, TASKTRACKER), number_of_slaves,
-                         get_image_id(service.cluster, opt),
-                         opt.get('instance_type'), opt.get('key_name'),
-                         opt.get('public_key'), opt.get('private_key'),
-                         opt.get('user_data_file'),
-                         opt.get('availability_zone'), opt.get('user_packages'),
-                         opt.get('auto_shutdown'), opt.get('env'),
-                         opt.get('security_group'))
-    service.launch_slaves(template)
-
-  elif command == 'launch-cluster':
-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS,
-                                                    ("NUM_SLAVES",),
-                                                    unbounded_args=True)
-    check_launch_options_set(service.cluster, opt)
-    config_dir = get_config_dir(opt)
-    instance_templates = []
-    if len(args) == 2:
-      number_of_slaves = int(args[1])
-      print_deprecation(sys.argv[0], 'launch-cluster %s 1 nn,snn,jt %s dn,tt' %
-                        (service.cluster.name, number_of_slaves))
-      instance_templates = [
-        InstanceTemplate((NAMENODE, SECONDARY_NAMENODE, JOBTRACKER), 1,
-                         get_image_id(service.cluster, opt),
-                         opt.get('instance_type'), opt.get('key_name'),
-                         opt.get('public_key'), opt.get('private_key'),
-                         opt.get('user_data_file'),
-                         opt.get('availability_zone'), opt.get('user_packages'),
-                         opt.get('auto_shutdown'), opt.get('env'),
-                         opt.get('security_group')),
-        InstanceTemplate((DATANODE, TASKTRACKER), number_of_slaves,
-                         get_image_id(service.cluster, opt),
-                         opt.get('instance_type'), opt.get('key_name'),
-                         opt.get('public_key'), opt.get('private_key'),
-                         opt.get('user_data_file'),
-                         opt.get('availability_zone'), opt.get('user_packages'),
-                         opt.get('auto_shutdown'), opt.get('env'),
-                         opt.get('security_group')),
-                         ]
-    elif len(args) > 2 and len(args) % 2 == 0:
-      print_usage(sys.argv[0])
-      sys.exit(1)
-    else:
-      for i in range(len(args) / 2):
-        number = int(args[2 * i + 1])
-        roles = args[2 * i + 2].split(",")
-        instance_templates.append(
-          InstanceTemplate(roles, number, get_image_id(service.cluster, opt),
-                           opt.get('instance_type'), opt.get('key_name'),
-                           opt.get('public_key'), opt.get('private_key'),
-                           opt.get('user_data_file'),
-                           opt.get('availability_zone'),
-                           opt.get('user_packages'),
-                           opt.get('auto_shutdown'), opt.get('env'),
-                           opt.get('security_group')))
-
-    service.launch_cluster(instance_templates, config_dir,
-                           opt.get('client_cidr'))
-
-  elif command == 'login':
-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
-    service.login(opt.get('ssh_options'))
-
-  elif command == 'proxy':
-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
-    service.proxy(opt.get('ssh_options'))
-
-  elif command == 'push':
-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS,
-                                                    ("FILE",))
-    service.push(opt.get('ssh_options'), args[1])
-
-  elif command == 'exec':
-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS,
-                                                    ("CMD",), True)
-    service.execute(opt.get('ssh_options'), args[1:])
-
-  elif command == 'terminate-cluster':
-    (opt, args, service) = parse_options_and_config(command, FORCE_OPTIONS)
-    service.terminate_cluster(opt["force"])
-
-  elif command == 'delete-cluster':
-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
-    service.delete_cluster()
-
-  elif command == 'create-formatted-snapshot':
-    (opt, args, service) = parse_options_and_config(command, SNAPSHOT_OPTIONS,
-                                                    ("SIZE",))
-    size = int(args[1])
-    check_options_set(opt, ['availability_zone', 'key_name'])
-    ami_ubuntu_intrepid_x86 = 'ami-ec48af85' # use a general AMI
-    service.create_formatted_snapshot(size,
-                                         opt.get('availability_zone'),
-                                         ami_ubuntu_intrepid_x86,
-                                         opt.get('key_name'),
-                                         xstr(opt.get('ssh_options')))
-
-  elif command == 'list-storage':
-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
-    service.list_storage()
-
-  elif command == 'create-storage':
-    (opt, args, service) = parse_options_and_config(command, PLACEMENT_OPTIONS,
-                                                    ("ROLE", "NUM_INSTANCES",
-                                                     "SPEC_FILE"))
-    role = args[1]
-    number_of_instances = int(args[2])
-    spec_file = args[3]
-    check_options_set(opt, ['availability_zone'])
-    service.create_storage(role, number_of_instances,
-                           opt.get('availability_zone'), spec_file)
-
-  elif command == 'attach-storage':
-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS,
-                                                    ("ROLE",))
-    service.attach_storage(args[1])
-
-  elif command == 'delete-storage':
-    (opt, args, service) = parse_options_and_config(command, FORCE_OPTIONS)
-    service.delete_storage(opt["force"])
-
-  elif command == 'update-slaves-file':
-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
-    check_options_set(opt, ['private_key'])
-    ssh_options = xstr(opt.get('ssh_options'))
-    config_dir = get_config_dir(opt)
-    service.update_slaves_file(config_dir, ssh_options, opt.get('private_key'))
-
-  else:
-    print "Unrecognized command '%s'" % command
-    print_usage(sys.argv[0])
-    sys.exit(1)

+ 0 - 187
src/contrib/cloud/src/py/hadoop/cloud/cluster.py

@@ -1,187 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Classes for controlling a cluster of cloud instances.
-"""
-
-from __future__ import with_statement
-
-import gzip
-import StringIO
-import urllib
-
-from hadoop.cloud.storage import Storage
-
-CLUSTER_PROVIDER_MAP = {
-  "dummy": ('hadoop.cloud.providers.dummy', 'DummyCluster'),
-  "ec2": ('hadoop.cloud.providers.ec2', 'Ec2Cluster'),
-  "rackspace": ('hadoop.cloud.providers.rackspace', 'RackspaceCluster'),
-}
-
-def get_cluster(provider):
-  """
-  Retrieve the Cluster class for a provider.
-  """
-  mod_name, driver_name = CLUSTER_PROVIDER_MAP[provider]
-  _mod = __import__(mod_name, globals(), locals(), [driver_name])
-  return getattr(_mod, driver_name)
-
-class Cluster(object):
-  """
-  A cluster of server instances. A cluster has a unique name.
-  One may launch instances which run in a certain role.
-  """
-
-  def __init__(self, name, config_dir):
-    self.name = name
-    self.config_dir = config_dir
-
-  def get_provider_code(self):
-    """
-    The code that uniquely identifies the cloud provider.
-    """
-    raise Exception("Unimplemented")
-
-  def authorize_role(self, role, from_port, to_port, cidr_ip):
-    """
-    Authorize access to machines in a given role from a given network.
-    """
-    pass
-
-  def get_instances_in_role(self, role, state_filter=None):
-    """
-    Get all the instances in a role, filtered by state.
-
-    @param role: the name of the role
-    @param state_filter: the state that the instance should be in
-    (e.g. "running"), or None for all states
-    """
-    raise Exception("Unimplemented")
-
-  def print_status(self, roles=None, state_filter="running"):
-    """
-    Print the status of instances in the given roles, filtered by state.
-    """
-    pass
-
-  def check_running(self, role, number):
-    """
-    Check that a certain number of instances in a role are running.
-    """
-    instances = self.get_instances_in_role(role, "running")
-    if len(instances) != number:
-      print "Expected %s instances in role %s, but was %s %s" % \
-        (number, role, len(instances), instances)
-      return False
-    else:
-      return instances
-
-  def launch_instances(self, roles, number, image_id, size_id,
-                       instance_user_data, **kwargs):
-    """
-    Launch instances (having the given roles) in the cluster.
-    Returns a list of IDs for the instances started.
-    """
-    pass
-
-  def wait_for_instances(self, instance_ids, timeout=600):
-    """
-    Wait for instances to start.
-    Raise TimeoutException if the timeout is exceeded.
-    """
-    pass
-
-  def terminate(self):
-    """
-    Terminate all instances in the cluster.
-    """
-    pass
-
-  def delete(self):
-    """
-    Delete the cluster permanently. This operation is only permitted if no
-    instances are running.
-    """
-    pass
-
-  def get_storage(self):
-    """
-    Return the external storage for the cluster.
-    """
-    return Storage(self)
-
-class InstanceUserData(object):
-  """
-  The data passed to an instance on start up.
-  """
-
-  def __init__(self, filename, replacements={}):
-    self.filename = filename
-    self.replacements = replacements
-
-  def _read_file(self, filename):
-    """
-    Read the user data.
-    """
-    return urllib.urlopen(filename).read()
-
-  def read(self):
-    """
-    Read the user data, making replacements.
-    """
-    contents = self._read_file(self.filename)
-    for (match, replacement) in self.replacements.iteritems():
-      if replacement == None:
-        replacement = ''
-      contents = contents.replace(match, replacement)
-    return contents
-
-  def read_as_gzip_stream(self):
-    """
-    Read and compress the data.
-    """
-    output = StringIO.StringIO()
-    compressed = gzip.GzipFile(mode='wb', fileobj=output)
-    compressed.write(self.read())
-    compressed.close()
-    return output.getvalue()
-
-class Instance(object):
-  """
-  A server instance.
-  """
-  def __init__(self, id, public_ip, private_ip):
-    self.id = id
-    self.public_ip = public_ip
-    self.private_ip = private_ip
-
-class RoleSyntaxException(Exception):
-  """
-  Raised when a role name is invalid. Role names may consist of a sequence
-  of alphanumeric characters and underscores. Dashes are not permitted in role
-  names.
-  """
-  def __init__(self, message):
-    super(RoleSyntaxException, self).__init__()
-    self.message = message
-  def __str__(self):
-    return repr(self.message)
-
-class TimeoutException(Exception):
-  """
-  Raised when a timeout is exceeded.
-  """
-  pass

+ 0 - 459
src/contrib/cloud/src/py/hadoop/cloud/data/boot-rackspace.sh

@@ -1,459 +0,0 @@
-#!/bin/bash -x
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-################################################################################
-# Script that is run on each instance on boot.
-################################################################################
-
-################################################################################
-# Initialize variables
-################################################################################
-SELF_HOST=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`
-HADOOP_VERSION=${HADOOP_VERSION:-0.20.1}
-HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION
-HADOOP_CONF_DIR=$HADOOP_HOME/conf
-for role in $(echo "$ROLES" | tr "," "\n"); do
-  case $role in
-  nn)
-    NN_HOST=$SELF_HOST
-    ;;
-  jt)
-    JT_HOST=$SELF_HOST
-    ;;
-  esac
-done
-
-function register_auto_shutdown() {
-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
-  fi
-}
-
-function update_repo() {
-  if which dpkg &> /dev/null; then
-    sudo apt-get update
-  elif which rpm &> /dev/null; then
-    yum update -y yum
-  fi
-}
-
-# Install a list of packages on debian or redhat as appropriate
-function install_packages() {
-  if which dpkg &> /dev/null; then
-    apt-get update
-    apt-get -y install $@
-  elif which rpm &> /dev/null; then
-    yum install -y $@
-  else
-    echo "No package manager found."
-  fi
-}
-
-# Install any user packages specified in the USER_PACKAGES environment variable
-function install_user_packages() {
-  if [ ! -z "$USER_PACKAGES" ]; then
-    install_packages $USER_PACKAGES
-  fi
-}
-
-function install_hadoop() {
-  useradd hadoop
-
-  hadoop_tar_url=http://s3.amazonaws.com/hadoop-releases/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
-  hadoop_tar_file=`basename $hadoop_tar_url`
-  hadoop_tar_md5_file=`basename $hadoop_tar_url.md5`
-
-  curl="curl --retry 3 --silent --show-error --fail"
-  for i in `seq 1 3`;
-  do
-    $curl -O $hadoop_tar_url
-    $curl -O $hadoop_tar_url.md5
-    if md5sum -c $hadoop_tar_md5_file; then
-      break;
-    else
-      rm -f $hadoop_tar_file $hadoop_tar_md5_file
-    fi
-  done
-
-  if [ ! -e $hadoop_tar_file ]; then
-    echo "Failed to download $hadoop_tar_url. Aborting."
-    exit 1
-  fi
-
-  tar zxf $hadoop_tar_file -C /usr/local
-  rm -f $hadoop_tar_file $hadoop_tar_md5_file
-
-  echo "export HADOOP_HOME=$HADOOP_HOME" >> ~root/.bashrc
-  echo 'export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH' >> ~root/.bashrc
-}
-
-function prep_disk() {
-  mount=$1
-  device=$2
-  automount=${3:-false}
-
-  echo "warning: ERASING CONTENTS OF $device"
-  mkfs.xfs -f $device
-  if [ ! -e $mount ]; then
-    mkdir $mount
-  fi
-  mount -o defaults,noatime $device $mount
-  if $automount ; then
-    echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab
-  fi
-}
-
-function wait_for_mount {
-  mount=$1
-  device=$2
-
-  mkdir $mount
-
-  i=1
-  echo "Attempting to mount $device"
-  while true ; do
-    sleep 10
-    echo -n "$i "
-    i=$[$i+1]
-    mount -o defaults,noatime $device $mount || continue
-    echo " Mounted."
-    break;
-  done
-}
-
-function make_hadoop_dirs {
-  for mount in "$@"; do
-    if [ ! -e $mount/hadoop ]; then
-      mkdir -p $mount/hadoop
-      chown hadoop:hadoop $mount/hadoop
-    fi
-  done
-}
-
-# Configure Hadoop by setting up disks and site file
-function configure_hadoop() {
-
-  MOUNT=/data
-  FIRST_MOUNT=$MOUNT
-  DFS_NAME_DIR=$MOUNT/hadoop/hdfs/name
-  FS_CHECKPOINT_DIR=$MOUNT/hadoop/hdfs/secondary
-  DFS_DATA_DIR=$MOUNT/hadoop/hdfs/data
-  MAPRED_LOCAL_DIR=$MOUNT/hadoop/mapred/local
-  MAX_MAP_TASKS=2
-  MAX_REDUCE_TASKS=1
-  CHILD_OPTS=-Xmx550m
-  CHILD_ULIMIT=1126400
-  TMP_DIR=$MOUNT/tmp/hadoop-\${user.name}
-
-  mkdir -p $MOUNT/hadoop
-  chown hadoop:hadoop $MOUNT/hadoop
-  mkdir $MOUNT/tmp
-  chmod a+rwxt $MOUNT/tmp
-
-  mkdir /etc/hadoop
-  ln -s $HADOOP_CONF_DIR /etc/hadoop/conf
-
-  ##############################################################################
-  # Modify this section to customize your Hadoop cluster.
-  ##############################################################################
-  cat > $HADOOP_CONF_DIR/hadoop-site.xml <<EOF
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-<property>
-  <name>dfs.block.size</name>
-  <value>134217728</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.data.dir</name>
-  <value>$DFS_DATA_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.datanode.du.reserved</name>
-  <value>1073741824</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.datanode.handler.count</name>
-  <value>3</value>
-  <final>true</final>
-</property>
-<!--property>
-  <name>dfs.hosts</name>
-  <value>$HADOOP_CONF_DIR/dfs.hosts</value>
-  <final>true</final>
-</property-->
-<!--property>
-  <name>dfs.hosts.exclude</name>
-  <value>$HADOOP_CONF_DIR/dfs.hosts.exclude</value>
-  <final>true</final>
-</property-->
-<property>
-  <name>dfs.name.dir</name>
-  <value>$DFS_NAME_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.namenode.handler.count</name>
-  <value>5</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.permissions</name>
-  <value>true</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.replication</name>
-  <value>$DFS_REPLICATION</value>
-</property>
-<property>
-  <name>fs.checkpoint.dir</name>
-  <value>$FS_CHECKPOINT_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>fs.default.name</name>
-  <value>hdfs://$NN_HOST:8020/</value>
-</property>
-<property>
-  <name>fs.trash.interval</name>
-  <value>1440</value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.tmp.dir</name>
-  <value>/data/tmp/hadoop-\${user.name}</value>
-  <final>true</final>
-</property>
-<property>
-  <name>io.file.buffer.size</name>
-  <value>65536</value>
-</property>
-<property>
-  <name>mapred.child.java.opts</name>
-  <value>$CHILD_OPTS</value>
-</property>
-<property>
-  <name>mapred.child.ulimit</name>
-  <value>$CHILD_ULIMIT</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.job.tracker</name>
-  <value>$JT_HOST:8021</value>
-</property>
-<property>
-  <name>mapred.job.tracker.handler.count</name>
-  <value>5</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.local.dir</name>
-  <value>$MAPRED_LOCAL_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.map.tasks.speculative.execution</name>
-  <value>true</value>
-</property>
-<property>
-  <name>mapred.reduce.parallel.copies</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.reduce.tasks</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.reduce.tasks.speculative.execution</name>
-  <value>false</value>
-</property>
-<property>
-  <name>mapred.submit.replication</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.system.dir</name>
-  <value>/hadoop/system/mapred</value>
-</property>
-<property>
-  <name>mapred.tasktracker.map.tasks.maximum</name>
-  <value>$MAX_MAP_TASKS</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.tasktracker.reduce.tasks.maximum</name>
-  <value>$MAX_REDUCE_TASKS</value>
-  <final>true</final>
-</property>
-<property>
-  <name>tasktracker.http.threads</name>
-  <value>46</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.compress.map.output</name>
-  <value>true</value>
-</property>
-<property>
-  <name>mapred.output.compression.type</name>
-  <value>BLOCK</value>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.default</name>
-  <value>org.apache.hadoop.net.StandardSocketFactory</value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.ClientProtocol</name>
-  <value></value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.JobSubmissionProtocol</name>
-  <value></value>
-  <final>true</final>
-</property>
-<property>
-  <name>io.compression.codecs</name>
-  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</value>
-</property>
-</configuration>
-EOF
-
-  # Keep PID files in a non-temporary directory
-  sed -i -e "s|# export HADOOP_PID_DIR=.*|export HADOOP_PID_DIR=/var/run/hadoop|" \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-  mkdir -p /var/run/hadoop
-  chown -R hadoop:hadoop /var/run/hadoop
-
-  # Set SSH options within the cluster
-  sed -i -e 's|# export HADOOP_SSH_OPTS=.*|export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no"|' \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-    
-  # Disable IPv6
-  sed -i -e 's|# export HADOOP_OPTS=.*|export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"|' \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-
-  # Hadoop logs should be on the /mnt partition
-  sed -i -e 's|# export HADOOP_LOG_DIR=.*|export HADOOP_LOG_DIR=/var/log/hadoop/logs|' \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-  rm -rf /var/log/hadoop
-  mkdir /data/hadoop/logs
-  chown hadoop:hadoop /data/hadoop/logs
-  ln -s /data/hadoop/logs /var/log/hadoop
-  chown -R hadoop:hadoop /var/log/hadoop
-
-}
-
-# Sets up small website on cluster.
-function setup_web() {
-
-  if which dpkg &> /dev/null; then
-    apt-get -y install thttpd
-    WWW_BASE=/var/www
-  elif which rpm &> /dev/null; then
-    yum install -y thttpd
-    chkconfig --add thttpd
-    WWW_BASE=/var/www/thttpd/html
-  fi
-
-  cat > $WWW_BASE/index.html << END
-<html>
-<head>
-<title>Hadoop Cloud Cluster</title>
-</head>
-<body>
-<h1>Hadoop Cloud Cluster</h1>
-To browse the cluster you need to have a proxy configured.
-Start the proxy with <tt>hadoop-cloud proxy &lt;cluster_name&gt;</tt>,
-and point your browser to
-<a href="http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac">this Proxy
-Auto-Configuration (PAC)</a> file.  To manage multiple proxy configurations,
-you may wish to use
-<a href="https://addons.mozilla.org/en-US/firefox/addon/2464">FoxyProxy</a>.
-<ul>
-<li><a href="http://$NN_HOST:50070/">NameNode</a>
-<li><a href="http://$JT_HOST:50030/">JobTracker</a>
-</ul>
-</body>
-</html>
-END
-
-  service thttpd start
-
-}
-
-function start_namenode() {
-  if which dpkg &> /dev/null; then
-    AS_HADOOP="su -s /bin/bash - hadoop -c"
-  elif which rpm &> /dev/null; then
-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
-  fi
-
-  # Format HDFS
-  [ ! -e $FIRST_MOUNT/hadoop/hdfs ] && $AS_HADOOP "$HADOOP_HOME/bin/hadoop namenode -format"
-
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start namenode"
-
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop dfsadmin -safemode wait"
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -mkdir /user"
-  # The following is questionable, as it allows a user to delete another user
-  # It's needed to allow users to create their own user directories
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -chmod +w /user"
-
-}
-
-function start_daemon() {
-  if which dpkg &> /dev/null; then
-    AS_HADOOP="su -s /bin/bash - hadoop -c"
-  elif which rpm &> /dev/null; then
-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
-  fi
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start $1"
-}
-
-update_repo
-register_auto_shutdown
-install_user_packages
-install_hadoop
-configure_hadoop
-
-for role in $(echo "$ROLES" | tr "," "\n"); do
-  case $role in
-  nn)
-    setup_web
-    start_namenode
-    ;;
-  snn)
-    start_daemon secondarynamenode
-    ;;
-  jt)
-    start_daemon jobtracker
-    ;;
-  dn)
-    start_daemon datanode
-    ;;
-  tt)
-    start_daemon tasktracker
-    ;;
-  esac
-done
-

+ 0 - 548
src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh

@@ -1,548 +0,0 @@
-#!/bin/bash -x
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-################################################################################
-# Script that is run on each EC2 instance on boot. It is passed in the EC2 user
-# data, so should not exceed 16K in size after gzip compression.
-#
-# This script is executed by /etc/init.d/ec2-run-user-data, and output is
-# logged to /var/log/messages.
-################################################################################
-
-################################################################################
-# Initialize variables
-################################################################################
-
-# Substitute environment variables passed by the client
-export %ENV%
-
-HADOOP_VERSION=${HADOOP_VERSION:-0.20.1}
-HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION
-HADOOP_CONF_DIR=$HADOOP_HOME/conf
-SELF_HOST=`wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname`
-for role in $(echo "$ROLES" | tr "," "\n"); do
-  case $role in
-  nn)
-    NN_HOST=$SELF_HOST
-    ;;
-  jt)
-    JT_HOST=$SELF_HOST
-    ;;
-  esac
-done
-
-function register_auto_shutdown() {
-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
-  fi
-}
-
-# Install a list of packages on debian or redhat as appropriate
-function install_packages() {
-  if which dpkg &> /dev/null; then
-    apt-get update
-    apt-get -y install $@
-  elif which rpm &> /dev/null; then
-    yum install -y $@
-  else
-    echo "No package manager found."
-  fi
-}
-
-# Install any user packages specified in the USER_PACKAGES environment variable
-function install_user_packages() {
-  if [ ! -z "$USER_PACKAGES" ]; then
-    install_packages $USER_PACKAGES
-  fi
-}
-
-function install_hadoop() {
-  useradd hadoop
-
-  hadoop_tar_url=http://s3.amazonaws.com/hadoop-releases/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
-  hadoop_tar_file=`basename $hadoop_tar_url`
-  hadoop_tar_md5_file=`basename $hadoop_tar_url.md5`
-
-  curl="curl --retry 3 --silent --show-error --fail"
-  for i in `seq 1 3`;
-  do
-    $curl -O $hadoop_tar_url
-    $curl -O $hadoop_tar_url.md5
-    if md5sum -c $hadoop_tar_md5_file; then
-      break;
-    else
-      rm -f $hadoop_tar_file $hadoop_tar_md5_file
-    fi
-  done
-
-  if [ ! -e $hadoop_tar_file ]; then
-    echo "Failed to download $hadoop_tar_url. Aborting."
-    exit 1
-  fi
-
-  tar zxf $hadoop_tar_file -C /usr/local
-  rm -f $hadoop_tar_file $hadoop_tar_md5_file
-
-  echo "export HADOOP_HOME=$HADOOP_HOME" >> ~root/.bashrc
-  echo 'export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH' >> ~root/.bashrc
-}
-
-function prep_disk() {
-  mount=$1
-  device=$2
-  automount=${3:-false}
-
-  echo "warning: ERASING CONTENTS OF $device"
-  mkfs.xfs -f $device
-  if [ ! -e $mount ]; then
-    mkdir $mount
-  fi
-  mount -o defaults,noatime $device $mount
-  if $automount ; then
-    echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab
-  fi
-}
-
-function wait_for_mount {
-  mount=$1
-  device=$2
-
-  mkdir $mount
-
-  i=1
-  echo "Attempting to mount $device"
-  while true ; do
-    sleep 10
-    echo -n "$i "
-    i=$[$i+1]
-    mount -o defaults,noatime $device $mount || continue
-    echo " Mounted."
-    break;
-  done
-}
-
-function make_hadoop_dirs {
-  for mount in "$@"; do
-    if [ ! -e $mount/hadoop ]; then
-      mkdir -p $mount/hadoop
-      chown hadoop:hadoop $mount/hadoop
-    fi
-  done
-}
-
-# Configure Hadoop by setting up disks and site file
-function configure_hadoop() {
-
-  install_packages xfsprogs # needed for XFS
-
-  INSTANCE_TYPE=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`
-
-  if [ -n "$EBS_MAPPINGS" ]; then
-    # EBS_MAPPINGS is like "/ebs1,/dev/sdj;/ebs2,/dev/sdk"
-    DFS_NAME_DIR=''
-    FS_CHECKPOINT_DIR=''
-    DFS_DATA_DIR=''
-    for mapping in $(echo "$EBS_MAPPINGS" | tr ";" "\n"); do
-      # Split on the comma (see "Parameter Expansion" in the bash man page)
-      mount=${mapping%,*}
-      device=${mapping#*,}
-      wait_for_mount $mount $device
-      DFS_NAME_DIR=${DFS_NAME_DIR},"$mount/hadoop/hdfs/name"
-      FS_CHECKPOINT_DIR=${FS_CHECKPOINT_DIR},"$mount/hadoop/hdfs/secondary"
-      DFS_DATA_DIR=${DFS_DATA_DIR},"$mount/hadoop/hdfs/data"
-      FIRST_MOUNT=${FIRST_MOUNT-$mount}
-      make_hadoop_dirs $mount
-    done
-    # Remove leading commas
-    DFS_NAME_DIR=${DFS_NAME_DIR#?}
-    FS_CHECKPOINT_DIR=${FS_CHECKPOINT_DIR#?}
-    DFS_DATA_DIR=${DFS_DATA_DIR#?}
-
-    DFS_REPLICATION=3 # EBS is internally replicated, but we also use HDFS replication for safety
-  else
-    case $INSTANCE_TYPE in
-    m1.xlarge|c1.xlarge)
-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name,/mnt2/hadoop/hdfs/name
-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary,/mnt2/hadoop/hdfs/secondary
-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data,/mnt2/hadoop/hdfs/data,/mnt3/hadoop/hdfs/data,/mnt4/hadoop/hdfs/data
-      ;;
-    m1.large)
-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name,/mnt2/hadoop/hdfs/name
-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary,/mnt2/hadoop/hdfs/secondary
-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data,/mnt2/hadoop/hdfs/data
-      ;;
-    *)
-      # "m1.small" or "c1.medium"
-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name
-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary
-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data
-      ;;
-    esac
-    FIRST_MOUNT=/mnt
-    DFS_REPLICATION=3
-  fi
-
-  case $INSTANCE_TYPE in
-  m1.xlarge|c1.xlarge)
-    prep_disk /mnt2 /dev/sdc true &
-    disk2_pid=$!
-    prep_disk /mnt3 /dev/sdd true &
-    disk3_pid=$!
-    prep_disk /mnt4 /dev/sde true &
-    disk4_pid=$!
-    wait $disk2_pid $disk3_pid $disk4_pid
-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local,/mnt2/hadoop/mapred/local,/mnt3/hadoop/mapred/local,/mnt4/hadoop/mapred/local
-    MAX_MAP_TASKS=8
-    MAX_REDUCE_TASKS=4
-    CHILD_OPTS=-Xmx680m
-    CHILD_ULIMIT=1392640
-    ;;
-  m1.large)
-    prep_disk /mnt2 /dev/sdc true
-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local,/mnt2/hadoop/mapred/local
-    MAX_MAP_TASKS=4
-    MAX_REDUCE_TASKS=2
-    CHILD_OPTS=-Xmx1024m
-    CHILD_ULIMIT=2097152
-    ;;
-  c1.medium)
-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local
-    MAX_MAP_TASKS=4
-    MAX_REDUCE_TASKS=2
-    CHILD_OPTS=-Xmx550m
-    CHILD_ULIMIT=1126400
-    ;;
-  *)
-    # "m1.small"
-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local
-    MAX_MAP_TASKS=2
-    MAX_REDUCE_TASKS=1
-    CHILD_OPTS=-Xmx550m
-    CHILD_ULIMIT=1126400
-    ;;
-  esac
-
-  make_hadoop_dirs `ls -d /mnt*`
-
-  # Create tmp directory
-  mkdir /mnt/tmp
-  chmod a+rwxt /mnt/tmp
-  
-  mkdir /etc/hadoop
-  ln -s $HADOOP_CONF_DIR /etc/hadoop/conf
-
-  ##############################################################################
-  # Modify this section to customize your Hadoop cluster.
-  ##############################################################################
-  cat > $HADOOP_CONF_DIR/hadoop-site.xml <<EOF
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-<property>
-  <name>dfs.block.size</name>
-  <value>134217728</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.data.dir</name>
-  <value>$DFS_DATA_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.datanode.du.reserved</name>
-  <value>1073741824</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.datanode.handler.count</name>
-  <value>3</value>
-  <final>true</final>
-</property>
-<!--property>
-  <name>dfs.hosts</name>
-  <value>$HADOOP_CONF_DIR/dfs.hosts</value>
-  <final>true</final>
-</property-->
-<!--property>
-  <name>dfs.hosts.exclude</name>
-  <value>$HADOOP_CONF_DIR/dfs.hosts.exclude</value>
-  <final>true</final>
-</property-->
-<property>
-  <name>dfs.name.dir</name>
-  <value>$DFS_NAME_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.namenode.handler.count</name>
-  <value>5</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.permissions</name>
-  <value>true</value>
-  <final>true</final>
-</property>
-<property>
-  <name>dfs.replication</name>
-  <value>$DFS_REPLICATION</value>
-</property>
-<property>
-  <name>fs.checkpoint.dir</name>
-  <value>$FS_CHECKPOINT_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>fs.default.name</name>
-  <value>hdfs://$NN_HOST:8020/</value>
-</property>
-<property>
-  <name>fs.trash.interval</name>
-  <value>1440</value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.tmp.dir</name>
-  <value>/mnt/tmp/hadoop-\${user.name}</value>
-  <final>true</final>
-</property>
-<property>
-  <name>io.file.buffer.size</name>
-  <value>65536</value>
-</property>
-<property>
-  <name>mapred.child.java.opts</name>
-  <value>$CHILD_OPTS</value>
-</property>
-<property>
-  <name>mapred.child.ulimit</name>
-  <value>$CHILD_ULIMIT</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.job.tracker</name>
-  <value>$JT_HOST:8021</value>
-</property>
-<property>
-  <name>mapred.job.tracker.handler.count</name>
-  <value>5</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.local.dir</name>
-  <value>$MAPRED_LOCAL_DIR</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.map.tasks.speculative.execution</name>
-  <value>true</value>
-</property>
-<property>
-  <name>mapred.reduce.parallel.copies</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.reduce.tasks</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.reduce.tasks.speculative.execution</name>
-  <value>false</value>
-</property>
-<property>
-  <name>mapred.submit.replication</name>
-  <value>10</value>
-</property>
-<property>
-  <name>mapred.system.dir</name>
-  <value>/hadoop/system/mapred</value>
-</property>
-<property>
-  <name>mapred.tasktracker.map.tasks.maximum</name>
-  <value>$MAX_MAP_TASKS</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.tasktracker.reduce.tasks.maximum</name>
-  <value>$MAX_REDUCE_TASKS</value>
-  <final>true</final>
-</property>
-<property>
-  <name>tasktracker.http.threads</name>
-  <value>46</value>
-  <final>true</final>
-</property>
-<property>
-  <name>mapred.compress.map.output</name>
-  <value>true</value>
-</property>
-<property>
-  <name>mapred.output.compression.type</name>
-  <value>BLOCK</value>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.default</name>
-  <value>org.apache.hadoop.net.StandardSocketFactory</value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.ClientProtocol</name>
-  <value></value>
-  <final>true</final>
-</property>
-<property>
-  <name>hadoop.rpc.socket.factory.class.JobSubmissionProtocol</name>
-  <value></value>
-  <final>true</final>
-</property>
-<property>
-  <name>io.compression.codecs</name>
-  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</value>
-</property>
-<property>
-  <name>fs.s3.awsAccessKeyId</name>
-  <value>$AWS_ACCESS_KEY_ID</value>
-</property>
-<property>
-  <name>fs.s3.awsSecretAccessKey</name>
-  <value>$AWS_SECRET_ACCESS_KEY</value>
-</property>
-<property>
-  <name>fs.s3n.awsAccessKeyId</name>
-  <value>$AWS_ACCESS_KEY_ID</value>
-</property>
-<property>
-  <name>fs.s3n.awsSecretAccessKey</name>
-  <value>$AWS_SECRET_ACCESS_KEY</value>
-</property>
-</configuration>
-EOF
-
-  # Keep PID files in a non-temporary directory
-  sed -i -e "s|# export HADOOP_PID_DIR=.*|export HADOOP_PID_DIR=/var/run/hadoop|" \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-  mkdir -p /var/run/hadoop
-  chown -R hadoop:hadoop /var/run/hadoop
-
-  # Set SSH options within the cluster
-  sed -i -e 's|# export HADOOP_SSH_OPTS=.*|export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no"|' \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-
-  # Hadoop logs should be on the /mnt partition
-  sed -i -e 's|# export HADOOP_LOG_DIR=.*|export HADOOP_LOG_DIR=/var/log/hadoop/logs|' \
-    $HADOOP_CONF_DIR/hadoop-env.sh
-  rm -rf /var/log/hadoop
-  mkdir /mnt/hadoop/logs
-  chown hadoop:hadoop /mnt/hadoop/logs
-  ln -s /mnt/hadoop/logs /var/log/hadoop
-  chown -R hadoop:hadoop /var/log/hadoop
-
-}
-
-# Sets up small website on cluster.
-function setup_web() {
-
-  if which dpkg &> /dev/null; then
-    apt-get -y install thttpd
-    WWW_BASE=/var/www
-  elif which rpm &> /dev/null; then
-    yum install -y thttpd
-    chkconfig --add thttpd
-    WWW_BASE=/var/www/thttpd/html
-  fi
-
-  cat > $WWW_BASE/index.html << END
-<html>
-<head>
-<title>Hadoop EC2 Cluster</title>
-</head>
-<body>
-<h1>Hadoop EC2 Cluster</h1>
-To browse the cluster you need to have a proxy configured.
-Start the proxy with <tt>hadoop-ec2 proxy &lt;cluster_name&gt;</tt>,
-and point your browser to
-<a href="http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac">this Proxy
-Auto-Configuration (PAC)</a> file.  To manage multiple proxy configurations,
-you may wish to use
-<a href="https://addons.mozilla.org/en-US/firefox/addon/2464">FoxyProxy</a>.
-<ul>
-<li><a href="http://$NN_HOST:50070/">NameNode</a>
-<li><a href="http://$JT_HOST:50030/">JobTracker</a>
-</ul>
-</body>
-</html>
-END
-
-  service thttpd start
-
-}
-
-function start_namenode() {
-  if which dpkg &> /dev/null; then
-    AS_HADOOP="su -s /bin/bash - hadoop -c"
-  elif which rpm &> /dev/null; then
-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
-  fi
-
-  # Format HDFS
-  [ ! -e $FIRST_MOUNT/hadoop/hdfs ] && $AS_HADOOP "$HADOOP_HOME/bin/hadoop namenode -format"
-
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start namenode"
-
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop dfsadmin -safemode wait"
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -mkdir /user"
-  # The following is questionable, as it allows a user to delete another user
-  # It's needed to allow users to create their own user directories
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -chmod +w /user"
-
-}
-
-function start_daemon() {
-  if which dpkg &> /dev/null; then
-    AS_HADOOP="su -s /bin/bash - hadoop -c"
-  elif which rpm &> /dev/null; then
-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
-  fi
-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start $1"
-}
-
-register_auto_shutdown
-install_user_packages
-install_hadoop
-configure_hadoop
-
-for role in $(echo "$ROLES" | tr "," "\n"); do
-  case $role in
-  nn)
-    setup_web
-    start_namenode
-    ;;
-  snn)
-    start_daemon secondarynamenode
-    ;;
-  jt)
-    start_daemon jobtracker
-    ;;
-  dn)
-    start_daemon datanode
-    ;;
-  tt)
-    start_daemon tasktracker
-    ;;
-  esac
-done
-

+ 0 - 22
src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-rackspace-init-remote.sh

@@ -1,22 +0,0 @@
-#!/bin/bash -ex
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Run a script downloaded at boot time to avoid Rackspace's 10K limitation.
-
-wget -qO/usr/bin/runurl run.alestic.com/runurl
-chmod 755 /usr/bin/runurl
-%ENV% runurl http://hadoop-dev-test.s3.amazonaws.com/boot-rackspace.sh

+ 0 - 112
src/contrib/cloud/src/py/hadoop/cloud/data/zookeeper-ec2-init-remote.sh

@@ -1,112 +0,0 @@
-#!/bin/bash -x
-
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-################################################################################
-# Script that is run on each EC2 instance on boot. It is passed in the EC2 user
-# data, so should not exceed 16K in size after gzip compression.
-#
-# This script is executed by /etc/init.d/ec2-run-user-data, and output is
-# logged to /var/log/messages.
-################################################################################
-
-################################################################################
-# Initialize variables
-################################################################################
-
-# Substitute environment variables passed by the client
-export %ENV%
-
-ZK_VERSION=${ZK_VERSION:-3.2.2}
-ZOOKEEPER_HOME=/usr/local/zookeeper-$ZK_VERSION
-ZK_CONF_DIR=/etc/zookeeper/conf
-
-function register_auto_shutdown() {
-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
-  fi
-}
-
-# Install a list of packages on debian or redhat as appropriate
-function install_packages() {
-  if which dpkg &> /dev/null; then
-    apt-get update
-    apt-get -y install $@
-  elif which rpm &> /dev/null; then
-    yum install -y $@
-  else
-    echo "No package manager found."
-  fi
-}
-
-# Install any user packages specified in the USER_PACKAGES environment variable
-function install_user_packages() {
-  if [ ! -z "$USER_PACKAGES" ]; then
-    install_packages $USER_PACKAGES
-  fi
-}
-
-function install_zookeeper() {
-  zk_tar_url=http://www.apache.org/dist/hadoop/zookeeper/zookeeper-$ZK_VERSION/zookeeper-$ZK_VERSION.tar.gz
-  zk_tar_file=`basename $zk_tar_url`
-  zk_tar_md5_file=`basename $zk_tar_url.md5`
-
-  curl="curl --retry 3 --silent --show-error --fail"
-  for i in `seq 1 3`;
-  do
-    $curl -O $zk_tar_url
-    $curl -O $zk_tar_url.md5
-    if md5sum -c $zk_tar_md5_file; then
-      break;
-    else
-      rm -f $zk_tar_file $zk_tar_md5_file
-    fi
-  done
-
-  if [ ! -e $zk_tar_file ]; then
-    echo "Failed to download $zk_tar_url. Aborting."
-    exit 1
-  fi
-
-  tar zxf $zk_tar_file -C /usr/local
-  rm -f $zk_tar_file $zk_tar_md5_file
-
-  echo "export ZOOKEEPER_HOME=$ZOOKEEPER_HOME" >> ~root/.bashrc
-  echo 'export PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH' >> ~root/.bashrc
-}
-
-function configure_zookeeper() {
-  mkdir -p /mnt/zookeeper/logs
-  ln -s /mnt/zookeeper/logs /var/log/zookeeper
-  mkdir -p /var/log/zookeeper/txlog
-  mkdir -p $ZK_CONF_DIR
-  cp $ZOOKEEPER_HOME/conf/log4j.properties $ZK_CONF_DIR
-
-  sed -i -e "s|log4j.rootLogger=INFO, CONSOLE|log4j.rootLogger=INFO, ROLLINGFILE|" \
-         -e "s|log4j.appender.ROLLINGFILE.File=zookeeper.log|log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log|" \
-      $ZK_CONF_DIR/log4j.properties
-      
-  # Ensure ZooKeeper starts on boot
-  cat > /etc/rc.local <<EOF
-ZOOCFGDIR=$ZK_CONF_DIR $ZOOKEEPER_HOME/bin/zkServer.sh start > /dev/null 2>&1 &
-EOF
-
-}
-
-register_auto_shutdown
-install_user_packages
-install_zookeeper
-configure_zookeeper

+ 0 - 14
src/contrib/cloud/src/py/hadoop/cloud/providers/__init__.py

@@ -1,14 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.

+ 0 - 61
src/contrib/cloud/src/py/hadoop/cloud/providers/dummy.py

@@ -1,61 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-
-from hadoop.cloud.cluster import Cluster
-from hadoop.cloud.cluster import Instance
-
-logger = logging.getLogger(__name__)
-
-class DummyCluster(Cluster):
-
-  @staticmethod
-  def get_clusters_with_role(role, state="running"):
-    logger.info("get_clusters_with_role(%s, %s)", role, state)
-    return ["dummy-cluster"]
-
-  def __init__(self, name, config_dir):
-    super(DummyCluster, self).__init__(name, config_dir)
-    logger.info("__init__(%s, %s)", name, config_dir)
-
-  def get_provider_code(self):
-    return "dummy"
-
-  def authorize_role(self, role, from_port, to_port, cidr_ip):
-    logger.info("authorize_role(%s, %s, %s, %s)", role, from_port, to_port,
-                cidr_ip)
-
-  def get_instances_in_role(self, role, state_filter=None):
-    logger.info("get_instances_in_role(%s, %s)", role, state_filter)
-    return [Instance(1, '127.0.0.1', '127.0.0.1')]
-
-  def print_status(self, roles, state_filter="running"):
-    logger.info("print_status(%s, %s)", roles, state_filter)
-
-  def launch_instances(self, role, number, image_id, size_id,
-                       instance_user_data, **kwargs):
-    logger.info("launch_instances(%s, %s, %s, %s, %s, %s)", role, number,
-                image_id, size_id, instance_user_data, str(kwargs))
-    return [1]
-
-  def wait_for_instances(self, instance_ids, timeout=600):
-    logger.info("wait_for_instances(%s, %s)", instance_ids, timeout)
-
-  def terminate(self):
-    logger.info("terminate")
-
-  def delete(self):
-    logger.info("delete")

+ 0 - 479
src/contrib/cloud/src/py/hadoop/cloud/providers/ec2.py

@@ -1,479 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from boto.ec2.connection import EC2Connection
-from boto.exception import EC2ResponseError
-import logging
-from hadoop.cloud.cluster import Cluster
-from hadoop.cloud.cluster import Instance
-from hadoop.cloud.cluster import RoleSyntaxException
-from hadoop.cloud.cluster import TimeoutException
-from hadoop.cloud.storage import JsonVolumeManager
-from hadoop.cloud.storage import JsonVolumeSpecManager
-from hadoop.cloud.storage import MountableVolume
-from hadoop.cloud.storage import Storage
-from hadoop.cloud.util import xstr
-import os
-import re
-import subprocess
-import sys
-import time
-
-logger = logging.getLogger(__name__)
-
-def _run_command_on_instance(instance, ssh_options, command):
-  print "Running ssh %s root@%s '%s'" % \
-    (ssh_options, instance.public_dns_name, command)
-  retcode = subprocess.call("ssh %s root@%s '%s'" %
-                           (ssh_options, instance.public_dns_name, command),
-                           shell=True)
-  print "Command running on %s returned with value %s" % \
-    (instance.public_dns_name, retcode)
-
-def _wait_for_volume(ec2_connection, volume_id):
-  """
-  Waits until a volume becomes available.
-  """
-  while True:
-    volumes = ec2_connection.get_all_volumes([volume_id,])
-    if volumes[0].status == 'available':
-      break
-    sys.stdout.write(".")
-    sys.stdout.flush()
-    time.sleep(1)
-
-class Ec2Cluster(Cluster):
-  """
-  A cluster of EC2 instances. A cluster has a unique name.
-
-  Instances running in the cluster run in a security group with the cluster's
-  name, and also a name indicating the instance's role, e.g. <cluster-name>-foo
-  to show a "foo" instance.
-  """
-
-  @staticmethod
-  def get_clusters_with_role(role, state="running"):
-    all_instances = EC2Connection().get_all_instances()
-    clusters = []
-    for res in all_instances:
-      instance = res.instances[0]
-      for group in res.groups:
-        if group.id.endswith("-" + role) and instance.state == state:
-          clusters.append(re.sub("-%s$" % re.escape(role), "", group.id))
-    return clusters
-
-  def __init__(self, name, config_dir):
-    super(Ec2Cluster, self).__init__(name, config_dir)
-    self.ec2Connection = EC2Connection()
-
-  def get_provider_code(self):
-    return "ec2"
-
-  def _get_cluster_group_name(self):
-    return self.name
-
-  def _check_role_name(self, role):
-    if not re.match("^[a-zA-Z0-9_+]+$", role):
-      raise RoleSyntaxException("Invalid role name '%s'" % role)
-
-  def _group_name_for_role(self, role):
-    """
-    Return the security group name for an instance in a given role.
-    """
-    self._check_role_name(role)
-    return "%s-%s" % (self.name, role)
-
-  def _get_group_names(self, roles):
-    group_names = [self._get_cluster_group_name()]
-    for role in roles:
-      group_names.append(self._group_name_for_role(role))
-    return group_names
-
-  def _get_all_group_names(self):
-    security_groups = self.ec2Connection.get_all_security_groups()
-    security_group_names = \
-      [security_group.name for security_group in security_groups]
-    return security_group_names
-
-  def _get_all_group_names_for_cluster(self):
-    all_group_names = self._get_all_group_names()
-    r = []
-    if self.name not in all_group_names:
-      return r
-    for group in all_group_names:
-      if re.match("^%s(-[a-zA-Z0-9_+]+)?$" % self.name, group):
-        r.append(group)
-    return r
-
-  def _create_groups(self, role):
-    """
-    Create the security groups for a given role, including a group for the
-    cluster if it doesn't exist.
-    """
-    self._check_role_name(role)
-    security_group_names = self._get_all_group_names()
-
-    cluster_group_name = self._get_cluster_group_name()
-    if not cluster_group_name in security_group_names:
-      self.ec2Connection.create_security_group(cluster_group_name,
-                                               "Cluster (%s)" % (self.name))
-      self.ec2Connection.authorize_security_group(cluster_group_name,
-                                                  cluster_group_name)
-      # Allow SSH from anywhere
-      self.ec2Connection.authorize_security_group(cluster_group_name,
-                                                  ip_protocol="tcp",
-                                                  from_port=22, to_port=22,
-                                                  cidr_ip="0.0.0.0/0")
-
-    role_group_name = self._group_name_for_role(role)
-    if not role_group_name in security_group_names:
-      self.ec2Connection.create_security_group(role_group_name,
-        "Role %s (%s)" % (role, self.name))
-
-  def authorize_role(self, role, from_port, to_port, cidr_ip):
-    """
-    Authorize access to machines in a given role from a given network.
-    """
-    self._check_role_name(role)
-    role_group_name = self._group_name_for_role(role)
-    # Revoke first to avoid InvalidPermission.Duplicate error
-    self.ec2Connection.revoke_security_group(role_group_name,
-                                             ip_protocol="tcp",
-                                             from_port=from_port,
-                                             to_port=to_port, cidr_ip=cidr_ip)
-    self.ec2Connection.authorize_security_group(role_group_name,
-                                                ip_protocol="tcp",
-                                                from_port=from_port,
-                                                to_port=to_port,
-                                                cidr_ip=cidr_ip)
-
-  def _get_instances(self, group_name, state_filter=None):
-    """
-    Get all the instances in a group, filtered by state.
-
-    @param group_name: the name of the group
-    @param state_filter: the state that the instance should be in
-      (e.g. "running"), or None for all states
-    """
-    all_instances = self.ec2Connection.get_all_instances()
-    instances = []
-    for res in all_instances:
-      for group in res.groups:
-        if group.id == group_name:
-          for instance in res.instances:
-            if state_filter == None or instance.state == state_filter:
-              instances.append(instance)
-    return instances
-
-  def get_instances_in_role(self, role, state_filter=None):
-    """
-    Get all the instances in a role, filtered by state.
-
-    @param role: the name of the role
-    @param state_filter: the state that the instance should be in
-      (e.g. "running"), or None for all states
-    """
-    self._check_role_name(role)
-    instances = []
-    for instance in self._get_instances(self._group_name_for_role(role),
-                                        state_filter):
-      instances.append(Instance(instance.id, instance.dns_name,
-                                instance.private_dns_name))
-    return instances
-
-  def _print_instance(self, role, instance):
-    print "\t".join((role, instance.id,
-      instance.image_id,
-      instance.dns_name, instance.private_dns_name,
-      instance.state, xstr(instance.key_name), instance.instance_type,
-      str(instance.launch_time), instance.placement))
-
-  def print_status(self, roles=None, state_filter="running"):
-    """
-    Print the status of instances in the given roles, filtered by state.
-    """
-    if not roles:
-      for instance in self._get_instances(self._get_cluster_group_name(),
-                                          state_filter):
-        self._print_instance("", instance)
-    else:
-      for role in roles:
-        for instance in self._get_instances(self._group_name_for_role(role),
-                                            state_filter):
-          self._print_instance(role, instance)
-
-  def launch_instances(self, roles, number, image_id, size_id,
-                       instance_user_data, **kwargs):
-    for role in roles:
-      self._check_role_name(role)  
-      self._create_groups(role)
-      
-    user_data = instance_user_data.read_as_gzip_stream()
-    security_groups = self._get_group_names(roles) + kwargs.get('security_groups', [])
-
-    reservation = self.ec2Connection.run_instances(image_id, min_count=number,
-      max_count=number, key_name=kwargs.get('key_name', None),
-      security_groups=security_groups, user_data=user_data,
-      instance_type=size_id, placement=kwargs.get('placement', None))
-    return [instance.id for instance in reservation.instances]
-
-  def wait_for_instances(self, instance_ids, timeout=600):
-    start_time = time.time()
-    while True:
-      if (time.time() - start_time >= timeout):
-        raise TimeoutException()
-      try:
-        if self._all_started(self.ec2Connection.get_all_instances(instance_ids)):
-          break
-      # don't timeout for race condition where instance is not yet registered
-      except EC2ResponseError:
-        pass
-      sys.stdout.write(".")
-      sys.stdout.flush()
-      time.sleep(1)
-
-  def _all_started(self, reservations):
-    for res in reservations:
-      for instance in res.instances:
-        if instance.state != "running":
-          return False
-    return True
-
-  def terminate(self):
-    instances = self._get_instances(self._get_cluster_group_name(), "running")
-    if instances:
-      self.ec2Connection.terminate_instances([i.id for i in instances])
-
-  def delete(self):
-    """
-    Delete the security groups for each role in the cluster, and the group for
-    the cluster.
-    """
-    group_names = self._get_all_group_names_for_cluster()
-    for group in group_names:
-      self.ec2Connection.delete_security_group(group)
-
-  def get_storage(self):
-    """
-    Return the external storage for the cluster.
-    """
-    return Ec2Storage(self)
-
-
-class Ec2Storage(Storage):
-  """
-  Storage volumes for an EC2 cluster. The storage is associated with a named
-  cluster. Metadata for the storage volumes is kept in a JSON file on the client
-  machine (in a file called "ec2-storage-<cluster-name>.json" in the
-  configuration directory).
-  """
-
-  @staticmethod
-  def create_formatted_snapshot(cluster, size, availability_zone, image_id,
-                                key_name, ssh_options):
-    """
-    Creates a formatted snapshot of a given size. This saves having to format
-    volumes when they are first attached.
-    """
-    conn = cluster.ec2Connection
-    print "Starting instance"
-    reservation = conn.run_instances(image_id, key_name=key_name,
-                                     placement=availability_zone)
-    instance = reservation.instances[0]
-    try:
-      cluster.wait_for_instances([instance.id,])
-      print "Started instance %s" % instance.id
-    except TimeoutException:
-      print "Timeout"
-      return
-    print
-    print "Waiting 60 seconds before attaching storage"
-    time.sleep(60)
-    # Re-populate instance object since it has more details filled in
-    instance.update()
-
-    print "Creating volume of size %s in %s" % (size, availability_zone)
-    volume = conn.create_volume(size, availability_zone)
-    print "Created volume %s" % volume
-    print "Attaching volume to %s" % instance.id
-    volume.attach(instance.id, '/dev/sdj')
-
-    _run_command_on_instance(instance, ssh_options, """
-      while true ; do
-        echo 'Waiting for /dev/sdj...';
-        if [ -e /dev/sdj ]; then break; fi;
-        sleep 1;
-      done;
-      mkfs.ext3 -F -m 0.5 /dev/sdj
-    """)
-
-    print "Detaching volume"
-    conn.detach_volume(volume.id, instance.id)
-    print "Creating snapshot"
-    snapshot = volume.create_snapshot()
-    print "Created snapshot %s" % snapshot.id
-    _wait_for_volume(conn, volume.id)
-    print
-    print "Deleting volume"
-    volume.delete()
-    print "Deleted volume"
-    print "Stopping instance"
-    terminated = conn.terminate_instances([instance.id,])
-    print "Stopped instance %s" % terminated
-
-  def __init__(self, cluster):
-    super(Ec2Storage, self).__init__(cluster)
-    self.config_dir = cluster.config_dir
-
-  def _get_storage_filename(self):
-    return os.path.join(self.config_dir,
-                        "ec2-storage-%s.json" % (self.cluster.name))
-
-  def create(self, role, number_of_instances, availability_zone, spec_filename):
-    spec_file = open(spec_filename, 'r')
-    volume_spec_manager = JsonVolumeSpecManager(spec_file)
-    volume_manager = JsonVolumeManager(self._get_storage_filename())
-    for dummy in range(number_of_instances):
-      mountable_volumes = []
-      volume_specs = volume_spec_manager.volume_specs_for_role(role)
-      for spec in volume_specs:
-        logger.info("Creating volume of size %s in %s from snapshot %s" % \
-                    (spec.size, availability_zone, spec.snapshot_id))
-        volume = self.cluster.ec2Connection.create_volume(spec.size,
-                                                          availability_zone,
-                                                          spec.snapshot_id)
-        mountable_volumes.append(MountableVolume(volume.id, spec.mount_point,
-                                                 spec.device))
-      volume_manager.add_instance_storage_for_role(role, mountable_volumes)
-
-  def _get_mountable_volumes(self, role):
-    storage_filename = self._get_storage_filename()
-    volume_manager = JsonVolumeManager(storage_filename)
-    return volume_manager.get_instance_storage_for_role(role)
-
-  def get_mappings_string_for_role(self, role):
-    mappings = {}
-    mountable_volumes_list = self._get_mountable_volumes(role)
-    for mountable_volumes in mountable_volumes_list:
-      for mountable_volume in mountable_volumes:
-        mappings[mountable_volume.mount_point] = mountable_volume.device
-    return ";".join(["%s,%s" % (mount_point, device) for (mount_point, device)
-                     in mappings.items()])
-
-  def _has_storage(self, role):
-    return self._get_mountable_volumes(role)
-
-  def has_any_storage(self, roles):
-    for role in roles:
-      if self._has_storage(role):
-        return True
-    return False
-
-  def get_roles(self):
-    storage_filename = self._get_storage_filename()
-    volume_manager = JsonVolumeManager(storage_filename)
-    return volume_manager.get_roles()
-  
-  def _get_ec2_volumes_dict(self, mountable_volumes):
-    volume_ids = [mv.volume_id for mv in sum(mountable_volumes, [])]
-    volumes = self.cluster.ec2Connection.get_all_volumes(volume_ids)
-    volumes_dict = {}
-    for volume in volumes:
-      volumes_dict[volume.id] = volume
-    return volumes_dict
-
-  def _print_volume(self, role, volume):
-    print "\t".join((role, volume.id, str(volume.size),
-                     volume.snapshot_id, volume.availabilityZone,
-                     volume.status, str(volume.create_time),
-                     str(volume.attach_time)))
-
-  def print_status(self, roles=None):
-    if roles == None:
-      storage_filename = self._get_storage_filename()
-      volume_manager = JsonVolumeManager(storage_filename)
-      roles = volume_manager.get_roles()
-    for role in roles:
-      mountable_volumes_list = self._get_mountable_volumes(role)
-      ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
-      for mountable_volumes in mountable_volumes_list:
-        for mountable_volume in mountable_volumes:
-          self._print_volume(role, ec2_volumes[mountable_volume.volume_id])
-
-  def _replace(self, string, replacements):
-    for (match, replacement) in replacements.iteritems():
-      string = string.replace(match, replacement)
-    return string
-
-  def attach(self, role, instances):
-    mountable_volumes_list = self._get_mountable_volumes(role)
-    if not mountable_volumes_list:
-      return
-    ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
-
-    available_mountable_volumes_list = []
-
-    available_instances_dict = {}
-    for instance in instances:
-      available_instances_dict[instance.id] = instance
-
-    # Iterate over mountable_volumes and retain those that are not attached
-    # Also maintain a list of instances that have no attached storage
-    # Note that we do not fill in "holes" (instances that only have some of
-    # their storage attached)
-    for mountable_volumes in mountable_volumes_list:
-      available = True
-      for mountable_volume in mountable_volumes:
-        if ec2_volumes[mountable_volume.volume_id].status != 'available':
-          available = False
-          attach_data = ec2_volumes[mountable_volume.volume_id].attach_data
-          instance_id = attach_data.instance_id
-          if available_instances_dict.has_key(instance_id):
-            del available_instances_dict[instance_id]
-      if available:
-        available_mountable_volumes_list.append(mountable_volumes)
-
-    if len(available_instances_dict) != len(available_mountable_volumes_list):
-      logger.warning("Number of available instances (%s) and volumes (%s) \
-        do not match." \
-        % (len(available_instances_dict),
-           len(available_mountable_volumes_list)))
-
-    for (instance, mountable_volumes) in zip(available_instances_dict.values(),
-                                             available_mountable_volumes_list):
-      print "Attaching storage to %s" % instance.id
-      for mountable_volume in mountable_volumes:
-        volume = ec2_volumes[mountable_volume.volume_id]
-        print "Attaching %s to %s" % (volume.id, instance.id)
-        volume.attach(instance.id, mountable_volume.device)
-
-  def delete(self, roles=[]):
-    storage_filename = self._get_storage_filename()
-    volume_manager = JsonVolumeManager(storage_filename)
-    for role in roles:
-      mountable_volumes_list = volume_manager.get_instance_storage_for_role(role)
-      ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
-      all_available = True
-      for volume in ec2_volumes.itervalues():
-        if volume.status != 'available':
-          all_available = False
-          logger.warning("Volume %s is not available.", volume)
-      if not all_available:
-        logger.warning("Some volumes are still in use for role %s.\
-          Aborting delete.", role)
-        return
-      for volume in ec2_volumes.itervalues():
-        volume.delete()
-      volume_manager.remove_instance_storage_for_role(role)

+ 0 - 239
src/contrib/cloud/src/py/hadoop/cloud/providers/rackspace.py

@@ -1,239 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import with_statement
-
-import base64
-import os
-import subprocess
-import sys
-import time
-import uuid
-
-from hadoop.cloud.cluster import Cluster
-from hadoop.cloud.cluster import Instance
-from hadoop.cloud.cluster import TimeoutException
-from hadoop.cloud.service import HadoopService
-from hadoop.cloud.service import TASKTRACKER
-from libcloud.drivers.rackspace import RackspaceNodeDriver
-from libcloud.base import Node
-from libcloud.base import NodeImage
-
-RACKSPACE_KEY = os.environ['RACKSPACE_KEY']
-RACKSPACE_SECRET = os.environ['RACKSPACE_SECRET']
-
-STATE_MAP = { 'running': 'ACTIVE' }
-STATE_MAP_REVERSED = dict((v, k) for k, v in STATE_MAP.iteritems())
-
-USER_DATA_FILENAME = "/etc/init.d/rackspace-init.sh"
-
-class RackspaceCluster(Cluster):
-  """
-  A cluster of instances running on Rackspace Cloud Servers. A cluster has a
-  unique name, which is stored under the "cluster" metadata key of each server.
-
-  Every instance in the cluster has one or more roles, stored as a
-  comma-separated string under the "roles" metadata key. For example, an instance
-  with roles "foo" and "bar" has a "foo,bar" "roles" key.
-  
-  At boot time two files are injected into an instance's filesystem: the user
-  data file (which is used as a boot script), and the user's public key.
-  """
-  @staticmethod
-  def get_clusters_with_role(role, state="running", driver=None):
-    driver = driver or RackspaceNodeDriver(RACKSPACE_KEY, RACKSPACE_SECRET)
-    all_nodes = RackspaceCluster._list_nodes(driver)
-    clusters = set()
-    for node in all_nodes:
-      try:
-        if node.extra['metadata'].has_key('cluster') and \
-            role in node.extra['metadata']['roles'].split(','):
-          if node.state == STATE_MAP[state]:
-            clusters.add(node.extra['metadata']['cluster'])
-      except KeyError:
-        pass
-    return clusters
-  
-  @staticmethod
-  def _list_nodes(driver, retries=5):
-    attempts = 0
-    while True:
-      try:
-        return driver.list_nodes()
-      except IOError:
-        attempts = attempts + 1
-        if attempts > retries:
-          raise
-        time.sleep(5)
-
-  def __init__(self, name, config_dir, driver=None):
-    super(RackspaceCluster, self).__init__(name, config_dir)
-    self.driver = driver or RackspaceNodeDriver(RACKSPACE_KEY, RACKSPACE_SECRET)
-
-  def get_provider_code(self):
-    return "rackspace"
-  
-  def _get_nodes(self, state_filter=None):
-    all_nodes = RackspaceCluster._list_nodes(self.driver)
-    nodes = []
-    for node in all_nodes:
-      try:
-        if node.extra['metadata']['cluster'] == self.name:
-          if state_filter == None or node.state == STATE_MAP[state_filter]:
-            nodes.append(node)
-      except KeyError:
-        pass
-    return nodes
-
-  def _to_instance(self, node):
-    return Instance(node.id, node.public_ip[0], node.private_ip[0])
-  
-  def _get_nodes_in_role(self, role, state_filter=None):
-    all_nodes = RackspaceCluster._list_nodes(self.driver)
-    nodes = []
-    for node in all_nodes:
-      try:
-        if node.extra['metadata']['cluster'] == self.name and \
-          role in node.extra['metadata']['roles'].split(','):
-          if state_filter == None or node.state == STATE_MAP[state_filter]:
-            nodes.append(node)
-      except KeyError:
-        pass
-    return nodes
-  
-  def get_instances_in_role(self, role, state_filter=None):
-    """
-    Get all the instances in a role, filtered by state.
-
-    @param role: the name of the role
-    @param state_filter: the state that the instance should be in
-      (e.g. "running"), or None for all states
-    """
-    return [self._to_instance(node) for node in \
-            self._get_nodes_in_role(role, state_filter)]
-
-  def _print_node(self, node, out):
-    out.write("\t".join((node.extra['metadata']['roles'], node.id,
-      node.name,
-      self._ip_list_to_string(node.public_ip),
-      self._ip_list_to_string(node.private_ip),
-      STATE_MAP_REVERSED[node.state])))
-    out.write("\n")
-    
-  def _ip_list_to_string(self, ips):
-    if ips is None:
-      return ""
-    return ",".join(ips)
-
-  def print_status(self, roles=None, state_filter="running", out=sys.stdout):
-    if not roles:
-      for node in self._get_nodes(state_filter):
-        self._print_node(node, out)
-    else:
-      for role in roles:
-        for node in self._get_nodes_in_role(role, state_filter):
-          self._print_node(node, out)
-
-  def launch_instances(self, roles, number, image_id, size_id,
-                       instance_user_data, **kwargs):
-    metadata = {"cluster": self.name, "roles": ",".join(roles)}
-    node_ids = []
-    files = { USER_DATA_FILENAME: instance_user_data.read() }
-    if "public_key" in kwargs:
-      files["/root/.ssh/authorized_keys"] = open(kwargs["public_key"]).read()
-    for dummy in range(number):
-      node = self._launch_instance(roles, image_id, size_id, metadata, files)
-      node_ids.append(node.id)
-    return node_ids
-
-  def _launch_instance(self, roles, image_id, size_id, metadata, files):
-    instance_name = "%s-%s" % (self.name, uuid.uuid4().hex[-8:])
-    node = self.driver.create_node(instance_name, self._find_image(image_id),
-                                   self._find_size(size_id), metadata=metadata,
-                                   files=files)
-    return node
-
-  def _find_image(self, image_id):
-    return NodeImage(id=image_id, name=None, driver=None)
-
-  def _find_size(self, size_id):
-    matches = [i for i in self.driver.list_sizes() if i.id == str(size_id)]
-    if len(matches) != 1:
-      return None
-    return matches[0]
-
-  def wait_for_instances(self, instance_ids, timeout=600):
-    start_time = time.time()
-    while True:
-      if (time.time() - start_time >= timeout):
-        raise TimeoutException()
-      try:
-        if self._all_started(instance_ids):
-          break
-      except Exception:
-        pass
-      sys.stdout.write(".")
-      sys.stdout.flush()
-      time.sleep(1)
-
-  def _all_started(self, node_ids):
-    all_nodes = RackspaceCluster._list_nodes(self.driver)
-    node_id_to_node = {}
-    for node in all_nodes:
-      node_id_to_node[node.id] = node
-    for node_id in node_ids:
-      try:
-        if node_id_to_node[node_id].state != STATE_MAP["running"]:
-          return False
-      except KeyError:
-        return False
-    return True
-
-  def terminate(self):
-    nodes = self._get_nodes("running")
-    print nodes
-    for node in nodes:
-      self.driver.destroy_node(node)
-
-class RackspaceHadoopService(HadoopService):
-    
-  def _update_cluster_membership(self, public_key, private_key):
-    """
-    Creates a cluster-wide hosts file and copies it across the cluster.
-    This is a stop gap until DNS is configured on the cluster. 
-    """
-    ssh_options = '-o StrictHostKeyChecking=no'
-
-    time.sleep(30) # wait for SSH daemon to start
-    nodes = self.cluster._get_nodes('running')
-    # create hosts file
-    hosts_file = 'hosts'
-    with open(hosts_file, 'w') as f:
-      f.write("127.0.0.1 localhost localhost.localdomain\n")
-      for node in nodes:
-        f.write(node.public_ip[0] + "\t" + node.name + "\n")
-    # copy to each node in the cluster
-    for node in nodes:
-      self._call('scp -i %s %s %s root@%s:/etc/hosts' \
-                 % (private_key, ssh_options, hosts_file, node.public_ip[0]))
-    os.remove(hosts_file)
-
-  def _call(self, command):
-    print command
-    try:
-      subprocess.call(command, shell=True)
-    except Exception, e:
-      print e
-  

+ 0 - 640
src/contrib/cloud/src/py/hadoop/cloud/service.py

@@ -1,640 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Classes for running services on a cluster.
-"""
-
-from __future__ import with_statement
-
-from hadoop.cloud.cluster import get_cluster
-from hadoop.cloud.cluster import InstanceUserData
-from hadoop.cloud.cluster import TimeoutException
-from hadoop.cloud.providers.ec2 import Ec2Storage
-from hadoop.cloud.util import build_env_string
-from hadoop.cloud.util import url_get
-from hadoop.cloud.util import xstr
-import logging
-import os
-import re
-import socket
-import subprocess
-import sys
-import time
-
-logger = logging.getLogger(__name__)
-
-MASTER = "master"  # Deprecated.
-
-NAMENODE = "nn"
-SECONDARY_NAMENODE = "snn"
-JOBTRACKER = "jt"
-DATANODE = "dn"
-TASKTRACKER = "tt"
-
-class InstanceTemplate(object):
-  """
-  A template for creating server instances in a cluster.
-  """
-  def __init__(self, roles, number, image_id, size_id,
-                     key_name, public_key, private_key,
-                     user_data_file_template=None, placement=None,
-                     user_packages=None, auto_shutdown=None, env_strings=[],
-                     security_groups=[]):
-    self.roles = roles
-    self.number = number
-    self.image_id = image_id
-    self.size_id = size_id
-    self.key_name = key_name
-    self.public_key = public_key
-    self.private_key = private_key
-    self.user_data_file_template = user_data_file_template
-    self.placement = placement
-    self.user_packages = user_packages
-    self.auto_shutdown = auto_shutdown
-    self.env_strings = env_strings
-    self.security_groups = security_groups
-
-  def add_env_strings(self, env_strings):
-    new_env_strings = list(self.env_strings or [])
-    new_env_strings.extend(env_strings)
-    self.env_strings = new_env_strings
-
-
-class Service(object):
-  """
-  A general service that runs on a cluster.
-  """
-  
-  def __init__(self, cluster):
-    self.cluster = cluster
-    
-  def get_service_code(self):
-    """
-    The code that uniquely identifies the service.
-    """
-    raise Exception("Unimplemented")
-    
-  def list_all(self, provider):
-    """
-    Find and print all clusters running this type of service.
-    """
-    raise Exception("Unimplemented")
-
-  def list(self):
-    """
-    Find and print all the instances running in this cluster.
-    """
-    raise Exception("Unimplemented")
-  
-  def launch_master(self, instance_template, config_dir, client_cidr):
-    """
-    Launch a "master" instance.
-    """
-    raise Exception("Unimplemented")
-  
-  def launch_slaves(self, instance_template):
-    """
-    Launch "slave" instance.
-    """
-    raise Exception("Unimplemented")
-  
-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
-    """
-    Launch a cluster of instances.
-    """
-    raise Exception("Unimplemented")
-  
-  def terminate_cluster(self,  force=False):
-    self.cluster.print_status()
-    if not force and not self._prompt("Terminate all instances?"):
-      print "Not terminating cluster."
-    else:
-      print "Terminating cluster"
-      self.cluster.terminate()
-      
-  def delete_cluster(self):
-    self.cluster.delete()
-    
-  def create_formatted_snapshot(self, size, availability_zone,
-                                image_id, key_name, ssh_options):
-    Ec2Storage.create_formatted_snapshot(self.cluster, size,
-                                         availability_zone,
-                                         image_id,
-                                         key_name,
-                                         ssh_options)
-
-  def list_storage(self):
-    storage = self.cluster.get_storage()
-    storage.print_status()
-
-  def create_storage(self, role, number_of_instances,
-                     availability_zone, spec_file):
-    storage = self.cluster.get_storage()
-    storage.create(role, number_of_instances, availability_zone, spec_file)
-    storage.print_status()
-    
-  def attach_storage(self, role):
-    storage = self.cluster.get_storage()
-    storage.attach(role, self.cluster.get_instances_in_role(role, 'running'))
-    storage.print_status()
-    
-  def delete_storage(self, force=False):
-    storage = self.cluster.get_storage()
-    storage.print_status()
-    if not force and not self._prompt("Delete all storage volumes? THIS WILL \
-      PERMANENTLY DELETE ALL DATA"):
-      print "Not deleting storage volumes."
-    else:
-      print "Deleting storage"
-      for role in storage.get_roles():
-        storage.delete(role)
-  
-  def login(self, ssh_options):
-    raise Exception("Unimplemented")
-    
-  def proxy(self, ssh_options):
-    raise Exception("Unimplemented")
-    
-  def push(self, ssh_options, file):
-    raise Exception("Unimplemented")
-    
-  def execute(self, ssh_options, args):
-    raise Exception("Unimplemented")
-  
-  def update_slaves_file(self, config_dir, ssh_options, private_key):
-    raise Exception("Unimplemented")
-  
-  def _prompt(self, prompt):
-    """ Returns true if user responds "yes" to prompt. """
-    return raw_input("%s [yes or no]: " % prompt).lower() == "yes"
-
-  def _call(self, command):
-    print command
-    try:
-      subprocess.call(command, shell=True)
-    except Exception, e:
-      print e
-
-  def _get_default_user_data_file_template(self):
-    data_path = os.path.join(os.path.dirname(__file__), 'data')
-    return os.path.join(data_path, '%s-%s-init-remote.sh' %
-                 (self.get_service_code(), self.cluster.get_provider_code()))
-
-  def _launch_instances(self, instance_template):
-    it = instance_template
-    user_data_file_template = it.user_data_file_template
-    if it.user_data_file_template == None:
-      user_data_file_template = self._get_default_user_data_file_template()
-    ebs_mappings = ''
-    storage = self.cluster.get_storage()
-    for role in it.roles:
-      if storage.has_any_storage((role,)):
-        ebs_mappings = storage.get_mappings_string_for_role(role)
-    replacements = { "%ENV%": build_env_string(it.env_strings, {
-      "ROLES": ",".join(it.roles),
-      "USER_PACKAGES": it.user_packages,
-      "AUTO_SHUTDOWN": it.auto_shutdown,
-      "EBS_MAPPINGS": ebs_mappings,
-    }) }
-    instance_user_data = InstanceUserData(user_data_file_template, replacements)
-    instance_ids = self.cluster.launch_instances(it.roles, it.number, it.image_id,
-                                            it.size_id,
-                                            instance_user_data,
-                                            key_name=it.key_name,
-                                            public_key=it.public_key,
-                                            placement=it.placement)
-    print "Waiting for %s instances in role %s to start" % \
-      (it.number, ",".join(it.roles))
-    try:
-      self.cluster.wait_for_instances(instance_ids)
-      print "%s instances started" % ",".join(it.roles)
-    except TimeoutException:
-      print "Timeout while waiting for %s instance to start." % ",".join(it.roles)
-      return
-    print
-    self.cluster.print_status(it.roles[0])
-    return self.cluster.get_instances_in_role(it.roles[0], "running")
-
-  
-class HadoopService(Service):
-  """
-  A HDFS and MapReduce service.
-  """
-  
-  def __init__(self, cluster):
-    super(HadoopService, self).__init__(cluster)
-    
-  def get_service_code(self):
-    return "hadoop"
-    
-  def list_all(self, provider):
-    """
-    Find and print clusters that have a running namenode instances
-    """
-    legacy_clusters = get_cluster(provider).get_clusters_with_role(MASTER)
-    clusters = list(get_cluster(provider).get_clusters_with_role(NAMENODE))
-    clusters.extend(legacy_clusters)
-    if not clusters:
-      print "No running clusters"
-    else:
-      for cluster in clusters:
-        print cluster
-    
-  def list(self):
-    self.cluster.print_status()
-
-  def launch_master(self, instance_template, config_dir, client_cidr):
-    if self.cluster.check_running(NAMENODE, 0) == False:
-      return  # don't proceed if another master is running
-    self.launch_cluster((instance_template,), config_dir, client_cidr)
-  
-  def launch_slaves(self, instance_template):
-    instances = self.cluster.check_running(NAMENODE, 1)
-    if not instances:
-      return
-    master = instances[0]
-    for role in (NAMENODE, SECONDARY_NAMENODE, JOBTRACKER): 
-      singleton_host_env = "%s_HOST=%s" % \
-              (self._sanitize_role_name(role), master.public_ip)
-      instance_template.add_env_strings((singleton_host_env))
-    self._launch_instances(instance_template)              
-    self._attach_storage(instance_template.roles)
-    self._print_master_url()
-      
-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
-    number_of_tasktrackers = 0
-    roles = []
-    for it in instance_templates:
-      roles.extend(it.roles)
-      if TASKTRACKER in it.roles:
-        number_of_tasktrackers += it.number
-    self._launch_cluster_instances(instance_templates)
-    self._create_client_hadoop_site_file(config_dir)
-    self._authorize_client_ports(client_cidr)
-    self._attach_storage(roles)
-    self._update_cluster_membership(instance_templates[0].public_key,
-                                    instance_templates[0].private_key)
-    try:
-      self._wait_for_hadoop(number_of_tasktrackers)
-    except TimeoutException:
-      print "Timeout while waiting for Hadoop to start. Please check logs on" +\
-        " cluster."
-    self._print_master_url()
-    
-  def login(self, ssh_options):
-    master = self._get_master()
-    if not master:
-      sys.exit(1)
-    subprocess.call('ssh %s root@%s' % \
-                    (xstr(ssh_options), master.public_ip),
-                    shell=True)
-    
-  def proxy(self, ssh_options):
-    master = self._get_master()
-    if not master:
-      sys.exit(1)
-    options = '-o "ConnectTimeout 10" -o "ServerAliveInterval 60" ' \
-              '-N -D 6666'
-    process = subprocess.Popen('ssh %s %s root@%s' %
-      (xstr(ssh_options), options, master.public_ip),
-      stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
-      shell=True)
-    print """export HADOOP_CLOUD_PROXY_PID=%s;
-echo Proxy pid %s;""" % (process.pid, process.pid)
-    
-  def push(self, ssh_options, file):
-    master = self._get_master()
-    if not master:
-      sys.exit(1)
-    subprocess.call('scp %s -r %s root@%s:' % (xstr(ssh_options),
-                                               file, master.public_ip),
-                                               shell=True)
-    
-  def execute(self, ssh_options, args):
-    master = self._get_master()
-    if not master:
-      sys.exit(1)
-    subprocess.call("ssh %s root@%s '%s'" % (xstr(ssh_options),
-                                             master.public_ip,
-                                             " ".join(args)), shell=True)
-  
-  def update_slaves_file(self, config_dir, ssh_options, private_key):
-    instances = self.cluster.check_running(NAMENODE, 1)
-    if not instances:
-      sys.exit(1)
-    master = instances[0]
-    slaves = self.cluster.get_instances_in_role(DATANODE, "running")
-    cluster_dir = os.path.join(config_dir, self.cluster.name)
-    slaves_file = os.path.join(cluster_dir, 'slaves')
-    with open(slaves_file, 'w') as f:
-      for slave in slaves:
-        f.write(slave.public_ip + "\n")
-    subprocess.call('scp %s -r %s root@%s:/etc/hadoop/conf' % \
-                    (ssh_options, slaves_file, master.public_ip), shell=True)
-    # Copy private key
-    subprocess.call('scp %s -r %s root@%s:/root/.ssh/id_rsa' % \
-                    (ssh_options, private_key, master.public_ip), shell=True)
-    for slave in slaves:
-      subprocess.call('scp %s -r %s root@%s:/root/.ssh/id_rsa' % \
-                      (ssh_options, private_key, slave.public_ip), shell=True)
-        
-  def _get_master(self):
-    # For split namenode/jobtracker, designate the namenode as the master
-    return self._get_namenode()
-
-  def _get_namenode(self):
-    instances = self.cluster.get_instances_in_role(NAMENODE, "running")
-    if not instances:
-      return None
-    return instances[0]
-
-  def _get_jobtracker(self):
-    instances = self.cluster.get_instances_in_role(JOBTRACKER, "running")
-    if not instances:
-      return None
-    return instances[0]
-
-  def _launch_cluster_instances(self, instance_templates):
-    singleton_hosts = []
-    for instance_template in instance_templates:
-      instance_template.add_env_strings(singleton_hosts)
-      instances = self._launch_instances(instance_template)
-      if instance_template.number == 1:
-        if len(instances) != 1:
-          logger.error("Expected a single '%s' instance, but found %s.",
-                       "".join(instance_template.roles), len(instances))
-          return
-        else:
-          for role in instance_template.roles:
-            singleton_host_env = "%s_HOST=%s" % \
-              (self._sanitize_role_name(role),
-               instances[0].public_ip)
-            singleton_hosts.append(singleton_host_env)
-
-  def _sanitize_role_name(self, role):
-    """Replace characters in role name with ones allowed in bash variable names"""
-    return role.replace('+', '_').upper()
-
-  def _authorize_client_ports(self, client_cidrs=[]):
-    if not client_cidrs:
-      logger.debug("No client CIDRs specified, using local address.")
-      client_ip = url_get('http://checkip.amazonaws.com/').strip()
-      client_cidrs = ("%s/32" % client_ip,)
-    logger.debug("Client CIDRs: %s", client_cidrs)
-    namenode = self._get_namenode()
-    jobtracker = self._get_jobtracker()
-    for client_cidr in client_cidrs:
-      # Allow access to port 80 on namenode from client
-      self.cluster.authorize_role(NAMENODE, 80, 80, client_cidr)
-      # Allow access to jobtracker UI on master from client
-      # (so we can see when the cluster is ready)
-      self.cluster.authorize_role(JOBTRACKER, 50030, 50030, client_cidr)
-    # Allow access to namenode and jobtracker via public address from each other
-    namenode_ip = socket.gethostbyname(namenode.public_ip)
-    jobtracker_ip = socket.gethostbyname(jobtracker.public_ip)
-    self.cluster.authorize_role(NAMENODE, 8020, 8020, "%s/32" % namenode_ip)
-    self.cluster.authorize_role(NAMENODE, 8020, 8020, "%s/32" % jobtracker_ip)
-    self.cluster.authorize_role(JOBTRACKER, 8021, 8021, "%s/32" % namenode_ip)
-    self.cluster.authorize_role(JOBTRACKER, 8021, 8021,
-                                "%s/32" % jobtracker_ip)
-  
-  def _create_client_hadoop_site_file(self, config_dir):
-    namenode = self._get_namenode()
-    jobtracker = self._get_jobtracker()
-    cluster_dir = os.path.join(config_dir, self.cluster.name)
-    aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID') or ''
-    aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY') or ''
-    if not os.path.exists(cluster_dir):
-      os.makedirs(cluster_dir)
-    with open(os.path.join(cluster_dir, 'hadoop-site.xml'), 'w') as f:
-      f.write("""<?xml version="1.0"?>
-  <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-  <!-- Put site-specific property overrides in this file. -->
-  <configuration>
-  <property>
-    <name>hadoop.job.ugi</name>
-    <value>root,root</value>
-  </property>
-  <property>
-    <name>fs.default.name</name>
-    <value>hdfs://%(namenode)s:8020/</value>
-  </property>
-  <property>
-    <name>mapred.job.tracker</name>
-    <value>%(jobtracker)s:8021</value>
-  </property>
-  <property>
-    <name>hadoop.socks.server</name>
-    <value>localhost:6666</value>
-  </property>
-  <property>
-    <name>hadoop.rpc.socket.factory.class.default</name>
-    <value>org.apache.hadoop.net.SocksSocketFactory</value>
-  </property>
-  <property>
-    <name>fs.s3.awsAccessKeyId</name>
-    <value>%(aws_access_key_id)s</value>
-  </property>
-  <property>
-    <name>fs.s3.awsSecretAccessKey</name>
-    <value>%(aws_secret_access_key)s</value>
-  </property>
-  <property>
-    <name>fs.s3n.awsAccessKeyId</name>
-    <value>%(aws_access_key_id)s</value>
-  </property>
-  <property>
-    <name>fs.s3n.awsSecretAccessKey</name>
-    <value>%(aws_secret_access_key)s</value>
-  </property>
-  </configuration>
-  """ % {'namenode': namenode.public_ip,
-    'jobtracker': jobtracker.public_ip,
-    'aws_access_key_id': aws_access_key_id,
-    'aws_secret_access_key': aws_secret_access_key})        
-
-  def _wait_for_hadoop(self, number, timeout=600):
-    start_time = time.time()
-    jobtracker = self._get_jobtracker()
-    if not jobtracker:
-      return
-    print "Waiting for jobtracker to start"
-    previous_running = 0
-    while True:
-      if (time.time() - start_time >= timeout):
-        raise TimeoutException()
-      try:
-        actual_running = self._number_of_tasktrackers(jobtracker.public_ip, 1)
-        break
-      except IOError:
-        pass
-      sys.stdout.write(".")
-      sys.stdout.flush()
-      time.sleep(1)
-    print
-    if number > 0:
-      print "Waiting for %d tasktrackers to start" % number
-      while actual_running < number:
-        if (time.time() - start_time >= timeout):
-          raise TimeoutException()
-        try:
-          actual_running = self._number_of_tasktrackers(jobtracker.public_ip, 5, 2)
-          if actual_running != previous_running:
-            sys.stdout.write("%d" % actual_running)
-          sys.stdout.write(".")
-          sys.stdout.flush()
-          time.sleep(1)
-          previous_running = actual_running
-        except IOError:
-          pass
-      print
-
-  # The optional ?type=active is a difference between Hadoop 0.18 and 0.20
-  _NUMBER_OF_TASK_TRACKERS = re.compile(
-    r'<a href="machines.jsp(?:\?type=active)?">(\d+)</a>')
-  
-  def _number_of_tasktrackers(self, jt_hostname, timeout, retries=0):
-    jt_page = url_get("http://%s:50030/jobtracker.jsp" % jt_hostname, timeout,
-                      retries)
-    m = self._NUMBER_OF_TASK_TRACKERS.search(jt_page)
-    if m:
-      return int(m.group(1))
-    return 0
-
-  def _print_master_url(self):
-    webserver = self._get_jobtracker()
-    if not webserver:
-      return
-    print "Browse the cluster at http://%s/" % webserver.public_ip
-
-  def _attach_storage(self, roles):
-    storage = self.cluster.get_storage()
-    if storage.has_any_storage(roles):
-      print "Waiting 10 seconds before attaching storage"
-      time.sleep(10)
-      for role in roles:
-        storage.attach(role, self.cluster.get_instances_in_role(role, 'running'))
-      storage.print_status(roles)
-      
-  def _update_cluster_membership(self, public_key, private_key):
-    pass
-
-
-class ZooKeeperService(Service):
-  """
-  A ZooKeeper service.
-  """
-
-  ZOOKEEPER_ROLE = "zk"
-
-  def __init__(self, cluster):
-    super(ZooKeeperService, self).__init__(cluster)
-    
-  def get_service_code(self):
-    return "zookeeper"
-
-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
-    self._launch_cluster_instances(instance_templates)
-    self._authorize_client_ports(client_cidr)
-    self._update_cluster_membership(instance_templates[0].public_key)
-    
-  def _launch_cluster_instances(self, instance_templates):
-    for instance_template in instance_templates:
-      instances = self._launch_instances(instance_template)
-
-  def _authorize_client_ports(self, client_cidrs=[]):
-    if not client_cidrs:
-      logger.debug("No client CIDRs specified, using local address.")
-      client_ip = url_get('http://checkip.amazonaws.com/').strip()
-      client_cidrs = ("%s/32" % client_ip,)
-    logger.debug("Client CIDRs: %s", client_cidrs)
-    for client_cidr in client_cidrs:
-      self.cluster.authorize_role(self.ZOOKEEPER_ROLE, 2181, 2181, client_cidr)
-  
-  def _update_cluster_membership(self, public_key):
-    time.sleep(30) # wait for SSH daemon to start
-    
-    ssh_options = '-o StrictHostKeyChecking=no'
-    private_key = public_key[:-4] # TODO: pass in private key explicitly
-
-    instances = self.cluster.get_instances_in_role(self.ZOOKEEPER_ROLE,
-                                                   'running')
-    config_file = 'zoo.cfg'
-    with open(config_file, 'w') as f:
-      f.write("""# The number of milliseconds of each tick
-tickTime=2000
-# The number of ticks that the initial
-# synchronization phase can take
-initLimit=10
-# The number of ticks that can pass between
-# sending a request and getting an acknowledgement
-syncLimit=5
-# The directory where the snapshot is stored.
-dataDir=/var/log/zookeeper/txlog
-# The port at which the clients will connect
-clientPort=2181
-# The servers in the ensemble
-""")
-      counter = 1
-      for i in instances:
-        f.write("server.%s=%s:2888:3888\n" % (counter, i.private_ip))
-        counter += 1
-    # copy to each node in the cluster
-    myid_file = 'myid'
-    counter = 1
-    for i in instances:
-      self._call('scp -i %s %s %s root@%s:/etc/zookeeper/conf/zoo.cfg' \
-                 % (private_key, ssh_options, config_file, i.public_ip))
-      with open(myid_file, 'w') as f:
-        f.write(str(counter) + "\n")
-      self._call('scp -i %s %s %s root@%s:/var/log/zookeeper/txlog/myid' \
-                 % (private_key, ssh_options, myid_file, i.public_ip))
-      counter += 1
-    os.remove(config_file)
-    os.remove(myid_file)
-
-    # start the zookeeper servers
-    for i in instances:
-      self._call('ssh -i %s %s root@%s nohup /etc/rc.local &' \
-                 % (private_key, ssh_options, i.public_ip))
-      
-    hosts_string = ",".join(["%s:2181" % i.public_ip for i in instances]) 
-    print "ZooKeeper cluster: %s" % hosts_string
-
-SERVICE_PROVIDER_MAP = {
-  "hadoop": {
-     "rackspace": ('hadoop.cloud.providers.rackspace', 'RackspaceHadoopService')
-  },
-  "zookeeper": {
-    # "provider_code": ('hadoop.cloud.providers.provider_code', 'ProviderZooKeeperService')
-  },
-}
-
-DEFAULT_SERVICE_PROVIDER_MAP = {
-  "hadoop": HadoopService,
-  "zookeeper": ZooKeeperService
-}
-
-def get_service(service, provider):
-  """
-  Retrieve the Service class for a service and provider.
-  """
-  try:
-    mod_name, service_classname = SERVICE_PROVIDER_MAP[service][provider]
-    _mod = __import__(mod_name, globals(), locals(), [service_classname])
-    return getattr(_mod, service_classname)
-  except KeyError:
-    return DEFAULT_SERVICE_PROVIDER_MAP[service]

+ 0 - 173
src/contrib/cloud/src/py/hadoop/cloud/storage.py

@@ -1,173 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Classes for controlling external cluster storage.
-"""
-
-import logging
-import simplejson as json
-
-logger = logging.getLogger(__name__)
-
-class VolumeSpec(object):
-  """
-  The specification for a storage volume, encapsulating all the information
-  needed to create a volume and ultimately mount it on an instance.
-  """
-  def __init__(self, size, mount_point, device, snapshot_id):
-    self.size = size
-    self.mount_point = mount_point
-    self.device = device
-    self.snapshot_id = snapshot_id
-
-
-class JsonVolumeSpecManager(object):
-  """
-  A container for VolumeSpecs. This object can read VolumeSpecs specified in
-  JSON.
-  """
-  def __init__(self, spec_file):
-    self.spec = json.load(spec_file)
-
-  def volume_specs_for_role(self, role):
-    return [VolumeSpec(d["size_gb"], d["mount_point"], d["device"],
-                       d["snapshot_id"]) for d in self.spec[role]]
-
-  def get_mappings_string_for_role(self, role):
-    """
-    Returns a short string of the form
-    "mount_point1,device1;mount_point2,device2;..."
-    which is useful for passing as an environment variable.
-    """
-    return ";".join(["%s,%s" % (d["mount_point"], d["device"])
-                     for d in self.spec[role]])
-
-
-class MountableVolume(object):
-  """
-  A storage volume that has been created. It may or may not have been attached
-  or mounted to an instance.
-  """
-  def __init__(self, volume_id, mount_point, device):
-    self.volume_id = volume_id
-    self.mount_point = mount_point
-    self.device = device
-
-
-class JsonVolumeManager(object):
-
-  def __init__(self, filename):
-    self.filename = filename
-
-  def _load(self):
-    try:
-      return json.load(open(self.filename, "r"))
-    except IOError:
-      logger.debug("File %s does not exist.", self.filename)
-      return {}
-
-  def _store(self, obj):
-    return json.dump(obj, open(self.filename, "w"), sort_keys=True, indent=2)
-  
-  def get_roles(self):
-    json_dict = self._load()
-    return json_dict.keys()
-
-  def add_instance_storage_for_role(self, role, mountable_volumes):
-    json_dict = self._load()
-    mv_dicts = [mv.__dict__ for mv in mountable_volumes]
-    json_dict.setdefault(role, []).append(mv_dicts)
-    self._store(json_dict)
-
-  def remove_instance_storage_for_role(self, role):
-    json_dict = self._load()
-    del json_dict[role]
-    self._store(json_dict)
-
-  def get_instance_storage_for_role(self, role):
-    """
-    Returns a list of lists of MountableVolume objects. Each nested list is
-    the storage for one instance.
-    """
-    try:
-      json_dict = self._load()
-      instance_storage = []
-      for instance in json_dict[role]:
-        vols = []
-        for vol in instance:
-          vols.append(MountableVolume(vol["volume_id"], vol["mount_point"],
-                                      vol["device"]))
-        instance_storage.append(vols)
-      return instance_storage
-    except KeyError:
-      return []
-
-class Storage(object):
-  """
-  Storage volumes for a cluster. The storage is associated with a named
-  cluster. Many clusters just have local storage, in which case this is
-  not used.
-  """
-
-  def __init__(self, cluster):
-    self.cluster = cluster
-
-  def create(self, role, number_of_instances, availability_zone, spec_filename):
-    """
-    Create new storage volumes for instances with the given role, according to
-    the mapping defined in the spec file.
-    """
-    pass
-
-  def get_mappings_string_for_role(self, role):
-    """
-    Returns a short string of the form
-    "mount_point1,device1;mount_point2,device2;..."
-    which is useful for passing as an environment variable.
-    """
-    raise Exception("Unimplemented")
-
-  def has_any_storage(self, roles):
-    """
-    Return True if any of the given roles has associated storage
-    """
-    return False
-
-  def get_roles(self):
-    """
-    Return a list of roles that have storage defined.
-    """
-    return []
-
-  def print_status(self, roles=None):
-    """
-    Print the status of storage volumes for the given roles.
-    """
-    pass
-
-  def attach(self, role, instances):
-    """
-    Attach volumes for a role to instances. Some volumes may already be
-    attached, in which case they are ignored, and we take care not to attach
-    multiple volumes to an instance.
-    """
-    pass
-
-  def delete(self, roles=[]):
-    """
-    Permanently delete all the storage for the given roles.
-    """
-    pass

+ 0 - 84
src/contrib/cloud/src/py/hadoop/cloud/util.py

@@ -1,84 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Utility functions.
-"""
-
-import ConfigParser
-import socket
-import urllib2
-
-def bash_quote(text):
-  """Quotes a string for bash, by using single quotes."""
-  if text == None:
-    return ""
-  return "'%s'" % text.replace("'", "'\\''")
-
-def bash_quote_env(env):
-  """Quotes the value in an environment variable assignment."""
-  if env.find("=") == -1:
-    return env
-  (var, value) = env.split("=")
-  return "%s=%s" % (var, bash_quote(value))
-
-def build_env_string(env_strings=[], pairs={}):
-  """Build a bash environment variable assignment"""
-  env = ''
-  if env_strings:
-    for env_string in env_strings:
-      env += "%s " % bash_quote_env(env_string)
-  if pairs:
-    for key, val in pairs.items():
-      env += "%s=%s " % (key, bash_quote(val))
-  return env[:-1]
-
-def merge_config_with_options(section_name, config, options):
-  """
-  Merge configuration options with a dictionary of options.
-  Keys in the options dictionary take precedence.
-  """
-  res = {}
-  try:
-    for (key, value) in config.items(section_name):
-      if value.find("\n") != -1:
-        res[key] = value.split("\n")
-      else:
-        res[key] = value
-  except ConfigParser.NoSectionError:
-    pass
-  for key in options:
-    if options[key] != None:
-      res[key] = options[key]
-  return res
-
-def url_get(url, timeout=10, retries=0):
-  """
-  Retrieve content from the given URL.
-  """
-   # in Python 2.6 we can pass timeout to urllib2.urlopen
-  socket.setdefaulttimeout(timeout)
-  attempts = 0
-  while True:
-    try:
-      return urllib2.urlopen(url).read()
-    except urllib2.URLError:
-      attempts = attempts + 1
-      if attempts > retries:
-        raise
-
-def xstr(string):
-  """Sane string conversion: return an empty string if string is None."""
-  return '' if string is None else str(string)

+ 0 - 30
src/contrib/cloud/src/py/setup.py

@@ -1,30 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from distutils.core import setup
-
-version = __import__('hadoop.cloud').cloud.VERSION
-
-setup(name='hadoop-cloud',
-      version=version,
-      description='Scripts for running Hadoop on cloud providers',
-      license = 'Apache License (2.0)',
-      url = 'http://hadoop.apache.org/common/',
-      packages=['hadoop', 'hadoop.cloud','hadoop.cloud.providers'],
-      package_data={'hadoop.cloud': ['data/*.sh']},
-      scripts=['hadoop-ec2'],
-      author = 'Apache Hadoop Contributors',
-      author_email = 'common-dev@hadoop.apache.org',
-)

+ 0 - 37
src/contrib/cloud/src/test/py/testcluster.py

@@ -1,37 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-
-from hadoop.cloud.cluster import RoleSyntaxException
-from hadoop.cloud.providers.ec2 import Ec2Cluster
-
-class TestCluster(unittest.TestCase):
-
-  def test_group_name_for_role(self):
-    cluster = Ec2Cluster("test-cluster", None)
-    self.assertEqual("test-cluster-foo", cluster._group_name_for_role("foo"))
-
-  def test_check_role_name_valid(self):
-    cluster = Ec2Cluster("test-cluster", None)
-    cluster._check_role_name(
-      "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_+")
-
-  def test_check_role_name_dash_is_invalid(self):
-    cluster = Ec2Cluster("test-cluster", None)
-    self.assertRaises(RoleSyntaxException, cluster._check_role_name, "a-b")
-
-if __name__ == '__main__':
-  unittest.main()

+ 0 - 74
src/contrib/cloud/src/test/py/testrackspace.py

@@ -1,74 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import StringIO
-import unittest
-
-from hadoop.cloud.providers.rackspace import RackspaceCluster
-
-class TestCluster(unittest.TestCase):
-
-  class DriverStub(object):
-    def list_nodes(self):
-      class NodeStub(object):
-        def __init__(self, name, metadata):
-          self.id = name
-          self.name = name
-          self.state = 'ACTIVE'
-          self.public_ip = ['100.0.0.1']
-          self.private_ip = ['10.0.0.1']
-          self.extra = { 'metadata': metadata }
-      return [NodeStub('random_instance', {}),
-              NodeStub('cluster1-nj-000', {'cluster': 'cluster1', 'roles': 'nn,jt'}),
-              NodeStub('cluster1-dt-000', {'cluster': 'cluster1', 'roles': 'dn,tt'}),
-              NodeStub('cluster1-dt-001', {'cluster': 'cluster1', 'roles': 'dn,tt'}),
-              NodeStub('cluster2-dt-000', {'cluster': 'cluster2', 'roles': 'dn,tt'}),
-              NodeStub('cluster3-nj-000', {'cluster': 'cluster3', 'roles': 'nn,jt'})]
-
-  def test_get_clusters_with_role(self):
-    self.assertEqual(set(['cluster1', 'cluster2']),
-      RackspaceCluster.get_clusters_with_role('dn', 'running',
-                                           TestCluster.DriverStub()))
-    
-  def test_get_instances_in_role(self):
-    cluster = RackspaceCluster('cluster1', None, TestCluster.DriverStub())
-    
-    instances = cluster.get_instances_in_role('nn')
-    self.assertEquals(1, len(instances))
-    self.assertEquals('cluster1-nj-000', instances[0].id)
-
-    instances = cluster.get_instances_in_role('tt')
-    self.assertEquals(2, len(instances))
-    self.assertEquals(set(['cluster1-dt-000', 'cluster1-dt-001']),
-                      set([i.id for i in instances]))
-    
-  def test_print_status(self):
-    cluster = RackspaceCluster('cluster1', None, TestCluster.DriverStub())
-    
-    out = StringIO.StringIO()
-    cluster.print_status(None, "running", out)
-    self.assertEquals("""nn,jt cluster1-nj-000 cluster1-nj-000 100.0.0.1 10.0.0.1 running
-dn,tt cluster1-dt-000 cluster1-dt-000 100.0.0.1 10.0.0.1 running
-dn,tt cluster1-dt-001 cluster1-dt-001 100.0.0.1 10.0.0.1 running
-""", out.getvalue().replace("\t", " "))
-
-    out = StringIO.StringIO()
-    cluster.print_status(["dn"], "running", out)
-    self.assertEquals("""dn,tt cluster1-dt-000 cluster1-dt-000 100.0.0.1 10.0.0.1 running
-dn,tt cluster1-dt-001 cluster1-dt-001 100.0.0.1 10.0.0.1 running
-""", out.getvalue().replace("\t", " "))
-
-if __name__ == '__main__':
-  unittest.main()

+ 0 - 143
src/contrib/cloud/src/test/py/teststorage.py

@@ -1,143 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import unittest
-
-import simplejson as json
-from StringIO import StringIO
-
-from hadoop.cloud.storage import MountableVolume
-from hadoop.cloud.storage import JsonVolumeManager
-from hadoop.cloud.storage import JsonVolumeSpecManager
-
-spec = {
- "master": ({"size_gb":"8", "mount_point":"/", "device":"/dev/sdj",
-             "snapshot_id": "snap_1"},
-            ),
- "slave": ({"size_gb":"8", "mount_point":"/", "device":"/dev/sdj",
-            "snapshot_id": "snap_2"},
-           {"size_gb":"10", "mount_point":"/data1", "device":"/dev/sdk",
-            "snapshot_id": "snap_3"},
-           )
- }
-
-class TestJsonVolumeSpecManager(unittest.TestCase):
-
-  def test_volume_specs_for_role(self):
-
-    input = StringIO(json.dumps(spec))
-
-    volume_spec_manager = JsonVolumeSpecManager(input)
-
-    master_specs = volume_spec_manager.volume_specs_for_role("master")
-    self.assertEqual(1, len(master_specs))
-    self.assertEqual("/", master_specs[0].mount_point)
-    self.assertEqual("8", master_specs[0].size)
-    self.assertEqual("/dev/sdj", master_specs[0].device)
-    self.assertEqual("snap_1", master_specs[0].snapshot_id)
-
-    slave_specs = volume_spec_manager.volume_specs_for_role("slave")
-    self.assertEqual(2, len(slave_specs))
-    self.assertEqual("snap_2", slave_specs[0].snapshot_id)
-    self.assertEqual("snap_3", slave_specs[1].snapshot_id)
-
-    self.assertRaises(KeyError, volume_spec_manager.volume_specs_for_role,
-                      "no-such-role")
-
-  def test_get_mappings_string_for_role(self):
-
-    input = StringIO(json.dumps(spec))
-
-    volume_spec_manager = JsonVolumeSpecManager(input)
-
-    master_mappings = volume_spec_manager.get_mappings_string_for_role("master")
-    self.assertEqual("/,/dev/sdj", master_mappings)
-
-    slave_mappings = volume_spec_manager.get_mappings_string_for_role("slave")
-    self.assertEqual("/,/dev/sdj;/data1,/dev/sdk", slave_mappings)
-
-    self.assertRaises(KeyError,
-                      volume_spec_manager.get_mappings_string_for_role,
-                      "no-such-role")
-
-class TestJsonVolumeManager(unittest.TestCase):
-
-  def tearDown(self):
-    try:
-      os.remove("volumemanagertest.json")
-    except OSError:
-      pass
-    
-  def test_add_instance_storage_for_role(self):
-    volume_manager = JsonVolumeManager("volumemanagertest.json")
-    self.assertEqual(0,
-      len(volume_manager.get_instance_storage_for_role("master")))
-    self.assertEqual(0, len(volume_manager.get_roles()))
-
-    volume_manager.add_instance_storage_for_role("master",
-                                                 [MountableVolume("vol_1", "/",
-                                                                  "/dev/sdj")])
-    master_storage = volume_manager.get_instance_storage_for_role("master")
-    self.assertEqual(1, len(master_storage))
-    master_storage_instance0 = master_storage[0]
-    self.assertEqual(1, len(master_storage_instance0))
-    master_storage_instance0_vol0 = master_storage_instance0[0]
-    self.assertEqual("vol_1", master_storage_instance0_vol0.volume_id)
-    self.assertEqual("/", master_storage_instance0_vol0.mount_point)
-    self.assertEqual("/dev/sdj", master_storage_instance0_vol0.device)
-
-    volume_manager.add_instance_storage_for_role("slave",
-                                                 [MountableVolume("vol_2", "/",
-                                                                  "/dev/sdj")])
-    self.assertEqual(1,
-      len(volume_manager.get_instance_storage_for_role("master")))
-    slave_storage = volume_manager.get_instance_storage_for_role("slave")
-    self.assertEqual(1, len(slave_storage))
-    slave_storage_instance0 = slave_storage[0]
-    self.assertEqual(1, len(slave_storage_instance0))
-    slave_storage_instance0_vol0 = slave_storage_instance0[0]
-    self.assertEqual("vol_2", slave_storage_instance0_vol0.volume_id)
-    self.assertEqual("/", slave_storage_instance0_vol0.mount_point)
-    self.assertEqual("/dev/sdj", slave_storage_instance0_vol0.device)
-
-    volume_manager.add_instance_storage_for_role("slave",
-      [MountableVolume("vol_3", "/", "/dev/sdj"),
-       MountableVolume("vol_4", "/data1", "/dev/sdk")])
-    self.assertEqual(1,
-      len(volume_manager.get_instance_storage_for_role("master")))
-    slave_storage = volume_manager.get_instance_storage_for_role("slave")
-    self.assertEqual(2, len(slave_storage))
-    slave_storage_instance0 = slave_storage[0]
-    slave_storage_instance1 = slave_storage[1]
-    self.assertEqual(1, len(slave_storage_instance0))
-    self.assertEqual(2, len(slave_storage_instance1))
-    slave_storage_instance1_vol0 = slave_storage_instance1[0]
-    slave_storage_instance1_vol1 = slave_storage_instance1[1]
-    self.assertEqual("vol_3", slave_storage_instance1_vol0.volume_id)
-    self.assertEqual("/", slave_storage_instance1_vol0.mount_point)
-    self.assertEqual("/dev/sdj", slave_storage_instance1_vol0.device)
-    self.assertEqual("vol_4", slave_storage_instance1_vol1.volume_id)
-    self.assertEqual("/data1", slave_storage_instance1_vol1.mount_point)
-    self.assertEqual("/dev/sdk", slave_storage_instance1_vol1.device)
-    
-    roles = volume_manager.get_roles()
-    self.assertEqual(2, len(roles))
-    self.assertTrue("slave" in roles)
-    self.assertTrue("master" in roles)
-
-
-if __name__ == '__main__':
-  unittest.main()

+ 0 - 44
src/contrib/cloud/src/test/py/testuserdata.py

@@ -1,44 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import tempfile
-import unittest
-
-from hadoop.cloud.cluster import InstanceUserData
-
-class TestInstanceUserData(unittest.TestCase):
-
-  def test_replacement(self):
-    file = tempfile.NamedTemporaryFile()
-    file.write("Contents go here")
-    file.flush()
-    self.assertEqual("Contents go here",
-                     InstanceUserData(file.name, {}).read())
-    self.assertEqual("Contents were here",
-                     InstanceUserData(file.name, { "go": "were"}).read())
-    self.assertEqual("Contents  here",
-                     InstanceUserData(file.name, { "go": None}).read())
-    file.close()
-
-  def test_read_file_url(self):
-    file = tempfile.NamedTemporaryFile()
-    file.write("Contents go here")
-    file.flush()
-    self.assertEqual("Contents go here",
-                     InstanceUserData("file://%s" % file.name, {}).read())
-    file.close()
-
-if __name__ == '__main__':
-  unittest.main()

+ 0 - 81
src/contrib/cloud/src/test/py/testutil.py

@@ -1,81 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import ConfigParser
-import StringIO
-import unittest
-
-from hadoop.cloud.util import bash_quote
-from hadoop.cloud.util import bash_quote_env
-from hadoop.cloud.util import build_env_string
-from hadoop.cloud.util import merge_config_with_options
-from hadoop.cloud.util import xstr
-
-class TestUtilFunctions(unittest.TestCase):
-
-  def test_bash_quote(self):
-    self.assertEqual("", bash_quote(None))
-    self.assertEqual("''", bash_quote(""))
-    self.assertEqual("'a'", bash_quote("a"))
-    self.assertEqual("'a b'", bash_quote("a b"))
-    self.assertEqual("'a\b'", bash_quote("a\b"))
-    self.assertEqual("'a '\\'' b'", bash_quote("a ' b"))
-
-  def test_bash_quote_env(self):
-    self.assertEqual("", bash_quote_env(""))
-    self.assertEqual("a", bash_quote_env("a"))
-    self.assertEqual("a='b'", bash_quote_env("a=b"))
-    self.assertEqual("a='b c'", bash_quote_env("a=b c"))
-    self.assertEqual("a='b\c'", bash_quote_env("a=b\c"))
-    self.assertEqual("a='b '\\'' c'", bash_quote_env("a=b ' c"))
-
-  def test_build_env_string(self):
-    self.assertEqual("", build_env_string())
-    self.assertEqual("a='b' c='d'",
-                     build_env_string(env_strings=["a=b", "c=d"]))
-    self.assertEqual("a='b' c='d'",
-                     build_env_string(pairs={"a": "b", "c": "d"}))
-
-  def test_merge_config_with_options(self):
-    options = { "a": "b" }
-    config = ConfigParser.ConfigParser()
-    self.assertEqual({ "a": "b" },
-                     merge_config_with_options("section", config, options))
-    config.add_section("section")
-    self.assertEqual({ "a": "b" },
-                     merge_config_with_options("section", config, options))
-    config.set("section", "a", "z")
-    config.set("section", "c", "d")
-    self.assertEqual({ "a": "z", "c": "d" },
-                     merge_config_with_options("section", config, {}))
-    self.assertEqual({ "a": "b", "c": "d" },
-                     merge_config_with_options("section", config, options))
-
-  def test_merge_config_with_options_list(self):
-    config = ConfigParser.ConfigParser()
-    config.readfp(StringIO.StringIO("""[section]
-env1=a=b
- c=d
-env2=e=f
- g=h"""))
-    self.assertEqual({ "env1": ["a=b", "c=d"], "env2": ["e=f", "g=h"] },
-                     merge_config_with_options("section", config, {}))
-
-  def test_xstr(self):
-    self.assertEqual("", xstr(None))
-    self.assertEqual("a", xstr("a"))
-
-if __name__ == '__main__':
-  unittest.main()

+ 0 - 46
src/contrib/cloud/tools/rackspace/remote-setup.sh

@@ -1,46 +0,0 @@
-#!/bin/bash -x
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-#
-# Given an Ubuntu base system install, install the base packages we need.
-#
-
-# We require multiverse to be enabled.
-cat >> /etc/apt/sources.list << EOF
-deb http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
-deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
-deb http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse
-deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse
-EOF
-
-apt-get update
-
-# Install Java
-apt-get -y install sun-java6-jdk
-echo "export JAVA_HOME=/usr/lib/jvm/java-6-sun" >> /etc/profile
-export JAVA_HOME=/usr/lib/jvm/java-6-sun
-java -version
-
-# Install general packages
-apt-get -y install vim curl screen ssh rsync unzip openssh-server
-apt-get -y install policykit # http://www.bergek.com/2008/11/24/ubuntu-810-libpolkit-error/
-
-# Create root's .ssh directory if it doesn't exist
-mkdir -p /root/.ssh
-
-# Run any rackspace init script injected at boot time
-echo '[ -f /etc/init.d/rackspace-init.sh ] && /bin/sh /etc/init.d/rackspace-init.sh; exit 0' > /etc/rc.local