15 年之前 · 5382255687
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -236,15 +236,10 @@ Release 0.21.0 - Unreleased
 
															     and the init of the class is made to take a Configuration argument.
														
 
															     (Jakob Homan via ddas)
														
 
															-    HADOOP-6108. Add support for EBS storage on EC2. (tomwhite)
														
 
															-
														
 
															     Hadoop-6223. Add new file system interface AbstractFileSystem with
														
 
															     implementation of some file systems that delegate to old FileSystem.
														
 
															     (Sanjay Radia via suresh)
														
 
															-    HADOOP-6392. Run namenode and jobtracker on separate EC2 instances.
														
 
															-    (tomwhite)
														
 
															-
														
 
															     HADOOP-6433. Introduce asychronous deletion of files via a pool of
														
 
															     threads. This can be used to delete files in the Distributed
														
 
															     Cache. (Zheng Shao via dhruba)
														
@@ -252,13 +247,9 @@ Release 0.21.0 - Unreleased
 
															     HADOOP-6415. Adds a common token interface for both job token and 
														
 
															     delegation token. (Kan Zhang via ddas)
														
 
															-    HADOOP-6466. Add a ZooKeeper service to the cloud scripts. (tomwhite)
														
 
															-
														
 
															     HADOOP-6408. Add a /conf servlet to dump running configuration.
														
 
															     (Todd Lipcon via tomwhite)
														
 
															-    HADOOP-6464. Write a Rackspace cloud provider. (tomwhite)
														
 
															-
														
 
															     HADOOP-6520. Adds APIs to read/write Token and secret keys. Also
														
 
															     adds the automatic loading of tokens into UserGroupInformation
														
 
															     upon login. The tokens are read from a file specified in the
														
@@ -737,15 +728,8 @@ Release 0.21.0 - Unreleased
 
															     HADOOP-6394. Add a helper class to simplify FileContext related tests and
														
 
															     improve code reusability. (Jitendra Nath Pandey via suresh)
														
 
															-    HADOOP-6426. Create ant build for running EC2 unit tests. (tomwhite)
														
 
															-
														
 
															     HADOOP-4656. Add a user to groups mapping service. (boryas, acmurthy)
														
 
															-    HADOOP-6444. Support additional security group option in hadoop-ec2 script.
														
 
															-    (Paul Egan via tomwhite)
														
 
															-
														
 
															-    HADOOP-6454. Create setup.py for EC2 cloud scripts. (tomwhite)
														
 
															-
														
 
															     HADOOP-6435. Make RPC.waitForProxy with timeout public. (Steve Loughran
														
 
															     via tomwhite)
														
@@ -1500,9 +1484,6 @@ Release 0.21.0 - Unreleased
 
															     HADOOP-6640. FileSystem.get() does RPC retries within a static
														
 
															     synchronized block. (hairong)
														
 
															-    HADOOP-6680. hadoop-cloud push command invokes proxy creation.
														
 
															-    (Andrew Klochkov via tomwhite)
														
 
															-
														
 
															     HADOOP-6691. TestFileSystemCaching sometimes hangs. (hairong)
														
 
															     HADOOP-6507. Hadoop Common Docs - delete 3 doc files that do not belong
														
--- a/src/contrib/cloud/README.txt
+++ b/src/contrib/cloud/README.txt
@@ -1,497 +0,0 @@
 
															-Hadoop Cloud Scripts
														
 
															-====================
														
 
															-
														
 
															-These scripts allow you to run Hadoop on cloud providers. These instructions
														
 
															-assume you are running on Amazon EC2, the differences for other providers are
														
 
															-noted at the end of this document.
														
 
															-
														
 
															-Getting Started
														
 
															-===============
														
 
															-
														
 
															-First, unpack the scripts on your system. For convenience, you may like to put
														
 
															-the top-level directory on your path.
														
 
															-
														
 
															-You'll also need python (version 2.5 or newer) and the boto and simplejson
														
 
															-libraries. After you download boto and simplejson, you can install each in turn
														
 
															-by running the following in the directory where you unpacked the distribution:
														
 
															-
														
 
															-% sudo python setup.py install
														
 
															-
														
 
															-Alternatively, you might like to use the python-boto and python-simplejson RPM
														
 
															-and Debian packages.
														
 
															-
														
 
															-You need to tell the scripts your AWS credentials. The simplest way to do this
														
 
															-is to set the environment variables (but see
														
 
															-http://code.google.com/p/boto/wiki/BotoConfig for other options):
														
 
															-
														
 
															-    * AWS_ACCESS_KEY_ID - Your AWS Access Key ID
														
 
															-    * AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
														
 
															-
														
 
															-To configure the scripts, create a directory called .hadoop-cloud (note the
														
 
															-leading ".") in your home directory. In it, create a file called
														
 
															-clusters.cfg with a section for each cluster you want to control. e.g.:
														
 
															-
														
 
															-[my-hadoop-cluster]
														
 
															-image_id=ami-6159bf08
														
 
															-instance_type=c1.medium
														
 
															-key_name=tom
														
 
															-availability_zone=us-east-1c
														
 
															-private_key=PATH_TO_PRIVATE_KEY
														
 
															-ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
														
 
															-
														
 
															-The image chosen here is one with a i386 Fedora OS. For a list of suitable AMIs
														
 
															-see http://wiki.apache.org/hadoop/AmazonEC2.
														
 
															-
														
 
															-The architecture must be compatible with the instance type. For m1.small and
														
 
															-c1.medium instances use the i386 AMIs, while for m1.large, m1.xlarge, and
														
 
															-c1.xlarge instances use the x86_64 AMIs. One of the high CPU instances
														
 
															-(c1.medium or c1.xlarge) is recommended.
														
 
															-
														
 
															-Then you can run the hadoop-ec2 script. It will display usage instructions when
														
 
															-invoked without arguments.
														
 
															-
														
 
															-You can test that it can connect to AWS by typing:
														
 
															-
														
 
															-% hadoop-ec2 list
														
 
															-
														
 
															-LAUNCHING A CLUSTER
														
 
															-===================
														
 
															-
														
 
															-To launch a cluster called "my-hadoop-cluster" with 10 worker (slave) nodes
														
 
															-type:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster my-hadoop-cluster 10
														
 
															-
														
 
															-This will boot the master node and 10 worker nodes. The master node runs the
														
 
															-namenode, secondary namenode, and jobtracker, and each worker node runs a
														
 
															-datanode and a tasktracker. Equivalently the cluster could be launched as:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster my-hadoop-cluster 1 nn,snn,jt 10 dn,tt
														
 
															-
														
 
															-Note that using this notation you can launch a split namenode/jobtracker cluster
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster my-hadoop-cluster 1 nn,snn 1 jt 10 dn,tt
														
 
															-
														
 
															-When the nodes have started and the Hadoop cluster has come up, the console will
														
 
															-display a message like
														
 
															-
														
 
															-  Browse the cluster at http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/
														
 
															-
														
 
															-You can access Hadoop's web UI by visiting this URL. By default, port 80 is
														
 
															-opened for access from your client machine. You may change the firewall settings
														
 
															-(to allow access from a network, rather than just a single machine, for example)
														
 
															-by using the Amazon EC2 command line tools, or by using a tool like Elastic Fox.
														
 
															-There is a security group for each node's role. The one for the namenode
														
 
															-is <cluster-name>-nn, for example.
														
 
															-
														
 
															-For security reasons, traffic from the network your client is running on is
														
 
															-proxied through the master node of the cluster using an SSH tunnel (a SOCKS
														
 
															-proxy on port 6666). To set up the proxy run the following command:
														
 
															-
														
 
															-% hadoop-ec2 proxy my-hadoop-cluster
														
 
															-
														
 
															-Web browsers need to be configured to use this proxy too, so you can view pages
														
 
															-served by worker nodes in the cluster. The most convenient way to do this is to
														
 
															-use a proxy auto-config (PAC) file, such as this one:
														
 
															-
														
 
															-  http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac
														
 
															-
														
 
															-If you are using Firefox, then you may find
														
 
															-FoxyProxy useful for managing PAC files. (If you use FoxyProxy, then you need to
														
 
															-get it to use the proxy for DNS lookups. To do this, go to Tools -> FoxyProxy ->
														
 
															-Options, and then under "Miscellaneous" in the bottom left, choose "Use SOCKS
														
 
															-proxy for DNS lookups".)
														
 
															-
														
 
															-PERSISTENT CLUSTERS
														
 
															-===================
														
 
															-
														
 
															-Hadoop clusters running on EC2 that use local EC2 storage (the default) will not
														
 
															-retain data once the cluster has been terminated. It is possible to use EBS for
														
 
															-persistent data, which allows a cluster to be shut down while it is not being
														
 
															-used.
														
 
															-
														
 
															-Note: EBS support is a Beta feature.
														
 
															-
														
 
															-First create a new section called "my-ebs-cluster" in the
														
 
															-.hadoop-cloud/clusters.cfg file.
														
 
															-
														
 
															-Now we need to create storage for the new cluster. Create a temporary EBS volume
														
 
															-of size 100GiB, format it, and save it as a snapshot in S3. This way, we only
														
 
															-have to do the formatting once.
														
 
															-
														
 
															-% hadoop-ec2 create-formatted-snapshot my-ebs-cluster 100
														
 
															-
														
 
															-We create storage for a single namenode and for two datanodes. The volumes to
														
 
															-create are described in a JSON spec file, which references the snapshot we just
														
 
															-created. Here is the contents of a JSON file, called
														
 
															-my-ebs-cluster-storage-spec.json:
														
 
															-
														
 
															-{
														
 
															-  "nn": [
														
 
															-    {
														
 
															-      "device": "/dev/sdj",
														
 
															-      "mount_point": "/ebs1",
														
 
															-      "size_gb": "100",
														
 
															-      "snapshot_id": "snap-268e704f"
														
 
															-    },
														
 
															-    {
														
 
															-      "device": "/dev/sdk",
														
 
															-      "mount_point": "/ebs2",
														
 
															-      "size_gb": "100",
														
 
															-      "snapshot_id": "snap-268e704f"
														
 
															-    }
														
 
															-  ],
														
 
															-  "dn": [
														
 
															-    {
														
 
															-      "device": "/dev/sdj",
														
 
															-      "mount_point": "/ebs1",
														
 
															-      "size_gb": "100",
														
 
															-      "snapshot_id": "snap-268e704f"
														
 
															-    },
														
 
															-    {
														
 
															-      "device": "/dev/sdk",
														
 
															-      "mount_point": "/ebs2",
														
 
															-      "size_gb": "100",
														
 
															-      "snapshot_id": "snap-268e704f"
														
 
															-    }
														
 
															-  ]
														
 
															-}
														
 
															-
														
 
															-
														
 
															-Each role (here "nn" and "dn") is the key to an array of volume
														
 
															-specifications. In this example, the "slave" role has two devices ("/dev/sdj"
														
 
															-and "/dev/sdk") with different mount points, sizes, and generated from an EBS
														
 
															-snapshot. The snapshot is the formatted snapshot created earlier, so that the
														
 
															-volumes we create are pre-formatted. The size of the drives must match the size
														
 
															-of the snapshot created earlier.
														
 
															-
														
 
															-Let's create actual volumes using this file.
														
 
															-
														
 
															-% hadoop-ec2 create-storage my-ebs-cluster nn 1 \
														
 
															-    my-ebs-cluster-storage-spec.json
														
 
															-% hadoop-ec2 create-storage my-ebs-cluster dn 2 \
														
 
															-    my-ebs-cluster-storage-spec.json
														
 
															-
														
 
															-Now let's start the cluster with 2 slave nodes:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster my-ebs-cluster 2
														
 
															-
														
 
															-Login and run a job which creates some output.
														
 
															-
														
 
															-% hadoop-ec2 login my-ebs-cluster
														
 
															-
														
 
															-# hadoop fs -mkdir input
														
 
															-# hadoop fs -put /etc/hadoop/conf/*.xml input
														
 
															-# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output \
														
 
															-    'dfs[a-z.]+'
														
 
															-
														
 
															-Look at the output:
														
 
															-
														
 
															-# hadoop fs -cat output/part-00000 | head
														
 
															-
														
 
															-Now let's shutdown the cluster.
														
 
															-
														
 
															-% hadoop-ec2 terminate-cluster my-ebs-cluster
														
 
															-
														
 
															-A little while later we restart the cluster and login.
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster my-ebs-cluster 2
														
 
															-% hadoop-ec2 login my-ebs-cluster
														
 
															-
														
 
															-The output from the job we ran before should still be there:
														
 
															-
														
 
															-# hadoop fs -cat output/part-00000 | head
														
 
															-
														
 
															-RUNNING JOBS
														
 
															-============
														
 
															-
														
 
															-When you launched the cluster, a hadoop-site.xml file was created in the
														
 
															-directory ~/.hadoop-cloud/<cluster-name>. You can use this to connect to the
														
 
															-cluster by setting the HADOOP_CONF_DIR enviroment variable (it is also possible
														
 
															-to set the configuration file to use by passing it as a -conf option to Hadoop
														
 
															-Tools):
														
 
															-
														
 
															-% export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
														
 
															-
														
 
															-Let's try browsing HDFS:
														
 
															-
														
 
															-% hadoop fs -ls /
														
 
															-
														
 
															-Running a job is straightforward:
														
 
															-
														
 
															-% hadoop fs -mkdir input # create an input directory
														
 
															-% hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there
														
 
															-% hadoop jar $HADOOP_HOME/hadoop-*-examples.jar wordcount input output
														
 
															-% hadoop fs -cat output/part-00000 | head
														
 
															-
														
 
															-Of course, these examples assume that you have installed Hadoop on your local
														
 
															-machine. It is also possible to launch jobs from within the cluster. First log
														
 
															-into the namenode:
														
 
															-
														
 
															-% hadoop-ec2 login my-hadoop-cluster
														
 
															-
														
 
															-Then run a job as before:
														
 
															-
														
 
															-# hadoop fs -mkdir input
														
 
															-# hadoop fs -put /etc/hadoop/conf/*.xml input
														
 
															-# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
														
 
															-# hadoop fs -cat output/part-00000 | head
														
 
															-
														
 
															-TERMINATING A CLUSTER
														
 
															-=====================
														
 
															-
														
 
															-When you've finished with your cluster you can stop it with the following
														
 
															-command.
														
 
															-
														
 
															-NOTE: ALL DATA WILL BE LOST UNLESS YOU ARE USING EBS!
														
 
															-
														
 
															-% hadoop-ec2 terminate-cluster my-hadoop-cluster
														
 
															-
														
 
															-You can then delete the EC2 security groups with:
														
 
															-
														
 
															-% hadoop-ec2 delete-cluster my-hadoop-cluster
														
 
															-
														
 
															-AUTOMATIC CLUSTER SHUTDOWN
														
 
															-==========================
														
 
															-
														
 
															-You may use the --auto-shutdown option to automatically terminate a cluster
														
 
															-a given time (specified in minutes) after launch. This is useful for short-lived
														
 
															-clusters where the jobs complete in a known amount of time.
														
 
															-
														
 
															-If you want to cancel the automatic shutdown, then run
														
 
															-
														
 
															-% hadoop-ec2 exec my-hadoop-cluster shutdown -c
														
 
															-% hadoop-ec2 update-slaves-file my-hadoop-cluster
														
 
															-% hadoop-ec2 exec my-hadoop-cluster /usr/lib/hadoop/bin/slaves.sh shutdown -c
														
 
															-
														
 
															-CONFIGURATION NOTES
														
 
															-===================
														
 
															-
														
 
															-It is possible to specify options on the command line: these take precedence
														
 
															-over any specified in the configuration file. For example:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster --image-id ami-2359bf4a --instance-type c1.xlarge \
														
 
															-  my-hadoop-cluster 10
														
 
															-
														
 
															-This command launches a 10-node cluster using the specified image and instance
														
 
															-type, overriding the equivalent settings (if any) that are in the
														
 
															-"my-hadoop-cluster" section of the configuration file. Note that words in
														
 
															-options are separated by hyphens (--instance-type) while the corresponding
														
 
															-configuration parameter is are separated by underscores (instance_type).
														
 
															-
														
 
															-The scripts install Hadoop RPMs or Debian packages (depending on the OS) at
														
 
															-instance boot time.
														
 
															-
														
 
															-By default, Apache Hadoop 0.20.1 is installed. You can also run other versions
														
 
															-of Apache Hadoop. For example the following uses version 0.18.3:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster --env HADOOP_VERSION=0.18.3 \
														
 
															-  my-hadoop-cluster 10
														
 
															-
														
 
															-CUSTOMIZATION
														
 
															-=============
														
 
															-
														
 
															-You can specify a list of packages to install on every instance at boot time
														
 
															-using the --user-packages command-line option (or the user_packages
														
 
															-configuration parameter). Packages should be space-separated. Note that package
														
 
															-names should reflect the package manager being used to install them (yum or
														
 
															-apt-get depending on the OS).
														
 
															-
														
 
															-Here's an example that installs RPMs for R and git:
														
 
															-
														
 
															-% hadoop-ec2 launch-cluster --user-packages 'R git-core' my-hadoop-cluster 10
														
 
															-
														
 
															-You have full control over the script that is run when each instance boots. The
														
 
															-default script, hadoop-ec2-init-remote.sh, may be used as a starting point to
														
 
															-add extra configuration or customization of the instance. Make a copy of the
														
 
															-script in your home directory, or somewhere similar, and set the
														
 
															---user-data-file command-line option (or the user_data_file configuration
														
 
															-parameter) to point to the (modified) copy.  hadoop-ec2 will replace "%ENV%"
														
 
															-in your user data script with
														
 
															-USER_PACKAGES, AUTO_SHUTDOWN, and EBS_MAPPINGS, as well as extra parameters
														
 
															-supplied using the --env commandline flag.
														
 
															-
														
 
															-Another way of customizing the instance, which may be more appropriate for
														
 
															-larger changes, is to create you own image.
														
 
															-
														
 
															-It's possible to use any image, as long as it i) runs (gzip compressed) user
														
 
															-data on boot, and ii) has Java installed.
														
 
															-
														
 
															-OTHER SERVICES
														
 
															-==============
														
 
															-
														
 
															-ZooKeeper
														
 
															-=========
														
 
															-
														
 
															-You can run ZooKeeper by setting the "service" parameter to "zookeeper". For
														
 
															-example:
														
 
															-
														
 
															-[my-zookeeper-cluster]
														
 
															-service=zookeeper
														
 
															-ami=ami-ed59bf84
														
 
															-instance_type=m1.small
														
 
															-key_name=tom
														
 
															-availability_zone=us-east-1c
														
 
															-public_key=PATH_TO_PUBLIC_KEY
														
 
															-private_key=PATH_TO_PRIVATE_KEY
														
 
															-
														
 
															-Then to launch a three-node ZooKeeper ensemble, run:
														
 
															-
														
 
															-% ./hadoop-ec2 launch-cluster my-zookeeper-cluster 3 zk
														
 
															-
														
 
															-PROVIDER-SPECIFIC DETAILS
														
 
															-=========================
														
 
															-
														
 
															-Rackspace
														
 
															-=========
														
 
															-
														
 
															-Running on Rackspace is very similar to running on EC2, with a few minor
														
 
															-differences noted here.
														
 
															-
														
 
															-Security Warning
														
 
															-================
														
 
															-
														
 
															-Currently, Hadoop clusters on Rackspace are insecure since they don't run behind
														
 
															-a firewall.
														
 
															-
														
 
															-Creating an image
														
 
															-=================
														
 
															-
														
 
															-Rackspace doesn't support shared images, so you will need to build your own base
														
 
															-image to get started. See "Instructions for creating an image" at the end of
														
 
															-this document for details.
														
 
															-
														
 
															-Installation
														
 
															-============
														
 
															-
														
 
															-To run on rackspace you need to install libcloud by checking out the latest
														
 
															-source from Apache:
														
 
															-
														
 
															-git clone git://git.apache.org/libcloud.git
														
 
															-cd libcloud; python setup.py install
														
 
															-
														
 
															-Set up your Rackspace credentials by exporting the following environment
														
 
															-variables:
														
 
															-
														
 
															-    * RACKSPACE_KEY - Your Rackspace user name
														
 
															-    * RACKSPACE_SECRET - Your Rackspace API key
														
 
															-    
														
 
															-Configuration
														
 
															-=============
														
 
															-
														
 
															-The cloud_provider parameter must be set to specify Rackspace as the provider.
														
 
															-Here is a typical configuration:
														
 
															-
														
 
															-[my-rackspace-cluster]
														
 
															-cloud_provider=rackspace
														
 
															-image_id=200152
														
 
															-instance_type=4
														
 
															-public_key=/path/to/public/key/file
														
 
															-private_key=/path/to/private/key/file
														
 
															-ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
														
 
															-
														
 
															-It's a good idea to create a dedicated key using a command similar to:
														
 
															-
														
 
															-ssh-keygen -f id_rsa_rackspace -P ''
														
 
															-
														
 
															-Launching a cluster
														
 
															-===================
														
 
															-
														
 
															-Use the "hadoop-cloud" command instead of "hadoop-ec2".
														
 
															-
														
 
															-After launching a cluster you need to manually add a hostname mapping for the
														
 
															-master node to your client's /etc/hosts to get it to work. This is because DNS
														
 
															-isn't set up for the cluster nodes so your client won't resolve their addresses.
														
 
															-You can do this with
														
 
															-
														
 
															-hadoop-cloud list my-rackspace-cluster | grep 'nn,snn,jt' \
														
 
															- | awk '{print $4 " " $3 }'  | sudo tee -a /etc/hosts
														
 
															-
														
 
															-Instructions for creating an image
														
 
															-==================================
														
 
															-
														
 
															-First set your Rackspace credentials:
														
 
															-
														
 
															-export RACKSPACE_KEY=<Your Rackspace user name>
														
 
															-export RACKSPACE_SECRET=<Your Rackspace API key>
														
 
															-
														
 
															-Now create an authentication token for the session, and retrieve the server
														
 
															-management URL to perform operations against.
														
 
															-
														
 
															-# Final SED is to remove trailing ^M
														
 
															-AUTH_TOKEN=`curl -D - -H X-Auth-User:$RACKSPACE_KEY \
														
 
															-  -H X-Auth-Key:$RACKSPACE_SECRET https://auth.api.rackspacecloud.com/v1.0 \
														
 
															-  | grep 'X-Auth-Token:' | awk '{print $2}' | sed 's/.$//'`
														
 
															-SERVER_MANAGEMENT_URL=`curl -D - -H X-Auth-User:$RACKSPACE_KEY \
														
 
															-  -H X-Auth-Key:$RACKSPACE_SECRET https://auth.api.rackspacecloud.com/v1.0 \
														
 
															-  | grep 'X-Server-Management-Url:' | awk '{print $2}' | sed 's/.$//'`
														
 
															-
														
 
															-echo $AUTH_TOKEN
														
 
															-echo $SERVER_MANAGEMENT_URL
														
 
															-
														
 
															-You can get a list of images with the following
														
 
															-
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images
														
 
															-
														
 
															-Here's the same query, but with pretty-printed XML output:
														
 
															-
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images.xml | xmllint --format -
														
 
															-
														
 
															-There are similar queries for flavors and running instances:
														
 
															-
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/flavors.xml | xmllint --format -
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers.xml | xmllint --format -
														
 
															-
														
 
															-The following command will create a new server. In this case it will create a
														
 
															-2GB Ubuntu 8.10 instance, as determined by the imageId and flavorId attributes.
														
 
															-The name of the instance is set to something meaningful too.
														
 
															-
														
 
															-curl -v -X POST -H X-Auth-Token:$AUTH_TOKEN -H 'Content-type: text/xml' -d @- $SERVER_MANAGEMENT_URL/servers << EOF
														
 
															-<server xmlns="http://docs.rackspacecloud.com/servers/api/v1.0" name="apache-hadoop-ubuntu-8.10-base" imageId="11" flavorId="4">
														
 
															-  <metadata/>
														
 
															-</server>
														
 
															-EOF
														
 
															-
														
 
															-Make a note of the new server's ID, public IP address and admin password as you
														
 
															-will need these later.
														
 
															-
														
 
															-You can check the status of the server with
														
 
															-
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers/$SERVER_ID.xml | xmllint --format -
														
 
															-
														
 
															-When it has started (status "ACTIVE"), copy the setup script over:
														
 
															-
														
 
															-scp tools/rackspace/remote-setup.sh root@$SERVER:remote-setup.sh
														
 
															-
														
 
															-Log in to and run the setup script (you will need to manually accept the
														
 
															-Sun Java license):
														
 
															-
														
 
															-sh remote-setup.sh
														
 
															-
														
 
															-Once the script has completed, log out and create an image of the running
														
 
															-instance (giving it a memorable name):
														
 
															-
														
 
															-curl -v -X POST -H X-Auth-Token:$AUTH_TOKEN -H 'Content-type: text/xml' -d @- $SERVER_MANAGEMENT_URL/images << EOF
														
 
															-<image xmlns="http://docs.rackspacecloud.com/servers/api/v1.0" name="Apache Hadoop Ubuntu 8.10" serverId="$SERVER_ID" />
														
 
															-EOF
														
 
															-
														
 
															-Keep a note of the image ID as this is what you will use to launch fresh
														
 
															-instances from.
														
 
															-
														
 
															-You can check the status of the image with
														
 
															-
														
 
															-curl -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images/$IMAGE_ID.xml | xmllint --format -
														
 
															-
														
 
															-When it's "ACTIVE" is is ready for use. It's important to realize that you have
														
 
															-to keep the server from which you generated the image running for as long as the
														
 
															-image is in use.
														
 
															-
														
 
															-However, if you want to clean up an old instance run:
														
 
															-
														
 
															-curl -X DELETE -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/servers/$SERVER_ID
														
 
															-
														
 
															-Similarly, you can delete old images:
														
 
															-
														
 
															-curl -X DELETE -H X-Auth-Token:$AUTH_TOKEN $SERVER_MANAGEMENT_URL/images/$IMAGE_ID
														
 
															-
														
 
															-
														
--- a/src/contrib/cloud/build.xml
+++ b/src/contrib/cloud/build.xml
@@ -1,45 +0,0 @@
 
															-<?xml version="1.0"?>
														
 
															-
														
 
															-<!--
														
 
															-   Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-   contributor license agreements.  See the NOTICE file distributed with
														
 
															-   this work for additional information regarding copyright ownership.
														
 
															-   The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-   (the "License"); you may not use this file except in compliance with
														
 
															-   the License.  You may obtain a copy of the License at
														
 
															-
														
 
															-       http://www.apache.org/licenses/LICENSE-2.0
														
 
															-
														
 
															-   Unless required by applicable law or agreed to in writing, software
														
 
															-   distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-   See the License for the specific language governing permissions and
														
 
															-   limitations under the License.
														
 
															--->
														
 
															-
														
 
															-<project name="hadoop-cloud" default="test-py">
														
 
															-  <property name="lib.dir" value="${basedir}/lib"/>
														
 
															-  <path id="java.classpath">
														
 
															-    <fileset dir="${lib.dir}">
														
 
															-      <include name="**/*.jar" />
														
 
															-    </fileset>
														
 
															-  </path>
														
 
															-  <path id="test.py.path">
														
 
															-    <pathelement location="${basedir}/src/py"/>
														
 
															-    <pathelement location="${basedir}/src/test/py"/>
														
 
															-  </path>
														
 
															-  <target name="test-py" description="Run python unit tests">
														
 
															-    <taskdef name="py-test" classname="org.pyant.tasks.PythonTestTask">
														
 
															-      <classpath refid="java.classpath" />
														
 
															-    </taskdef>
														
 
															-    <py-test python="python" pythonpathref="test.py.path" >
														
 
															-      <fileset dir="${basedir}/src/test/py">
														
 
															-        <include name="*.py"/>
														
 
															-      </fileset>
														
 
															-    </py-test>
														
 
															-  </target>
														
 
															-  <target name="compile"/>
														
 
															-  <target name="package"/>
														
 
															-  <target name="test" depends="test-py"/>
														
 
															-  <target name="clean"/>
														
 
															-</project>
														
--- a/src/contrib/cloud/lib/pyAntTasks-1.3-LICENSE.txt
+++ b/src/contrib/cloud/lib/pyAntTasks-1.3-LICENSE.txt
@@ -1,202 +0,0 @@
 
															-
														
 
															-                                 Apache License
														
 
															-                           Version 2.0, January 2004
														
 
															-                        http://www.apache.org/licenses/
														
 
															-
														
 
															-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
														
 
															-
														
 
															-   1. Definitions.
														
 
															-
														
 
															-      "License" shall mean the terms and conditions for use, reproduction,
														
 
															-      and distribution as defined by Sections 1 through 9 of this document.
														
 
															-
														
 
															-      "Licensor" shall mean the copyright owner or entity authorized by
														
 
															-      the copyright owner that is granting the License.
														
 
															-
														
 
															-      "Legal Entity" shall mean the union of the acting entity and all
														
 
															-      other entities that control, are controlled by, or are under common
														
 
															-      control with that entity. For the purposes of this definition,
														
 
															-      "control" means (i) the power, direct or indirect, to cause the
														
 
															-      direction or management of such entity, whether by contract or
														
 
															-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
														
 
															-      outstanding shares, or (iii) beneficial ownership of such entity.
														
 
															-
														
 
															-      "You" (or "Your") shall mean an individual or Legal Entity
														
 
															-      exercising permissions granted by this License.
														
 
															-
														
 
															-      "Source" form shall mean the preferred form for making modifications,
														
 
															-      including but not limited to software source code, documentation
														
 
															-      source, and configuration files.
														
 
															-
														
 
															-      "Object" form shall mean any form resulting from mechanical
														
 
															-      transformation or translation of a Source form, including but
														
 
															-      not limited to compiled object code, generated documentation,
														
 
															-      and conversions to other media types.
														
 
															-
														
 
															-      "Work" shall mean the work of authorship, whether in Source or
														
 
															-      Object form, made available under the License, as indicated by a
														
 
															-      copyright notice that is included in or attached to the work
														
 
															-      (an example is provided in the Appendix below).
														
 
															-
														
 
															-      "Derivative Works" shall mean any work, whether in Source or Object
														
 
															-      form, that is based on (or derived from) the Work and for which the
														
 
															-      editorial revisions, annotations, elaborations, or other modifications
														
 
															-      represent, as a whole, an original work of authorship. For the purposes
														
 
															-      of this License, Derivative Works shall not include works that remain
														
 
															-      separable from, or merely link (or bind by name) to the interfaces of,
														
 
															-      the Work and Derivative Works thereof.
														
 
															-
														
 
															-      "Contribution" shall mean any work of authorship, including
														
 
															-      the original version of the Work and any modifications or additions
														
 
															-      to that Work or Derivative Works thereof, that is intentionally
														
 
															-      submitted to Licensor for inclusion in the Work by the copyright owner
														
 
															-      or by an individual or Legal Entity authorized to submit on behalf of
														
 
															-      the copyright owner. For the purposes of this definition, "submitted"
														
 
															-      means any form of electronic, verbal, or written communication sent
														
 
															-      to the Licensor or its representatives, including but not limited to
														
 
															-      communication on electronic mailing lists, source code control systems,
														
 
															-      and issue tracking systems that are managed by, or on behalf of, the
														
 
															-      Licensor for the purpose of discussing and improving the Work, but
														
 
															-      excluding communication that is conspicuously marked or otherwise
														
 
															-      designated in writing by the copyright owner as "Not a Contribution."
														
 
															-
														
 
															-      "Contributor" shall mean Licensor and any individual or Legal Entity
														
 
															-      on behalf of whom a Contribution has been received by Licensor and
														
 
															-      subsequently incorporated within the Work.
														
 
															-
														
 
															-   2. Grant of Copyright License. Subject to the terms and conditions of
														
 
															-      this License, each Contributor hereby grants to You a perpetual,
														
 
															-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
														
 
															-      copyright license to reproduce, prepare Derivative Works of,
														
 
															-      publicly display, publicly perform, sublicense, and distribute the
														
 
															-      Work and such Derivative Works in Source or Object form.
														
 
															-
														
 
															-   3. Grant of Patent License. Subject to the terms and conditions of
														
 
															-      this License, each Contributor hereby grants to You a perpetual,
														
 
															-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
														
 
															-      (except as stated in this section) patent license to make, have made,
														
 
															-      use, offer to sell, sell, import, and otherwise transfer the Work,
														
 
															-      where such license applies only to those patent claims licensable
														
 
															-      by such Contributor that are necessarily infringed by their
														
 
															-      Contribution(s) alone or by combination of their Contribution(s)
														
 
															-      with the Work to which such Contribution(s) was submitted. If You
														
 
															-      institute patent litigation against any entity (including a
														
 
															-      cross-claim or counterclaim in a lawsuit) alleging that the Work
														
 
															-      or a Contribution incorporated within the Work constitutes direct
														
 
															-      or contributory patent infringement, then any patent licenses
														
 
															-      granted to You under this License for that Work shall terminate
														
 
															-      as of the date such litigation is filed.
														
 
															-
														
 
															-   4. Redistribution. You may reproduce and distribute copies of the
														
 
															-      Work or Derivative Works thereof in any medium, with or without
														
 
															-      modifications, and in Source or Object form, provided that You
														
 
															-      meet the following conditions:
														
 
															-
														
 
															-      (a) You must give any other recipients of the Work or
														
 
															-          Derivative Works a copy of this License; and
														
 
															-
														
 
															-      (b) You must cause any modified files to carry prominent notices
														
 
															-          stating that You changed the files; and
														
 
															-
														
 
															-      (c) You must retain, in the Source form of any Derivative Works
														
 
															-          that You distribute, all copyright, patent, trademark, and
														
 
															-          attribution notices from the Source form of the Work,
														
 
															-          excluding those notices that do not pertain to any part of
														
 
															-          the Derivative Works; and
														
 
															-
														
 
															-      (d) If the Work includes a "NOTICE" text file as part of its
														
 
															-          distribution, then any Derivative Works that You distribute must
														
 
															-          include a readable copy of the attribution notices contained
														
 
															-          within such NOTICE file, excluding those notices that do not
														
 
															-          pertain to any part of the Derivative Works, in at least one
														
 
															-          of the following places: within a NOTICE text file distributed
														
 
															-          as part of the Derivative Works; within the Source form or
														
 
															-          documentation, if provided along with the Derivative Works; or,
														
 
															-          within a display generated by the Derivative Works, if and
														
 
															-          wherever such third-party notices normally appear. The contents
														
 
															-          of the NOTICE file are for informational purposes only and
														
 
															-          do not modify the License. You may add Your own attribution
														
 
															-          notices within Derivative Works that You distribute, alongside
														
 
															-          or as an addendum to the NOTICE text from the Work, provided
														
 
															-          that such additional attribution notices cannot be construed
														
 
															-          as modifying the License.
														
 
															-
														
 
															-      You may add Your own copyright statement to Your modifications and
														
 
															-      may provide additional or different license terms and conditions
														
 
															-      for use, reproduction, or distribution of Your modifications, or
														
 
															-      for any such Derivative Works as a whole, provided Your use,
														
 
															-      reproduction, and distribution of the Work otherwise complies with
														
 
															-      the conditions stated in this License.
														
 
															-
														
 
															-   5. Submission of Contributions. Unless You explicitly state otherwise,
														
 
															-      any Contribution intentionally submitted for inclusion in the Work
														
 
															-      by You to the Licensor shall be under the terms and conditions of
														
 
															-      this License, without any additional terms or conditions.
														
 
															-      Notwithstanding the above, nothing herein shall supersede or modify
														
 
															-      the terms of any separate license agreement you may have executed
														
 
															-      with Licensor regarding such Contributions.
														
 
															-
														
 
															-   6. Trademarks. This License does not grant permission to use the trade
														
 
															-      names, trademarks, service marks, or product names of the Licensor,
														
 
															-      except as required for reasonable and customary use in describing the
														
 
															-      origin of the Work and reproducing the content of the NOTICE file.
														
 
															-
														
 
															-   7. Disclaimer of Warranty. Unless required by applicable law or
														
 
															-      agreed to in writing, Licensor provides the Work (and each
														
 
															-      Contributor provides its Contributions) on an "AS IS" BASIS,
														
 
															-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
														
 
															-      implied, including, without limitation, any warranties or conditions
														
 
															-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
														
 
															-      PARTICULAR PURPOSE. You are solely responsible for determining the
														
 
															-      appropriateness of using or redistributing the Work and assume any
														
 
															-      risks associated with Your exercise of permissions under this License.
														
 
															-
														
 
															-   8. Limitation of Liability. In no event and under no legal theory,
														
 
															-      whether in tort (including negligence), contract, or otherwise,
														
 
															-      unless required by applicable law (such as deliberate and grossly
														
 
															-      negligent acts) or agreed to in writing, shall any Contributor be
														
 
															-      liable to You for damages, including any direct, indirect, special,
														
 
															-      incidental, or consequential damages of any character arising as a
														
 
															-      result of this License or out of the use or inability to use the
														
 
															-      Work (including but not limited to damages for loss of goodwill,
														
 
															-      work stoppage, computer failure or malfunction, or any and all
														
 
															-      other commercial damages or losses), even if such Contributor
														
 
															-      has been advised of the possibility of such damages.
														
 
															-
														
 
															-   9. Accepting Warranty or Additional Liability. While redistributing
														
 
															-      the Work or Derivative Works thereof, You may choose to offer,
														
 
															-      and charge a fee for, acceptance of support, warranty, indemnity,
														
 
															-      or other liability obligations and/or rights consistent with this
														
 
															-      License. However, in accepting such obligations, You may act only
														
 
															-      on Your own behalf and on Your sole responsibility, not on behalf
														
 
															-      of any other Contributor, and only if You agree to indemnify,
														
 
															-      defend, and hold each Contributor harmless for any liability
														
 
															-      incurred by, or claims asserted against, such Contributor by reason
														
 
															-      of your accepting any such warranty or additional liability.
														
 
															-
														
 
															-   END OF TERMS AND CONDITIONS
														
 
															-
														
 
															-   APPENDIX: How to apply the Apache License to your work.
														
 
															-
														
 
															-      To apply the Apache License to your work, attach the following
														
 
															-      boilerplate notice, with the fields enclosed by brackets "[]"
														
 
															-      replaced with your own identifying information. (Don't include
														
 
															-      the brackets!)  The text should be enclosed in the appropriate
														
 
															-      comment syntax for the file format. We also recommend that a
														
 
															-      file or class name and description of purpose be included on the
														
 
															-      same "printed page" as the copyright notice for easier
														
 
															-      identification within third-party archives.
														
 
															-
														
 
															-   Copyright [yyyy] [name of copyright owner]
														
 
															-
														
 
															-   Licensed under the Apache License, Version 2.0 (the "License");
														
 
															-   you may not use this file except in compliance with the License.
														
 
															-   You may obtain a copy of the License at
														
 
															-
														
 
															-       http://www.apache.org/licenses/LICENSE-2.0
														
 
															-
														
 
															-   Unless required by applicable law or agreed to in writing, software
														
 
															-   distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-   See the License for the specific language governing permissions and
														
 
															-   limitations under the License.
														
--- a/src/contrib/cloud/lib/pyAntTasks-1.3.jar
+++ b/src/contrib/cloud/lib/pyAntTasks-1.3.jar
--- a/src/contrib/cloud/src/integration-test/create-ebs-snapshot.sh
+++ b/src/contrib/cloud/src/integration-test/create-ebs-snapshot.sh
@@ -1,52 +0,0 @@
 
															-#!/usr/bin/env bash
														
 
															-#
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-#
														
 
															-# This script tests the "hadoop-ec2 create-formatted-snapshot" command.
														
 
															-# The snapshot is deleted immediately afterwards.
														
 
															-#
														
 
															-# Example usage:
														
 
															-# ./create-ebs-snapshot.sh
														
 
															-#
														
 
															-
														
 
															-set -e
														
 
															-set -x
														
 
															-
														
 
															-bin=`dirname "$0"`
														
 
															-bin=`cd "$bin"; pwd`
														
 
															-
														
 
															-WORKSPACE=${WORKSPACE:-`pwd`}
														
 
															-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
														
 
															-CLUSTER=${CLUSTER:-hadoop-cloud-$USER-test-cluster}
														
 
															-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
														
 
															-KEY_NAME=${KEY_NAME:-$USER}
														
 
															-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
														
 
															-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
														
 
															-SSH_OPTIONS=${SSH_OPTIONS:-"-i ~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME \
														
 
															-  -o StrictHostKeyChecking=no"}
														
 
															-
														
 
															-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-$HADOOP_CLOUD_PROVIDER
														
 
															-
														
 
															-$HADOOP_CLOUD_SCRIPT create-formatted-snapshot --config-dir=$CONFIG_DIR \
														
 
															-  --key-name=$KEY_NAME --availability-zone=$AVAILABILITY_ZONE \
														
 
															-  --ssh-options="$SSH_OPTIONS" \
														
 
															-  $CLUSTER 1 > out.tmp
														
 
															-
														
 
															-snapshot_id=`grep 'Created snapshot' out.tmp | awk '{print $3}'`
														
 
															-
														
 
															-ec2-delete-snapshot $snapshot_id
														
 
															-
														
 
															-rm -f out.tmp
														
--- a/src/contrib/cloud/src/integration-test/ebs-storage-spec.json
+++ b/src/contrib/cloud/src/integration-test/ebs-storage-spec.json
@@ -1,30 +0,0 @@
 
															-{
														
 
															-  "nn": [
														
 
															-    {
														
 
															-      "device": "/dev/sdj",
														
 
															-      "mount_point": "/ebs1",
														
 
															-      "size_gb": "7",
														
 
															-      "snapshot_id": "snap-fe44bb97"
														
 
															-    },
														
 
															-    {
														
 
															-      "device": "/dev/sdk",
														
 
															-      "mount_point": "/ebs2",
														
 
															-      "size_gb": "7",
														
 
															-      "snapshot_id": "snap-fe44bb97"
														
 
															-    }
														
 
															-  ],
														
 
															-  "dn": [
														
 
															-    {
														
 
															-      "device": "/dev/sdj",
														
 
															-      "mount_point": "/ebs1",
														
 
															-      "size_gb": "7",
														
 
															-      "snapshot_id": "snap-fe44bb97"
														
 
															-    },
														
 
															-    {
														
 
															-      "device": "/dev/sdk",
														
 
															-      "mount_point": "/ebs2",
														
 
															-      "size_gb": "7",
														
 
															-      "snapshot_id": "snap-fe44bb97"
														
 
															-    }
														
 
															-  ]
														
 
															-}
														
--- a/src/contrib/cloud/src/integration-test/persistent-cluster.sh
+++ b/src/contrib/cloud/src/integration-test/persistent-cluster.sh
@@ -1,122 +0,0 @@
 
															-#!/usr/bin/env bash
														
 
															-#
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-#
														
 
															-# This script tests the Hadoop cloud scripts by running through a minimal
														
 
															-# sequence of steps to start a persistent (EBS) cluster, run a job, then
														
 
															-# shutdown the cluster.
														
 
															-#
														
 
															-# Example usage:
														
 
															-# HADOOP_HOME=~/dev/hadoop-0.20.1/ ./persistent-cluster.sh
														
 
															-#
														
 
															-
														
 
															-function wait_for_volume_detachment() {
														
 
															-  set +e
														
 
															-  set +x
														
 
															-  while true; do
														
 
															-    attached=`$HADOOP_CLOUD_SCRIPT list-storage --config-dir=$CONFIG_DIR \
														
 
															-      $CLUSTER | awk '{print $6}' | grep 'attached'`
														
 
															-    sleep 5
														
 
															-    if [ -z "$attached" ]; then
														
 
															-      break
														
 
															-    fi
														
 
															-  done
														
 
															-  set -e
														
 
															-  set -x
														
 
															-}
														
 
															-
														
 
															-set -e
														
 
															-set -x
														
 
															-
														
 
															-bin=`dirname "$0"`
														
 
															-bin=`cd "$bin"; pwd`
														
 
															-
														
 
															-WORKSPACE=${WORKSPACE:-`pwd`}
														
 
															-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
														
 
															-CLUSTER=${CLUSTER:-hadoop-cloud-ebs-$USER-test-cluster}
														
 
															-IMAGE_ID=${IMAGE_ID:-ami-6159bf08} # default to Fedora 32-bit AMI
														
 
															-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
														
 
															-KEY_NAME=${KEY_NAME:-$USER}
														
 
															-AUTO_SHUTDOWN=${AUTO_SHUTDOWN:-15}
														
 
															-LOCAL_HADOOP_VERSION=${LOCAL_HADOOP_VERSION:-0.20.1}
														
 
															-HADOOP_HOME=${HADOOP_HOME:-$WORKSPACE/hadoop-$LOCAL_HADOOP_VERSION}
														
 
															-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
														
 
															-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
														
 
															-SSH_OPTIONS=${SSH_OPTIONS:-"-i ~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME \
														
 
															-  -o StrictHostKeyChecking=no"}
														
 
															-
														
 
															-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-$HADOOP_CLOUD_PROVIDER
														
 
															-export HADOOP_CONF_DIR=$CONFIG_DIR/$CLUSTER
														
 
															-
														
 
															-# Install Hadoop locally
														
 
															-if [ ! -d $HADOOP_HOME ]; then
														
 
															-  wget http://archive.apache.org/dist/hadoop/core/hadoop-\
														
 
															-$LOCAL_HADOOP_VERSION/hadoop-$LOCAL_HADOOP_VERSION.tar.gz
														
 
															-  tar zxf hadoop-$LOCAL_HADOOP_VERSION.tar.gz -C $WORKSPACE
														
 
															-  rm hadoop-$LOCAL_HADOOP_VERSION.tar.gz
														
 
															-fi
														
 
															-
														
 
															-# Create storage
														
 
															-$HADOOP_CLOUD_SCRIPT create-storage --config-dir=$CONFIG_DIR \
														
 
															-  --availability-zone=$AVAILABILITY_ZONE $CLUSTER nn 1 \
														
 
															-  $bin/ebs-storage-spec.json
														
 
															-$HADOOP_CLOUD_SCRIPT create-storage --config-dir=$CONFIG_DIR \
														
 
															-  --availability-zone=$AVAILABILITY_ZONE $CLUSTER dn 1 \
														
 
															-  $bin/ebs-storage-spec.json
														
 
															-
														
 
															-# Launch a cluster
														
 
															-$HADOOP_CLOUD_SCRIPT launch-cluster --config-dir=$CONFIG_DIR \
														
 
															-  --image-id=$IMAGE_ID --key-name=$KEY_NAME --auto-shutdown=$AUTO_SHUTDOWN \
														
 
															-  --availability-zone=$AVAILABILITY_ZONE $CLIENT_CIDRS $ENVS $CLUSTER 1
														
 
															-
														
 
															-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
														
 
															-eval `$HADOOP_CLOUD_SCRIPT proxy --config-dir=$CONFIG_DIR \
														
 
															-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
														
 
															-
														
 
															-# Run a job and check it works
														
 
															-$HADOOP_HOME/bin/hadoop fs -mkdir input
														
 
															-$HADOOP_HOME/bin/hadoop fs -put $HADOOP_HOME/LICENSE.txt input
														
 
															-$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep \
														
 
															-  input output Apache
														
 
															-# following returns a non-zero exit code if no match
														
 
															-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
														
 
															-
														
 
															-# Shutdown the cluster
														
 
															-kill $HADOOP_CLOUD_PROXY_PID
														
 
															-$HADOOP_CLOUD_SCRIPT terminate-cluster --config-dir=$CONFIG_DIR --force $CLUSTER
														
 
															-sleep 5 # wait for termination to take effect
														
 
															-
														
 
															-# Relaunch the cluster
														
 
															-$HADOOP_CLOUD_SCRIPT launch-cluster --config-dir=$CONFIG_DIR \
														
 
															-  --image-id=$IMAGE_ID --key-name=$KEY_NAME --auto-shutdown=$AUTO_SHUTDOWN \
														
 
															-  --availability-zone=$AVAILABILITY_ZONE $CLIENT_CIDRS $ENVS $CLUSTER 1
														
 
															-
														
 
															-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
														
 
															-eval `$HADOOP_CLOUD_SCRIPT proxy --config-dir=$CONFIG_DIR \
														
 
															-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
														
 
															-
														
 
															-# Check output is still there
														
 
															-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
														
 
															-
														
 
															-# Shutdown the cluster
														
 
															-kill $HADOOP_CLOUD_PROXY_PID
														
 
															-$HADOOP_CLOUD_SCRIPT terminate-cluster --config-dir=$CONFIG_DIR --force $CLUSTER
														
 
															-sleep 5 # wait for termination to take effect
														
 
															-
														
 
															-# Cleanup
														
 
															-$HADOOP_CLOUD_SCRIPT delete-cluster --config-dir=$CONFIG_DIR $CLUSTER
														
 
															-wait_for_volume_detachment
														
 
															-$HADOOP_CLOUD_SCRIPT delete-storage --config-dir=$CONFIG_DIR --force $CLUSTER
														
--- a/src/contrib/cloud/src/integration-test/transient-cluster.sh
+++ b/src/contrib/cloud/src/integration-test/transient-cluster.sh
@@ -1,112 +0,0 @@
 
															-#!/usr/bin/env bash
														
 
															-#
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-#
														
 
															-# This script tests the Hadoop cloud scripts by running through a minimal
														
 
															-# sequence of steps to start a cluster, run a job, then shutdown the cluster.
														
 
															-#
														
 
															-# Example usage:
														
 
															-# HADOOP_HOME=~/dev/hadoop-0.20.1/ ./transient-cluster.sh
														
 
															-#
														
 
															-
														
 
															-set -e
														
 
															-set -x
														
 
															-
														
 
															-bin=`dirname "$0"`
														
 
															-bin=`cd "$bin"; pwd`
														
 
															-
														
 
															-WORKSPACE=${WORKSPACE:-`pwd`}
														
 
															-CONFIG_DIR=${CONFIG_DIR:-$WORKSPACE/.hadoop-cloud}
														
 
															-CLUSTER=${CLUSTER:-hadoop-cloud-$USER-test-cluster}
														
 
															-IMAGE_ID=${IMAGE_ID:-ami-6159bf08} # default to Fedora 32-bit AMI
														
 
															-INSTANCE_TYPE=${INSTANCE_TYPE:-m1.small}
														
 
															-AVAILABILITY_ZONE=${AVAILABILITY_ZONE:-us-east-1c}
														
 
															-KEY_NAME=${KEY_NAME:-$USER}
														
 
															-AUTO_SHUTDOWN=${AUTO_SHUTDOWN:-15}
														
 
															-LOCAL_HADOOP_VERSION=${LOCAL_HADOOP_VERSION:-0.20.1}
														
 
															-HADOOP_HOME=${HADOOP_HOME:-$WORKSPACE/hadoop-$LOCAL_HADOOP_VERSION}
														
 
															-HADOOP_CLOUD_HOME=${HADOOP_CLOUD_HOME:-$bin/../py}
														
 
															-HADOOP_CLOUD_PROVIDER=${HADOOP_CLOUD_PROVIDER:-ec2}
														
 
															-PUBLIC_KEY=${PUBLIC_KEY:-~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME.pub}
														
 
															-PRIVATE_KEY=${PRIVATE_KEY:-~/.$HADOOP_CLOUD_PROVIDER/id_rsa-$KEY_NAME}
														
 
															-SSH_OPTIONS=${SSH_OPTIONS:-"-i $PRIVATE_KEY -o StrictHostKeyChecking=no"}
														
 
															-LAUNCH_ARGS=${LAUNCH_ARGS:-"1 nn,snn,jt 1 dn,tt"}
														
 
															-
														
 
															-HADOOP_CLOUD_SCRIPT=$HADOOP_CLOUD_HOME/hadoop-cloud
														
 
															-export HADOOP_CONF_DIR=$CONFIG_DIR/$CLUSTER
														
 
															-
														
 
															-# Install Hadoop locally
														
 
															-if [ ! -d $HADOOP_HOME ]; then
														
 
															-  wget http://archive.apache.org/dist/hadoop/core/hadoop-\
														
 
															-$LOCAL_HADOOP_VERSION/hadoop-$LOCAL_HADOOP_VERSION.tar.gz
														
 
															-  tar zxf hadoop-$LOCAL_HADOOP_VERSION.tar.gz -C $WORKSPACE
														
 
															-  rm hadoop-$LOCAL_HADOOP_VERSION.tar.gz
														
 
															-fi
														
 
															-
														
 
															-# Launch a cluster
														
 
															-if [ $HADOOP_CLOUD_PROVIDER == 'ec2' ]; then
														
 
															-  $HADOOP_CLOUD_SCRIPT launch-cluster \
														
 
															-    --config-dir=$CONFIG_DIR \
														
 
															-    --image-id=$IMAGE_ID \
														
 
															-    --instance-type=$INSTANCE_TYPE \
														
 
															-    --key-name=$KEY_NAME \
														
 
															-    --auto-shutdown=$AUTO_SHUTDOWN \
														
 
															-    --availability-zone=$AVAILABILITY_ZONE \
														
 
															-    $CLIENT_CIDRS $ENVS $CLUSTER $LAUNCH_ARGS
														
 
															-else
														
 
															-  $HADOOP_CLOUD_SCRIPT launch-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-    --config-dir=$CONFIG_DIR \
														
 
															-    --image-id=$IMAGE_ID \
														
 
															-    --instance-type=$INSTANCE_TYPE \
														
 
															-    --public-key=$PUBLIC_KEY \
														
 
															-    --private-key=$PRIVATE_KEY \
														
 
															-    --auto-shutdown=$AUTO_SHUTDOWN \
														
 
															-    $CLIENT_CIDRS $ENVS $CLUSTER $LAUNCH_ARGS
														
 
															-fi
														
 
															-  
														
 
															-# List clusters
														
 
															-$HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-  --config-dir=$CONFIG_DIR
														
 
															-$HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-  --config-dir=$CONFIG_DIR $CLUSTER
														
 
															-
														
 
															-# Run a proxy and save its pid in HADOOP_CLOUD_PROXY_PID
														
 
															-eval `$HADOOP_CLOUD_SCRIPT proxy --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-  --config-dir=$CONFIG_DIR \
														
 
															-  --ssh-options="$SSH_OPTIONS" $CLUSTER`
														
 
															-  
														
 
															-if [ $HADOOP_CLOUD_PROVIDER == 'rackspace' ]; then
														
 
															-  # Need to update /etc/hosts (interactively)
														
 
															-  $HADOOP_CLOUD_SCRIPT list --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-    --config-dir=$CONFIG_DIR $CLUSTER | grep 'nn,snn,jt' \
														
 
															-    | awk '{print $4 " " $3 }'  | sudo tee -a /etc/hosts
														
 
															-fi
														
 
															-
														
 
															-# Run a job and check it works
														
 
															-$HADOOP_HOME/bin/hadoop fs -mkdir input
														
 
															-$HADOOP_HOME/bin/hadoop fs -put $HADOOP_HOME/LICENSE.txt input
														
 
															-$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep \
														
 
															-  input output Apache
														
 
															-# following returns a non-zero exit code if no match
														
 
															-$HADOOP_HOME/bin/hadoop fs -cat 'output/part-00000' | grep Apache
														
 
															-
														
 
															-# Shutdown the cluster
														
 
															-kill $HADOOP_CLOUD_PROXY_PID
														
 
															-$HADOOP_CLOUD_SCRIPT terminate-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-  --config-dir=$CONFIG_DIR --force $CLUSTER
														
 
															-sleep 5 # wait for termination to take effect
														
 
															-$HADOOP_CLOUD_SCRIPT delete-cluster --cloud-provider=$HADOOP_CLOUD_PROVIDER \
														
 
															-  --config-dir=$CONFIG_DIR $CLUSTER
														
--- a/src/contrib/cloud/src/py/hadoop-cloud
+++ b/src/contrib/cloud/src/py/hadoop-cloud
@@ -1,21 +0,0 @@
 
															-#!/usr/bin/env python2.5
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from hadoop.cloud.cli import main
														
 
															-
														
 
															-if __name__ == "__main__":
														
 
															-  main()
														
--- a/src/contrib/cloud/src/py/hadoop-ec2
+++ b/src/contrib/cloud/src/py/hadoop-ec2
@@ -1,21 +0,0 @@
 
															-#!/usr/bin/env python2.5
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from hadoop.cloud.cli import main
														
 
															-
														
 
															-if __name__ == "__main__":
														
 
															-  main()
														
--- a/src/contrib/cloud/src/py/hadoop/__init__.py
+++ b/src/contrib/cloud/src/py/hadoop/__init__.py
@@ -1,14 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/__init__.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/__init__.py
@@ -1,15 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-VERSION="0.22.0"
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/cli.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/cli.py
@@ -1,438 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from __future__ import with_statement
														
 
															-
														
 
															-import ConfigParser
														
 
															-from hadoop.cloud import VERSION
														
 
															-from hadoop.cloud.cluster import get_cluster
														
 
															-from hadoop.cloud.service import get_service
														
 
															-from hadoop.cloud.service import InstanceTemplate
														
 
															-from hadoop.cloud.service import NAMENODE
														
 
															-from hadoop.cloud.service import SECONDARY_NAMENODE
														
 
															-from hadoop.cloud.service import JOBTRACKER
														
 
															-from hadoop.cloud.service import DATANODE
														
 
															-from hadoop.cloud.service import TASKTRACKER
														
 
															-from hadoop.cloud.util import merge_config_with_options
														
 
															-from hadoop.cloud.util import xstr
														
 
															-import logging
														
 
															-from optparse import OptionParser
														
 
															-from optparse import make_option
														
 
															-import os
														
 
															-import sys
														
 
															-
														
 
															-DEFAULT_SERVICE_NAME = 'hadoop'
														
 
															-DEFAULT_CLOUD_PROVIDER = 'ec2'
														
 
															-
														
 
															-DEFAULT_CONFIG_DIR_NAME = '.hadoop-cloud'
														
 
															-DEFAULT_CONFIG_DIR = os.path.join(os.environ['HOME'], DEFAULT_CONFIG_DIR_NAME)
														
 
															-CONFIG_FILENAME = 'clusters.cfg'
														
 
															-
														
 
															-CONFIG_DIR_OPTION = \
														
 
															-  make_option("--config-dir", metavar="CONFIG-DIR",
														
 
															-    help="The configuration directory.")
														
 
															-
														
 
															-PROVIDER_OPTION = \
														
 
															-  make_option("--cloud-provider", metavar="PROVIDER",
														
 
															-    help="The cloud provider, e.g. 'ec2' for Amazon EC2.")
														
 
															-
														
 
															-BASIC_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-]
														
 
															-
														
 
															-LAUNCH_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-  make_option("-a", "--ami", metavar="AMI",
														
 
															-    help="The AMI ID of the image to launch. (Amazon EC2 only. Deprecated, use \
														
 
															---image-id.)"),
														
 
															-  make_option("-e", "--env", metavar="ENV", action="append",
														
 
															-    help="An environment variable to pass to instances. \
														
 
															-(May be specified multiple times.)"),
														
 
															-  make_option("-f", "--user-data-file", metavar="URL",
														
 
															-    help="The URL of the file containing user data to be made available to \
														
 
															-instances."),
														
 
															-  make_option("--image-id", metavar="ID",
														
 
															-    help="The ID of the image to launch."),
														
 
															-  make_option("-k", "--key-name", metavar="KEY-PAIR",
														
 
															-    help="The key pair to use when launching instances. (Amazon EC2 only.)"),
														
 
															-  make_option("-p", "--user-packages", metavar="PACKAGES",
														
 
															-    help="A space-separated list of packages to install on instances on start \
														
 
															-up."),
														
 
															-  make_option("-t", "--instance-type", metavar="TYPE",
														
 
															-    help="The type of instance to be launched. One of m1.small, m1.large, \
														
 
															-m1.xlarge, c1.medium, or c1.xlarge."),
														
 
															-  make_option("-z", "--availability-zone", metavar="ZONE",
														
 
															-    help="The availability zone to run the instances in."),
														
 
															-  make_option("--auto-shutdown", metavar="TIMEOUT_MINUTES",
														
 
															-    help="The time in minutes after launch when an instance will be \
														
 
															-automatically shut down."),
														
 
															-  make_option("--client-cidr", metavar="CIDR", action="append",
														
 
															-    help="The CIDR of the client, which is used to allow access through the \
														
 
															-firewall to the master node. (May be specified multiple times.)"),
														
 
															-  make_option("--security-group", metavar="SECURITY_GROUP", action="append",
														
 
															-    default=[], help="Additional security groups within which the instances \
														
 
															-should be run. (Amazon EC2 only.) (May be specified multiple times.)"),
														
 
															-  make_option("--public-key", metavar="FILE",
														
 
															-    help="The public key to authorize on launching instances. (Non-EC2 \
														
 
															-providers only.)"),
														
 
															-  make_option("--private-key", metavar="FILE",
														
 
															-    help="The private key to use when connecting to instances. (Non-EC2 \
														
 
															-providers only.)"),
														
 
															-]
														
 
															-
														
 
															-SNAPSHOT_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-  make_option("-k", "--key-name", metavar="KEY-PAIR",
														
 
															-    help="The key pair to use when launching instances."),
														
 
															-  make_option("-z", "--availability-zone", metavar="ZONE",
														
 
															-    help="The availability zone to run the instances in."),
														
 
															-  make_option("--ssh-options", metavar="SSH-OPTIONS",
														
 
															-    help="SSH options to use."),
														
 
															-]
														
 
															-
														
 
															-PLACEMENT_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-  make_option("-z", "--availability-zone", metavar="ZONE",
														
 
															-    help="The availability zone to run the instances in."),
														
 
															-]
														
 
															-
														
 
															-FORCE_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-  make_option("--force", action="store_true", default=False,
														
 
															-  help="Do not ask for confirmation."),
														
 
															-]
														
 
															-
														
 
															-SSH_OPTIONS = [
														
 
															-  CONFIG_DIR_OPTION,
														
 
															-  PROVIDER_OPTION,
														
 
															-  make_option("--ssh-options", metavar="SSH-OPTIONS",
														
 
															-    help="SSH options to use."),
														
 
															-]
														
 
															-
														
 
															-def print_usage(script):
														
 
															-  print """Usage: %(script)s COMMAND [OPTIONS]
														
 
															-where COMMAND and [OPTIONS] may be one of:
														
 
															-  list [CLUSTER]                      list all running Hadoop clusters
														
 
															-                                        or instances in CLUSTER
														
 
															-  launch-master CLUSTER               launch or find a master in CLUSTER
														
 
															-  launch-slaves CLUSTER NUM_SLAVES    launch NUM_SLAVES slaves in CLUSTER
														
 
															-  launch-cluster CLUSTER (NUM_SLAVES| launch a master and NUM_SLAVES slaves or
														
 
															-    N ROLE [N ROLE ...])                N instances in ROLE in CLUSTER
														
 
															-  create-formatted-snapshot CLUSTER   create an empty, formatted snapshot of
														
 
															-    SIZE                                size SIZE GiB
														
 
															-  list-storage CLUSTER                list storage volumes for CLUSTER
														
 
															-  create-storage CLUSTER ROLE         create volumes for NUM_INSTANCES instances
														
 
															-    NUM_INSTANCES SPEC_FILE             in ROLE for CLUSTER, using SPEC_FILE
														
 
															-  attach-storage ROLE                 attach storage volumes for ROLE to CLUSTER
														
 
															-  login CLUSTER                       log in to the master in CLUSTER over SSH
														
 
															-  proxy CLUSTER                       start a SOCKS proxy on localhost into the
														
 
															-                                        CLUSTER
														
 
															-  push CLUSTER FILE                   scp FILE to the master in CLUSTER
														
 
															-  exec CLUSTER CMD                    execute CMD on the master in CLUSTER
														
 
															-  terminate-cluster CLUSTER           terminate all instances in CLUSTER
														
 
															-  delete-cluster CLUSTER              delete the group information for CLUSTER
														
 
															-  delete-storage CLUSTER              delete all storage volumes for CLUSTER
														
 
															-  update-slaves-file CLUSTER          update the slaves file on the CLUSTER
														
 
															-                                        master
														
 
															-
														
 
															-Use %(script)s COMMAND --help to see additional options for specific commands.
														
 
															-""" % locals()
														
 
															-
														
 
															-def print_deprecation(script, replacement):
														
 
															-  print "Deprecated. Use '%(script)s %(replacement)s'." % locals()
														
 
															-
														
 
															-def parse_options_and_config(command, option_list=[], extra_arguments=(),
														
 
															-                             unbounded_args=False):
														
 
															-  """
														
 
															-  Parse the arguments to command using the given option list, and combine with
														
 
															-  any configuration parameters.
														
 
															-
														
 
															-  If unbounded_args is true then there must be at least as many extra arguments
														
 
															-  as specified by extra_arguments (the first argument is always CLUSTER).
														
 
															-  Otherwise there must be exactly the same number of arguments as
														
 
															-  extra_arguments.
														
 
															-  """
														
 
															-  expected_arguments = ["CLUSTER",]
														
 
															-  expected_arguments.extend(extra_arguments)
														
 
															-  (options_dict, args) = parse_options(command, option_list, expected_arguments,
														
 
															-                                       unbounded_args)
														
 
															-
														
 
															-  config_dir = get_config_dir(options_dict)
														
 
															-  config_files = [os.path.join(config_dir, CONFIG_FILENAME)]
														
 
															-  if 'config_dir' not in options_dict:
														
 
															-    # if config_dir not set, then also search in current directory
														
 
															-    config_files.insert(0, CONFIG_FILENAME)
														
 
															-
														
 
															-  config = ConfigParser.ConfigParser()
														
 
															-  read_files = config.read(config_files)
														
 
															-  logging.debug("Read %d configuration files: %s", len(read_files),
														
 
															-                ", ".join(read_files))
														
 
															-  cluster_name = args[0]
														
 
															-  opt = merge_config_with_options(cluster_name, config, options_dict)
														
 
															-  logging.debug("Options: %s", str(opt))
														
 
															-  service_name = get_service_name(opt)
														
 
															-  cloud_provider = get_cloud_provider(opt)
														
 
															-  cluster = get_cluster(cloud_provider)(cluster_name, config_dir)
														
 
															-  service = get_service(service_name, cloud_provider)(cluster)
														
 
															-  return (opt, args, service)
														
 
															-
														
 
															-def parse_options(command, option_list=[], expected_arguments=(),
														
 
															-                  unbounded_args=False):
														
 
															-  """
														
 
															-  Parse the arguments to command using the given option list.
														
 
															-
														
 
															-  If unbounded_args is true then there must be at least as many extra arguments
														
 
															-  as specified by extra_arguments (the first argument is always CLUSTER).
														
 
															-  Otherwise there must be exactly the same number of arguments as
														
 
															-  extra_arguments.
														
 
															-  """
														
 
															-
														
 
															-  config_file_name = "%s/%s" % (DEFAULT_CONFIG_DIR_NAME, CONFIG_FILENAME)
														
 
															-  usage = """%%prog %s [options] %s
														
 
															-
														
 
															-Options may also be specified in a configuration file called
														
 
															-%s located in the user's home directory.
														
 
															-Options specified on the command line take precedence over any in the
														
 
															-configuration file.""" % (command, " ".join(expected_arguments),
														
 
															-                          config_file_name)
														
 
															-  parser = OptionParser(usage=usage, version="%%prog %s" % VERSION,
														
 
															-                        option_list=option_list)
														
 
															-  parser.disable_interspersed_args()
														
 
															-  (options, args) = parser.parse_args(sys.argv[2:])
														
 
															-  if unbounded_args:
														
 
															-    if len(args) < len(expected_arguments):
														
 
															-      parser.error("incorrect number of arguments")
														
 
															-  elif len(args) != len(expected_arguments):
														
 
															-    parser.error("incorrect number of arguments")
														
 
															-  return (vars(options), args)
														
 
															-
														
 
															-def get_config_dir(options_dict):
														
 
															-  config_dir = options_dict.get('config_dir')
														
 
															-  if not config_dir:
														
 
															-    config_dir = DEFAULT_CONFIG_DIR
														
 
															-  return config_dir
														
 
															-
														
 
															-def get_service_name(options_dict):
														
 
															-  service_name = options_dict.get("service", None)
														
 
															-  if service_name is None:
														
 
															-    service_name = DEFAULT_SERVICE_NAME
														
 
															-  return service_name
														
 
															-
														
 
															-def get_cloud_provider(options_dict):
														
 
															-  provider = options_dict.get("cloud_provider", None)
														
 
															-  if provider is None:
														
 
															-    provider = DEFAULT_CLOUD_PROVIDER
														
 
															-  return provider
														
 
															-
														
 
															-def check_options_set(options, option_names):
														
 
															-  for option_name in option_names:
														
 
															-    if options.get(option_name) is None:
														
 
															-      print "Option '%s' is missing. Aborting." % option_name
														
 
															-      sys.exit(1)
														
 
															-
														
 
															-def check_launch_options_set(cluster, options):
														
 
															-  if cluster.get_provider_code() == 'ec2':
														
 
															-    if options.get('ami') is None and options.get('image_id') is None:
														
 
															-      print "One of ami or image_id must be specified. Aborting."
														
 
															-      sys.exit(1)
														
 
															-    check_options_set(options, ['key_name'])
														
 
															-  else:
														
 
															-    check_options_set(options, ['image_id', 'public_key'])
														
 
															-
														
 
															-def get_image_id(cluster, options):
														
 
															-  if cluster.get_provider_code() == 'ec2':
														
 
															-    return options.get('image_id', options.get('ami'))
														
 
															-  else:
														
 
															-    return options.get('image_id')
														
 
															-
														
 
															-def main():
														
 
															-  # Use HADOOP_CLOUD_LOGGING_LEVEL=DEBUG to enable debugging output.
														
 
															-  logging.basicConfig(level=getattr(logging,
														
 
															-                                    os.getenv("HADOOP_CLOUD_LOGGING_LEVEL",
														
 
															-                                              "INFO")))
														
 
															-
														
 
															-  if len(sys.argv) < 2:
														
 
															-    print_usage(sys.argv[0])
														
 
															-    sys.exit(1)
														
 
															-
														
 
															-  command = sys.argv[1]
														
 
															-
														
 
															-  if command == 'list':
														
 
															-    (opt, args) = parse_options(command, BASIC_OPTIONS, unbounded_args=True)
														
 
															-    if len(args) == 0:
														
 
															-      service_name = get_service_name(opt)
														
 
															-      cloud_provider = get_cloud_provider(opt)
														
 
															-      service = get_service(service_name, cloud_provider)(None)
														
 
															-      service.list_all(cloud_provider)
														
 
															-    else:
														
 
															-      (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
														
 
															-      service.list()
														
 
															-
														
 
															-  elif command == 'launch-master':
														
 
															-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS)
														
 
															-    check_launch_options_set(service.cluster, opt)
														
 
															-    config_dir = get_config_dir(opt)
														
 
															-    template = InstanceTemplate((NAMENODE, SECONDARY_NAMENODE, JOBTRACKER), 1,
														
 
															-                         get_image_id(service.cluster, opt),
														
 
															-                         opt.get('instance_type'), opt.get('key_name'),
														
 
															-                         opt.get('public_key'), opt.get('private_key'),
														
 
															-                         opt.get('user_data_file'),
														
 
															-                         opt.get('availability_zone'), opt.get('user_packages'),
														
 
															-                         opt.get('auto_shutdown'), opt.get('env'),
														
 
															-                         opt.get('security_group'))
														
 
															-    service.launch_master(template, config_dir, opt.get('client_cidr'))
														
 
															-
														
 
															-  elif command == 'launch-slaves':
														
 
															-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS,
														
 
															-                                                    ("NUM_SLAVES",))
														
 
															-    number_of_slaves = int(args[1])
														
 
															-    check_launch_options_set(service.cluster, opt)
														
 
															-    template = InstanceTemplate((DATANODE, TASKTRACKER), number_of_slaves,
														
 
															-                         get_image_id(service.cluster, opt),
														
 
															-                         opt.get('instance_type'), opt.get('key_name'),
														
 
															-                         opt.get('public_key'), opt.get('private_key'),
														
 
															-                         opt.get('user_data_file'),
														
 
															-                         opt.get('availability_zone'), opt.get('user_packages'),
														
 
															-                         opt.get('auto_shutdown'), opt.get('env'),
														
 
															-                         opt.get('security_group'))
														
 
															-    service.launch_slaves(template)
														
 
															-
														
 
															-  elif command == 'launch-cluster':
														
 
															-    (opt, args, service) = parse_options_and_config(command, LAUNCH_OPTIONS,
														
 
															-                                                    ("NUM_SLAVES",),
														
 
															-                                                    unbounded_args=True)
														
 
															-    check_launch_options_set(service.cluster, opt)
														
 
															-    config_dir = get_config_dir(opt)
														
 
															-    instance_templates = []
														
 
															-    if len(args) == 2:
														
 
															-      number_of_slaves = int(args[1])
														
 
															-      print_deprecation(sys.argv[0], 'launch-cluster %s 1 nn,snn,jt %s dn,tt' %
														
 
															-                        (service.cluster.name, number_of_slaves))
														
 
															-      instance_templates = [
														
 
															-        InstanceTemplate((NAMENODE, SECONDARY_NAMENODE, JOBTRACKER), 1,
														
 
															-                         get_image_id(service.cluster, opt),
														
 
															-                         opt.get('instance_type'), opt.get('key_name'),
														
 
															-                         opt.get('public_key'), opt.get('private_key'),
														
 
															-                         opt.get('user_data_file'),
														
 
															-                         opt.get('availability_zone'), opt.get('user_packages'),
														
 
															-                         opt.get('auto_shutdown'), opt.get('env'),
														
 
															-                         opt.get('security_group')),
														
 
															-        InstanceTemplate((DATANODE, TASKTRACKER), number_of_slaves,
														
 
															-                         get_image_id(service.cluster, opt),
														
 
															-                         opt.get('instance_type'), opt.get('key_name'),
														
 
															-                         opt.get('public_key'), opt.get('private_key'),
														
 
															-                         opt.get('user_data_file'),
														
 
															-                         opt.get('availability_zone'), opt.get('user_packages'),
														
 
															-                         opt.get('auto_shutdown'), opt.get('env'),
														
 
															-                         opt.get('security_group')),
														
 
															-                         ]
														
 
															-    elif len(args) > 2 and len(args) % 2 == 0:
														
 
															-      print_usage(sys.argv[0])
														
 
															-      sys.exit(1)
														
 
															-    else:
														
 
															-      for i in range(len(args) / 2):
														
 
															-        number = int(args[2 * i + 1])
														
 
															-        roles = args[2 * i + 2].split(",")
														
 
															-        instance_templates.append(
														
 
															-          InstanceTemplate(roles, number, get_image_id(service.cluster, opt),
														
 
															-                           opt.get('instance_type'), opt.get('key_name'),
														
 
															-                           opt.get('public_key'), opt.get('private_key'),
														
 
															-                           opt.get('user_data_file'),
														
 
															-                           opt.get('availability_zone'),
														
 
															-                           opt.get('user_packages'),
														
 
															-                           opt.get('auto_shutdown'), opt.get('env'),
														
 
															-                           opt.get('security_group')))
														
 
															-
														
 
															-    service.launch_cluster(instance_templates, config_dir,
														
 
															-                           opt.get('client_cidr'))
														
 
															-
														
 
															-  elif command == 'login':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
														
 
															-    service.login(opt.get('ssh_options'))
														
 
															-
														
 
															-  elif command == 'proxy':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
														
 
															-    service.proxy(opt.get('ssh_options'))
														
 
															-
														
 
															-  elif command == 'push':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS,
														
 
															-                                                    ("FILE",))
														
 
															-    service.push(opt.get('ssh_options'), args[1])
														
 
															-
														
 
															-  elif command == 'exec':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS,
														
 
															-                                                    ("CMD",), True)
														
 
															-    service.execute(opt.get('ssh_options'), args[1:])
														
 
															-
														
 
															-  elif command == 'terminate-cluster':
														
 
															-    (opt, args, service) = parse_options_and_config(command, FORCE_OPTIONS)
														
 
															-    service.terminate_cluster(opt["force"])
														
 
															-
														
 
															-  elif command == 'delete-cluster':
														
 
															-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
														
 
															-    service.delete_cluster()
														
 
															-
														
 
															-  elif command == 'create-formatted-snapshot':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SNAPSHOT_OPTIONS,
														
 
															-                                                    ("SIZE",))
														
 
															-    size = int(args[1])
														
 
															-    check_options_set(opt, ['availability_zone', 'key_name'])
														
 
															-    ami_ubuntu_intrepid_x86 = 'ami-ec48af85' # use a general AMI
														
 
															-    service.create_formatted_snapshot(size,
														
 
															-                                         opt.get('availability_zone'),
														
 
															-                                         ami_ubuntu_intrepid_x86,
														
 
															-                                         opt.get('key_name'),
														
 
															-                                         xstr(opt.get('ssh_options')))
														
 
															-
														
 
															-  elif command == 'list-storage':
														
 
															-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS)
														
 
															-    service.list_storage()
														
 
															-
														
 
															-  elif command == 'create-storage':
														
 
															-    (opt, args, service) = parse_options_and_config(command, PLACEMENT_OPTIONS,
														
 
															-                                                    ("ROLE", "NUM_INSTANCES",
														
 
															-                                                     "SPEC_FILE"))
														
 
															-    role = args[1]
														
 
															-    number_of_instances = int(args[2])
														
 
															-    spec_file = args[3]
														
 
															-    check_options_set(opt, ['availability_zone'])
														
 
															-    service.create_storage(role, number_of_instances,
														
 
															-                           opt.get('availability_zone'), spec_file)
														
 
															-
														
 
															-  elif command == 'attach-storage':
														
 
															-    (opt, args, service) = parse_options_and_config(command, BASIC_OPTIONS,
														
 
															-                                                    ("ROLE",))
														
 
															-    service.attach_storage(args[1])
														
 
															-
														
 
															-  elif command == 'delete-storage':
														
 
															-    (opt, args, service) = parse_options_and_config(command, FORCE_OPTIONS)
														
 
															-    service.delete_storage(opt["force"])
														
 
															-
														
 
															-  elif command == 'update-slaves-file':
														
 
															-    (opt, args, service) = parse_options_and_config(command, SSH_OPTIONS)
														
 
															-    check_options_set(opt, ['private_key'])
														
 
															-    ssh_options = xstr(opt.get('ssh_options'))
														
 
															-    config_dir = get_config_dir(opt)
														
 
															-    service.update_slaves_file(config_dir, ssh_options, opt.get('private_key'))
														
 
															-
														
 
															-  else:
														
 
															-    print "Unrecognized command '%s'" % command
														
 
															-    print_usage(sys.argv[0])
														
 
															-    sys.exit(1)
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/cluster.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/cluster.py
@@ -1,187 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-"""
														
 
															-Classes for controlling a cluster of cloud instances.
														
 
															-"""
														
 
															-
														
 
															-from __future__ import with_statement
														
 
															-
														
 
															-import gzip
														
 
															-import StringIO
														
 
															-import urllib
														
 
															-
														
 
															-from hadoop.cloud.storage import Storage
														
 
															-
														
 
															-CLUSTER_PROVIDER_MAP = {
														
 
															-  "dummy": ('hadoop.cloud.providers.dummy', 'DummyCluster'),
														
 
															-  "ec2": ('hadoop.cloud.providers.ec2', 'Ec2Cluster'),
														
 
															-  "rackspace": ('hadoop.cloud.providers.rackspace', 'RackspaceCluster'),
														
 
															-}
														
 
															-
														
 
															-def get_cluster(provider):
														
 
															-  """
														
 
															-  Retrieve the Cluster class for a provider.
														
 
															-  """
														
 
															-  mod_name, driver_name = CLUSTER_PROVIDER_MAP[provider]
														
 
															-  _mod = __import__(mod_name, globals(), locals(), [driver_name])
														
 
															-  return getattr(_mod, driver_name)
														
 
															-
														
 
															-class Cluster(object):
														
 
															-  """
														
 
															-  A cluster of server instances. A cluster has a unique name.
														
 
															-  One may launch instances which run in a certain role.
														
 
															-  """
														
 
															-
														
 
															-  def __init__(self, name, config_dir):
														
 
															-    self.name = name
														
 
															-    self.config_dir = config_dir
														
 
															-
														
 
															-  def get_provider_code(self):
														
 
															-    """
														
 
															-    The code that uniquely identifies the cloud provider.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-
														
 
															-  def authorize_role(self, role, from_port, to_port, cidr_ip):
														
 
															-    """
														
 
															-    Authorize access to machines in a given role from a given network.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def get_instances_in_role(self, role, state_filter=None):
														
 
															-    """
														
 
															-    Get all the instances in a role, filtered by state.
														
 
															-
														
 
															-    @param role: the name of the role
														
 
															-    @param state_filter: the state that the instance should be in
														
 
															-    (e.g. "running"), or None for all states
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-
														
 
															-  def print_status(self, roles=None, state_filter="running"):
														
 
															-    """
														
 
															-    Print the status of instances in the given roles, filtered by state.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def check_running(self, role, number):
														
 
															-    """
														
 
															-    Check that a certain number of instances in a role are running.
														
 
															-    """
														
 
															-    instances = self.get_instances_in_role(role, "running")
														
 
															-    if len(instances) != number:
														
 
															-      print "Expected %s instances in role %s, but was %s %s" % \
														
 
															-        (number, role, len(instances), instances)
														
 
															-      return False
														
 
															-    else:
														
 
															-      return instances
														
 
															-
														
 
															-  def launch_instances(self, roles, number, image_id, size_id,
														
 
															-                       instance_user_data, **kwargs):
														
 
															-    """
														
 
															-    Launch instances (having the given roles) in the cluster.
														
 
															-    Returns a list of IDs for the instances started.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def wait_for_instances(self, instance_ids, timeout=600):
														
 
															-    """
														
 
															-    Wait for instances to start.
														
 
															-    Raise TimeoutException if the timeout is exceeded.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def terminate(self):
														
 
															-    """
														
 
															-    Terminate all instances in the cluster.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def delete(self):
														
 
															-    """
														
 
															-    Delete the cluster permanently. This operation is only permitted if no
														
 
															-    instances are running.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def get_storage(self):
														
 
															-    """
														
 
															-    Return the external storage for the cluster.
														
 
															-    """
														
 
															-    return Storage(self)
														
 
															-
														
 
															-class InstanceUserData(object):
														
 
															-  """
														
 
															-  The data passed to an instance on start up.
														
 
															-  """
														
 
															-
														
 
															-  def __init__(self, filename, replacements={}):
														
 
															-    self.filename = filename
														
 
															-    self.replacements = replacements
														
 
															-
														
 
															-  def _read_file(self, filename):
														
 
															-    """
														
 
															-    Read the user data.
														
 
															-    """
														
 
															-    return urllib.urlopen(filename).read()
														
 
															-
														
 
															-  def read(self):
														
 
															-    """
														
 
															-    Read the user data, making replacements.
														
 
															-    """
														
 
															-    contents = self._read_file(self.filename)
														
 
															-    for (match, replacement) in self.replacements.iteritems():
														
 
															-      if replacement == None:
														
 
															-        replacement = ''
														
 
															-      contents = contents.replace(match, replacement)
														
 
															-    return contents
														
 
															-
														
 
															-  def read_as_gzip_stream(self):
														
 
															-    """
														
 
															-    Read and compress the data.
														
 
															-    """
														
 
															-    output = StringIO.StringIO()
														
 
															-    compressed = gzip.GzipFile(mode='wb', fileobj=output)
														
 
															-    compressed.write(self.read())
														
 
															-    compressed.close()
														
 
															-    return output.getvalue()
														
 
															-
														
 
															-class Instance(object):
														
 
															-  """
														
 
															-  A server instance.
														
 
															-  """
														
 
															-  def __init__(self, id, public_ip, private_ip):
														
 
															-    self.id = id
														
 
															-    self.public_ip = public_ip
														
 
															-    self.private_ip = private_ip
														
 
															-
														
 
															-class RoleSyntaxException(Exception):
														
 
															-  """
														
 
															-  Raised when a role name is invalid. Role names may consist of a sequence
														
 
															-  of alphanumeric characters and underscores. Dashes are not permitted in role
														
 
															-  names.
														
 
															-  """
														
 
															-  def __init__(self, message):
														
 
															-    super(RoleSyntaxException, self).__init__()
														
 
															-    self.message = message
														
 
															-  def __str__(self):
														
 
															-    return repr(self.message)
														
 
															-
														
 
															-class TimeoutException(Exception):
														
 
															-  """
														
 
															-  Raised when a timeout is exceeded.
														
 
															-  """
														
 
															-  pass
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/data/boot-rackspace.sh
+++ b/src/contrib/cloud/src/py/hadoop/cloud/data/boot-rackspace.sh
@@ -1,459 +0,0 @@
 
															-#!/bin/bash -x
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-################################################################################
														
 
															-# Script that is run on each instance on boot.
														
 
															-################################################################################
														
 
															-
														
 
															-################################################################################
														
 
															-# Initialize variables
														
 
															-################################################################################
														
 
															-SELF_HOST=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`
														
 
															-HADOOP_VERSION=${HADOOP_VERSION:-0.20.1}
														
 
															-HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION
														
 
															-HADOOP_CONF_DIR=$HADOOP_HOME/conf
														
 
															-for role in $(echo "$ROLES" | tr "," "\n"); do
														
 
															-  case $role in
														
 
															-  nn)
														
 
															-    NN_HOST=$SELF_HOST
														
 
															-    ;;
														
 
															-  jt)
														
 
															-    JT_HOST=$SELF_HOST
														
 
															-    ;;
														
 
															-  esac
														
 
															-done
														
 
															-
														
 
															-function register_auto_shutdown() {
														
 
															-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
														
 
															-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function update_repo() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    sudo apt-get update
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum update -y yum
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install a list of packages on debian or redhat as appropriate
														
 
															-function install_packages() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    apt-get update
														
 
															-    apt-get -y install $@
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum install -y $@
														
 
															-  else
														
 
															-    echo "No package manager found."
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install any user packages specified in the USER_PACKAGES environment variable
														
 
															-function install_user_packages() {
														
 
															-  if [ ! -z "$USER_PACKAGES" ]; then
														
 
															-    install_packages $USER_PACKAGES
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function install_hadoop() {
														
 
															-  useradd hadoop
														
 
															-
														
 
															-  hadoop_tar_url=http://s3.amazonaws.com/hadoop-releases/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
														
 
															-  hadoop_tar_file=`basename $hadoop_tar_url`
														
 
															-  hadoop_tar_md5_file=`basename $hadoop_tar_url.md5`
														
 
															-
														
 
															-  curl="curl --retry 3 --silent --show-error --fail"
														
 
															-  for i in `seq 1 3`;
														
 
															-  do
														
 
															-    $curl -O $hadoop_tar_url
														
 
															-    $curl -O $hadoop_tar_url.md5
														
 
															-    if md5sum -c $hadoop_tar_md5_file; then
														
 
															-      break;
														
 
															-    else
														
 
															-      rm -f $hadoop_tar_file $hadoop_tar_md5_file
														
 
															-    fi
														
 
															-  done
														
 
															-
														
 
															-  if [ ! -e $hadoop_tar_file ]; then
														
 
															-    echo "Failed to download $hadoop_tar_url. Aborting."
														
 
															-    exit 1
														
 
															-  fi
														
 
															-
														
 
															-  tar zxf $hadoop_tar_file -C /usr/local
														
 
															-  rm -f $hadoop_tar_file $hadoop_tar_md5_file
														
 
															-
														
 
															-  echo "export HADOOP_HOME=$HADOOP_HOME" >> ~root/.bashrc
														
 
															-  echo 'export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH' >> ~root/.bashrc
														
 
															-}
														
 
															-
														
 
															-function prep_disk() {
														
 
															-  mount=$1
														
 
															-  device=$2
														
 
															-  automount=${3:-false}
														
 
															-
														
 
															-  echo "warning: ERASING CONTENTS OF $device"
														
 
															-  mkfs.xfs -f $device
														
 
															-  if [ ! -e $mount ]; then
														
 
															-    mkdir $mount
														
 
															-  fi
														
 
															-  mount -o defaults,noatime $device $mount
														
 
															-  if $automount ; then
														
 
															-    echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function wait_for_mount {
														
 
															-  mount=$1
														
 
															-  device=$2
														
 
															-
														
 
															-  mkdir $mount
														
 
															-
														
 
															-  i=1
														
 
															-  echo "Attempting to mount $device"
														
 
															-  while true ; do
														
 
															-    sleep 10
														
 
															-    echo -n "$i "
														
 
															-    i=$[$i+1]
														
 
															-    mount -o defaults,noatime $device $mount || continue
														
 
															-    echo " Mounted."
														
 
															-    break;
														
 
															-  done
														
 
															-}
														
 
															-
														
 
															-function make_hadoop_dirs {
														
 
															-  for mount in "$@"; do
														
 
															-    if [ ! -e $mount/hadoop ]; then
														
 
															-      mkdir -p $mount/hadoop
														
 
															-      chown hadoop:hadoop $mount/hadoop
														
 
															-    fi
														
 
															-  done
														
 
															-}
														
 
															-
														
 
															-# Configure Hadoop by setting up disks and site file
														
 
															-function configure_hadoop() {
														
 
															-
														
 
															-  MOUNT=/data
														
 
															-  FIRST_MOUNT=$MOUNT
														
 
															-  DFS_NAME_DIR=$MOUNT/hadoop/hdfs/name
														
 
															-  FS_CHECKPOINT_DIR=$MOUNT/hadoop/hdfs/secondary
														
 
															-  DFS_DATA_DIR=$MOUNT/hadoop/hdfs/data
														
 
															-  MAPRED_LOCAL_DIR=$MOUNT/hadoop/mapred/local
														
 
															-  MAX_MAP_TASKS=2
														
 
															-  MAX_REDUCE_TASKS=1
														
 
															-  CHILD_OPTS=-Xmx550m
														
 
															-  CHILD_ULIMIT=1126400
														
 
															-  TMP_DIR=$MOUNT/tmp/hadoop-\${user.name}
														
 
															-
														
 
															-  mkdir -p $MOUNT/hadoop
														
 
															-  chown hadoop:hadoop $MOUNT/hadoop
														
 
															-  mkdir $MOUNT/tmp
														
 
															-  chmod a+rwxt $MOUNT/tmp
														
 
															-
														
 
															-  mkdir /etc/hadoop
														
 
															-  ln -s $HADOOP_CONF_DIR /etc/hadoop/conf
														
 
															-
														
 
															-  ##############################################################################
														
 
															-  # Modify this section to customize your Hadoop cluster.
														
 
															-  ##############################################################################
														
 
															-  cat > $HADOOP_CONF_DIR/hadoop-site.xml <<EOF
														
 
															-<?xml version="1.0"?>
														
 
															-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
														
 
															-<configuration>
														
 
															-<property>
														
 
															-  <name>dfs.block.size</name>
														
 
															-  <value>134217728</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.data.dir</name>
														
 
															-  <value>$DFS_DATA_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.datanode.du.reserved</name>
														
 
															-  <value>1073741824</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.datanode.handler.count</name>
														
 
															-  <value>3</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<!--property>
														
 
															-  <name>dfs.hosts</name>
														
 
															-  <value>$HADOOP_CONF_DIR/dfs.hosts</value>
														
 
															-  <final>true</final>
														
 
															-</property-->
														
 
															-<!--property>
														
 
															-  <name>dfs.hosts.exclude</name>
														
 
															-  <value>$HADOOP_CONF_DIR/dfs.hosts.exclude</value>
														
 
															-  <final>true</final>
														
 
															-</property-->
														
 
															-<property>
														
 
															-  <name>dfs.name.dir</name>
														
 
															-  <value>$DFS_NAME_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.namenode.handler.count</name>
														
 
															-  <value>5</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.permissions</name>
														
 
															-  <value>true</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.replication</name>
														
 
															-  <value>$DFS_REPLICATION</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.checkpoint.dir</name>
														
 
															-  <value>$FS_CHECKPOINT_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.default.name</name>
														
 
															-  <value>hdfs://$NN_HOST:8020/</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.trash.interval</name>
														
 
															-  <value>1440</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.tmp.dir</name>
														
 
															-  <value>/data/tmp/hadoop-\${user.name}</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>io.file.buffer.size</name>
														
 
															-  <value>65536</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.child.java.opts</name>
														
 
															-  <value>$CHILD_OPTS</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.child.ulimit</name>
														
 
															-  <value>$CHILD_ULIMIT</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.job.tracker</name>
														
 
															-  <value>$JT_HOST:8021</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.job.tracker.handler.count</name>
														
 
															-  <value>5</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.local.dir</name>
														
 
															-  <value>$MAPRED_LOCAL_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.map.tasks.speculative.execution</name>
														
 
															-  <value>true</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.parallel.copies</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.tasks</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.tasks.speculative.execution</name>
														
 
															-  <value>false</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.submit.replication</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.system.dir</name>
														
 
															-  <value>/hadoop/system/mapred</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.tasktracker.map.tasks.maximum</name>
														
 
															-  <value>$MAX_MAP_TASKS</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.tasktracker.reduce.tasks.maximum</name>
														
 
															-  <value>$MAX_REDUCE_TASKS</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>tasktracker.http.threads</name>
														
 
															-  <value>46</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.compress.map.output</name>
														
 
															-  <value>true</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.output.compression.type</name>
														
 
															-  <value>BLOCK</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.default</name>
														
 
															-  <value>org.apache.hadoop.net.StandardSocketFactory</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.ClientProtocol</name>
														
 
															-  <value></value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.JobSubmissionProtocol</name>
														
 
															-  <value></value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>io.compression.codecs</name>
														
 
															-  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</value>
														
 
															-</property>
														
 
															-</configuration>
														
 
															-EOF
														
 
															-
														
 
															-  # Keep PID files in a non-temporary directory
														
 
															-  sed -i -e "s|# export HADOOP_PID_DIR=.*|export HADOOP_PID_DIR=/var/run/hadoop|" \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-  mkdir -p /var/run/hadoop
														
 
															-  chown -R hadoop:hadoop /var/run/hadoop
														
 
															-
														
 
															-  # Set SSH options within the cluster
														
 
															-  sed -i -e 's|# export HADOOP_SSH_OPTS=.*|export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no"|' \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-    
														
 
															-  # Disable IPv6
														
 
															-  sed -i -e 's|# export HADOOP_OPTS=.*|export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"|' \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-
														
 
															-  # Hadoop logs should be on the /mnt partition
														
 
															-  sed -i -e 's|# export HADOOP_LOG_DIR=.*|export HADOOP_LOG_DIR=/var/log/hadoop/logs|' \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-  rm -rf /var/log/hadoop
														
 
															-  mkdir /data/hadoop/logs
														
 
															-  chown hadoop:hadoop /data/hadoop/logs
														
 
															-  ln -s /data/hadoop/logs /var/log/hadoop
														
 
															-  chown -R hadoop:hadoop /var/log/hadoop
														
 
															-
														
 
															-}
														
 
															-
														
 
															-# Sets up small website on cluster.
														
 
															-function setup_web() {
														
 
															-
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    apt-get -y install thttpd
														
 
															-    WWW_BASE=/var/www
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum install -y thttpd
														
 
															-    chkconfig --add thttpd
														
 
															-    WWW_BASE=/var/www/thttpd/html
														
 
															-  fi
														
 
															-
														
 
															-  cat > $WWW_BASE/index.html << END
														
 
															-<html>
														
 
															-<head>
														
 
															-<title>Hadoop Cloud Cluster</title>
														
 
															-</head>
														
 
															-<body>
														
 
															-<h1>Hadoop Cloud Cluster</h1>
														
 
															-To browse the cluster you need to have a proxy configured.
														
 
															-Start the proxy with <tt>hadoop-cloud proxy &lt;cluster_name&gt;</tt>,
														
 
															-and point your browser to
														
 
															-<a href="http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac">this Proxy
														
 
															-Auto-Configuration (PAC)</a> file.  To manage multiple proxy configurations,
														
 
															-you may wish to use
														
 
															-<a href="https://addons.mozilla.org/en-US/firefox/addon/2464">FoxyProxy</a>.
														
 
															-<ul>
														
 
															-<li><a href="http://$NN_HOST:50070/">NameNode</a>
														
 
															-<li><a href="http://$JT_HOST:50030/">JobTracker</a>
														
 
															-</ul>
														
 
															-</body>
														
 
															-</html>
														
 
															-END
														
 
															-
														
 
															-  service thttpd start
														
 
															-
														
 
															-}
														
 
															-
														
 
															-function start_namenode() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    AS_HADOOP="su -s /bin/bash - hadoop -c"
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
														
 
															-  fi
														
 
															-
														
 
															-  # Format HDFS
														
 
															-  [ ! -e $FIRST_MOUNT/hadoop/hdfs ] && $AS_HADOOP "$HADOOP_HOME/bin/hadoop namenode -format"
														
 
															-
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start namenode"
														
 
															-
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop dfsadmin -safemode wait"
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -mkdir /user"
														
 
															-  # The following is questionable, as it allows a user to delete another user
														
 
															-  # It's needed to allow users to create their own user directories
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -chmod +w /user"
														
 
															-
														
 
															-}
														
 
															-
														
 
															-function start_daemon() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    AS_HADOOP="su -s /bin/bash - hadoop -c"
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
														
 
															-  fi
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start $1"
														
 
															-}
														
 
															-
														
 
															-update_repo
														
 
															-register_auto_shutdown
														
 
															-install_user_packages
														
 
															-install_hadoop
														
 
															-configure_hadoop
														
 
															-
														
 
															-for role in $(echo "$ROLES" | tr "," "\n"); do
														
 
															-  case $role in
														
 
															-  nn)
														
 
															-    setup_web
														
 
															-    start_namenode
														
 
															-    ;;
														
 
															-  snn)
														
 
															-    start_daemon secondarynamenode
														
 
															-    ;;
														
 
															-  jt)
														
 
															-    start_daemon jobtracker
														
 
															-    ;;
														
 
															-  dn)
														
 
															-    start_daemon datanode
														
 
															-    ;;
														
 
															-  tt)
														
 
															-    start_daemon tasktracker
														
 
															-    ;;
														
 
															-  esac
														
 
															-done
														
 
															-
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh
+++ b/src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh
@@ -1,548 +0,0 @@
 
															-#!/bin/bash -x
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-################################################################################
														
 
															-# Script that is run on each EC2 instance on boot. It is passed in the EC2 user
														
 
															-# data, so should not exceed 16K in size after gzip compression.
														
 
															-#
														
 
															-# This script is executed by /etc/init.d/ec2-run-user-data, and output is
														
 
															-# logged to /var/log/messages.
														
 
															-################################################################################
														
 
															-
														
 
															-################################################################################
														
 
															-# Initialize variables
														
 
															-################################################################################
														
 
															-
														
 
															-# Substitute environment variables passed by the client
														
 
															-export %ENV%
														
 
															-
														
 
															-HADOOP_VERSION=${HADOOP_VERSION:-0.20.1}
														
 
															-HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION
														
 
															-HADOOP_CONF_DIR=$HADOOP_HOME/conf
														
 
															-SELF_HOST=`wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname`
														
 
															-for role in $(echo "$ROLES" | tr "," "\n"); do
														
 
															-  case $role in
														
 
															-  nn)
														
 
															-    NN_HOST=$SELF_HOST
														
 
															-    ;;
														
 
															-  jt)
														
 
															-    JT_HOST=$SELF_HOST
														
 
															-    ;;
														
 
															-  esac
														
 
															-done
														
 
															-
														
 
															-function register_auto_shutdown() {
														
 
															-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
														
 
															-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install a list of packages on debian or redhat as appropriate
														
 
															-function install_packages() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    apt-get update
														
 
															-    apt-get -y install $@
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum install -y $@
														
 
															-  else
														
 
															-    echo "No package manager found."
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install any user packages specified in the USER_PACKAGES environment variable
														
 
															-function install_user_packages() {
														
 
															-  if [ ! -z "$USER_PACKAGES" ]; then
														
 
															-    install_packages $USER_PACKAGES
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function install_hadoop() {
														
 
															-  useradd hadoop
														
 
															-
														
 
															-  hadoop_tar_url=http://s3.amazonaws.com/hadoop-releases/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
														
 
															-  hadoop_tar_file=`basename $hadoop_tar_url`
														
 
															-  hadoop_tar_md5_file=`basename $hadoop_tar_url.md5`
														
 
															-
														
 
															-  curl="curl --retry 3 --silent --show-error --fail"
														
 
															-  for i in `seq 1 3`;
														
 
															-  do
														
 
															-    $curl -O $hadoop_tar_url
														
 
															-    $curl -O $hadoop_tar_url.md5
														
 
															-    if md5sum -c $hadoop_tar_md5_file; then
														
 
															-      break;
														
 
															-    else
														
 
															-      rm -f $hadoop_tar_file $hadoop_tar_md5_file
														
 
															-    fi
														
 
															-  done
														
 
															-
														
 
															-  if [ ! -e $hadoop_tar_file ]; then
														
 
															-    echo "Failed to download $hadoop_tar_url. Aborting."
														
 
															-    exit 1
														
 
															-  fi
														
 
															-
														
 
															-  tar zxf $hadoop_tar_file -C /usr/local
														
 
															-  rm -f $hadoop_tar_file $hadoop_tar_md5_file
														
 
															-
														
 
															-  echo "export HADOOP_HOME=$HADOOP_HOME" >> ~root/.bashrc
														
 
															-  echo 'export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH' >> ~root/.bashrc
														
 
															-}
														
 
															-
														
 
															-function prep_disk() {
														
 
															-  mount=$1
														
 
															-  device=$2
														
 
															-  automount=${3:-false}
														
 
															-
														
 
															-  echo "warning: ERASING CONTENTS OF $device"
														
 
															-  mkfs.xfs -f $device
														
 
															-  if [ ! -e $mount ]; then
														
 
															-    mkdir $mount
														
 
															-  fi
														
 
															-  mount -o defaults,noatime $device $mount
														
 
															-  if $automount ; then
														
 
															-    echo "$device $mount xfs defaults,noatime 0 0" >> /etc/fstab
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function wait_for_mount {
														
 
															-  mount=$1
														
 
															-  device=$2
														
 
															-
														
 
															-  mkdir $mount
														
 
															-
														
 
															-  i=1
														
 
															-  echo "Attempting to mount $device"
														
 
															-  while true ; do
														
 
															-    sleep 10
														
 
															-    echo -n "$i "
														
 
															-    i=$[$i+1]
														
 
															-    mount -o defaults,noatime $device $mount || continue
														
 
															-    echo " Mounted."
														
 
															-    break;
														
 
															-  done
														
 
															-}
														
 
															-
														
 
															-function make_hadoop_dirs {
														
 
															-  for mount in "$@"; do
														
 
															-    if [ ! -e $mount/hadoop ]; then
														
 
															-      mkdir -p $mount/hadoop
														
 
															-      chown hadoop:hadoop $mount/hadoop
														
 
															-    fi
														
 
															-  done
														
 
															-}
														
 
															-
														
 
															-# Configure Hadoop by setting up disks and site file
														
 
															-function configure_hadoop() {
														
 
															-
														
 
															-  install_packages xfsprogs # needed for XFS
														
 
															-
														
 
															-  INSTANCE_TYPE=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`
														
 
															-
														
 
															-  if [ -n "$EBS_MAPPINGS" ]; then
														
 
															-    # EBS_MAPPINGS is like "/ebs1,/dev/sdj;/ebs2,/dev/sdk"
														
 
															-    DFS_NAME_DIR=''
														
 
															-    FS_CHECKPOINT_DIR=''
														
 
															-    DFS_DATA_DIR=''
														
 
															-    for mapping in $(echo "$EBS_MAPPINGS" | tr ";" "\n"); do
														
 
															-      # Split on the comma (see "Parameter Expansion" in the bash man page)
														
 
															-      mount=${mapping%,*}
														
 
															-      device=${mapping#*,}
														
 
															-      wait_for_mount $mount $device
														
 
															-      DFS_NAME_DIR=${DFS_NAME_DIR},"$mount/hadoop/hdfs/name"
														
 
															-      FS_CHECKPOINT_DIR=${FS_CHECKPOINT_DIR},"$mount/hadoop/hdfs/secondary"
														
 
															-      DFS_DATA_DIR=${DFS_DATA_DIR},"$mount/hadoop/hdfs/data"
														
 
															-      FIRST_MOUNT=${FIRST_MOUNT-$mount}
														
 
															-      make_hadoop_dirs $mount
														
 
															-    done
														
 
															-    # Remove leading commas
														
 
															-    DFS_NAME_DIR=${DFS_NAME_DIR#?}
														
 
															-    FS_CHECKPOINT_DIR=${FS_CHECKPOINT_DIR#?}
														
 
															-    DFS_DATA_DIR=${DFS_DATA_DIR#?}
														
 
															-
														
 
															-    DFS_REPLICATION=3 # EBS is internally replicated, but we also use HDFS replication for safety
														
 
															-  else
														
 
															-    case $INSTANCE_TYPE in
														
 
															-    m1.xlarge|c1.xlarge)
														
 
															-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name,/mnt2/hadoop/hdfs/name
														
 
															-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary,/mnt2/hadoop/hdfs/secondary
														
 
															-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data,/mnt2/hadoop/hdfs/data,/mnt3/hadoop/hdfs/data,/mnt4/hadoop/hdfs/data
														
 
															-      ;;
														
 
															-    m1.large)
														
 
															-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name,/mnt2/hadoop/hdfs/name
														
 
															-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary,/mnt2/hadoop/hdfs/secondary
														
 
															-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data,/mnt2/hadoop/hdfs/data
														
 
															-      ;;
														
 
															-    *)
														
 
															-      # "m1.small" or "c1.medium"
														
 
															-      DFS_NAME_DIR=/mnt/hadoop/hdfs/name
														
 
															-      FS_CHECKPOINT_DIR=/mnt/hadoop/hdfs/secondary
														
 
															-      DFS_DATA_DIR=/mnt/hadoop/hdfs/data
														
 
															-      ;;
														
 
															-    esac
														
 
															-    FIRST_MOUNT=/mnt
														
 
															-    DFS_REPLICATION=3
														
 
															-  fi
														
 
															-
														
 
															-  case $INSTANCE_TYPE in
														
 
															-  m1.xlarge|c1.xlarge)
														
 
															-    prep_disk /mnt2 /dev/sdc true &
														
 
															-    disk2_pid=$!
														
 
															-    prep_disk /mnt3 /dev/sdd true &
														
 
															-    disk3_pid=$!
														
 
															-    prep_disk /mnt4 /dev/sde true &
														
 
															-    disk4_pid=$!
														
 
															-    wait $disk2_pid $disk3_pid $disk4_pid
														
 
															-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local,/mnt2/hadoop/mapred/local,/mnt3/hadoop/mapred/local,/mnt4/hadoop/mapred/local
														
 
															-    MAX_MAP_TASKS=8
														
 
															-    MAX_REDUCE_TASKS=4
														
 
															-    CHILD_OPTS=-Xmx680m
														
 
															-    CHILD_ULIMIT=1392640
														
 
															-    ;;
														
 
															-  m1.large)
														
 
															-    prep_disk /mnt2 /dev/sdc true
														
 
															-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local,/mnt2/hadoop/mapred/local
														
 
															-    MAX_MAP_TASKS=4
														
 
															-    MAX_REDUCE_TASKS=2
														
 
															-    CHILD_OPTS=-Xmx1024m
														
 
															-    CHILD_ULIMIT=2097152
														
 
															-    ;;
														
 
															-  c1.medium)
														
 
															-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local
														
 
															-    MAX_MAP_TASKS=4
														
 
															-    MAX_REDUCE_TASKS=2
														
 
															-    CHILD_OPTS=-Xmx550m
														
 
															-    CHILD_ULIMIT=1126400
														
 
															-    ;;
														
 
															-  *)
														
 
															-    # "m1.small"
														
 
															-    MAPRED_LOCAL_DIR=/mnt/hadoop/mapred/local
														
 
															-    MAX_MAP_TASKS=2
														
 
															-    MAX_REDUCE_TASKS=1
														
 
															-    CHILD_OPTS=-Xmx550m
														
 
															-    CHILD_ULIMIT=1126400
														
 
															-    ;;
														
 
															-  esac
														
 
															-
														
 
															-  make_hadoop_dirs `ls -d /mnt*`
														
 
															-
														
 
															-  # Create tmp directory
														
 
															-  mkdir /mnt/tmp
														
 
															-  chmod a+rwxt /mnt/tmp
														
 
															-  
														
 
															-  mkdir /etc/hadoop
														
 
															-  ln -s $HADOOP_CONF_DIR /etc/hadoop/conf
														
 
															-
														
 
															-  ##############################################################################
														
 
															-  # Modify this section to customize your Hadoop cluster.
														
 
															-  ##############################################################################
														
 
															-  cat > $HADOOP_CONF_DIR/hadoop-site.xml <<EOF
														
 
															-<?xml version="1.0"?>
														
 
															-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
														
 
															-<configuration>
														
 
															-<property>
														
 
															-  <name>dfs.block.size</name>
														
 
															-  <value>134217728</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.data.dir</name>
														
 
															-  <value>$DFS_DATA_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.datanode.du.reserved</name>
														
 
															-  <value>1073741824</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.datanode.handler.count</name>
														
 
															-  <value>3</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<!--property>
														
 
															-  <name>dfs.hosts</name>
														
 
															-  <value>$HADOOP_CONF_DIR/dfs.hosts</value>
														
 
															-  <final>true</final>
														
 
															-</property-->
														
 
															-<!--property>
														
 
															-  <name>dfs.hosts.exclude</name>
														
 
															-  <value>$HADOOP_CONF_DIR/dfs.hosts.exclude</value>
														
 
															-  <final>true</final>
														
 
															-</property-->
														
 
															-<property>
														
 
															-  <name>dfs.name.dir</name>
														
 
															-  <value>$DFS_NAME_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.namenode.handler.count</name>
														
 
															-  <value>5</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.permissions</name>
														
 
															-  <value>true</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>dfs.replication</name>
														
 
															-  <value>$DFS_REPLICATION</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.checkpoint.dir</name>
														
 
															-  <value>$FS_CHECKPOINT_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.default.name</name>
														
 
															-  <value>hdfs://$NN_HOST:8020/</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.trash.interval</name>
														
 
															-  <value>1440</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.tmp.dir</name>
														
 
															-  <value>/mnt/tmp/hadoop-\${user.name}</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>io.file.buffer.size</name>
														
 
															-  <value>65536</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.child.java.opts</name>
														
 
															-  <value>$CHILD_OPTS</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.child.ulimit</name>
														
 
															-  <value>$CHILD_ULIMIT</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.job.tracker</name>
														
 
															-  <value>$JT_HOST:8021</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.job.tracker.handler.count</name>
														
 
															-  <value>5</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.local.dir</name>
														
 
															-  <value>$MAPRED_LOCAL_DIR</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.map.tasks.speculative.execution</name>
														
 
															-  <value>true</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.parallel.copies</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.tasks</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.reduce.tasks.speculative.execution</name>
														
 
															-  <value>false</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.submit.replication</name>
														
 
															-  <value>10</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.system.dir</name>
														
 
															-  <value>/hadoop/system/mapred</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.tasktracker.map.tasks.maximum</name>
														
 
															-  <value>$MAX_MAP_TASKS</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.tasktracker.reduce.tasks.maximum</name>
														
 
															-  <value>$MAX_REDUCE_TASKS</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>tasktracker.http.threads</name>
														
 
															-  <value>46</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.compress.map.output</name>
														
 
															-  <value>true</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>mapred.output.compression.type</name>
														
 
															-  <value>BLOCK</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.default</name>
														
 
															-  <value>org.apache.hadoop.net.StandardSocketFactory</value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.ClientProtocol</name>
														
 
															-  <value></value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>hadoop.rpc.socket.factory.class.JobSubmissionProtocol</name>
														
 
															-  <value></value>
														
 
															-  <final>true</final>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>io.compression.codecs</name>
														
 
															-  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.s3.awsAccessKeyId</name>
														
 
															-  <value>$AWS_ACCESS_KEY_ID</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.s3.awsSecretAccessKey</name>
														
 
															-  <value>$AWS_SECRET_ACCESS_KEY</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.s3n.awsAccessKeyId</name>
														
 
															-  <value>$AWS_ACCESS_KEY_ID</value>
														
 
															-</property>
														
 
															-<property>
														
 
															-  <name>fs.s3n.awsSecretAccessKey</name>
														
 
															-  <value>$AWS_SECRET_ACCESS_KEY</value>
														
 
															-</property>
														
 
															-</configuration>
														
 
															-EOF
														
 
															-
														
 
															-  # Keep PID files in a non-temporary directory
														
 
															-  sed -i -e "s|# export HADOOP_PID_DIR=.*|export HADOOP_PID_DIR=/var/run/hadoop|" \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-  mkdir -p /var/run/hadoop
														
 
															-  chown -R hadoop:hadoop /var/run/hadoop
														
 
															-
														
 
															-  # Set SSH options within the cluster
														
 
															-  sed -i -e 's|# export HADOOP_SSH_OPTS=.*|export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no"|' \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-
														
 
															-  # Hadoop logs should be on the /mnt partition
														
 
															-  sed -i -e 's|# export HADOOP_LOG_DIR=.*|export HADOOP_LOG_DIR=/var/log/hadoop/logs|' \
														
 
															-    $HADOOP_CONF_DIR/hadoop-env.sh
														
 
															-  rm -rf /var/log/hadoop
														
 
															-  mkdir /mnt/hadoop/logs
														
 
															-  chown hadoop:hadoop /mnt/hadoop/logs
														
 
															-  ln -s /mnt/hadoop/logs /var/log/hadoop
														
 
															-  chown -R hadoop:hadoop /var/log/hadoop
														
 
															-
														
 
															-}
														
 
															-
														
 
															-# Sets up small website on cluster.
														
 
															-function setup_web() {
														
 
															-
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    apt-get -y install thttpd
														
 
															-    WWW_BASE=/var/www
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum install -y thttpd
														
 
															-    chkconfig --add thttpd
														
 
															-    WWW_BASE=/var/www/thttpd/html
														
 
															-  fi
														
 
															-
														
 
															-  cat > $WWW_BASE/index.html << END
														
 
															-<html>
														
 
															-<head>
														
 
															-<title>Hadoop EC2 Cluster</title>
														
 
															-</head>
														
 
															-<body>
														
 
															-<h1>Hadoop EC2 Cluster</h1>
														
 
															-To browse the cluster you need to have a proxy configured.
														
 
															-Start the proxy with <tt>hadoop-ec2 proxy &lt;cluster_name&gt;</tt>,
														
 
															-and point your browser to
														
 
															-<a href="http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac">this Proxy
														
 
															-Auto-Configuration (PAC)</a> file.  To manage multiple proxy configurations,
														
 
															-you may wish to use
														
 
															-<a href="https://addons.mozilla.org/en-US/firefox/addon/2464">FoxyProxy</a>.
														
 
															-<ul>
														
 
															-<li><a href="http://$NN_HOST:50070/">NameNode</a>
														
 
															-<li><a href="http://$JT_HOST:50030/">JobTracker</a>
														
 
															-</ul>
														
 
															-</body>
														
 
															-</html>
														
 
															-END
														
 
															-
														
 
															-  service thttpd start
														
 
															-
														
 
															-}
														
 
															-
														
 
															-function start_namenode() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    AS_HADOOP="su -s /bin/bash - hadoop -c"
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
														
 
															-  fi
														
 
															-
														
 
															-  # Format HDFS
														
 
															-  [ ! -e $FIRST_MOUNT/hadoop/hdfs ] && $AS_HADOOP "$HADOOP_HOME/bin/hadoop namenode -format"
														
 
															-
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start namenode"
														
 
															-
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop dfsadmin -safemode wait"
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -mkdir /user"
														
 
															-  # The following is questionable, as it allows a user to delete another user
														
 
															-  # It's needed to allow users to create their own user directories
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop fs -chmod +w /user"
														
 
															-
														
 
															-}
														
 
															-
														
 
															-function start_daemon() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    AS_HADOOP="su -s /bin/bash - hadoop -c"
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    AS_HADOOP="/sbin/runuser -s /bin/bash - hadoop -c"
														
 
															-  fi
														
 
															-  $AS_HADOOP "$HADOOP_HOME/bin/hadoop-daemon.sh start $1"
														
 
															-}
														
 
															-
														
 
															-register_auto_shutdown
														
 
															-install_user_packages
														
 
															-install_hadoop
														
 
															-configure_hadoop
														
 
															-
														
 
															-for role in $(echo "$ROLES" | tr "," "\n"); do
														
 
															-  case $role in
														
 
															-  nn)
														
 
															-    setup_web
														
 
															-    start_namenode
														
 
															-    ;;
														
 
															-  snn)
														
 
															-    start_daemon secondarynamenode
														
 
															-    ;;
														
 
															-  jt)
														
 
															-    start_daemon jobtracker
														
 
															-    ;;
														
 
															-  dn)
														
 
															-    start_daemon datanode
														
 
															-    ;;
														
 
															-  tt)
														
 
															-    start_daemon tasktracker
														
 
															-    ;;
														
 
															-  esac
														
 
															-done
														
 
															-
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-rackspace-init-remote.sh
+++ b/src/contrib/cloud/src/py/hadoop/cloud/data/hadoop-rackspace-init-remote.sh
@@ -1,22 +0,0 @@
 
															-#!/bin/bash -ex
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-# Run a script downloaded at boot time to avoid Rackspace's 10K limitation.
														
 
															-
														
 
															-wget -qO/usr/bin/runurl run.alestic.com/runurl
														
 
															-chmod 755 /usr/bin/runurl
														
 
															-%ENV% runurl http://hadoop-dev-test.s3.amazonaws.com/boot-rackspace.sh
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/data/zookeeper-ec2-init-remote.sh
+++ b/src/contrib/cloud/src/py/hadoop/cloud/data/zookeeper-ec2-init-remote.sh
@@ -1,112 +0,0 @@
 
															-#!/bin/bash -x
														
 
															-
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-################################################################################
														
 
															-# Script that is run on each EC2 instance on boot. It is passed in the EC2 user
														
 
															-# data, so should not exceed 16K in size after gzip compression.
														
 
															-#
														
 
															-# This script is executed by /etc/init.d/ec2-run-user-data, and output is
														
 
															-# logged to /var/log/messages.
														
 
															-################################################################################
														
 
															-
														
 
															-################################################################################
														
 
															-# Initialize variables
														
 
															-################################################################################
														
 
															-
														
 
															-# Substitute environment variables passed by the client
														
 
															-export %ENV%
														
 
															-
														
 
															-ZK_VERSION=${ZK_VERSION:-3.2.2}
														
 
															-ZOOKEEPER_HOME=/usr/local/zookeeper-$ZK_VERSION
														
 
															-ZK_CONF_DIR=/etc/zookeeper/conf
														
 
															-
														
 
															-function register_auto_shutdown() {
														
 
															-  if [ ! -z "$AUTO_SHUTDOWN" ]; then
														
 
															-    shutdown -h +$AUTO_SHUTDOWN >/dev/null &
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install a list of packages on debian or redhat as appropriate
														
 
															-function install_packages() {
														
 
															-  if which dpkg &> /dev/null; then
														
 
															-    apt-get update
														
 
															-    apt-get -y install $@
														
 
															-  elif which rpm &> /dev/null; then
														
 
															-    yum install -y $@
														
 
															-  else
														
 
															-    echo "No package manager found."
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-# Install any user packages specified in the USER_PACKAGES environment variable
														
 
															-function install_user_packages() {
														
 
															-  if [ ! -z "$USER_PACKAGES" ]; then
														
 
															-    install_packages $USER_PACKAGES
														
 
															-  fi
														
 
															-}
														
 
															-
														
 
															-function install_zookeeper() {
														
 
															-  zk_tar_url=http://www.apache.org/dist/hadoop/zookeeper/zookeeper-$ZK_VERSION/zookeeper-$ZK_VERSION.tar.gz
														
 
															-  zk_tar_file=`basename $zk_tar_url`
														
 
															-  zk_tar_md5_file=`basename $zk_tar_url.md5`
														
 
															-
														
 
															-  curl="curl --retry 3 --silent --show-error --fail"
														
 
															-  for i in `seq 1 3`;
														
 
															-  do
														
 
															-    $curl -O $zk_tar_url
														
 
															-    $curl -O $zk_tar_url.md5
														
 
															-    if md5sum -c $zk_tar_md5_file; then
														
 
															-      break;
														
 
															-    else
														
 
															-      rm -f $zk_tar_file $zk_tar_md5_file
														
 
															-    fi
														
 
															-  done
														
 
															-
														
 
															-  if [ ! -e $zk_tar_file ]; then
														
 
															-    echo "Failed to download $zk_tar_url. Aborting."
														
 
															-    exit 1
														
 
															-  fi
														
 
															-
														
 
															-  tar zxf $zk_tar_file -C /usr/local
														
 
															-  rm -f $zk_tar_file $zk_tar_md5_file
														
 
															-
														
 
															-  echo "export ZOOKEEPER_HOME=$ZOOKEEPER_HOME" >> ~root/.bashrc
														
 
															-  echo 'export PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH' >> ~root/.bashrc
														
 
															-}
														
 
															-
														
 
															-function configure_zookeeper() {
														
 
															-  mkdir -p /mnt/zookeeper/logs
														
 
															-  ln -s /mnt/zookeeper/logs /var/log/zookeeper
														
 
															-  mkdir -p /var/log/zookeeper/txlog
														
 
															-  mkdir -p $ZK_CONF_DIR
														
 
															-  cp $ZOOKEEPER_HOME/conf/log4j.properties $ZK_CONF_DIR
														
 
															-
														
 
															-  sed -i -e "s|log4j.rootLogger=INFO, CONSOLE|log4j.rootLogger=INFO, ROLLINGFILE|" \
														
 
															-         -e "s|log4j.appender.ROLLINGFILE.File=zookeeper.log|log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log|" \
														
 
															-      $ZK_CONF_DIR/log4j.properties
														
 
															-      
														
 
															-  # Ensure ZooKeeper starts on boot
														
 
															-  cat > /etc/rc.local <<EOF
														
 
															-ZOOCFGDIR=$ZK_CONF_DIR $ZOOKEEPER_HOME/bin/zkServer.sh start > /dev/null 2>&1 &
														
 
															-EOF
														
 
															-
														
 
															-}
														
 
															-
														
 
															-register_auto_shutdown
														
 
															-install_user_packages
														
 
															-install_zookeeper
														
 
															-configure_zookeeper
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/providers/__init__.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/providers/__init__.py
@@ -1,14 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/providers/dummy.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/providers/dummy.py
@@ -1,61 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import logging
														
 
															-
														
 
															-from hadoop.cloud.cluster import Cluster
														
 
															-from hadoop.cloud.cluster import Instance
														
 
															-
														
 
															-logger = logging.getLogger(__name__)
														
 
															-
														
 
															-class DummyCluster(Cluster):
														
 
															-
														
 
															-  @staticmethod
														
 
															-  def get_clusters_with_role(role, state="running"):
														
 
															-    logger.info("get_clusters_with_role(%s, %s)", role, state)
														
 
															-    return ["dummy-cluster"]
														
 
															-
														
 
															-  def __init__(self, name, config_dir):
														
 
															-    super(DummyCluster, self).__init__(name, config_dir)
														
 
															-    logger.info("__init__(%s, %s)", name, config_dir)
														
 
															-
														
 
															-  def get_provider_code(self):
														
 
															-    return "dummy"
														
 
															-
														
 
															-  def authorize_role(self, role, from_port, to_port, cidr_ip):
														
 
															-    logger.info("authorize_role(%s, %s, %s, %s)", role, from_port, to_port,
														
 
															-                cidr_ip)
														
 
															-
														
 
															-  def get_instances_in_role(self, role, state_filter=None):
														
 
															-    logger.info("get_instances_in_role(%s, %s)", role, state_filter)
														
 
															-    return [Instance(1, '127.0.0.1', '127.0.0.1')]
														
 
															-
														
 
															-  def print_status(self, roles, state_filter="running"):
														
 
															-    logger.info("print_status(%s, %s)", roles, state_filter)
														
 
															-
														
 
															-  def launch_instances(self, role, number, image_id, size_id,
														
 
															-                       instance_user_data, **kwargs):
														
 
															-    logger.info("launch_instances(%s, %s, %s, %s, %s, %s)", role, number,
														
 
															-                image_id, size_id, instance_user_data, str(kwargs))
														
 
															-    return [1]
														
 
															-
														
 
															-  def wait_for_instances(self, instance_ids, timeout=600):
														
 
															-    logger.info("wait_for_instances(%s, %s)", instance_ids, timeout)
														
 
															-
														
 
															-  def terminate(self):
														
 
															-    logger.info("terminate")
														
 
															-
														
 
															-  def delete(self):
														
 
															-    logger.info("delete")
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/providers/ec2.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/providers/ec2.py
@@ -1,479 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from boto.ec2.connection import EC2Connection
														
 
															-from boto.exception import EC2ResponseError
														
 
															-import logging
														
 
															-from hadoop.cloud.cluster import Cluster
														
 
															-from hadoop.cloud.cluster import Instance
														
 
															-from hadoop.cloud.cluster import RoleSyntaxException
														
 
															-from hadoop.cloud.cluster import TimeoutException
														
 
															-from hadoop.cloud.storage import JsonVolumeManager
														
 
															-from hadoop.cloud.storage import JsonVolumeSpecManager
														
 
															-from hadoop.cloud.storage import MountableVolume
														
 
															-from hadoop.cloud.storage import Storage
														
 
															-from hadoop.cloud.util import xstr
														
 
															-import os
														
 
															-import re
														
 
															-import subprocess
														
 
															-import sys
														
 
															-import time
														
 
															-
														
 
															-logger = logging.getLogger(__name__)
														
 
															-
														
 
															-def _run_command_on_instance(instance, ssh_options, command):
														
 
															-  print "Running ssh %s root@%s '%s'" % \
														
 
															-    (ssh_options, instance.public_dns_name, command)
														
 
															-  retcode = subprocess.call("ssh %s root@%s '%s'" %
														
 
															-                           (ssh_options, instance.public_dns_name, command),
														
 
															-                           shell=True)
														
 
															-  print "Command running on %s returned with value %s" % \
														
 
															-    (instance.public_dns_name, retcode)
														
 
															-
														
 
															-def _wait_for_volume(ec2_connection, volume_id):
														
 
															-  """
														
 
															-  Waits until a volume becomes available.
														
 
															-  """
														
 
															-  while True:
														
 
															-    volumes = ec2_connection.get_all_volumes([volume_id,])
														
 
															-    if volumes[0].status == 'available':
														
 
															-      break
														
 
															-    sys.stdout.write(".")
														
 
															-    sys.stdout.flush()
														
 
															-    time.sleep(1)
														
 
															-
														
 
															-class Ec2Cluster(Cluster):
														
 
															-  """
														
 
															-  A cluster of EC2 instances. A cluster has a unique name.
														
 
															-
														
 
															-  Instances running in the cluster run in a security group with the cluster's
														
 
															-  name, and also a name indicating the instance's role, e.g. <cluster-name>-foo
														
 
															-  to show a "foo" instance.
														
 
															-  """
														
 
															-
														
 
															-  @staticmethod
														
 
															-  def get_clusters_with_role(role, state="running"):
														
 
															-    all_instances = EC2Connection().get_all_instances()
														
 
															-    clusters = []
														
 
															-    for res in all_instances:
														
 
															-      instance = res.instances[0]
														
 
															-      for group in res.groups:
														
 
															-        if group.id.endswith("-" + role) and instance.state == state:
														
 
															-          clusters.append(re.sub("-%s$" % re.escape(role), "", group.id))
														
 
															-    return clusters
														
 
															-
														
 
															-  def __init__(self, name, config_dir):
														
 
															-    super(Ec2Cluster, self).__init__(name, config_dir)
														
 
															-    self.ec2Connection = EC2Connection()
														
 
															-
														
 
															-  def get_provider_code(self):
														
 
															-    return "ec2"
														
 
															-
														
 
															-  def _get_cluster_group_name(self):
														
 
															-    return self.name
														
 
															-
														
 
															-  def _check_role_name(self, role):
														
 
															-    if not re.match("^[a-zA-Z0-9_+]+$", role):
														
 
															-      raise RoleSyntaxException("Invalid role name '%s'" % role)
														
 
															-
														
 
															-  def _group_name_for_role(self, role):
														
 
															-    """
														
 
															-    Return the security group name for an instance in a given role.
														
 
															-    """
														
 
															-    self._check_role_name(role)
														
 
															-    return "%s-%s" % (self.name, role)
														
 
															-
														
 
															-  def _get_group_names(self, roles):
														
 
															-    group_names = [self._get_cluster_group_name()]
														
 
															-    for role in roles:
														
 
															-      group_names.append(self._group_name_for_role(role))
														
 
															-    return group_names
														
 
															-
														
 
															-  def _get_all_group_names(self):
														
 
															-    security_groups = self.ec2Connection.get_all_security_groups()
														
 
															-    security_group_names = \
														
 
															-      [security_group.name for security_group in security_groups]
														
 
															-    return security_group_names
														
 
															-
														
 
															-  def _get_all_group_names_for_cluster(self):
														
 
															-    all_group_names = self._get_all_group_names()
														
 
															-    r = []
														
 
															-    if self.name not in all_group_names:
														
 
															-      return r
														
 
															-    for group in all_group_names:
														
 
															-      if re.match("^%s(-[a-zA-Z0-9_+]+)?$" % self.name, group):
														
 
															-        r.append(group)
														
 
															-    return r
														
 
															-
														
 
															-  def _create_groups(self, role):
														
 
															-    """
														
 
															-    Create the security groups for a given role, including a group for the
														
 
															-    cluster if it doesn't exist.
														
 
															-    """
														
 
															-    self._check_role_name(role)
														
 
															-    security_group_names = self._get_all_group_names()
														
 
															-
														
 
															-    cluster_group_name = self._get_cluster_group_name()
														
 
															-    if not cluster_group_name in security_group_names:
														
 
															-      self.ec2Connection.create_security_group(cluster_group_name,
														
 
															-                                               "Cluster (%s)" % (self.name))
														
 
															-      self.ec2Connection.authorize_security_group(cluster_group_name,
														
 
															-                                                  cluster_group_name)
														
 
															-      # Allow SSH from anywhere
														
 
															-      self.ec2Connection.authorize_security_group(cluster_group_name,
														
 
															-                                                  ip_protocol="tcp",
														
 
															-                                                  from_port=22, to_port=22,
														
 
															-                                                  cidr_ip="0.0.0.0/0")
														
 
															-
														
 
															-    role_group_name = self._group_name_for_role(role)
														
 
															-    if not role_group_name in security_group_names:
														
 
															-      self.ec2Connection.create_security_group(role_group_name,
														
 
															-        "Role %s (%s)" % (role, self.name))
														
 
															-
														
 
															-  def authorize_role(self, role, from_port, to_port, cidr_ip):
														
 
															-    """
														
 
															-    Authorize access to machines in a given role from a given network.
														
 
															-    """
														
 
															-    self._check_role_name(role)
														
 
															-    role_group_name = self._group_name_for_role(role)
														
 
															-    # Revoke first to avoid InvalidPermission.Duplicate error
														
 
															-    self.ec2Connection.revoke_security_group(role_group_name,
														
 
															-                                             ip_protocol="tcp",
														
 
															-                                             from_port=from_port,
														
 
															-                                             to_port=to_port, cidr_ip=cidr_ip)
														
 
															-    self.ec2Connection.authorize_security_group(role_group_name,
														
 
															-                                                ip_protocol="tcp",
														
 
															-                                                from_port=from_port,
														
 
															-                                                to_port=to_port,
														
 
															-                                                cidr_ip=cidr_ip)
														
 
															-
														
 
															-  def _get_instances(self, group_name, state_filter=None):
														
 
															-    """
														
 
															-    Get all the instances in a group, filtered by state.
														
 
															-
														
 
															-    @param group_name: the name of the group
														
 
															-    @param state_filter: the state that the instance should be in
														
 
															-      (e.g. "running"), or None for all states
														
 
															-    """
														
 
															-    all_instances = self.ec2Connection.get_all_instances()
														
 
															-    instances = []
														
 
															-    for res in all_instances:
														
 
															-      for group in res.groups:
														
 
															-        if group.id == group_name:
														
 
															-          for instance in res.instances:
														
 
															-            if state_filter == None or instance.state == state_filter:
														
 
															-              instances.append(instance)
														
 
															-    return instances
														
 
															-
														
 
															-  def get_instances_in_role(self, role, state_filter=None):
														
 
															-    """
														
 
															-    Get all the instances in a role, filtered by state.
														
 
															-
														
 
															-    @param role: the name of the role
														
 
															-    @param state_filter: the state that the instance should be in
														
 
															-      (e.g. "running"), or None for all states
														
 
															-    """
														
 
															-    self._check_role_name(role)
														
 
															-    instances = []
														
 
															-    for instance in self._get_instances(self._group_name_for_role(role),
														
 
															-                                        state_filter):
														
 
															-      instances.append(Instance(instance.id, instance.dns_name,
														
 
															-                                instance.private_dns_name))
														
 
															-    return instances
														
 
															-
														
 
															-  def _print_instance(self, role, instance):
														
 
															-    print "\t".join((role, instance.id,
														
 
															-      instance.image_id,
														
 
															-      instance.dns_name, instance.private_dns_name,
														
 
															-      instance.state, xstr(instance.key_name), instance.instance_type,
														
 
															-      str(instance.launch_time), instance.placement))
														
 
															-
														
 
															-  def print_status(self, roles=None, state_filter="running"):
														
 
															-    """
														
 
															-    Print the status of instances in the given roles, filtered by state.
														
 
															-    """
														
 
															-    if not roles:
														
 
															-      for instance in self._get_instances(self._get_cluster_group_name(),
														
 
															-                                          state_filter):
														
 
															-        self._print_instance("", instance)
														
 
															-    else:
														
 
															-      for role in roles:
														
 
															-        for instance in self._get_instances(self._group_name_for_role(role),
														
 
															-                                            state_filter):
														
 
															-          self._print_instance(role, instance)
														
 
															-
														
 
															-  def launch_instances(self, roles, number, image_id, size_id,
														
 
															-                       instance_user_data, **kwargs):
														
 
															-    for role in roles:
														
 
															-      self._check_role_name(role)  
														
 
															-      self._create_groups(role)
														
 
															-      
														
 
															-    user_data = instance_user_data.read_as_gzip_stream()
														
 
															-    security_groups = self._get_group_names(roles) + kwargs.get('security_groups', [])
														
 
															-
														
 
															-    reservation = self.ec2Connection.run_instances(image_id, min_count=number,
														
 
															-      max_count=number, key_name=kwargs.get('key_name', None),
														
 
															-      security_groups=security_groups, user_data=user_data,
														
 
															-      instance_type=size_id, placement=kwargs.get('placement', None))
														
 
															-    return [instance.id for instance in reservation.instances]
														
 
															-
														
 
															-  def wait_for_instances(self, instance_ids, timeout=600):
														
 
															-    start_time = time.time()
														
 
															-    while True:
														
 
															-      if (time.time() - start_time >= timeout):
														
 
															-        raise TimeoutException()
														
 
															-      try:
														
 
															-        if self._all_started(self.ec2Connection.get_all_instances(instance_ids)):
														
 
															-          break
														
 
															-      # don't timeout for race condition where instance is not yet registered
														
 
															-      except EC2ResponseError:
														
 
															-        pass
														
 
															-      sys.stdout.write(".")
														
 
															-      sys.stdout.flush()
														
 
															-      time.sleep(1)
														
 
															-
														
 
															-  def _all_started(self, reservations):
														
 
															-    for res in reservations:
														
 
															-      for instance in res.instances:
														
 
															-        if instance.state != "running":
														
 
															-          return False
														
 
															-    return True
														
 
															-
														
 
															-  def terminate(self):
														
 
															-    instances = self._get_instances(self._get_cluster_group_name(), "running")
														
 
															-    if instances:
														
 
															-      self.ec2Connection.terminate_instances([i.id for i in instances])
														
 
															-
														
 
															-  def delete(self):
														
 
															-    """
														
 
															-    Delete the security groups for each role in the cluster, and the group for
														
 
															-    the cluster.
														
 
															-    """
														
 
															-    group_names = self._get_all_group_names_for_cluster()
														
 
															-    for group in group_names:
														
 
															-      self.ec2Connection.delete_security_group(group)
														
 
															-
														
 
															-  def get_storage(self):
														
 
															-    """
														
 
															-    Return the external storage for the cluster.
														
 
															-    """
														
 
															-    return Ec2Storage(self)
														
 
															-
														
 
															-
														
 
															-class Ec2Storage(Storage):
														
 
															-  """
														
 
															-  Storage volumes for an EC2 cluster. The storage is associated with a named
														
 
															-  cluster. Metadata for the storage volumes is kept in a JSON file on the client
														
 
															-  machine (in a file called "ec2-storage-<cluster-name>.json" in the
														
 
															-  configuration directory).
														
 
															-  """
														
 
															-
														
 
															-  @staticmethod
														
 
															-  def create_formatted_snapshot(cluster, size, availability_zone, image_id,
														
 
															-                                key_name, ssh_options):
														
 
															-    """
														
 
															-    Creates a formatted snapshot of a given size. This saves having to format
														
 
															-    volumes when they are first attached.
														
 
															-    """
														
 
															-    conn = cluster.ec2Connection
														
 
															-    print "Starting instance"
														
 
															-    reservation = conn.run_instances(image_id, key_name=key_name,
														
 
															-                                     placement=availability_zone)
														
 
															-    instance = reservation.instances[0]
														
 
															-    try:
														
 
															-      cluster.wait_for_instances([instance.id,])
														
 
															-      print "Started instance %s" % instance.id
														
 
															-    except TimeoutException:
														
 
															-      print "Timeout"
														
 
															-      return
														
 
															-    print
														
 
															-    print "Waiting 60 seconds before attaching storage"
														
 
															-    time.sleep(60)
														
 
															-    # Re-populate instance object since it has more details filled in
														
 
															-    instance.update()
														
 
															-
														
 
															-    print "Creating volume of size %s in %s" % (size, availability_zone)
														
 
															-    volume = conn.create_volume(size, availability_zone)
														
 
															-    print "Created volume %s" % volume
														
 
															-    print "Attaching volume to %s" % instance.id
														
 
															-    volume.attach(instance.id, '/dev/sdj')
														
 
															-
														
 
															-    _run_command_on_instance(instance, ssh_options, """
														
 
															-      while true ; do
														
 
															-        echo 'Waiting for /dev/sdj...';
														
 
															-        if [ -e /dev/sdj ]; then break; fi;
														
 
															-        sleep 1;
														
 
															-      done;
														
 
															-      mkfs.ext3 -F -m 0.5 /dev/sdj
														
 
															-    """)
														
 
															-
														
 
															-    print "Detaching volume"
														
 
															-    conn.detach_volume(volume.id, instance.id)
														
 
															-    print "Creating snapshot"
														
 
															-    snapshot = volume.create_snapshot()
														
 
															-    print "Created snapshot %s" % snapshot.id
														
 
															-    _wait_for_volume(conn, volume.id)
														
 
															-    print
														
 
															-    print "Deleting volume"
														
 
															-    volume.delete()
														
 
															-    print "Deleted volume"
														
 
															-    print "Stopping instance"
														
 
															-    terminated = conn.terminate_instances([instance.id,])
														
 
															-    print "Stopped instance %s" % terminated
														
 
															-
														
 
															-  def __init__(self, cluster):
														
 
															-    super(Ec2Storage, self).__init__(cluster)
														
 
															-    self.config_dir = cluster.config_dir
														
 
															-
														
 
															-  def _get_storage_filename(self):
														
 
															-    return os.path.join(self.config_dir,
														
 
															-                        "ec2-storage-%s.json" % (self.cluster.name))
														
 
															-
														
 
															-  def create(self, role, number_of_instances, availability_zone, spec_filename):
														
 
															-    spec_file = open(spec_filename, 'r')
														
 
															-    volume_spec_manager = JsonVolumeSpecManager(spec_file)
														
 
															-    volume_manager = JsonVolumeManager(self._get_storage_filename())
														
 
															-    for dummy in range(number_of_instances):
														
 
															-      mountable_volumes = []
														
 
															-      volume_specs = volume_spec_manager.volume_specs_for_role(role)
														
 
															-      for spec in volume_specs:
														
 
															-        logger.info("Creating volume of size %s in %s from snapshot %s" % \
														
 
															-                    (spec.size, availability_zone, spec.snapshot_id))
														
 
															-        volume = self.cluster.ec2Connection.create_volume(spec.size,
														
 
															-                                                          availability_zone,
														
 
															-                                                          spec.snapshot_id)
														
 
															-        mountable_volumes.append(MountableVolume(volume.id, spec.mount_point,
														
 
															-                                                 spec.device))
														
 
															-      volume_manager.add_instance_storage_for_role(role, mountable_volumes)
														
 
															-
														
 
															-  def _get_mountable_volumes(self, role):
														
 
															-    storage_filename = self._get_storage_filename()
														
 
															-    volume_manager = JsonVolumeManager(storage_filename)
														
 
															-    return volume_manager.get_instance_storage_for_role(role)
														
 
															-
														
 
															-  def get_mappings_string_for_role(self, role):
														
 
															-    mappings = {}
														
 
															-    mountable_volumes_list = self._get_mountable_volumes(role)
														
 
															-    for mountable_volumes in mountable_volumes_list:
														
 
															-      for mountable_volume in mountable_volumes:
														
 
															-        mappings[mountable_volume.mount_point] = mountable_volume.device
														
 
															-    return ";".join(["%s,%s" % (mount_point, device) for (mount_point, device)
														
 
															-                     in mappings.items()])
														
 
															-
														
 
															-  def _has_storage(self, role):
														
 
															-    return self._get_mountable_volumes(role)
														
 
															-
														
 
															-  def has_any_storage(self, roles):
														
 
															-    for role in roles:
														
 
															-      if self._has_storage(role):
														
 
															-        return True
														
 
															-    return False
														
 
															-
														
 
															-  def get_roles(self):
														
 
															-    storage_filename = self._get_storage_filename()
														
 
															-    volume_manager = JsonVolumeManager(storage_filename)
														
 
															-    return volume_manager.get_roles()
														
 
															-  
														
 
															-  def _get_ec2_volumes_dict(self, mountable_volumes):
														
 
															-    volume_ids = [mv.volume_id for mv in sum(mountable_volumes, [])]
														
 
															-    volumes = self.cluster.ec2Connection.get_all_volumes(volume_ids)
														
 
															-    volumes_dict = {}
														
 
															-    for volume in volumes:
														
 
															-      volumes_dict[volume.id] = volume
														
 
															-    return volumes_dict
														
 
															-
														
 
															-  def _print_volume(self, role, volume):
														
 
															-    print "\t".join((role, volume.id, str(volume.size),
														
 
															-                     volume.snapshot_id, volume.availabilityZone,
														
 
															-                     volume.status, str(volume.create_time),
														
 
															-                     str(volume.attach_time)))
														
 
															-
														
 
															-  def print_status(self, roles=None):
														
 
															-    if roles == None:
														
 
															-      storage_filename = self._get_storage_filename()
														
 
															-      volume_manager = JsonVolumeManager(storage_filename)
														
 
															-      roles = volume_manager.get_roles()
														
 
															-    for role in roles:
														
 
															-      mountable_volumes_list = self._get_mountable_volumes(role)
														
 
															-      ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
														
 
															-      for mountable_volumes in mountable_volumes_list:
														
 
															-        for mountable_volume in mountable_volumes:
														
 
															-          self._print_volume(role, ec2_volumes[mountable_volume.volume_id])
														
 
															-
														
 
															-  def _replace(self, string, replacements):
														
 
															-    for (match, replacement) in replacements.iteritems():
														
 
															-      string = string.replace(match, replacement)
														
 
															-    return string
														
 
															-
														
 
															-  def attach(self, role, instances):
														
 
															-    mountable_volumes_list = self._get_mountable_volumes(role)
														
 
															-    if not mountable_volumes_list:
														
 
															-      return
														
 
															-    ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
														
 
															-
														
 
															-    available_mountable_volumes_list = []
														
 
															-
														
 
															-    available_instances_dict = {}
														
 
															-    for instance in instances:
														
 
															-      available_instances_dict[instance.id] = instance
														
 
															-
														
 
															-    # Iterate over mountable_volumes and retain those that are not attached
														
 
															-    # Also maintain a list of instances that have no attached storage
														
 
															-    # Note that we do not fill in "holes" (instances that only have some of
														
 
															-    # their storage attached)
														
 
															-    for mountable_volumes in mountable_volumes_list:
														
 
															-      available = True
														
 
															-      for mountable_volume in mountable_volumes:
														
 
															-        if ec2_volumes[mountable_volume.volume_id].status != 'available':
														
 
															-          available = False
														
 
															-          attach_data = ec2_volumes[mountable_volume.volume_id].attach_data
														
 
															-          instance_id = attach_data.instance_id
														
 
															-          if available_instances_dict.has_key(instance_id):
														
 
															-            del available_instances_dict[instance_id]
														
 
															-      if available:
														
 
															-        available_mountable_volumes_list.append(mountable_volumes)
														
 
															-
														
 
															-    if len(available_instances_dict) != len(available_mountable_volumes_list):
														
 
															-      logger.warning("Number of available instances (%s) and volumes (%s) \
														
 
															-        do not match." \
														
 
															-        % (len(available_instances_dict),
														
 
															-           len(available_mountable_volumes_list)))
														
 
															-
														
 
															-    for (instance, mountable_volumes) in zip(available_instances_dict.values(),
														
 
															-                                             available_mountable_volumes_list):
														
 
															-      print "Attaching storage to %s" % instance.id
														
 
															-      for mountable_volume in mountable_volumes:
														
 
															-        volume = ec2_volumes[mountable_volume.volume_id]
														
 
															-        print "Attaching %s to %s" % (volume.id, instance.id)
														
 
															-        volume.attach(instance.id, mountable_volume.device)
														
 
															-
														
 
															-  def delete(self, roles=[]):
														
 
															-    storage_filename = self._get_storage_filename()
														
 
															-    volume_manager = JsonVolumeManager(storage_filename)
														
 
															-    for role in roles:
														
 
															-      mountable_volumes_list = volume_manager.get_instance_storage_for_role(role)
														
 
															-      ec2_volumes = self._get_ec2_volumes_dict(mountable_volumes_list)
														
 
															-      all_available = True
														
 
															-      for volume in ec2_volumes.itervalues():
														
 
															-        if volume.status != 'available':
														
 
															-          all_available = False
														
 
															-          logger.warning("Volume %s is not available.", volume)
														
 
															-      if not all_available:
														
 
															-        logger.warning("Some volumes are still in use for role %s.\
														
 
															-          Aborting delete.", role)
														
 
															-        return
														
 
															-      for volume in ec2_volumes.itervalues():
														
 
															-        volume.delete()
														
 
															-      volume_manager.remove_instance_storage_for_role(role)
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/providers/rackspace.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/providers/rackspace.py
@@ -1,239 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from __future__ import with_statement
														
 
															-
														
 
															-import base64
														
 
															-import os
														
 
															-import subprocess
														
 
															-import sys
														
 
															-import time
														
 
															-import uuid
														
 
															-
														
 
															-from hadoop.cloud.cluster import Cluster
														
 
															-from hadoop.cloud.cluster import Instance
														
 
															-from hadoop.cloud.cluster import TimeoutException
														
 
															-from hadoop.cloud.service import HadoopService
														
 
															-from hadoop.cloud.service import TASKTRACKER
														
 
															-from libcloud.drivers.rackspace import RackspaceNodeDriver
														
 
															-from libcloud.base import Node
														
 
															-from libcloud.base import NodeImage
														
 
															-
														
 
															-RACKSPACE_KEY = os.environ['RACKSPACE_KEY']
														
 
															-RACKSPACE_SECRET = os.environ['RACKSPACE_SECRET']
														
 
															-
														
 
															-STATE_MAP = { 'running': 'ACTIVE' }
														
 
															-STATE_MAP_REVERSED = dict((v, k) for k, v in STATE_MAP.iteritems())
														
 
															-
														
 
															-USER_DATA_FILENAME = "/etc/init.d/rackspace-init.sh"
														
 
															-
														
 
															-class RackspaceCluster(Cluster):
														
 
															-  """
														
 
															-  A cluster of instances running on Rackspace Cloud Servers. A cluster has a
														
 
															-  unique name, which is stored under the "cluster" metadata key of each server.
														
 
															-
														
 
															-  Every instance in the cluster has one or more roles, stored as a
														
 
															-  comma-separated string under the "roles" metadata key. For example, an instance
														
 
															-  with roles "foo" and "bar" has a "foo,bar" "roles" key.
														
 
															-  
														
 
															-  At boot time two files are injected into an instance's filesystem: the user
														
 
															-  data file (which is used as a boot script), and the user's public key.
														
 
															-  """
														
 
															-  @staticmethod
														
 
															-  def get_clusters_with_role(role, state="running", driver=None):
														
 
															-    driver = driver or RackspaceNodeDriver(RACKSPACE_KEY, RACKSPACE_SECRET)
														
 
															-    all_nodes = RackspaceCluster._list_nodes(driver)
														
 
															-    clusters = set()
														
 
															-    for node in all_nodes:
														
 
															-      try:
														
 
															-        if node.extra['metadata'].has_key('cluster') and \
														
 
															-            role in node.extra['metadata']['roles'].split(','):
														
 
															-          if node.state == STATE_MAP[state]:
														
 
															-            clusters.add(node.extra['metadata']['cluster'])
														
 
															-      except KeyError:
														
 
															-        pass
														
 
															-    return clusters
														
 
															-  
														
 
															-  @staticmethod
														
 
															-  def _list_nodes(driver, retries=5):
														
 
															-    attempts = 0
														
 
															-    while True:
														
 
															-      try:
														
 
															-        return driver.list_nodes()
														
 
															-      except IOError:
														
 
															-        attempts = attempts + 1
														
 
															-        if attempts > retries:
														
 
															-          raise
														
 
															-        time.sleep(5)
														
 
															-
														
 
															-  def __init__(self, name, config_dir, driver=None):
														
 
															-    super(RackspaceCluster, self).__init__(name, config_dir)
														
 
															-    self.driver = driver or RackspaceNodeDriver(RACKSPACE_KEY, RACKSPACE_SECRET)
														
 
															-
														
 
															-  def get_provider_code(self):
														
 
															-    return "rackspace"
														
 
															-  
														
 
															-  def _get_nodes(self, state_filter=None):
														
 
															-    all_nodes = RackspaceCluster._list_nodes(self.driver)
														
 
															-    nodes = []
														
 
															-    for node in all_nodes:
														
 
															-      try:
														
 
															-        if node.extra['metadata']['cluster'] == self.name:
														
 
															-          if state_filter == None or node.state == STATE_MAP[state_filter]:
														
 
															-            nodes.append(node)
														
 
															-      except KeyError:
														
 
															-        pass
														
 
															-    return nodes
														
 
															-
														
 
															-  def _to_instance(self, node):
														
 
															-    return Instance(node.id, node.public_ip[0], node.private_ip[0])
														
 
															-  
														
 
															-  def _get_nodes_in_role(self, role, state_filter=None):
														
 
															-    all_nodes = RackspaceCluster._list_nodes(self.driver)
														
 
															-    nodes = []
														
 
															-    for node in all_nodes:
														
 
															-      try:
														
 
															-        if node.extra['metadata']['cluster'] == self.name and \
														
 
															-          role in node.extra['metadata']['roles'].split(','):
														
 
															-          if state_filter == None or node.state == STATE_MAP[state_filter]:
														
 
															-            nodes.append(node)
														
 
															-      except KeyError:
														
 
															-        pass
														
 
															-    return nodes
														
 
															-  
														
 
															-  def get_instances_in_role(self, role, state_filter=None):
														
 
															-    """
														
 
															-    Get all the instances in a role, filtered by state.
														
 
															-
														
 
															-    @param role: the name of the role
														
 
															-    @param state_filter: the state that the instance should be in
														
 
															-      (e.g. "running"), or None for all states
														
 
															-    """
														
 
															-    return [self._to_instance(node) for node in \
														
 
															-            self._get_nodes_in_role(role, state_filter)]
														
 
															-
														
 
															-  def _print_node(self, node, out):
														
 
															-    out.write("\t".join((node.extra['metadata']['roles'], node.id,
														
 
															-      node.name,
														
 
															-      self._ip_list_to_string(node.public_ip),
														
 
															-      self._ip_list_to_string(node.private_ip),
														
 
															-      STATE_MAP_REVERSED[node.state])))
														
 
															-    out.write("\n")
														
 
															-    
														
 
															-  def _ip_list_to_string(self, ips):
														
 
															-    if ips is None:
														
 
															-      return ""
														
 
															-    return ",".join(ips)
														
 
															-
														
 
															-  def print_status(self, roles=None, state_filter="running", out=sys.stdout):
														
 
															-    if not roles:
														
 
															-      for node in self._get_nodes(state_filter):
														
 
															-        self._print_node(node, out)
														
 
															-    else:
														
 
															-      for role in roles:
														
 
															-        for node in self._get_nodes_in_role(role, state_filter):
														
 
															-          self._print_node(node, out)
														
 
															-
														
 
															-  def launch_instances(self, roles, number, image_id, size_id,
														
 
															-                       instance_user_data, **kwargs):
														
 
															-    metadata = {"cluster": self.name, "roles": ",".join(roles)}
														
 
															-    node_ids = []
														
 
															-    files = { USER_DATA_FILENAME: instance_user_data.read() }
														
 
															-    if "public_key" in kwargs:
														
 
															-      files["/root/.ssh/authorized_keys"] = open(kwargs["public_key"]).read()
														
 
															-    for dummy in range(number):
														
 
															-      node = self._launch_instance(roles, image_id, size_id, metadata, files)
														
 
															-      node_ids.append(node.id)
														
 
															-    return node_ids
														
 
															-
														
 
															-  def _launch_instance(self, roles, image_id, size_id, metadata, files):
														
 
															-    instance_name = "%s-%s" % (self.name, uuid.uuid4().hex[-8:])
														
 
															-    node = self.driver.create_node(instance_name, self._find_image(image_id),
														
 
															-                                   self._find_size(size_id), metadata=metadata,
														
 
															-                                   files=files)
														
 
															-    return node
														
 
															-
														
 
															-  def _find_image(self, image_id):
														
 
															-    return NodeImage(id=image_id, name=None, driver=None)
														
 
															-
														
 
															-  def _find_size(self, size_id):
														
 
															-    matches = [i for i in self.driver.list_sizes() if i.id == str(size_id)]
														
 
															-    if len(matches) != 1:
														
 
															-      return None
														
 
															-    return matches[0]
														
 
															-
														
 
															-  def wait_for_instances(self, instance_ids, timeout=600):
														
 
															-    start_time = time.time()
														
 
															-    while True:
														
 
															-      if (time.time() - start_time >= timeout):
														
 
															-        raise TimeoutException()
														
 
															-      try:
														
 
															-        if self._all_started(instance_ids):
														
 
															-          break
														
 
															-      except Exception:
														
 
															-        pass
														
 
															-      sys.stdout.write(".")
														
 
															-      sys.stdout.flush()
														
 
															-      time.sleep(1)
														
 
															-
														
 
															-  def _all_started(self, node_ids):
														
 
															-    all_nodes = RackspaceCluster._list_nodes(self.driver)
														
 
															-    node_id_to_node = {}
														
 
															-    for node in all_nodes:
														
 
															-      node_id_to_node[node.id] = node
														
 
															-    for node_id in node_ids:
														
 
															-      try:
														
 
															-        if node_id_to_node[node_id].state != STATE_MAP["running"]:
														
 
															-          return False
														
 
															-      except KeyError:
														
 
															-        return False
														
 
															-    return True
														
 
															-
														
 
															-  def terminate(self):
														
 
															-    nodes = self._get_nodes("running")
														
 
															-    print nodes
														
 
															-    for node in nodes:
														
 
															-      self.driver.destroy_node(node)
														
 
															-
														
 
															-class RackspaceHadoopService(HadoopService):
														
 
															-    
														
 
															-  def _update_cluster_membership(self, public_key, private_key):
														
 
															-    """
														
 
															-    Creates a cluster-wide hosts file and copies it across the cluster.
														
 
															-    This is a stop gap until DNS is configured on the cluster. 
														
 
															-    """
														
 
															-    ssh_options = '-o StrictHostKeyChecking=no'
														
 
															-
														
 
															-    time.sleep(30) # wait for SSH daemon to start
														
 
															-    nodes = self.cluster._get_nodes('running')
														
 
															-    # create hosts file
														
 
															-    hosts_file = 'hosts'
														
 
															-    with open(hosts_file, 'w') as f:
														
 
															-      f.write("127.0.0.1 localhost localhost.localdomain\n")
														
 
															-      for node in nodes:
														
 
															-        f.write(node.public_ip[0] + "\t" + node.name + "\n")
														
 
															-    # copy to each node in the cluster
														
 
															-    for node in nodes:
														
 
															-      self._call('scp -i %s %s %s root@%s:/etc/hosts' \
														
 
															-                 % (private_key, ssh_options, hosts_file, node.public_ip[0]))
														
 
															-    os.remove(hosts_file)
														
 
															-
														
 
															-  def _call(self, command):
														
 
															-    print command
														
 
															-    try:
														
 
															-      subprocess.call(command, shell=True)
														
 
															-    except Exception, e:
														
 
															-      print e
														
 
															-  
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/service.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/service.py
@@ -1,640 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-"""
														
 
															-Classes for running services on a cluster.
														
 
															-"""
														
 
															-
														
 
															-from __future__ import with_statement
														
 
															-
														
 
															-from hadoop.cloud.cluster import get_cluster
														
 
															-from hadoop.cloud.cluster import InstanceUserData
														
 
															-from hadoop.cloud.cluster import TimeoutException
														
 
															-from hadoop.cloud.providers.ec2 import Ec2Storage
														
 
															-from hadoop.cloud.util import build_env_string
														
 
															-from hadoop.cloud.util import url_get
														
 
															-from hadoop.cloud.util import xstr
														
 
															-import logging
														
 
															-import os
														
 
															-import re
														
 
															-import socket
														
 
															-import subprocess
														
 
															-import sys
														
 
															-import time
														
 
															-
														
 
															-logger = logging.getLogger(__name__)
														
 
															-
														
 
															-MASTER = "master"  # Deprecated.
														
 
															-
														
 
															-NAMENODE = "nn"
														
 
															-SECONDARY_NAMENODE = "snn"
														
 
															-JOBTRACKER = "jt"
														
 
															-DATANODE = "dn"
														
 
															-TASKTRACKER = "tt"
														
 
															-
														
 
															-class InstanceTemplate(object):
														
 
															-  """
														
 
															-  A template for creating server instances in a cluster.
														
 
															-  """
														
 
															-  def __init__(self, roles, number, image_id, size_id,
														
 
															-                     key_name, public_key, private_key,
														
 
															-                     user_data_file_template=None, placement=None,
														
 
															-                     user_packages=None, auto_shutdown=None, env_strings=[],
														
 
															-                     security_groups=[]):
														
 
															-    self.roles = roles
														
 
															-    self.number = number
														
 
															-    self.image_id = image_id
														
 
															-    self.size_id = size_id
														
 
															-    self.key_name = key_name
														
 
															-    self.public_key = public_key
														
 
															-    self.private_key = private_key
														
 
															-    self.user_data_file_template = user_data_file_template
														
 
															-    self.placement = placement
														
 
															-    self.user_packages = user_packages
														
 
															-    self.auto_shutdown = auto_shutdown
														
 
															-    self.env_strings = env_strings
														
 
															-    self.security_groups = security_groups
														
 
															-
														
 
															-  def add_env_strings(self, env_strings):
														
 
															-    new_env_strings = list(self.env_strings or [])
														
 
															-    new_env_strings.extend(env_strings)
														
 
															-    self.env_strings = new_env_strings
														
 
															-
														
 
															-
														
 
															-class Service(object):
														
 
															-  """
														
 
															-  A general service that runs on a cluster.
														
 
															-  """
														
 
															-  
														
 
															-  def __init__(self, cluster):
														
 
															-    self.cluster = cluster
														
 
															-    
														
 
															-  def get_service_code(self):
														
 
															-    """
														
 
															-    The code that uniquely identifies the service.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-    
														
 
															-  def list_all(self, provider):
														
 
															-    """
														
 
															-    Find and print all clusters running this type of service.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-
														
 
															-  def list(self):
														
 
															-    """
														
 
															-    Find and print all the instances running in this cluster.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def launch_master(self, instance_template, config_dir, client_cidr):
														
 
															-    """
														
 
															-    Launch a "master" instance.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def launch_slaves(self, instance_template):
														
 
															-    """
														
 
															-    Launch "slave" instance.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
														
 
															-    """
														
 
															-    Launch a cluster of instances.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def terminate_cluster(self,  force=False):
														
 
															-    self.cluster.print_status()
														
 
															-    if not force and not self._prompt("Terminate all instances?"):
														
 
															-      print "Not terminating cluster."
														
 
															-    else:
														
 
															-      print "Terminating cluster"
														
 
															-      self.cluster.terminate()
														
 
															-      
														
 
															-  def delete_cluster(self):
														
 
															-    self.cluster.delete()
														
 
															-    
														
 
															-  def create_formatted_snapshot(self, size, availability_zone,
														
 
															-                                image_id, key_name, ssh_options):
														
 
															-    Ec2Storage.create_formatted_snapshot(self.cluster, size,
														
 
															-                                         availability_zone,
														
 
															-                                         image_id,
														
 
															-                                         key_name,
														
 
															-                                         ssh_options)
														
 
															-
														
 
															-  def list_storage(self):
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    storage.print_status()
														
 
															-
														
 
															-  def create_storage(self, role, number_of_instances,
														
 
															-                     availability_zone, spec_file):
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    storage.create(role, number_of_instances, availability_zone, spec_file)
														
 
															-    storage.print_status()
														
 
															-    
														
 
															-  def attach_storage(self, role):
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    storage.attach(role, self.cluster.get_instances_in_role(role, 'running'))
														
 
															-    storage.print_status()
														
 
															-    
														
 
															-  def delete_storage(self, force=False):
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    storage.print_status()
														
 
															-    if not force and not self._prompt("Delete all storage volumes? THIS WILL \
														
 
															-      PERMANENTLY DELETE ALL DATA"):
														
 
															-      print "Not deleting storage volumes."
														
 
															-    else:
														
 
															-      print "Deleting storage"
														
 
															-      for role in storage.get_roles():
														
 
															-        storage.delete(role)
														
 
															-  
														
 
															-  def login(self, ssh_options):
														
 
															-    raise Exception("Unimplemented")
														
 
															-    
														
 
															-  def proxy(self, ssh_options):
														
 
															-    raise Exception("Unimplemented")
														
 
															-    
														
 
															-  def push(self, ssh_options, file):
														
 
															-    raise Exception("Unimplemented")
														
 
															-    
														
 
															-  def execute(self, ssh_options, args):
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def update_slaves_file(self, config_dir, ssh_options, private_key):
														
 
															-    raise Exception("Unimplemented")
														
 
															-  
														
 
															-  def _prompt(self, prompt):
														
 
															-    """ Returns true if user responds "yes" to prompt. """
														
 
															-    return raw_input("%s [yes or no]: " % prompt).lower() == "yes"
														
 
															-
														
 
															-  def _call(self, command):
														
 
															-    print command
														
 
															-    try:
														
 
															-      subprocess.call(command, shell=True)
														
 
															-    except Exception, e:
														
 
															-      print e
														
 
															-
														
 
															-  def _get_default_user_data_file_template(self):
														
 
															-    data_path = os.path.join(os.path.dirname(__file__), 'data')
														
 
															-    return os.path.join(data_path, '%s-%s-init-remote.sh' %
														
 
															-                 (self.get_service_code(), self.cluster.get_provider_code()))
														
 
															-
														
 
															-  def _launch_instances(self, instance_template):
														
 
															-    it = instance_template
														
 
															-    user_data_file_template = it.user_data_file_template
														
 
															-    if it.user_data_file_template == None:
														
 
															-      user_data_file_template = self._get_default_user_data_file_template()
														
 
															-    ebs_mappings = ''
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    for role in it.roles:
														
 
															-      if storage.has_any_storage((role,)):
														
 
															-        ebs_mappings = storage.get_mappings_string_for_role(role)
														
 
															-    replacements = { "%ENV%": build_env_string(it.env_strings, {
														
 
															-      "ROLES": ",".join(it.roles),
														
 
															-      "USER_PACKAGES": it.user_packages,
														
 
															-      "AUTO_SHUTDOWN": it.auto_shutdown,
														
 
															-      "EBS_MAPPINGS": ebs_mappings,
														
 
															-    }) }
														
 
															-    instance_user_data = InstanceUserData(user_data_file_template, replacements)
														
 
															-    instance_ids = self.cluster.launch_instances(it.roles, it.number, it.image_id,
														
 
															-                                            it.size_id,
														
 
															-                                            instance_user_data,
														
 
															-                                            key_name=it.key_name,
														
 
															-                                            public_key=it.public_key,
														
 
															-                                            placement=it.placement)
														
 
															-    print "Waiting for %s instances in role %s to start" % \
														
 
															-      (it.number, ",".join(it.roles))
														
 
															-    try:
														
 
															-      self.cluster.wait_for_instances(instance_ids)
														
 
															-      print "%s instances started" % ",".join(it.roles)
														
 
															-    except TimeoutException:
														
 
															-      print "Timeout while waiting for %s instance to start." % ",".join(it.roles)
														
 
															-      return
														
 
															-    print
														
 
															-    self.cluster.print_status(it.roles[0])
														
 
															-    return self.cluster.get_instances_in_role(it.roles[0], "running")
														
 
															-
														
 
															-  
														
 
															-class HadoopService(Service):
														
 
															-  """
														
 
															-  A HDFS and MapReduce service.
														
 
															-  """
														
 
															-  
														
 
															-  def __init__(self, cluster):
														
 
															-    super(HadoopService, self).__init__(cluster)
														
 
															-    
														
 
															-  def get_service_code(self):
														
 
															-    return "hadoop"
														
 
															-    
														
 
															-  def list_all(self, provider):
														
 
															-    """
														
 
															-    Find and print clusters that have a running namenode instances
														
 
															-    """
														
 
															-    legacy_clusters = get_cluster(provider).get_clusters_with_role(MASTER)
														
 
															-    clusters = list(get_cluster(provider).get_clusters_with_role(NAMENODE))
														
 
															-    clusters.extend(legacy_clusters)
														
 
															-    if not clusters:
														
 
															-      print "No running clusters"
														
 
															-    else:
														
 
															-      for cluster in clusters:
														
 
															-        print cluster
														
 
															-    
														
 
															-  def list(self):
														
 
															-    self.cluster.print_status()
														
 
															-
														
 
															-  def launch_master(self, instance_template, config_dir, client_cidr):
														
 
															-    if self.cluster.check_running(NAMENODE, 0) == False:
														
 
															-      return  # don't proceed if another master is running
														
 
															-    self.launch_cluster((instance_template,), config_dir, client_cidr)
														
 
															-  
														
 
															-  def launch_slaves(self, instance_template):
														
 
															-    instances = self.cluster.check_running(NAMENODE, 1)
														
 
															-    if not instances:
														
 
															-      return
														
 
															-    master = instances[0]
														
 
															-    for role in (NAMENODE, SECONDARY_NAMENODE, JOBTRACKER): 
														
 
															-      singleton_host_env = "%s_HOST=%s" % \
														
 
															-              (self._sanitize_role_name(role), master.public_ip)
														
 
															-      instance_template.add_env_strings((singleton_host_env))
														
 
															-    self._launch_instances(instance_template)              
														
 
															-    self._attach_storage(instance_template.roles)
														
 
															-    self._print_master_url()
														
 
															-      
														
 
															-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
														
 
															-    number_of_tasktrackers = 0
														
 
															-    roles = []
														
 
															-    for it in instance_templates:
														
 
															-      roles.extend(it.roles)
														
 
															-      if TASKTRACKER in it.roles:
														
 
															-        number_of_tasktrackers += it.number
														
 
															-    self._launch_cluster_instances(instance_templates)
														
 
															-    self._create_client_hadoop_site_file(config_dir)
														
 
															-    self._authorize_client_ports(client_cidr)
														
 
															-    self._attach_storage(roles)
														
 
															-    self._update_cluster_membership(instance_templates[0].public_key,
														
 
															-                                    instance_templates[0].private_key)
														
 
															-    try:
														
 
															-      self._wait_for_hadoop(number_of_tasktrackers)
														
 
															-    except TimeoutException:
														
 
															-      print "Timeout while waiting for Hadoop to start. Please check logs on" +\
														
 
															-        " cluster."
														
 
															-    self._print_master_url()
														
 
															-    
														
 
															-  def login(self, ssh_options):
														
 
															-    master = self._get_master()
														
 
															-    if not master:
														
 
															-      sys.exit(1)
														
 
															-    subprocess.call('ssh %s root@%s' % \
														
 
															-                    (xstr(ssh_options), master.public_ip),
														
 
															-                    shell=True)
														
 
															-    
														
 
															-  def proxy(self, ssh_options):
														
 
															-    master = self._get_master()
														
 
															-    if not master:
														
 
															-      sys.exit(1)
														
 
															-    options = '-o "ConnectTimeout 10" -o "ServerAliveInterval 60" ' \
														
 
															-              '-N -D 6666'
														
 
															-    process = subprocess.Popen('ssh %s %s root@%s' %
														
 
															-      (xstr(ssh_options), options, master.public_ip),
														
 
															-      stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
														
 
															-      shell=True)
														
 
															-    print """export HADOOP_CLOUD_PROXY_PID=%s;
														
 
															-echo Proxy pid %s;""" % (process.pid, process.pid)
														
 
															-    
														
 
															-  def push(self, ssh_options, file):
														
 
															-    master = self._get_master()
														
 
															-    if not master:
														
 
															-      sys.exit(1)
														
 
															-    subprocess.call('scp %s -r %s root@%s:' % (xstr(ssh_options),
														
 
															-                                               file, master.public_ip),
														
 
															-                                               shell=True)
														
 
															-    
														
 
															-  def execute(self, ssh_options, args):
														
 
															-    master = self._get_master()
														
 
															-    if not master:
														
 
															-      sys.exit(1)
														
 
															-    subprocess.call("ssh %s root@%s '%s'" % (xstr(ssh_options),
														
 
															-                                             master.public_ip,
														
 
															-                                             " ".join(args)), shell=True)
														
 
															-  
														
 
															-  def update_slaves_file(self, config_dir, ssh_options, private_key):
														
 
															-    instances = self.cluster.check_running(NAMENODE, 1)
														
 
															-    if not instances:
														
 
															-      sys.exit(1)
														
 
															-    master = instances[0]
														
 
															-    slaves = self.cluster.get_instances_in_role(DATANODE, "running")
														
 
															-    cluster_dir = os.path.join(config_dir, self.cluster.name)
														
 
															-    slaves_file = os.path.join(cluster_dir, 'slaves')
														
 
															-    with open(slaves_file, 'w') as f:
														
 
															-      for slave in slaves:
														
 
															-        f.write(slave.public_ip + "\n")
														
 
															-    subprocess.call('scp %s -r %s root@%s:/etc/hadoop/conf' % \
														
 
															-                    (ssh_options, slaves_file, master.public_ip), shell=True)
														
 
															-    # Copy private key
														
 
															-    subprocess.call('scp %s -r %s root@%s:/root/.ssh/id_rsa' % \
														
 
															-                    (ssh_options, private_key, master.public_ip), shell=True)
														
 
															-    for slave in slaves:
														
 
															-      subprocess.call('scp %s -r %s root@%s:/root/.ssh/id_rsa' % \
														
 
															-                      (ssh_options, private_key, slave.public_ip), shell=True)
														
 
															-        
														
 
															-  def _get_master(self):
														
 
															-    # For split namenode/jobtracker, designate the namenode as the master
														
 
															-    return self._get_namenode()
														
 
															-
														
 
															-  def _get_namenode(self):
														
 
															-    instances = self.cluster.get_instances_in_role(NAMENODE, "running")
														
 
															-    if not instances:
														
 
															-      return None
														
 
															-    return instances[0]
														
 
															-
														
 
															-  def _get_jobtracker(self):
														
 
															-    instances = self.cluster.get_instances_in_role(JOBTRACKER, "running")
														
 
															-    if not instances:
														
 
															-      return None
														
 
															-    return instances[0]
														
 
															-
														
 
															-  def _launch_cluster_instances(self, instance_templates):
														
 
															-    singleton_hosts = []
														
 
															-    for instance_template in instance_templates:
														
 
															-      instance_template.add_env_strings(singleton_hosts)
														
 
															-      instances = self._launch_instances(instance_template)
														
 
															-      if instance_template.number == 1:
														
 
															-        if len(instances) != 1:
														
 
															-          logger.error("Expected a single '%s' instance, but found %s.",
														
 
															-                       "".join(instance_template.roles), len(instances))
														
 
															-          return
														
 
															-        else:
														
 
															-          for role in instance_template.roles:
														
 
															-            singleton_host_env = "%s_HOST=%s" % \
														
 
															-              (self._sanitize_role_name(role),
														
 
															-               instances[0].public_ip)
														
 
															-            singleton_hosts.append(singleton_host_env)
														
 
															-
														
 
															-  def _sanitize_role_name(self, role):
														
 
															-    """Replace characters in role name with ones allowed in bash variable names"""
														
 
															-    return role.replace('+', '_').upper()
														
 
															-
														
 
															-  def _authorize_client_ports(self, client_cidrs=[]):
														
 
															-    if not client_cidrs:
														
 
															-      logger.debug("No client CIDRs specified, using local address.")
														
 
															-      client_ip = url_get('http://checkip.amazonaws.com/').strip()
														
 
															-      client_cidrs = ("%s/32" % client_ip,)
														
 
															-    logger.debug("Client CIDRs: %s", client_cidrs)
														
 
															-    namenode = self._get_namenode()
														
 
															-    jobtracker = self._get_jobtracker()
														
 
															-    for client_cidr in client_cidrs:
														
 
															-      # Allow access to port 80 on namenode from client
														
 
															-      self.cluster.authorize_role(NAMENODE, 80, 80, client_cidr)
														
 
															-      # Allow access to jobtracker UI on master from client
														
 
															-      # (so we can see when the cluster is ready)
														
 
															-      self.cluster.authorize_role(JOBTRACKER, 50030, 50030, client_cidr)
														
 
															-    # Allow access to namenode and jobtracker via public address from each other
														
 
															-    namenode_ip = socket.gethostbyname(namenode.public_ip)
														
 
															-    jobtracker_ip = socket.gethostbyname(jobtracker.public_ip)
														
 
															-    self.cluster.authorize_role(NAMENODE, 8020, 8020, "%s/32" % namenode_ip)
														
 
															-    self.cluster.authorize_role(NAMENODE, 8020, 8020, "%s/32" % jobtracker_ip)
														
 
															-    self.cluster.authorize_role(JOBTRACKER, 8021, 8021, "%s/32" % namenode_ip)
														
 
															-    self.cluster.authorize_role(JOBTRACKER, 8021, 8021,
														
 
															-                                "%s/32" % jobtracker_ip)
														
 
															-  
														
 
															-  def _create_client_hadoop_site_file(self, config_dir):
														
 
															-    namenode = self._get_namenode()
														
 
															-    jobtracker = self._get_jobtracker()
														
 
															-    cluster_dir = os.path.join(config_dir, self.cluster.name)
														
 
															-    aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID') or ''
														
 
															-    aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY') or ''
														
 
															-    if not os.path.exists(cluster_dir):
														
 
															-      os.makedirs(cluster_dir)
														
 
															-    with open(os.path.join(cluster_dir, 'hadoop-site.xml'), 'w') as f:
														
 
															-      f.write("""<?xml version="1.0"?>
														
 
															-  <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
														
 
															-  <!-- Put site-specific property overrides in this file. -->
														
 
															-  <configuration>
														
 
															-  <property>
														
 
															-    <name>hadoop.job.ugi</name>
														
 
															-    <value>root,root</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>fs.default.name</name>
														
 
															-    <value>hdfs://%(namenode)s:8020/</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>mapred.job.tracker</name>
														
 
															-    <value>%(jobtracker)s:8021</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>hadoop.socks.server</name>
														
 
															-    <value>localhost:6666</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>hadoop.rpc.socket.factory.class.default</name>
														
 
															-    <value>org.apache.hadoop.net.SocksSocketFactory</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>fs.s3.awsAccessKeyId</name>
														
 
															-    <value>%(aws_access_key_id)s</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>fs.s3.awsSecretAccessKey</name>
														
 
															-    <value>%(aws_secret_access_key)s</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>fs.s3n.awsAccessKeyId</name>
														
 
															-    <value>%(aws_access_key_id)s</value>
														
 
															-  </property>
														
 
															-  <property>
														
 
															-    <name>fs.s3n.awsSecretAccessKey</name>
														
 
															-    <value>%(aws_secret_access_key)s</value>
														
 
															-  </property>
														
 
															-  </configuration>
														
 
															-  """ % {'namenode': namenode.public_ip,
														
 
															-    'jobtracker': jobtracker.public_ip,
														
 
															-    'aws_access_key_id': aws_access_key_id,
														
 
															-    'aws_secret_access_key': aws_secret_access_key})        
														
 
															-
														
 
															-  def _wait_for_hadoop(self, number, timeout=600):
														
 
															-    start_time = time.time()
														
 
															-    jobtracker = self._get_jobtracker()
														
 
															-    if not jobtracker:
														
 
															-      return
														
 
															-    print "Waiting for jobtracker to start"
														
 
															-    previous_running = 0
														
 
															-    while True:
														
 
															-      if (time.time() - start_time >= timeout):
														
 
															-        raise TimeoutException()
														
 
															-      try:
														
 
															-        actual_running = self._number_of_tasktrackers(jobtracker.public_ip, 1)
														
 
															-        break
														
 
															-      except IOError:
														
 
															-        pass
														
 
															-      sys.stdout.write(".")
														
 
															-      sys.stdout.flush()
														
 
															-      time.sleep(1)
														
 
															-    print
														
 
															-    if number > 0:
														
 
															-      print "Waiting for %d tasktrackers to start" % number
														
 
															-      while actual_running < number:
														
 
															-        if (time.time() - start_time >= timeout):
														
 
															-          raise TimeoutException()
														
 
															-        try:
														
 
															-          actual_running = self._number_of_tasktrackers(jobtracker.public_ip, 5, 2)
														
 
															-          if actual_running != previous_running:
														
 
															-            sys.stdout.write("%d" % actual_running)
														
 
															-          sys.stdout.write(".")
														
 
															-          sys.stdout.flush()
														
 
															-          time.sleep(1)
														
 
															-          previous_running = actual_running
														
 
															-        except IOError:
														
 
															-          pass
														
 
															-      print
														
 
															-
														
 
															-  # The optional ?type=active is a difference between Hadoop 0.18 and 0.20
														
 
															-  _NUMBER_OF_TASK_TRACKERS = re.compile(
														
 
															-    r'<a href="machines.jsp(?:\?type=active)?">(\d+)</a>')
														
 
															-  
														
 
															-  def _number_of_tasktrackers(self, jt_hostname, timeout, retries=0):
														
 
															-    jt_page = url_get("http://%s:50030/jobtracker.jsp" % jt_hostname, timeout,
														
 
															-                      retries)
														
 
															-    m = self._NUMBER_OF_TASK_TRACKERS.search(jt_page)
														
 
															-    if m:
														
 
															-      return int(m.group(1))
														
 
															-    return 0
														
 
															-
														
 
															-  def _print_master_url(self):
														
 
															-    webserver = self._get_jobtracker()
														
 
															-    if not webserver:
														
 
															-      return
														
 
															-    print "Browse the cluster at http://%s/" % webserver.public_ip
														
 
															-
														
 
															-  def _attach_storage(self, roles):
														
 
															-    storage = self.cluster.get_storage()
														
 
															-    if storage.has_any_storage(roles):
														
 
															-      print "Waiting 10 seconds before attaching storage"
														
 
															-      time.sleep(10)
														
 
															-      for role in roles:
														
 
															-        storage.attach(role, self.cluster.get_instances_in_role(role, 'running'))
														
 
															-      storage.print_status(roles)
														
 
															-      
														
 
															-  def _update_cluster_membership(self, public_key, private_key):
														
 
															-    pass
														
 
															-
														
 
															-
														
 
															-class ZooKeeperService(Service):
														
 
															-  """
														
 
															-  A ZooKeeper service.
														
 
															-  """
														
 
															-
														
 
															-  ZOOKEEPER_ROLE = "zk"
														
 
															-
														
 
															-  def __init__(self, cluster):
														
 
															-    super(ZooKeeperService, self).__init__(cluster)
														
 
															-    
														
 
															-  def get_service_code(self):
														
 
															-    return "zookeeper"
														
 
															-
														
 
															-  def launch_cluster(self, instance_templates, config_dir, client_cidr):
														
 
															-    self._launch_cluster_instances(instance_templates)
														
 
															-    self._authorize_client_ports(client_cidr)
														
 
															-    self._update_cluster_membership(instance_templates[0].public_key)
														
 
															-    
														
 
															-  def _launch_cluster_instances(self, instance_templates):
														
 
															-    for instance_template in instance_templates:
														
 
															-      instances = self._launch_instances(instance_template)
														
 
															-
														
 
															-  def _authorize_client_ports(self, client_cidrs=[]):
														
 
															-    if not client_cidrs:
														
 
															-      logger.debug("No client CIDRs specified, using local address.")
														
 
															-      client_ip = url_get('http://checkip.amazonaws.com/').strip()
														
 
															-      client_cidrs = ("%s/32" % client_ip,)
														
 
															-    logger.debug("Client CIDRs: %s", client_cidrs)
														
 
															-    for client_cidr in client_cidrs:
														
 
															-      self.cluster.authorize_role(self.ZOOKEEPER_ROLE, 2181, 2181, client_cidr)
														
 
															-  
														
 
															-  def _update_cluster_membership(self, public_key):
														
 
															-    time.sleep(30) # wait for SSH daemon to start
														
 
															-    
														
 
															-    ssh_options = '-o StrictHostKeyChecking=no'
														
 
															-    private_key = public_key[:-4] # TODO: pass in private key explicitly
														
 
															-
														
 
															-    instances = self.cluster.get_instances_in_role(self.ZOOKEEPER_ROLE,
														
 
															-                                                   'running')
														
 
															-    config_file = 'zoo.cfg'
														
 
															-    with open(config_file, 'w') as f:
														
 
															-      f.write("""# The number of milliseconds of each tick
														
 
															-tickTime=2000
														
 
															-# The number of ticks that the initial
														
 
															-# synchronization phase can take
														
 
															-initLimit=10
														
 
															-# The number of ticks that can pass between
														
 
															-# sending a request and getting an acknowledgement
														
 
															-syncLimit=5
														
 
															-# The directory where the snapshot is stored.
														
 
															-dataDir=/var/log/zookeeper/txlog
														
 
															-# The port at which the clients will connect
														
 
															-clientPort=2181
														
 
															-# The servers in the ensemble
														
 
															-""")
														
 
															-      counter = 1
														
 
															-      for i in instances:
														
 
															-        f.write("server.%s=%s:2888:3888\n" % (counter, i.private_ip))
														
 
															-        counter += 1
														
 
															-    # copy to each node in the cluster
														
 
															-    myid_file = 'myid'
														
 
															-    counter = 1
														
 
															-    for i in instances:
														
 
															-      self._call('scp -i %s %s %s root@%s:/etc/zookeeper/conf/zoo.cfg' \
														
 
															-                 % (private_key, ssh_options, config_file, i.public_ip))
														
 
															-      with open(myid_file, 'w') as f:
														
 
															-        f.write(str(counter) + "\n")
														
 
															-      self._call('scp -i %s %s %s root@%s:/var/log/zookeeper/txlog/myid' \
														
 
															-                 % (private_key, ssh_options, myid_file, i.public_ip))
														
 
															-      counter += 1
														
 
															-    os.remove(config_file)
														
 
															-    os.remove(myid_file)
														
 
															-
														
 
															-    # start the zookeeper servers
														
 
															-    for i in instances:
														
 
															-      self._call('ssh -i %s %s root@%s nohup /etc/rc.local &' \
														
 
															-                 % (private_key, ssh_options, i.public_ip))
														
 
															-      
														
 
															-    hosts_string = ",".join(["%s:2181" % i.public_ip for i in instances]) 
														
 
															-    print "ZooKeeper cluster: %s" % hosts_string
														
 
															-
														
 
															-SERVICE_PROVIDER_MAP = {
														
 
															-  "hadoop": {
														
 
															-     "rackspace": ('hadoop.cloud.providers.rackspace', 'RackspaceHadoopService')
														
 
															-  },
														
 
															-  "zookeeper": {
														
 
															-    # "provider_code": ('hadoop.cloud.providers.provider_code', 'ProviderZooKeeperService')
														
 
															-  },
														
 
															-}
														
 
															-
														
 
															-DEFAULT_SERVICE_PROVIDER_MAP = {
														
 
															-  "hadoop": HadoopService,
														
 
															-  "zookeeper": ZooKeeperService
														
 
															-}
														
 
															-
														
 
															-def get_service(service, provider):
														
 
															-  """
														
 
															-  Retrieve the Service class for a service and provider.
														
 
															-  """
														
 
															-  try:
														
 
															-    mod_name, service_classname = SERVICE_PROVIDER_MAP[service][provider]
														
 
															-    _mod = __import__(mod_name, globals(), locals(), [service_classname])
														
 
															-    return getattr(_mod, service_classname)
														
 
															-  except KeyError:
														
 
															-    return DEFAULT_SERVICE_PROVIDER_MAP[service]
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/storage.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/storage.py
@@ -1,173 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-"""
														
 
															-Classes for controlling external cluster storage.
														
 
															-"""
														
 
															-
														
 
															-import logging
														
 
															-import simplejson as json
														
 
															-
														
 
															-logger = logging.getLogger(__name__)
														
 
															-
														
 
															-class VolumeSpec(object):
														
 
															-  """
														
 
															-  The specification for a storage volume, encapsulating all the information
														
 
															-  needed to create a volume and ultimately mount it on an instance.
														
 
															-  """
														
 
															-  def __init__(self, size, mount_point, device, snapshot_id):
														
 
															-    self.size = size
														
 
															-    self.mount_point = mount_point
														
 
															-    self.device = device
														
 
															-    self.snapshot_id = snapshot_id
														
 
															-
														
 
															-
														
 
															-class JsonVolumeSpecManager(object):
														
 
															-  """
														
 
															-  A container for VolumeSpecs. This object can read VolumeSpecs specified in
														
 
															-  JSON.
														
 
															-  """
														
 
															-  def __init__(self, spec_file):
														
 
															-    self.spec = json.load(spec_file)
														
 
															-
														
 
															-  def volume_specs_for_role(self, role):
														
 
															-    return [VolumeSpec(d["size_gb"], d["mount_point"], d["device"],
														
 
															-                       d["snapshot_id"]) for d in self.spec[role]]
														
 
															-
														
 
															-  def get_mappings_string_for_role(self, role):
														
 
															-    """
														
 
															-    Returns a short string of the form
														
 
															-    "mount_point1,device1;mount_point2,device2;..."
														
 
															-    which is useful for passing as an environment variable.
														
 
															-    """
														
 
															-    return ";".join(["%s,%s" % (d["mount_point"], d["device"])
														
 
															-                     for d in self.spec[role]])
														
 
															-
														
 
															-
														
 
															-class MountableVolume(object):
														
 
															-  """
														
 
															-  A storage volume that has been created. It may or may not have been attached
														
 
															-  or mounted to an instance.
														
 
															-  """
														
 
															-  def __init__(self, volume_id, mount_point, device):
														
 
															-    self.volume_id = volume_id
														
 
															-    self.mount_point = mount_point
														
 
															-    self.device = device
														
 
															-
														
 
															-
														
 
															-class JsonVolumeManager(object):
														
 
															-
														
 
															-  def __init__(self, filename):
														
 
															-    self.filename = filename
														
 
															-
														
 
															-  def _load(self):
														
 
															-    try:
														
 
															-      return json.load(open(self.filename, "r"))
														
 
															-    except IOError:
														
 
															-      logger.debug("File %s does not exist.", self.filename)
														
 
															-      return {}
														
 
															-
														
 
															-  def _store(self, obj):
														
 
															-    return json.dump(obj, open(self.filename, "w"), sort_keys=True, indent=2)
														
 
															-  
														
 
															-  def get_roles(self):
														
 
															-    json_dict = self._load()
														
 
															-    return json_dict.keys()
														
 
															-
														
 
															-  def add_instance_storage_for_role(self, role, mountable_volumes):
														
 
															-    json_dict = self._load()
														
 
															-    mv_dicts = [mv.__dict__ for mv in mountable_volumes]
														
 
															-    json_dict.setdefault(role, []).append(mv_dicts)
														
 
															-    self._store(json_dict)
														
 
															-
														
 
															-  def remove_instance_storage_for_role(self, role):
														
 
															-    json_dict = self._load()
														
 
															-    del json_dict[role]
														
 
															-    self._store(json_dict)
														
 
															-
														
 
															-  def get_instance_storage_for_role(self, role):
														
 
															-    """
														
 
															-    Returns a list of lists of MountableVolume objects. Each nested list is
														
 
															-    the storage for one instance.
														
 
															-    """
														
 
															-    try:
														
 
															-      json_dict = self._load()
														
 
															-      instance_storage = []
														
 
															-      for instance in json_dict[role]:
														
 
															-        vols = []
														
 
															-        for vol in instance:
														
 
															-          vols.append(MountableVolume(vol["volume_id"], vol["mount_point"],
														
 
															-                                      vol["device"]))
														
 
															-        instance_storage.append(vols)
														
 
															-      return instance_storage
														
 
															-    except KeyError:
														
 
															-      return []
														
 
															-
														
 
															-class Storage(object):
														
 
															-  """
														
 
															-  Storage volumes for a cluster. The storage is associated with a named
														
 
															-  cluster. Many clusters just have local storage, in which case this is
														
 
															-  not used.
														
 
															-  """
														
 
															-
														
 
															-  def __init__(self, cluster):
														
 
															-    self.cluster = cluster
														
 
															-
														
 
															-  def create(self, role, number_of_instances, availability_zone, spec_filename):
														
 
															-    """
														
 
															-    Create new storage volumes for instances with the given role, according to
														
 
															-    the mapping defined in the spec file.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def get_mappings_string_for_role(self, role):
														
 
															-    """
														
 
															-    Returns a short string of the form
														
 
															-    "mount_point1,device1;mount_point2,device2;..."
														
 
															-    which is useful for passing as an environment variable.
														
 
															-    """
														
 
															-    raise Exception("Unimplemented")
														
 
															-
														
 
															-  def has_any_storage(self, roles):
														
 
															-    """
														
 
															-    Return True if any of the given roles has associated storage
														
 
															-    """
														
 
															-    return False
														
 
															-
														
 
															-  def get_roles(self):
														
 
															-    """
														
 
															-    Return a list of roles that have storage defined.
														
 
															-    """
														
 
															-    return []
														
 
															-
														
 
															-  def print_status(self, roles=None):
														
 
															-    """
														
 
															-    Print the status of storage volumes for the given roles.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def attach(self, role, instances):
														
 
															-    """
														
 
															-    Attach volumes for a role to instances. Some volumes may already be
														
 
															-    attached, in which case they are ignored, and we take care not to attach
														
 
															-    multiple volumes to an instance.
														
 
															-    """
														
 
															-    pass
														
 
															-
														
 
															-  def delete(self, roles=[]):
														
 
															-    """
														
 
															-    Permanently delete all the storage for the given roles.
														
 
															-    """
														
 
															-    pass
														
--- a/src/contrib/cloud/src/py/hadoop/cloud/util.py
+++ b/src/contrib/cloud/src/py/hadoop/cloud/util.py
@@ -1,84 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-"""
														
 
															-Utility functions.
														
 
															-"""
														
 
															-
														
 
															-import ConfigParser
														
 
															-import socket
														
 
															-import urllib2
														
 
															-
														
 
															-def bash_quote(text):
														
 
															-  """Quotes a string for bash, by using single quotes."""
														
 
															-  if text == None:
														
 
															-    return ""
														
 
															-  return "'%s'" % text.replace("'", "'\\''")
														
 
															-
														
 
															-def bash_quote_env(env):
														
 
															-  """Quotes the value in an environment variable assignment."""
														
 
															-  if env.find("=") == -1:
														
 
															-    return env
														
 
															-  (var, value) = env.split("=")
														
 
															-  return "%s=%s" % (var, bash_quote(value))
														
 
															-
														
 
															-def build_env_string(env_strings=[], pairs={}):
														
 
															-  """Build a bash environment variable assignment"""
														
 
															-  env = ''
														
 
															-  if env_strings:
														
 
															-    for env_string in env_strings:
														
 
															-      env += "%s " % bash_quote_env(env_string)
														
 
															-  if pairs:
														
 
															-    for key, val in pairs.items():
														
 
															-      env += "%s=%s " % (key, bash_quote(val))
														
 
															-  return env[:-1]
														
 
															-
														
 
															-def merge_config_with_options(section_name, config, options):
														
 
															-  """
														
 
															-  Merge configuration options with a dictionary of options.
														
 
															-  Keys in the options dictionary take precedence.
														
 
															-  """
														
 
															-  res = {}
														
 
															-  try:
														
 
															-    for (key, value) in config.items(section_name):
														
 
															-      if value.find("\n") != -1:
														
 
															-        res[key] = value.split("\n")
														
 
															-      else:
														
 
															-        res[key] = value
														
 
															-  except ConfigParser.NoSectionError:
														
 
															-    pass
														
 
															-  for key in options:
														
 
															-    if options[key] != None:
														
 
															-      res[key] = options[key]
														
 
															-  return res
														
 
															-
														
 
															-def url_get(url, timeout=10, retries=0):
														
 
															-  """
														
 
															-  Retrieve content from the given URL.
														
 
															-  """
														
 
															-   # in Python 2.6 we can pass timeout to urllib2.urlopen
														
 
															-  socket.setdefaulttimeout(timeout)
														
 
															-  attempts = 0
														
 
															-  while True:
														
 
															-    try:
														
 
															-      return urllib2.urlopen(url).read()
														
 
															-    except urllib2.URLError:
														
 
															-      attempts = attempts + 1
														
 
															-      if attempts > retries:
														
 
															-        raise
														
 
															-
														
 
															-def xstr(string):
														
 
															-  """Sane string conversion: return an empty string if string is None."""
														
 
															-  return '' if string is None else str(string)
														
--- a/src/contrib/cloud/src/py/setup.py
+++ b/src/contrib/cloud/src/py/setup.py
@@ -1,30 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-from distutils.core import setup
														
 
															-
														
 
															-version = __import__('hadoop.cloud').cloud.VERSION
														
 
															-
														
 
															-setup(name='hadoop-cloud',
														
 
															-      version=version,
														
 
															-      description='Scripts for running Hadoop on cloud providers',
														
 
															-      license = 'Apache License (2.0)',
														
 
															-      url = 'http://hadoop.apache.org/common/',
														
 
															-      packages=['hadoop', 'hadoop.cloud','hadoop.cloud.providers'],
														
 
															-      package_data={'hadoop.cloud': ['data/*.sh']},
														
 
															-      scripts=['hadoop-ec2'],
														
 
															-      author = 'Apache Hadoop Contributors',
														
 
															-      author_email = 'common-dev@hadoop.apache.org',
														
 
															-)
														
--- a/src/contrib/cloud/src/test/py/testcluster.py
+++ b/src/contrib/cloud/src/test/py/testcluster.py
@@ -1,37 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import unittest
														
 
															-
														
 
															-from hadoop.cloud.cluster import RoleSyntaxException
														
 
															-from hadoop.cloud.providers.ec2 import Ec2Cluster
														
 
															-
														
 
															-class TestCluster(unittest.TestCase):
														
 
															-
														
 
															-  def test_group_name_for_role(self):
														
 
															-    cluster = Ec2Cluster("test-cluster", None)
														
 
															-    self.assertEqual("test-cluster-foo", cluster._group_name_for_role("foo"))
														
 
															-
														
 
															-  def test_check_role_name_valid(self):
														
 
															-    cluster = Ec2Cluster("test-cluster", None)
														
 
															-    cluster._check_role_name(
														
 
															-      "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_+")
														
 
															-
														
 
															-  def test_check_role_name_dash_is_invalid(self):
														
 
															-    cluster = Ec2Cluster("test-cluster", None)
														
 
															-    self.assertRaises(RoleSyntaxException, cluster._check_role_name, "a-b")
														
 
															-
														
 
															-if __name__ == '__main__':
														
 
															-  unittest.main()
														
--- a/src/contrib/cloud/src/test/py/testrackspace.py
+++ b/src/contrib/cloud/src/test/py/testrackspace.py
@@ -1,74 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import StringIO
														
 
															-import unittest
														
 
															-
														
 
															-from hadoop.cloud.providers.rackspace import RackspaceCluster
														
 
															-
														
 
															-class TestCluster(unittest.TestCase):
														
 
															-
														
 
															-  class DriverStub(object):
														
 
															-    def list_nodes(self):
														
 
															-      class NodeStub(object):
														
 
															-        def __init__(self, name, metadata):
														
 
															-          self.id = name
														
 
															-          self.name = name
														
 
															-          self.state = 'ACTIVE'
														
 
															-          self.public_ip = ['100.0.0.1']
														
 
															-          self.private_ip = ['10.0.0.1']
														
 
															-          self.extra = { 'metadata': metadata }
														
 
															-      return [NodeStub('random_instance', {}),
														
 
															-              NodeStub('cluster1-nj-000', {'cluster': 'cluster1', 'roles': 'nn,jt'}),
														
 
															-              NodeStub('cluster1-dt-000', {'cluster': 'cluster1', 'roles': 'dn,tt'}),
														
 
															-              NodeStub('cluster1-dt-001', {'cluster': 'cluster1', 'roles': 'dn,tt'}),
														
 
															-              NodeStub('cluster2-dt-000', {'cluster': 'cluster2', 'roles': 'dn,tt'}),
														
 
															-              NodeStub('cluster3-nj-000', {'cluster': 'cluster3', 'roles': 'nn,jt'})]
														
 
															-
														
 
															-  def test_get_clusters_with_role(self):
														
 
															-    self.assertEqual(set(['cluster1', 'cluster2']),
														
 
															-      RackspaceCluster.get_clusters_with_role('dn', 'running',
														
 
															-                                           TestCluster.DriverStub()))
														
 
															-    
														
 
															-  def test_get_instances_in_role(self):
														
 
															-    cluster = RackspaceCluster('cluster1', None, TestCluster.DriverStub())
														
 
															-    
														
 
															-    instances = cluster.get_instances_in_role('nn')
														
 
															-    self.assertEquals(1, len(instances))
														
 
															-    self.assertEquals('cluster1-nj-000', instances[0].id)
														
 
															-
														
 
															-    instances = cluster.get_instances_in_role('tt')
														
 
															-    self.assertEquals(2, len(instances))
														
 
															-    self.assertEquals(set(['cluster1-dt-000', 'cluster1-dt-001']),
														
 
															-                      set([i.id for i in instances]))
														
 
															-    
														
 
															-  def test_print_status(self):
														
 
															-    cluster = RackspaceCluster('cluster1', None, TestCluster.DriverStub())
														
 
															-    
														
 
															-    out = StringIO.StringIO()
														
 
															-    cluster.print_status(None, "running", out)
														
 
															-    self.assertEquals("""nn,jt cluster1-nj-000 cluster1-nj-000 100.0.0.1 10.0.0.1 running
														
 
															-dn,tt cluster1-dt-000 cluster1-dt-000 100.0.0.1 10.0.0.1 running
														
 
															-dn,tt cluster1-dt-001 cluster1-dt-001 100.0.0.1 10.0.0.1 running
														
 
															-""", out.getvalue().replace("\t", " "))
														
 
															-
														
 
															-    out = StringIO.StringIO()
														
 
															-    cluster.print_status(["dn"], "running", out)
														
 
															-    self.assertEquals("""dn,tt cluster1-dt-000 cluster1-dt-000 100.0.0.1 10.0.0.1 running
														
 
															-dn,tt cluster1-dt-001 cluster1-dt-001 100.0.0.1 10.0.0.1 running
														
 
															-""", out.getvalue().replace("\t", " "))
														
 
															-
														
 
															-if __name__ == '__main__':
														
 
															-  unittest.main()
														
--- a/src/contrib/cloud/src/test/py/teststorage.py
+++ b/src/contrib/cloud/src/test/py/teststorage.py
@@ -1,143 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import os
														
 
															-import unittest
														
 
															-
														
 
															-import simplejson as json
														
 
															-from StringIO import StringIO
														
 
															-
														
 
															-from hadoop.cloud.storage import MountableVolume
														
 
															-from hadoop.cloud.storage import JsonVolumeManager
														
 
															-from hadoop.cloud.storage import JsonVolumeSpecManager
														
 
															-
														
 
															-spec = {
														
 
															- "master": ({"size_gb":"8", "mount_point":"/", "device":"/dev/sdj",
														
 
															-             "snapshot_id": "snap_1"},
														
 
															-            ),
														
 
															- "slave": ({"size_gb":"8", "mount_point":"/", "device":"/dev/sdj",
														
 
															-            "snapshot_id": "snap_2"},
														
 
															-           {"size_gb":"10", "mount_point":"/data1", "device":"/dev/sdk",
														
 
															-            "snapshot_id": "snap_3"},
														
 
															-           )
														
 
															- }
														
 
															-
														
 
															-class TestJsonVolumeSpecManager(unittest.TestCase):
														
 
															-
														
 
															-  def test_volume_specs_for_role(self):
														
 
															-
														
 
															-    input = StringIO(json.dumps(spec))
														
 
															-
														
 
															-    volume_spec_manager = JsonVolumeSpecManager(input)
														
 
															-
														
 
															-    master_specs = volume_spec_manager.volume_specs_for_role("master")
														
 
															-    self.assertEqual(1, len(master_specs))
														
 
															-    self.assertEqual("/", master_specs[0].mount_point)
														
 
															-    self.assertEqual("8", master_specs[0].size)
														
 
															-    self.assertEqual("/dev/sdj", master_specs[0].device)
														
 
															-    self.assertEqual("snap_1", master_specs[0].snapshot_id)
														
 
															-
														
 
															-    slave_specs = volume_spec_manager.volume_specs_for_role("slave")
														
 
															-    self.assertEqual(2, len(slave_specs))
														
 
															-    self.assertEqual("snap_2", slave_specs[0].snapshot_id)
														
 
															-    self.assertEqual("snap_3", slave_specs[1].snapshot_id)
														
 
															-
														
 
															-    self.assertRaises(KeyError, volume_spec_manager.volume_specs_for_role,
														
 
															-                      "no-such-role")
														
 
															-
														
 
															-  def test_get_mappings_string_for_role(self):
														
 
															-
														
 
															-    input = StringIO(json.dumps(spec))
														
 
															-
														
 
															-    volume_spec_manager = JsonVolumeSpecManager(input)
														
 
															-
														
 
															-    master_mappings = volume_spec_manager.get_mappings_string_for_role("master")
														
 
															-    self.assertEqual("/,/dev/sdj", master_mappings)
														
 
															-
														
 
															-    slave_mappings = volume_spec_manager.get_mappings_string_for_role("slave")
														
 
															-    self.assertEqual("/,/dev/sdj;/data1,/dev/sdk", slave_mappings)
														
 
															-
														
 
															-    self.assertRaises(KeyError,
														
 
															-                      volume_spec_manager.get_mappings_string_for_role,
														
 
															-                      "no-such-role")
														
 
															-
														
 
															-class TestJsonVolumeManager(unittest.TestCase):
														
 
															-
														
 
															-  def tearDown(self):
														
 
															-    try:
														
 
															-      os.remove("volumemanagertest.json")
														
 
															-    except OSError:
														
 
															-      pass
														
 
															-    
														
 
															-  def test_add_instance_storage_for_role(self):
														
 
															-    volume_manager = JsonVolumeManager("volumemanagertest.json")
														
 
															-    self.assertEqual(0,
														
 
															-      len(volume_manager.get_instance_storage_for_role("master")))
														
 
															-    self.assertEqual(0, len(volume_manager.get_roles()))
														
 
															-
														
 
															-    volume_manager.add_instance_storage_for_role("master",
														
 
															-                                                 [MountableVolume("vol_1", "/",
														
 
															-                                                                  "/dev/sdj")])
														
 
															-    master_storage = volume_manager.get_instance_storage_for_role("master")
														
 
															-    self.assertEqual(1, len(master_storage))
														
 
															-    master_storage_instance0 = master_storage[0]
														
 
															-    self.assertEqual(1, len(master_storage_instance0))
														
 
															-    master_storage_instance0_vol0 = master_storage_instance0[0]
														
 
															-    self.assertEqual("vol_1", master_storage_instance0_vol0.volume_id)
														
 
															-    self.assertEqual("/", master_storage_instance0_vol0.mount_point)
														
 
															-    self.assertEqual("/dev/sdj", master_storage_instance0_vol0.device)
														
 
															-
														
 
															-    volume_manager.add_instance_storage_for_role("slave",
														
 
															-                                                 [MountableVolume("vol_2", "/",
														
 
															-                                                                  "/dev/sdj")])
														
 
															-    self.assertEqual(1,
														
 
															-      len(volume_manager.get_instance_storage_for_role("master")))
														
 
															-    slave_storage = volume_manager.get_instance_storage_for_role("slave")
														
 
															-    self.assertEqual(1, len(slave_storage))
														
 
															-    slave_storage_instance0 = slave_storage[0]
														
 
															-    self.assertEqual(1, len(slave_storage_instance0))
														
 
															-    slave_storage_instance0_vol0 = slave_storage_instance0[0]
														
 
															-    self.assertEqual("vol_2", slave_storage_instance0_vol0.volume_id)
														
 
															-    self.assertEqual("/", slave_storage_instance0_vol0.mount_point)
														
 
															-    self.assertEqual("/dev/sdj", slave_storage_instance0_vol0.device)
														
 
															-
														
 
															-    volume_manager.add_instance_storage_for_role("slave",
														
 
															-      [MountableVolume("vol_3", "/", "/dev/sdj"),
														
 
															-       MountableVolume("vol_4", "/data1", "/dev/sdk")])
														
 
															-    self.assertEqual(1,
														
 
															-      len(volume_manager.get_instance_storage_for_role("master")))
														
 
															-    slave_storage = volume_manager.get_instance_storage_for_role("slave")
														
 
															-    self.assertEqual(2, len(slave_storage))
														
 
															-    slave_storage_instance0 = slave_storage[0]
														
 
															-    slave_storage_instance1 = slave_storage[1]
														
 
															-    self.assertEqual(1, len(slave_storage_instance0))
														
 
															-    self.assertEqual(2, len(slave_storage_instance1))
														
 
															-    slave_storage_instance1_vol0 = slave_storage_instance1[0]
														
 
															-    slave_storage_instance1_vol1 = slave_storage_instance1[1]
														
 
															-    self.assertEqual("vol_3", slave_storage_instance1_vol0.volume_id)
														
 
															-    self.assertEqual("/", slave_storage_instance1_vol0.mount_point)
														
 
															-    self.assertEqual("/dev/sdj", slave_storage_instance1_vol0.device)
														
 
															-    self.assertEqual("vol_4", slave_storage_instance1_vol1.volume_id)
														
 
															-    self.assertEqual("/data1", slave_storage_instance1_vol1.mount_point)
														
 
															-    self.assertEqual("/dev/sdk", slave_storage_instance1_vol1.device)
														
 
															-    
														
 
															-    roles = volume_manager.get_roles()
														
 
															-    self.assertEqual(2, len(roles))
														
 
															-    self.assertTrue("slave" in roles)
														
 
															-    self.assertTrue("master" in roles)
														
 
															-
														
 
															-
														
 
															-if __name__ == '__main__':
														
 
															-  unittest.main()
														
--- a/src/contrib/cloud/src/test/py/testuserdata.py
+++ b/src/contrib/cloud/src/test/py/testuserdata.py
@@ -1,44 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import tempfile
														
 
															-import unittest
														
 
															-
														
 
															-from hadoop.cloud.cluster import InstanceUserData
														
 
															-
														
 
															-class TestInstanceUserData(unittest.TestCase):
														
 
															-
														
 
															-  def test_replacement(self):
														
 
															-    file = tempfile.NamedTemporaryFile()
														
 
															-    file.write("Contents go here")
														
 
															-    file.flush()
														
 
															-    self.assertEqual("Contents go here",
														
 
															-                     InstanceUserData(file.name, {}).read())
														
 
															-    self.assertEqual("Contents were here",
														
 
															-                     InstanceUserData(file.name, { "go": "were"}).read())
														
 
															-    self.assertEqual("Contents  here",
														
 
															-                     InstanceUserData(file.name, { "go": None}).read())
														
 
															-    file.close()
														
 
															-
														
 
															-  def test_read_file_url(self):
														
 
															-    file = tempfile.NamedTemporaryFile()
														
 
															-    file.write("Contents go here")
														
 
															-    file.flush()
														
 
															-    self.assertEqual("Contents go here",
														
 
															-                     InstanceUserData("file://%s" % file.name, {}).read())
														
 
															-    file.close()
														
 
															-
														
 
															-if __name__ == '__main__':
														
 
															-  unittest.main()
														
--- a/src/contrib/cloud/src/test/py/testutil.py
+++ b/src/contrib/cloud/src/test/py/testutil.py
@@ -1,81 +0,0 @@
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-import ConfigParser
														
 
															-import StringIO
														
 
															-import unittest
														
 
															-
														
 
															-from hadoop.cloud.util import bash_quote
														
 
															-from hadoop.cloud.util import bash_quote_env
														
 
															-from hadoop.cloud.util import build_env_string
														
 
															-from hadoop.cloud.util import merge_config_with_options
														
 
															-from hadoop.cloud.util import xstr
														
 
															-
														
 
															-class TestUtilFunctions(unittest.TestCase):
														
 
															-
														
 
															-  def test_bash_quote(self):
														
 
															-    self.assertEqual("", bash_quote(None))
														
 
															-    self.assertEqual("''", bash_quote(""))
														
 
															-    self.assertEqual("'a'", bash_quote("a"))
														
 
															-    self.assertEqual("'a b'", bash_quote("a b"))
														
 
															-    self.assertEqual("'a\b'", bash_quote("a\b"))
														
 
															-    self.assertEqual("'a '\\'' b'", bash_quote("a ' b"))
														
 
															-
														
 
															-  def test_bash_quote_env(self):
														
 
															-    self.assertEqual("", bash_quote_env(""))
														
 
															-    self.assertEqual("a", bash_quote_env("a"))
														
 
															-    self.assertEqual("a='b'", bash_quote_env("a=b"))
														
 
															-    self.assertEqual("a='b c'", bash_quote_env("a=b c"))
														
 
															-    self.assertEqual("a='b\c'", bash_quote_env("a=b\c"))
														
 
															-    self.assertEqual("a='b '\\'' c'", bash_quote_env("a=b ' c"))
														
 
															-
														
 
															-  def test_build_env_string(self):
														
 
															-    self.assertEqual("", build_env_string())
														
 
															-    self.assertEqual("a='b' c='d'",
														
 
															-                     build_env_string(env_strings=["a=b", "c=d"]))
														
 
															-    self.assertEqual("a='b' c='d'",
														
 
															-                     build_env_string(pairs={"a": "b", "c": "d"}))
														
 
															-
														
 
															-  def test_merge_config_with_options(self):
														
 
															-    options = { "a": "b" }
														
 
															-    config = ConfigParser.ConfigParser()
														
 
															-    self.assertEqual({ "a": "b" },
														
 
															-                     merge_config_with_options("section", config, options))
														
 
															-    config.add_section("section")
														
 
															-    self.assertEqual({ "a": "b" },
														
 
															-                     merge_config_with_options("section", config, options))
														
 
															-    config.set("section", "a", "z")
														
 
															-    config.set("section", "c", "d")
														
 
															-    self.assertEqual({ "a": "z", "c": "d" },
														
 
															-                     merge_config_with_options("section", config, {}))
														
 
															-    self.assertEqual({ "a": "b", "c": "d" },
														
 
															-                     merge_config_with_options("section", config, options))
														
 
															-
														
 
															-  def test_merge_config_with_options_list(self):
														
 
															-    config = ConfigParser.ConfigParser()
														
 
															-    config.readfp(StringIO.StringIO("""[section]
														
 
															-env1=a=b
														
 
															- c=d
														
 
															-env2=e=f
														
 
															- g=h"""))
														
 
															-    self.assertEqual({ "env1": ["a=b", "c=d"], "env2": ["e=f", "g=h"] },
														
 
															-                     merge_config_with_options("section", config, {}))
														
 
															-
														
 
															-  def test_xstr(self):
														
 
															-    self.assertEqual("", xstr(None))
														
 
															-    self.assertEqual("a", xstr("a"))
														
 
															-
														
 
															-if __name__ == '__main__':
														
 
															-  unittest.main()
														
--- a/src/contrib/cloud/tools/rackspace/remote-setup.sh
+++ b/src/contrib/cloud/tools/rackspace/remote-setup.sh
@@ -1,46 +0,0 @@
 
															-#!/bin/bash -x
														
 
															-#
														
 
															-# Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															-# contributor license agreements.  See the NOTICE file distributed with
														
 
															-# this work for additional information regarding copyright ownership.
														
 
															-# The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															-# (the "License"); you may not use this file except in compliance with
														
 
															-# the License.  You may obtain a copy of the License at
														
 
															-#
														
 
															-#     http://www.apache.org/licenses/LICENSE-2.0
														
 
															-#
														
 
															-# Unless required by applicable law or agreed to in writing, software
														
 
															-# distributed under the License is distributed on an "AS IS" BASIS,
														
 
															-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															-# See the License for the specific language governing permissions and
														
 
															-# limitations under the License.
														
 
															-
														
 
															-#
														
 
															-# Given an Ubuntu base system install, install the base packages we need.
														
 
															-#
														
 
															-
														
 
															-# We require multiverse to be enabled.
														
 
															-cat >> /etc/apt/sources.list << EOF
														
 
															-deb http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
														
 
															-deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
														
 
															-deb http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse
														
 
															-deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse
														
 
															-EOF
														
 
															-
														
 
															-apt-get update
														
 
															-
														
 
															-# Install Java
														
 
															-apt-get -y install sun-java6-jdk
														
 
															-echo "export JAVA_HOME=/usr/lib/jvm/java-6-sun" >> /etc/profile
														
 
															-export JAVA_HOME=/usr/lib/jvm/java-6-sun
														
 
															-java -version
														
 
															-
														
 
															-# Install general packages
														
 
															-apt-get -y install vim curl screen ssh rsync unzip openssh-server
														
 
															-apt-get -y install policykit # http://www.bergek.com/2008/11/24/ubuntu-810-libpolkit-error/
														
 
															-
														
 
															-# Create root's .ssh directory if it doesn't exist
														
 
															-mkdir -p /root/.ssh
														
 
															-
														
 
															-# Run any rackspace init script injected at boot time
														
 
															-echo '[ -f /etc/init.d/rackspace-init.sh ] && /bin/sh /etc/init.d/rackspace-init.sh; exit 0' > /etc/rc.local