16 years ago · 0bd9508ddb
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -13,8 +13,6 @@ Trunk (unreleased changes)
 
															     HDFS-461. Tool to analyze file size distribution in HDFS. (shv)
														
 
															-    HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
														
 
															-
														
 
															   IMPROVEMENTS
														
 
															     HDFS-381. Remove blocks from DataNode maps when corresponding file
														
@@ -70,6 +68,11 @@ Trunk (unreleased changes)
 
															     HDFS-504. Update the modification time of a file when the file 
														
 
															     is closed. (Chun Zhang via dhruba)
														
 
															+    HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
														
 
															+
														
 
															+    HDFS-498. Add development guide and documentation for the fault injection
														
 
															+    framework.  (Konstantin Boudnik via szetszwo)
														
 
															+
														
 
															   BUG FIXES
														
 
															     HDFS-76. Better error message to users when commands fail because of 
														
 
															     lack of quota. Allow quota to be set even if the limit is lower than
														
--- a/src/docs/src/documentation/content/xdocs/faultinject_framework.xml
+++ b/src/docs/src/documentation/content/xdocs/faultinject_framework.xml
@@ -0,0 +1,390 @@
 
															+<?xml version="1.0"?>
														
 
															+<!--
														
 
															+   Licensed to the Apache Software Foundation (ASF) under one or more
														
 
															+   contributor license agreements.  See the NOTICE file distributed with
														
 
															+   this work for additional information regarding copyright ownership.
														
 
															+   The ASF licenses this file to You under the Apache License, Version 2.0
														
 
															+   (the "License"); you may not use this file except in compliance with
														
 
															+   the License.  You may obtain a copy of the License at
														
 
															+
														
 
															+       http://www.apache.org/licenses/LICENSE-2.0
														
 
															+
														
 
															+   Unless required by applicable law or agreed to in writing, software
														
 
															+   distributed under the License is distributed on an "AS IS" BASIS,
														
 
															+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
														
 
															+   See the License for the specific language governing permissions and
														
 
															+   limitations under the License.
														
 
															+-->
														
 
															+
														
 
															+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
														
 
															+
														
 
															+
														
 
															+<document>
														
 
															+  <header>
														
 
															+    <title>Fault injection Framework and Development Guide</title>
														
 
															+  </header>
														
 
															+
														
 
															+  <body>
														
 
															+    <section>
														
 
															+      <title>Introduction</title>
														
 
															+      <p>The following is a brief help for Hadoops' Fault Injection (FI)
														
 
															+        Framework and Developer's Guide for those who will be developing
														
 
															+        their own faults (aspects).
														
 
															+      </p>
														
 
															+      <p>An idea of Fault Injection (FI) is fairly simple: it is an
														
 
															+        infusion of errors and exceptions into an application's logic to
														
 
															+        achieve a higher coverage and fault tolerance of the system.
														
 
															+        Different implementations of this idea are available at this day.
														
 
															+        Hadoop's FI framework is built on top of Aspect Oriented Paradigm
														
 
															+        (AOP) implemented by AspectJ toolkit.
														
 
															+      </p>
														
 
															+    </section>
														
 
															+    <section>
														
 
															+      <title>Assumptions</title>
														
 
															+      <p>The current implementation of the framework assumes that the faults it
														
 
															+        will be emulating are of non-deterministic nature. i.e. the moment
														
 
															+        of a fault's happening isn't known in advance and is a coin-flip
														
 
															+        based.
														
 
															+      </p>
														
 
															+    </section>
														
 
															+    <section>
														
 
															+      <title>Architecture of the Fault Injection Framework</title>
														
 
															+      <figure src="images/FI-framework.gif" alt="Components layout" />
														
 
															+      <section>
														
 
															+        <title>Configuration management</title>
														
 
															+        <p>This piece of the framework allow to
														
 
															+          set expectations for faults to happen. The settings could be applied
														
 
															+          either statically (in advance) or in a runtime. There's two ways to
														
 
															+          configure desired level of faults in the framework:
														
 
															+        </p>
														
 
															+        <ul>
														
 
															+          <li>
														
 
															+            editing
														
 
															+            <code>src/aop/fi-site.xml</code>
														
 
															+            configuration file. This file is similar to other Hadoop's config
														
 
															+            files
														
 
															+          </li>
														
 
															+          <li>
														
 
															+            setting system properties of JVM through VM startup parameters or in
														
 
															+            <code>build.properties</code>
														
 
															+            file
														
 
															+          </li>
														
 
															+        </ul>
														
 
															+      </section>
														
 
															+      <section>
														
 
															+        <title>Probability model</title>
														
 
															+        <p>This fundamentally is a coin flipper. The methods of this class are
														
 
															+          getting a random number between 0.0
														
 
															+          and 1.0 and then checking if new number has happened to be in the
														
 
															+          range of
														
 
															+          0.0 and a configured level for the fault in question. If that
														
 
															+          condition
														
 
															+          is true then the fault will occur.
														
 
															+        </p>
														
 
															+        <p>Thus, to guarantee a happening of a fault one needs to set an
														
 
															+          appropriate level to 1.0.
														
 
															+          To completely prevent a fault from happening its probability level
														
 
															+          has to be set to 0.0
														
 
															+        </p>
														
 
															+        <p><strong>Nota bene</strong>: default probability level is set to 0
														
 
															+          (zero) unless the level is changed explicitly through the
														
 
															+          configuration file or in the runtime. The name of the default
														
 
															+          level's configuration parameter is
														
 
															+          <code>fi.*</code>
														
 
															+        </p>
														
 
															+      </section>
														
 
															+      <section>
														
 
															+        <title>Fault injection mechanism: AOP and AspectJ</title>
														
 
															+        <p>In the foundation of Hadoop's fault injection framework lays
														
 
															+          cross-cutting concept implemented by AspectJ. The following basic
														
 
															+          terms are important to remember:
														
 
															+        </p>
														
 
															+        <ul>
														
 
															+          <li>
														
 
															+            <strong>A cross-cutting concept</strong>
														
 
															+            (aspect) is behavior, and often data, that is used across the scope
														
 
															+            of a piece of software
														
 
															+          </li>
														
 
															+          <li>In AOP, the
														
 
															+            <strong>aspects</strong>
														
 
															+            provide a mechanism by which a cross-cutting concern can be
														
 
															+            specified in a modular way
														
 
															+          </li>
														
 
															+          <li>
														
 
															+            <strong>Advice</strong>
														
 
															+            is the
														
 
															+            code that is executed when an aspect is invoked
														
 
															+          </li>
														
 
															+          <li>
														
 
															+            <strong>Join point</strong>
														
 
															+            (or pointcut) is a specific
														
 
															+            point within the application that may or not invoke some advice
														
 
															+          </li>
														
 
															+        </ul>
														
 
															+      </section>
														
 
															+      <section>
														
 
															+        <title>Existing join points</title>
														
 
															+        <p>
														
 
															+          The following readily available join points are provided by AspectJ:
														
 
															+        </p>
														
 
															+        <ul>
														
 
															+          <li>Join when a method is called
														
 
															+          </li>
														
 
															+          <li>Join during a method's execution
														
 
															+          </li>
														
 
															+          <li>Join when a constructor is invoked
														
 
															+          </li>
														
 
															+          <li>Join during a constructor's execution
														
 
															+          </li>
														
 
															+          <li>Join during aspect advice execution
														
 
															+          </li>
														
 
															+          <li>Join before an object is initialized
														
 
															+          </li>
														
 
															+          <li>Join during object initialization
														
 
															+          </li>
														
 
															+          <li>Join during static initializer execution
														
 
															+          </li>
														
 
															+          <li>Join when a class's field is referenced
														
 
															+          </li>
														
 
															+          <li>Join when a class's field is assigned
														
 
															+          </li>
														
 
															+          <li>Join when a handler is executed
														
 
															+          </li>
														
 
															+        </ul>
														
 
															+      </section>
														
 
															+    </section>
														
 
															+    <section>
														
 
															+      <title>Aspects examples</title>
														
 
															+      <source>
														
 
															+package org.apache.hadoop.hdfs.server.datanode;
														
 
															+
														
 
															+import org.apache.commons.logging.Log;
														
 
															+import org.apache.commons.logging.LogFactory;
														
 
															+import org.apache.hadoop.fi.ProbabilityModel;
														
 
															+import org.apache.hadoop.hdfs.server.datanode.DataNode;
														
 
															+import org.apache.hadoop.util.DiskChecker.*;
														
 
															+
														
 
															+import java.io.IOException;
														
 
															+import java.io.OutputStream;
														
 
															+import java.io.DataOutputStream;
														
 
															+
														
 
															+/**
														
 
															+* This aspect takes care about faults injected into datanode.BlockReceiver
														
 
															+* class
														
 
															+*/
														
 
															+public aspect BlockReceiverAspects {
														
 
															+  public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
														
 
															+
														
 
															+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
														
 
															+    pointcut callReceivePacket() : call (* OutputStream.write(..))
														
 
															+      &amp;&amp; withincode (* BlockReceiver.receivePacket(..))
														
 
															+    // to further limit the application of this aspect a very narrow 'target' can be used as follows
														
 
															+    // &amp;&amp; target(DataOutputStream)
														
 
															+      &amp;&amp; !within(BlockReceiverAspects +);
														
 
															+
														
 
															+  before () throws IOException : callReceivePacket () {
														
 
															+    if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
														
 
															+      LOG.info("Before the injection point");
														
 
															+      Thread.dumpStack();
														
 
															+      throw new DiskOutOfSpaceException ("FI: injected fault point at " +
														
 
															+      thisJoinPoint.getStaticPart( ).getSourceLocation());
														
 
															+    }
														
 
															+  }
														
 
															+}
														
 
															+      </source>
														
 
															+      <p>
														
 
															+        The aspect has two main parts: the join point
														
 
															+        <code>pointcut callReceivepacket()</code>
														
 
															+        which servers as an identification mark of a specific point (in control
														
 
															+        and/or data flow) in the life of an application. A call to the advice -
														
 
															+        <code>before () throws IOException : callReceivepacket()</code>
														
 
															+        - will be
														
 
															+        <a href="#Putting+it+all+together">injected</a>
														
 
															+        before that specific spot of the application's code.
														
 
															+      </p>
														
 
															+
														
 
															+      <p>The pointcut identifies an invocation of class'
														
 
															+        <code>java.io.OutputStream write()</code>
														
 
															+        method
														
 
															+        with any number of parameters and any return type. This invoke should
														
 
															+        take place within the body of method
														
 
															+        <code>receivepacket()</code>
														
 
															+        from class<code>BlockReceiver</code>.
														
 
															+        The method can have any parameters and any return type. possible
														
 
															+        invocations of
														
 
															+        <code>write()</code>
														
 
															+        method happening anywhere within the aspect
														
 
															+        <code>BlockReceiverAspects</code>
														
 
															+        or its heirs will be ignored.
														
 
															+      </p>
														
 
															+      <p><strong>Note 1</strong>: This short example doesn't illustrate
														
 
															+        the fact that you can have more than a single injection point per
														
 
															+        class. In such a case the names of the faults have to be different
														
 
															+        if a developer wants to trigger them separately.
														
 
															+      </p>
														
 
															+      <p><strong>Note 2</strong>: After
														
 
															+        <a href="#Putting+it+all+together">injection step</a>
														
 
															+        you can verify that the faults were properly injected by
														
 
															+        searching for
														
 
															+        <code>ajc</code>
														
 
															+        keywords in a disassembled class file.
														
 
															+      </p>
														
 
															+
														
 
															+    </section>
														
 
															+    
														
 
															+    <section>
														
 
															+      <title>Fault naming convention &amp; namespaces</title>
														
 
															+      <p>For the sake of unified naming
														
 
															+      convention the following two types of names are recommended for a
														
 
															+      new aspects development:</p>
														
 
															+      <ul>
														
 
															+        <li>Activity specific notation (as
														
 
															+          when we don't care about a particular location of a fault's
														
 
															+          happening). In this case the name of the fault is rather abstract:
														
 
															+          <code>fi.hdfs.DiskError</code>
														
 
															+        </li>
														
 
															+        <li>Location specific notation.
														
 
															+          Here, the fault's name is mnemonic as in:
														
 
															+          <code>fi.hdfs.datanode.BlockReceiver[optional location details]</code>
														
 
															+        </li>
														
 
															+      </ul>
														
 
															+    </section>
														
 
															+
														
 
															+    <section>
														
 
															+      <title>Development tools</title>
														
 
															+      <ul>
														
 
															+        <li>Eclipse
														
 
															+          <a href="http://www.eclipse.org/ajdt/">AspectJ
														
 
															+            Development Toolkit
														
 
															+          </a>
														
 
															+          might help you in the aspects' development
														
 
															+          process.
														
 
															+        </li>
														
 
															+        <li>IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
														
 
															+        </li>
														
 
															+      </ul>
														
 
															+    </section>
														
 
															+
														
 
															+    <section>
														
 
															+      <title>Putting it all together</title>
														
 
															+      <p>Faults (or aspects) have to injected (or woven) together before
														
 
															+        they can be used. Here's a step-by-step instruction how this can be
														
 
															+        done.</p>
														
 
															+      <p>Weaving aspects in place:</p>
														
 
															+      <source>
														
 
															+% ant injectfaults
														
 
															+      </source>
														
 
															+      <p>If you
														
 
															+        misidentified the join point of your aspect then you'll see a
														
 
															+        warning similar to this one below when 'injectfaults' target is
														
 
															+        completed:</p>
														
 
															+        <source>
														
 
															+[iajc] warning at
														
 
															+src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
														
 
															+          BlockReceiverAspects.aj:44::0
														
 
															+advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
														
 
															+has not been applied [Xlint:adviceDidNotMatch]
														
 
															+        </source>
														
 
															+      <p>It isn't an error, so the build will report the successful result.
														
 
															+
														
 
															+        To prepare dev.jar file with all your faults weaved in
														
 
															+      place run (HDFS-475 pending)</p>
														
 
															+        <source>
														
 
															+% ant jar-fault-inject
														
 
															+        </source>
														
 
															+
														
 
															+      <p>Test jars can be created by</p>
														
 
															+        <source>
														
 
															+% ant jar-test-fault-inject
														
 
															+        </source>
														
 
															+
														
 
															+      <p>To run HDFS tests with faults injected:</p>
														
 
															+        <source>
														
 
															+% ant run-test-hdfs-fault-inject
														
 
															+        </source>
														
 
															+      <section>
														
 
															+        <title>How to use fault injection framework</title>
														
 
															+        <p>Faults could be triggered by the following two meanings:
														
 
															+        </p>
														
 
															+        <ul>
														
 
															+          <li>In the runtime as:
														
 
															+            <source>
														
 
															+% ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
														
 
															+            </source>
														
 
															+            To set a certain level, e.g. 25%, of all injected faults one can run
														
 
															+            <br/>
														
 
															+            <source>
														
 
															+% ant run-test-hdfs-fault-inject -Dfi.*=0.25
														
 
															+            </source>
														
 
															+          </li>
														
 
															+          <li>or from a program as follows:
														
 
															+          </li>
														
 
															+        </ul>
														
 
															+        <source>
														
 
															+package org.apache.hadoop.fs;
														
 
															+
														
 
															+import org.junit.Test;
														
 
															+import org.junit.Before;
														
 
															+import junit.framework.TestCase;
														
 
															+
														
 
															+public class DemoFiTest extends TestCase {
														
 
															+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
														
 
															+  @Override
														
 
															+  @Before
														
 
															+  public void setUp(){
														
 
															+    //Setting up the test's environment as required
														
 
															+  }
														
 
															+
														
 
															+  @Test
														
 
															+  public void testFI() {
														
 
															+    // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
														
 
															+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
														
 
															+    //
														
 
															+    // The main logic of your tests goes here
														
 
															+    //
														
 
															+    // Now set the level back to 0 (zero) to prevent this fault from happening again
														
 
															+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
														
 
															+    // or delete its trigger completely
														
 
															+    System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
														
 
															+  }
														
 
															+
														
 
															+  @Override
														
 
															+  @After
														
 
															+  public void tearDown() {
														
 
															+    //Cleaning up test test environment
														
 
															+  }
														
 
															+}
														
 
															+        </source>
														
 
															+        <p>
														
 
															+          as you can see above these two methods do the same thing. They are
														
 
															+          setting the probability level of
														
 
															+          <code>hdfs.datanode.BlockReceiver</code>
														
 
															+          at 12%.
														
 
															+          The difference, however, is that the program provides more
														
 
															+          flexibility and allows to turn a fault off when a test doesn't need
														
 
															+          it anymore.
														
 
															+        </p>
														
 
															+      </section>
														
 
															+    </section>
														
 
															+
														
 
															+    <section>
														
 
															+      <title>Additional information and contacts</title>
														
 
															+      <p>This two sources of information seem to be particularly
														
 
															+        interesting and worth further reading:
														
 
															+      </p>
														
 
															+      <ul>
														
 
															+        <li>
														
 
															+          <a href="http://www.eclipse.org/aspectj/doc/next/devguide/">
														
 
															+            http://www.eclipse.org/aspectj/doc/next/devguide/
														
 
															+          </a>
														
 
															+        </li>
														
 
															+        <li>AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
														
 
															+        </li>
														
 
															+      </ul>
														
 
															+      <p>Should you have any farther comments or questions to the author
														
 
															+        check
														
 
															+        <a href="http://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>
														
 
															+      </p>
														
 
															+    </section>
														
 
															+  </body>
														
 
															+</document>
														
--- a/src/docs/src/documentation/content/xdocs/site.xml
+++ b/src/docs/src/documentation/content/xdocs/site.xml
@@ -60,6 +60,9 @@ See http://forrest.apache.org/docs/linking.html for more info.
 
															 		<hdfs_SLG        			label="Synthetic Load Generator Guide"  href="SLG_user_guide.html" />
														
 
															 		<hdfs_imageviewer						label="Offline Image Viewer Guide"	href="hdfs_imageviewer.html" />
														
 
															 		<hdfs_libhdfs   				label="C API libhdfs"         						href="libhdfs.html" /> 
														
 
															+                <docs label="Testing">
														
 
															+                    <faultinject_framework              label="Fault Injection"                                                     href="faultinject_framework.html" />
														
 
															+                </docs>
														
 
															    </docs> 
														
 
															    <docs label="HOD">
														
--- a/src/docs/src/documentation/resources/images/FI-framework.gif
+++ b/src/docs/src/documentation/resources/images/FI-framework.gif
--- a/src/docs/src/documentation/resources/images/FI-framework.odg
+++ b/src/docs/src/documentation/resources/images/FI-framework.odg