Browse Source

HDFS-498. Add development guide and documentation for the fault injection framework. Contributed by Konstantin Boudnik

git-svn-id: https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@800910 13f79535-47bb-0310-9956-ffa450edef68
Tsz-wo Sze 16 years ago
parent
commit
0bd9508ddb

+ 5 - 2
CHANGES.txt

@@ -13,8 +13,6 @@ Trunk (unreleased changes)
 
 
     HDFS-461. Tool to analyze file size distribution in HDFS. (shv)
     HDFS-461. Tool to analyze file size distribution in HDFS. (shv)
 
 
-    HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
-
   IMPROVEMENTS
   IMPROVEMENTS
 
 
     HDFS-381. Remove blocks from DataNode maps when corresponding file
     HDFS-381. Remove blocks from DataNode maps when corresponding file
@@ -70,6 +68,11 @@ Trunk (unreleased changes)
     HDFS-504. Update the modification time of a file when the file 
     HDFS-504. Update the modification time of a file when the file 
     is closed. (Chun Zhang via dhruba)
     is closed. (Chun Zhang via dhruba)
 
 
+    HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
+
+    HDFS-498. Add development guide and documentation for the fault injection
+    framework.  (Konstantin Boudnik via szetszwo)
+
   BUG FIXES
   BUG FIXES
     HDFS-76. Better error message to users when commands fail because of 
     HDFS-76. Better error message to users when commands fail because of 
     lack of quota. Allow quota to be set even if the limit is lower than
     lack of quota. Allow quota to be set even if the limit is lower than

+ 390 - 0
src/docs/src/documentation/content/xdocs/faultinject_framework.xml

@@ -0,0 +1,390 @@
+<?xml version="1.0"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+
+<document>
+  <header>
+    <title>Fault injection Framework and Development Guide</title>
+  </header>
+
+  <body>
+    <section>
+      <title>Introduction</title>
+      <p>The following is a brief help for Hadoops' Fault Injection (FI)
+        Framework and Developer's Guide for those who will be developing
+        their own faults (aspects).
+      </p>
+      <p>An idea of Fault Injection (FI) is fairly simple: it is an
+        infusion of errors and exceptions into an application's logic to
+        achieve a higher coverage and fault tolerance of the system.
+        Different implementations of this idea are available at this day.
+        Hadoop's FI framework is built on top of Aspect Oriented Paradigm
+        (AOP) implemented by AspectJ toolkit.
+      </p>
+    </section>
+    <section>
+      <title>Assumptions</title>
+      <p>The current implementation of the framework assumes that the faults it
+        will be emulating are of non-deterministic nature. i.e. the moment
+        of a fault's happening isn't known in advance and is a coin-flip
+        based.
+      </p>
+    </section>
+    <section>
+      <title>Architecture of the Fault Injection Framework</title>
+      <figure src="images/FI-framework.gif" alt="Components layout" />
+      <section>
+        <title>Configuration management</title>
+        <p>This piece of the framework allow to
+          set expectations for faults to happen. The settings could be applied
+          either statically (in advance) or in a runtime. There's two ways to
+          configure desired level of faults in the framework:
+        </p>
+        <ul>
+          <li>
+            editing
+            <code>src/aop/fi-site.xml</code>
+            configuration file. This file is similar to other Hadoop's config
+            files
+          </li>
+          <li>
+            setting system properties of JVM through VM startup parameters or in
+            <code>build.properties</code>
+            file
+          </li>
+        </ul>
+      </section>
+      <section>
+        <title>Probability model</title>
+        <p>This fundamentally is a coin flipper. The methods of this class are
+          getting a random number between 0.0
+          and 1.0 and then checking if new number has happened to be in the
+          range of
+          0.0 and a configured level for the fault in question. If that
+          condition
+          is true then the fault will occur.
+        </p>
+        <p>Thus, to guarantee a happening of a fault one needs to set an
+          appropriate level to 1.0.
+          To completely prevent a fault from happening its probability level
+          has to be set to 0.0
+        </p>
+        <p><strong>Nota bene</strong>: default probability level is set to 0
+          (zero) unless the level is changed explicitly through the
+          configuration file or in the runtime. The name of the default
+          level's configuration parameter is
+          <code>fi.*</code>
+        </p>
+      </section>
+      <section>
+        <title>Fault injection mechanism: AOP and AspectJ</title>
+        <p>In the foundation of Hadoop's fault injection framework lays
+          cross-cutting concept implemented by AspectJ. The following basic
+          terms are important to remember:
+        </p>
+        <ul>
+          <li>
+            <strong>A cross-cutting concept</strong>
+            (aspect) is behavior, and often data, that is used across the scope
+            of a piece of software
+          </li>
+          <li>In AOP, the
+            <strong>aspects</strong>
+            provide a mechanism by which a cross-cutting concern can be
+            specified in a modular way
+          </li>
+          <li>
+            <strong>Advice</strong>
+            is the
+            code that is executed when an aspect is invoked
+          </li>
+          <li>
+            <strong>Join point</strong>
+            (or pointcut) is a specific
+            point within the application that may or not invoke some advice
+          </li>
+        </ul>
+      </section>
+      <section>
+        <title>Existing join points</title>
+        <p>
+          The following readily available join points are provided by AspectJ:
+        </p>
+        <ul>
+          <li>Join when a method is called
+          </li>
+          <li>Join during a method's execution
+          </li>
+          <li>Join when a constructor is invoked
+          </li>
+          <li>Join during a constructor's execution
+          </li>
+          <li>Join during aspect advice execution
+          </li>
+          <li>Join before an object is initialized
+          </li>
+          <li>Join during object initialization
+          </li>
+          <li>Join during static initializer execution
+          </li>
+          <li>Join when a class's field is referenced
+          </li>
+          <li>Join when a class's field is assigned
+          </li>
+          <li>Join when a handler is executed
+          </li>
+        </ul>
+      </section>
+    </section>
+    <section>
+      <title>Aspects examples</title>
+      <source>
+package org.apache.hadoop.hdfs.server.datanode;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fi.ProbabilityModel;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.util.DiskChecker.*;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.DataOutputStream;
+
+/**
+* This aspect takes care about faults injected into datanode.BlockReceiver
+* class
+*/
+public aspect BlockReceiverAspects {
+  public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
+
+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+    pointcut callReceivePacket() : call (* OutputStream.write(..))
+      &amp;&amp; withincode (* BlockReceiver.receivePacket(..))
+    // to further limit the application of this aspect a very narrow 'target' can be used as follows
+    // &amp;&amp; target(DataOutputStream)
+      &amp;&amp; !within(BlockReceiverAspects +);
+
+  before () throws IOException : callReceivePacket () {
+    if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
+      LOG.info("Before the injection point");
+      Thread.dumpStack();
+      throw new DiskOutOfSpaceException ("FI: injected fault point at " +
+      thisJoinPoint.getStaticPart( ).getSourceLocation());
+    }
+  }
+}
+      </source>
+      <p>
+        The aspect has two main parts: the join point
+        <code>pointcut callReceivepacket()</code>
+        which servers as an identification mark of a specific point (in control
+        and/or data flow) in the life of an application. A call to the advice -
+        <code>before () throws IOException : callReceivepacket()</code>
+        - will be
+        <a href="#Putting+it+all+together">injected</a>
+        before that specific spot of the application's code.
+      </p>
+
+      <p>The pointcut identifies an invocation of class'
+        <code>java.io.OutputStream write()</code>
+        method
+        with any number of parameters and any return type. This invoke should
+        take place within the body of method
+        <code>receivepacket()</code>
+        from class<code>BlockReceiver</code>.
+        The method can have any parameters and any return type. possible
+        invocations of
+        <code>write()</code>
+        method happening anywhere within the aspect
+        <code>BlockReceiverAspects</code>
+        or its heirs will be ignored.
+      </p>
+      <p><strong>Note 1</strong>: This short example doesn't illustrate
+        the fact that you can have more than a single injection point per
+        class. In such a case the names of the faults have to be different
+        if a developer wants to trigger them separately.
+      </p>
+      <p><strong>Note 2</strong>: After
+        <a href="#Putting+it+all+together">injection step</a>
+        you can verify that the faults were properly injected by
+        searching for
+        <code>ajc</code>
+        keywords in a disassembled class file.
+      </p>
+
+    </section>
+    
+    <section>
+      <title>Fault naming convention &amp; namespaces</title>
+      <p>For the sake of unified naming
+      convention the following two types of names are recommended for a
+      new aspects development:</p>
+      <ul>
+        <li>Activity specific notation (as
+          when we don't care about a particular location of a fault's
+          happening). In this case the name of the fault is rather abstract:
+          <code>fi.hdfs.DiskError</code>
+        </li>
+        <li>Location specific notation.
+          Here, the fault's name is mnemonic as in:
+          <code>fi.hdfs.datanode.BlockReceiver[optional location details]</code>
+        </li>
+      </ul>
+    </section>
+
+    <section>
+      <title>Development tools</title>
+      <ul>
+        <li>Eclipse
+          <a href="http://www.eclipse.org/ajdt/">AspectJ
+            Development Toolkit
+          </a>
+          might help you in the aspects' development
+          process.
+        </li>
+        <li>IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
+        </li>
+      </ul>
+    </section>
+
+    <section>
+      <title>Putting it all together</title>
+      <p>Faults (or aspects) have to injected (or woven) together before
+        they can be used. Here's a step-by-step instruction how this can be
+        done.</p>
+      <p>Weaving aspects in place:</p>
+      <source>
+% ant injectfaults
+      </source>
+      <p>If you
+        misidentified the join point of your aspect then you'll see a
+        warning similar to this one below when 'injectfaults' target is
+        completed:</p>
+        <source>
+[iajc] warning at
+src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
+          BlockReceiverAspects.aj:44::0
+advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
+has not been applied [Xlint:adviceDidNotMatch]
+        </source>
+      <p>It isn't an error, so the build will report the successful result.
+
+        To prepare dev.jar file with all your faults weaved in
+      place run (HDFS-475 pending)</p>
+        <source>
+% ant jar-fault-inject
+        </source>
+
+      <p>Test jars can be created by</p>
+        <source>
+% ant jar-test-fault-inject
+        </source>
+
+      <p>To run HDFS tests with faults injected:</p>
+        <source>
+% ant run-test-hdfs-fault-inject
+        </source>
+      <section>
+        <title>How to use fault injection framework</title>
+        <p>Faults could be triggered by the following two meanings:
+        </p>
+        <ul>
+          <li>In the runtime as:
+            <source>
+% ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
+            </source>
+            To set a certain level, e.g. 25%, of all injected faults one can run
+            <br/>
+            <source>
+% ant run-test-hdfs-fault-inject -Dfi.*=0.25
+            </source>
+          </li>
+          <li>or from a program as follows:
+          </li>
+        </ul>
+        <source>
+package org.apache.hadoop.fs;
+
+import org.junit.Test;
+import org.junit.Before;
+import junit.framework.TestCase;
+
+public class DemoFiTest extends TestCase {
+  public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+  @Override
+  @Before
+  public void setUp(){
+    //Setting up the test's environment as required
+  }
+
+  @Test
+  public void testFI() {
+    // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
+    //
+    // The main logic of your tests goes here
+    //
+    // Now set the level back to 0 (zero) to prevent this fault from happening again
+    System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
+    // or delete its trigger completely
+    System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
+  }
+
+  @Override
+  @After
+  public void tearDown() {
+    //Cleaning up test test environment
+  }
+}
+        </source>
+        <p>
+          as you can see above these two methods do the same thing. They are
+          setting the probability level of
+          <code>hdfs.datanode.BlockReceiver</code>
+          at 12%.
+          The difference, however, is that the program provides more
+          flexibility and allows to turn a fault off when a test doesn't need
+          it anymore.
+        </p>
+      </section>
+    </section>
+
+    <section>
+      <title>Additional information and contacts</title>
+      <p>This two sources of information seem to be particularly
+        interesting and worth further reading:
+      </p>
+      <ul>
+        <li>
+          <a href="http://www.eclipse.org/aspectj/doc/next/devguide/">
+            http://www.eclipse.org/aspectj/doc/next/devguide/
+          </a>
+        </li>
+        <li>AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
+        </li>
+      </ul>
+      <p>Should you have any farther comments or questions to the author
+        check
+        <a href="http://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>
+      </p>
+    </section>
+  </body>
+</document>

+ 3 - 0
src/docs/src/documentation/content/xdocs/site.xml

@@ -60,6 +60,9 @@ See http://forrest.apache.org/docs/linking.html for more info.
 		<hdfs_SLG        			label="Synthetic Load Generator Guide"  href="SLG_user_guide.html" />
 		<hdfs_SLG        			label="Synthetic Load Generator Guide"  href="SLG_user_guide.html" />
 		<hdfs_imageviewer						label="Offline Image Viewer Guide"	href="hdfs_imageviewer.html" />
 		<hdfs_imageviewer						label="Offline Image Viewer Guide"	href="hdfs_imageviewer.html" />
 		<hdfs_libhdfs   				label="C API libhdfs"         						href="libhdfs.html" /> 
 		<hdfs_libhdfs   				label="C API libhdfs"         						href="libhdfs.html" /> 
+                <docs label="Testing">
+                    <faultinject_framework              label="Fault Injection"                                                     href="faultinject_framework.html" />
+                </docs>
    </docs> 
    </docs> 
    
    
    <docs label="HOD">
    <docs label="HOD">

BIN
src/docs/src/documentation/resources/images/FI-framework.gif


BIN
src/docs/src/documentation/resources/images/FI-framework.odg