README 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
  1. #
  2. # Licensed to the Apache Software Foundation (ASF) under one or more
  3. # contributor license agreements. See the NOTICE file distributed with
  4. # this work for additional information regarding copyright ownership.
  5. # The ASF licenses this file to You under the Apache License, Version 2.0
  6. # (the "License"); you may not use this file except in compliance with
  7. # the License. You may obtain a copy of the License at
  8. #
  9. # http://www.apache.org/licenses/LICENSE-2.0
  10. #
  11. # Unless required by applicable law or agreed to in writing, software
  12. # distributed under the License is distributed on an "AS IS" BASIS,
  13. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. # See the License for the specific language governing permissions and
  15. # limitations under the License.
  16. #
  17. Fuse-DFS
  18. Fuse-DFS allows HDFS to be mounted as a local file system.
  19. It currently supports reads, writes, and directory operations (e.g., cp, ls, more, cat, find, less, rm, mkdir, mv, rmdir, touch, chmod, chown and permissions). Random access writing is not supported.
  20. Contributing
  21. It's pretty straightforward to add functionality to fuse-dfs as fuse makes things relatively simple. Some other tasks require also augmenting libhdfs to expose more hdfs functionality to C. See [https://issues.apache.org/jira/issues/?jql=text%20~%20%22fuse-dfs%22 fuse-dfs JIRAs]
  22. Requirements
  23. * Hadoop with compiled libhdfs.so
  24. * Linux kernel > 2.6.9 with fuse, which is the default or Fuse 2.7.x, 2.8.x installed. See: [http://fuse.sourceforge.net/]
  25. * modprobe fuse to load it
  26. * fuse_dfs executable (see below)
  27. * fuse_dfs_wrapper.sh installed in /bin or other appropriate location (see below)
  28. BUILDING
  29. fuse-dfs executable can be built by setting `require.fuse` option to true using Maven. For example:
  30. in HADOOP_HOME: `mvn package -Pnative -Drequire.fuse=true -DskipTests -Dmaven.javadoc.skip=true`
  31. The executable `fuse_dfs` will be located at HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/
  32. Common build problems include not finding the libjvm.so in JAVA_HOME/jre/lib/OS_ARCH/server or not finding fuse in FUSE_HOME or /usr/local.
  33. CONFIGURING
  34. fuse_dfs_wrapper.sh may not work out of box. To use it, look at all the paths in fuse_dfs_wrapper.sh and either correct them or set them in your environment before running. (note for automount and mount as root, you probably cannot control the environment, so best to set them in the wrapper)
  35. INSTALLING
  36. 1. `mkdir /export/hdfs` (or wherever you want to mount it)
  37. 2. `fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /export/hdfs -odebug` and from another terminal, try `ls /export/hdfs`
  38. If 2 works, try again dropping the debug mode, i.e., -debug
  39. (note - common problems are that you don't have libhdfs.so or libjvm.so or libfuse.so on your LD_LIBRARY_PATH, and your CLASSPATH does not contain hadoop and other required jars.)
  40. Also note, fuse-dfs will write error/warn messages to the syslog - typically in /var/log/messages
  41. You can use fuse-dfs to mount multiple hdfs instances by just changing the server/port name and directory mount point above.
  42. DEPLOYING
  43. in a root shell do the following:
  44. 1. add the following to /etc/fstab
  45. fuse_dfs#dfs://hadoop_server.foo.com:9000 /export/hdfs fuse -oallow_other,rw,-ousetrash,-oinitchecks 0 0
  46. 2. Mount using: `mount /export/hdfs`. Expect problems with not finding fuse_dfs. You will need to probably add this to /sbin and then problems finding the above 3 libraries. Add these using ldconfig.
  47. Fuse DFS takes the following mount options (i.e., on the command line or the comma separated list of options in /etc/fstab:
  48. -oserver=%s (optional place to specify the server but in fstab use the format above)
  49. -oport=%d (optional port see comment on server option)
  50. -oentry_timeout=%d (how long directory entries are cached by fuse in seconds - see fuse docs)
  51. -oattribute_timeout=%d (how long attributes are cached by fuse in seconds - see fuse docs)
  52. -oprotected=%s (a colon separated list of directories that fuse-dfs should not allow to be deleted or moved - e.g., /user:/tmp)
  53. -oprivate (not often used but means only the person who does the mount can use the filesystem - aka ! allow_others in fuse speak)
  54. -ordbuffer=%d (in KBs how large a buffer should fuse-dfs use when doing hdfs reads)
  55. ro
  56. rw
  57. -ousetrash (should fuse dfs throw things in /Trash when deleting them)
  58. -onotrash (opposite of usetrash)
  59. -odebug (do not daemonize - aka -d in fuse speak)
  60. -obig_writes (use fuse big_writes option so as to allow better performance of writes on kernels >= 2.6.26)
  61. -initchecks - have fuse-dfs try to connect to hdfs to ensure all is ok upon startup. recommended to have this on
  62. -omax_background=%d (maximum number of pending "background" requests - see fuse docs)
  63. The defaults are:
  64. entry,attribute_timeouts = 60 seconds
  65. rdbuffer = 10 MB
  66. protected = null
  67. debug = 0
  68. notrash
  69. private = 0
  70. EXPORTING
  71. Add the following to /etc/exports:
  72. /export/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)
  73. NOTE - you cannot export this with a FUSE module built into the kernel
  74. - e.g., kernel 2.6.17. For info on this, refer to the FUSE wiki.
  75. RECOMMENDATIONS
  76. 1. From /bin, `ln -s HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs* .`
  77. 2. Always start with debug on so you can see if you are missing a classpath or something like that.
  78. 3. use -obig_writes
  79. 4. use -initchecks
  80. KNOWN ISSUES
  81. 1. if you alias `ls` to `ls --color=auto` and try listing a directory with lots (over thousands) of files, expect it to be slow and at 10s of thousands, expect it to be very very slow. This is because `--color=auto` causes ls to stat every file in the directory. Since fuse-dfs does not cache attribute entries when doing a readdir,
  82. this is very slow. see [https://issues.apache.org/jira/browse/HADOOP-3797 HADOOP-3797]
  83. 2. Writes are approximately 33% slower than the DFSClient. TBD how to optimize this. see: [https://issues.apache.org/jira/browse/HADOOP-3805 HADOOP-3805] - try using -obig_writes if on a >2.6.26 kernel, should perform much better since bigger writes implies less context switching.
  84. 3. Reads are ~20-30% slower even with the read buffering.