LibHdfs.apt.vm 3.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. C API libhdfs
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. C API libhdfs
  18. %{toc|section=1|fromDepth=0}
  19. * Overview
  20. libhdfs is a JNI based C API for Hadoop's Distributed File System
  21. (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate
  22. HDFS files and the filesystem. libhdfs is part of the Hadoop
  23. distribution and comes pre-compiled in
  24. <<<${HADOOP_PREFIX}/libhdfs/libhdfs.so>>> .
  25. * The APIs
  26. The libhdfs APIs are a subset of: {{{hadoop fs APIs}}}.
  27. The header file for libhdfs describes each API in detail and is
  28. available in <<<${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h>>>
  29. * A Sample Program
  30. ----
  31. \#include "hdfs.h"
  32. int main(int argc, char **argv) {
  33. hdfsFS fs = hdfsConnect("default", 0);
  34. const char* writePath = "/tmp/testfile.txt";
  35. hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
  36. if(!writeFile) {
  37. fprintf(stderr, "Failed to open %s for writing!\n", writePath);
  38. exit(-1);
  39. }
  40. char* buffer = "Hello, World!";
  41. tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
  42. if (hdfsFlush(fs, writeFile)) {
  43. fprintf(stderr, "Failed to 'flush' %s\n", writePath);
  44. exit(-1);
  45. }
  46. hdfsCloseFile(fs, writeFile);
  47. }
  48. ----
  49. * How To Link With The Library
  50. See the Makefile for <<<hdfs_test.c>>> in the libhdfs source directory
  51. (<<<${HADOOP_PREFIX}/src/c++/libhdfs/Makefile>>>) or something like:
  52. <<<gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample>>>
  53. * Common Problems
  54. The most common problem is the <<<CLASSPATH>>> is not set properly when
  55. calling a program that uses libhdfs. Make sure you set it to all the
  56. Hadoop jars needed to run Hadoop itself. Currently, there is no way to
  57. programmatically generate the classpath, but a good bet is to include
  58. all the jar files in <<<${HADOOP_PREFIX}>>> and <<<${HADOOP_PREFIX}/lib>>> as well
  59. as the right configuration directory containing <<<hdfs-site.xml>>>
  60. * Thread Safe
  61. libdhfs is thread safe.
  62. * Concurrency and Hadoop FS "handles"
  63. The Hadoop FS implementation includes a FS handle cache which
  64. caches based on the URI of the namenode along with the user
  65. connecting. So, all calls to <<<hdfsConnect>>> will return the same
  66. handle but calls to <<<hdfsConnectAsUser>>> with different users will
  67. return different handles. But, since HDFS client handles are
  68. completely thread safe, this has no bearing on concurrency.
  69. * Concurrency and libhdfs/JNI
  70. The libhdfs calls to JNI should always be creating thread local
  71. storage, so (in theory), libhdfs should be as thread safe as the
  72. underlying calls to the Hadoop FS.