index.xml 1.4 KB

1234567891011121314151617181920212223242526272829303132
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <document xmlns="http://maven.apache.org/XDOC/2.0"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
  5. <head>
  6. <title>DistCp</title>
  7. </head>
  8. <body>
  9. <section name="Overview">
  10. <p>
  11. DistCp (distributed copy) is a tool used for large inter/intra-cluster
  12. copying. It uses Map/Reduce to effect its distribution, error
  13. handling and recovery, and reporting. It expands a list of files and
  14. directories into input to map tasks, each of which will copy a partition
  15. of the files specified in the source list.
  16. </p>
  17. <p>
  18. The erstwhile implementation of DistCp has its share of quirks and
  19. drawbacks, both in its usage, as well as its extensibility and
  20. performance. The purpose of the DistCp refactor was to fix these shortcomings,
  21. enabling it to be used and extended programmatically. New paradigms have
  22. been introduced to improve runtime and setup performance, while simultaneously
  23. retaining the legacy behaviour as default.
  24. </p>
  25. <p>
  26. This document aims to describe the design of the new DistCp, its spanking
  27. new features, their optimal use, and any deviance from the legacy
  28. implementation.
  29. </p>
  30. </section>
  31. </body>
  32. </document>