1234567891011121314151617181920212223242526272829303132 |
- <?xml version="1.0" encoding="UTF-8"?>
- <document xmlns="http://maven.apache.org/XDOC/2.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
- <head>
- <title>DistCp</title>
- </head>
- <body>
- <section name="Overview">
- <p>
- DistCp (distributed copy) is a tool used for large inter/intra-cluster
- copying. It uses Map/Reduce to effect its distribution, error
- handling and recovery, and reporting. It expands a list of files and
- directories into input to map tasks, each of which will copy a partition
- of the files specified in the source list.
- </p>
- <p>
- The erstwhile implementation of DistCp has its share of quirks and
- drawbacks, both in its usage, as well as its extensibility and
- performance. The purpose of the DistCp refactor was to fix these shortcomings,
- enabling it to be used and extended programmatically. New paradigms have
- been introduced to improve runtime and setup performance, while simultaneously
- retaining the legacy behaviour as default.
- </p>
- <p>
- This document aims to describe the design of the new DistCp, its spanking
- new features, their optimal use, and any deviance from the legacy
- implementation.
- </p>
- </section>
- </body>
- </document>
|