README.txt 1.5 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
  1. *************************************************
  2. *** Input Files (Note: tab-separated columns) ***
  3. *************************************************
  4. [:~]$ cat datajoin/input/A
  5. A.a11 A.a12
  6. A.a21 A.a22
  7. B.a21 A.a32
  8. A.a31 A.a32
  9. B.a31 A.a32
  10. [:~]$ cat datajoin/input/B
  11. A.a11 B.a12
  12. A.a11 B.a13
  13. B.a11 B.a12
  14. B.a21 B.a22
  15. A.a31 B.a32
  16. B.a31 B.a32
  17. *****************************
  18. *** Invoke SampleDataJoin ***
  19. *****************************
  20. [:~]$ $HADOOP_PREFIX/bin/hadoop jar hadoop-datajoin-examples.jar org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input datajoin/output Text 1 org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
  21. Using TextInputFormat: Text
  22. Using TextOutputFormat: Text
  23. 07/06/01 19:58:23 INFO mapred.FileInputFormat: Total input paths to process : 2
  24. Job job_kkzk08 is submitted
  25. Job job_kkzk08 is still running.
  26. 07/06/01 19:58:24 INFO mapred.LocalJobRunner: collectedCount 5
  27. totalCount 5
  28. 07/06/01 19:58:24 INFO mapred.LocalJobRunner: collectedCount 6
  29. totalCount 6
  30. 07/06/01 19:58:24 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
  31. 07/06/01 19:58:24 INFO mapred.LocalJobRunner: actuallyCollectedCount 5
  32. collectedCount 7
  33. groupCount 6
  34. > reduce
  35. *******************
  36. *** Output File ***
  37. *******************
  38. [:~]$ cat datajoin/output/part-00000
  39. A.a11 A.a12 B.a12
  40. A.a11 A.a12 B.a13
  41. A.a31 A.a32 B.a32
  42. B.a21 A.a32 B.a22
  43. B.a31 A.a32 B.a32