top 寫:
如同 jazz 所說:
DistCp Guide 提到 每個 TaskTracker 會去執行 NN_A 到 NN_B 的 copy 動作.
那至於 NN_B 如何知道 有從NN_A 來的 distcp file.... 就不得而知了.
在 Distcp.java source code 中, 沒看到有跟 NN 交互的 code.
另人匪疑..... XDD
我做了一個實驗。網路架構有點小複雜。
附加檔案:
hadoop distcp.png [ 27.53 KiB | 被瀏覽 4570 次 ]
從網路連線的情形看起來
是執行 distcp 那台去跟 DataNode A 要資料,然後寫到 DataNode B
我觀察到的順序是
圖中 NN-A 先出現 192.168.125.254 連線,
然後 NN-B 接著出現 192.168.125.254 連線,
然後圖中 DN-A 出現
tcp6 0 0 192.168.125.4:50010 192.168.125.254:53613 ESTABLISHED 9634/java
最後圖中 DN-B 出現
tcp6 0 0 192.168.125.6:50010 192.168.125.254:49606 ESTABLISHED 9632/java
代碼:
jazz@Wdebian:~$ hadoop fs -ls hdfs://192.168.125.3:9000/user/jazz
Found 2 items
-rw-r--r-- 3 jazz supergroup 209715200 2013-10-01 14:01 /user/jazz/200mb.img
drwxr-xr-x - jazz supergroup 0 2013-10-01 13:31 /user/jazz/input
jazz@Wdebian:~$ hadoop fs -ls hdfs://192.168.125.5:9000/user/jazz
Found 1 items
drwxr-xr-x - jazz supergroup 0 2013-10-01 14:42 /user/jazz/tmp
jazz@Wdebian:~$ hadoop distcp -p -update "hdfs://192.168.125.3:9000/user/jazz" "hdfs://192.168.125.5:9000/user/jazz"
13/10/01 14:43:45 INFO tools.DistCp: srcPaths=[hdfs://192.168.125.3:9000/user/jazz]
13/10/01 14:43:45 INFO tools.DistCp: destPath=hdfs://192.168.125.5:9000/user/jazz
13/10/01 14:43:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/10/01 14:43:46 INFO tools.DistCp: sourcePathsCount=19
13/10/01 14:43:46 INFO tools.DistCp: filesToCopyCount=17
13/10/01 14:43:46 INFO tools.DistCp: bytesToCopyCount=200.0m
13/10/01 14:43:46 INFO mapred.JobClient: Running job: job_local_0001
13/10/01 14:43:46 INFO util.ProcessTree: setsid exited with exit code 0
13/10/01 14:43:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@79038de7
13/10/01 14:43:46 INFO mapred.MapTask: numReduceTasks: 0
13/10/01 14:43:47 INFO mapred.JobClient: map 0% reduce 0%
13/10/01 14:43:52 INFO mapred.LocalJobRunner: 13.06 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 26.1m / 200.0m ]
13/10/01 14:43:55 INFO mapred.LocalJobRunner: 19.25 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 38.5m / 200.0m ]
13/10/01 14:43:58 INFO mapred.LocalJobRunner: 26.94 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 53.9m / 200.0m ]
13/10/01 14:44:01 INFO mapred.LocalJobRunner: 34.88 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 69.8m / 200.0m ]
13/10/01 14:44:04 INFO mapred.LocalJobRunner: 42.75 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 85.5m / 200.0m ]
13/10/01 14:44:07 INFO mapred.LocalJobRunner: 50.94 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 101.9m / 200.0m ]
13/10/01 14:44:10 INFO mapred.LocalJobRunner: 57.81 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 115.6m / 200.0m ]
13/10/01 14:44:13 INFO mapred.LocalJobRunner: 65.63 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 131.2m / 200.0m ]
13/10/01 14:44:16 INFO mapred.LocalJobRunner: 73.63 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 147.2m / 200.0m ]
13/10/01 14:44:19 INFO mapred.LocalJobRunner: 81.88 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 163.8m / 200.0m ]
13/10/01 14:44:22 INFO mapred.LocalJobRunner: 89.13 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 178.2m / 200.0m ]
13/10/01 14:44:25 INFO mapred.LocalJobRunner: 97.38 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 194.8m / 200.0m ]
13/10/01 14:44:28 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/10/01 14:44:28 INFO mapred.LocalJobRunner: 97.38 hdfs://192.168.125.5:9000/user/jazz/200mb.img [ 194.8m / 200.0m ]
13/10/01 14:44:28 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
13/10/01 14:44:28 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to hdfs://192.168.125.5:9000/user/jazz/_distcp_logs_ubgwmh
13/10/01 14:44:28 INFO mapred.LocalJobRunner: Copied: 17 Skipped: 0 Failed: 0
13/10/01 14:44:28 INFO mapred.LocalJobRunner: Copied: 17 Skipped: 0 Failed: 0
13/10/01 14:44:28 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/10/01 14:44:29 INFO mapred.JobClient: map 100% reduce 0%
13/10/01 14:44:29 INFO mapred.JobClient: Job complete: job_local_0001
13/10/01 14:44:29 INFO mapred.JobClient: Counters: 18
13/10/01 14:44:29 INFO mapred.JobClient: File Input Format Counters
13/10/01 14:44:29 INFO mapred.JobClient: Bytes Read=2830
13/10/01 14:44:29 INFO mapred.JobClient: File Output Format Counters
13/10/01 14:44:29 INFO mapred.JobClient: Bytes Written=0
13/10/01 14:44:29 INFO mapred.JobClient: distcp
13/10/01 14:44:29 INFO mapred.JobClient: Files copied=17
13/10/01 14:44:29 INFO mapred.JobClient: Bytes expected=209742092
13/10/01 14:44:29 INFO mapred.JobClient: Bytes copied=209742092
13/10/01 14:44:29 INFO mapred.JobClient: FileSystemCounters
13/10/01 14:44:29 INFO mapred.JobClient: FILE_BYTES_READ=297040
13/10/01 14:44:29 INFO mapred.JobClient: HDFS_BYTES_READ=209742092
13/10/01 14:44:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=339205
13/10/01 14:44:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=209742092
13/10/01 14:44:29 INFO mapred.JobClient: Map-Reduce Framework
13/10/01 14:44:29 INFO mapred.JobClient: Map input records=18
13/10/01 14:44:29 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/10/01 14:44:29 INFO mapred.JobClient: Spilled Records=0
13/10/01 14:44:29 INFO mapred.JobClient: Total committed heap usage (bytes)=113770496
13/10/01 14:44:29 INFO mapred.JobClient: CPU time spent (ms)=0
13/10/01 14:44:29 INFO mapred.JobClient: Map input bytes=2698
13/10/01 14:44:29 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/10/01 14:44:29 INFO mapred.JobClient: SPLIT_RAW_BYTES=143
13/10/01 14:44:29 INFO mapred.JobClient: Map output records=0
- Jazz