非常謝謝您的回覆!
已在 code 加入 context.getCounter("MY_GROUP","MY_COUNTER").increment(1);
有輸出 MY_COUNTER 數目
但結果與先前相同,會有斷線問題
關於記憶體
有調整conf內的 hadoop-env.sh
export HADOOP_HEAPSIZE=1024
若再調高 Log 會顯示
代碼:
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
程式內有加入 conf.set("mapred.child.java.opts","-Xmx1000m");
還需要在每個 Node 的 mapred-site.xml 加入
代碼:
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
內容嗎?
若原因是 Map 記憶體不足,是否應該會顯示 Error: Java heap space 等的錯誤中斷或 Log 內紀錄呢?關於/etc/hosts 已加入
代碼:
10.5.32.173 7900-PUB15
10.5.32.174 7900-PUB16
10.5.32.172 7900-PUB14
127.0.0.1 localhost
名稱設定與 Cygwin 下的 hostname 指令結果相符
一個 Job 跑 100 個 20~60M 的 txt 檔,反覆執行 100 次 Job
並非每次 Job 都會發生 node 離線問題,如:
前兩次 Job 很順利,無任何問題,1 Task 執行約10秒內
在第三次 Job 時 attempt_201304091538_0003_m_000017_0 Task 導致與 1 node 斷線
代碼:
2013-04-09 15:43:05,884 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201304091538_0003_m_000017_0' to tip task_201304091538_0003_m_000017, for tracker 'tracker_7900-PUB14:127.0.0.1/127.0.0.1:58424'
2013-04-09 15:45:20,987 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0003_m_000017_0 is 135102 ms debug.
2013-04-09 15:48:40,976 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0003_m_000017_0 is 335091 ms debug.
2013-04-09 15:52:00,965 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0003_m_000017_0 is 535080 ms debug.
2013-04-09 15:55:20,955 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201304091538_0003_m_000017_0' from 'tracker_7900-PUB14:127.0.0.1/127.0.0.1:58424'
2013-04-09 15:55:20,956 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0003_m_000017_0 is 735069 ms debug.
2013-04-09 15:55:20,956 INFO org.apache.hadoop.mapred.JobTracker: Launching task attempt_201304091538_0003_m_000017_0 timed out.
然後剩下 node 繼續順利執行 Job 至第九次 Job
代碼:
2013-04-09 16:03:47,022 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201304091538_0009_m_000046_0' to tip task_201304091538_0009_m_000046, for tracker 'tracker_7900-PUB16:127.0.0.1/127.0.0.1:62247'
2013-04-09 16:05:20,963 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0009_m_000046_0 is 93940 ms debug.
2013-04-09 16:08:40,926 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0009_m_000046_0 is 293903 ms debug.
2013-04-09 16:12:00,960 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0009_m_000046_0 is 493937 ms debug.
2013-04-09 16:15:20,965 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201304091538_0009_m_000046_0: Lost task tracker: tracker_7900-PUB16:127.0.0.1/127.0.0.1:62247
2013-04-09 16:15:20,965 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201304091538_0009_m_000046_0' from 'tracker_7900-PUB16:127.0.0.1/127.0.0.1:62247'
2013-04-09 16:15:20,965 INFO org.apache.hadoop.mapred.JobTracker: attempt_201304091538_0009_m_000046_0 is 693942 ms debug.
2013-04-09 16:15:20,965 INFO org.apache.hadoop.mapred.JobTracker: Launching task attempt_201304091538_0009_m_000046_0 timed out.
在與 1 node 斷線,如此到全部 node 斷線,Job 也不會中斷
想不懂為何相同的輸入有些 Job 可順利執行,但會在有些Job內的 Task 會發生問題
目前想嘗試在 Linux 下執行看看,或用 VM 虛擬 Linux 環境下執行
希望能解決此斷線問題