好幾個 ERROR 都是跟接收到 SIGTERM signal 有關。
一般來說 SIGTERM 是被作業系統 kill 掉
如果實驗環境還沒關掉,請提供 dmesg 的結果。初步懷疑記憶體不足,所以被 Linux Kernel 視為使用太大量的 process 而強制 kill 掉。
問題:ARM 平台的記憶體大小 - 由於預設的 HEAPSIZE 是 1GB,所以如果沒有調整,整個 Hadoop 跑起來又跑 YARN Job 的話,會要求 4~8GB 的記憶體。相信除非您是用 ARM Server,否則不會有這樣的記憶體大小。
- Jazz
代碼:
hadoop-hduser-datanode-arm1604.log:2016-09-23 16:00:24,854 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
hadoop-hduser-namenode-arm1604.log:2016-09-23 16:00:24,853 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
hadoop-hduser-secondarynamenode-arm1604.log:2016-09-23 16:00:24,853 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: RECEIVED SIGNAL 15: SIGTERM
yarn-hduser-nodemanager-arm1604.log:2016-09-23 16:00:24,853 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
yarn-hduser-nodemanager-arm1604.log:2016-09-23 16:00:25,002 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
yarn-hduser-resourcemanager-arm1604.log:2016-09-23 16:00:24,853 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
yarn-hduser-resourcemanager-arm1604.log:2016-09-23 16:00:24,873 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
yarn-hduser-resourcemanager-arm1604.log:2016-09-23 16:00:25,004 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
yarn-hduser-resourcemanager-arm1604.log:2016-09-23 16:00:25,042 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException
yarn-hduser-resourcemanager-arm1604.log:2016-09-23 16:00:25,046 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
shiyeh 寫:
Hi all,
我是在arm (cavium) 平台上面跑 hadoop-2.7.1
OS: ubuntu 16.04
kernel: 4.4.0-generic
java ver: 1.8.0_101
在執行 TestDFSIO 時,還沒跑完服務就被關閉並退出
例如: $ hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
我查log 看起來像是java引起的問題?
所有log如附檔。
感謝~~