Taiwan Hadoop Forum

台灣 Hadoop 技術討論區
現在的時間是 2022-06-25, 21:25

所有顯示的時間為 UTC + 8 小時




發表新文章 回覆主題  [ 6 篇文章 ] 
發表人 內容
 文章主題 : Hadoop 0.20.1 叢集安裝失敗(已解決,感謝jazz前輩指導)
文章發表於 : 2010-02-26, 22:50 
離線

註冊時間: 2009-11-25, 15:24
文章: 4
各位前輩們好:

小弟這陣子試著開始學習安裝Hadoop

目前在單機的環境下都可以順利運行,但是一旦要進行多台電腦的叢集運行就一直不成功

基本上我是參考這邊http://trac.nchc.org.tw/cloud/wiki/Hadoop_Lab7的教學

但是到最後都會得到Datanodes available: 0 (0 total, 0 dead)的結果

目前使用的電腦是Ubuntu 9.10 和 Kubuntu 9.10

兩者一樣都使用hadoop 0.20.1

我有節錄部份我執行時的結果如下

希望有經驗的人可以指導一下,非常感謝您

希望可以說明詳細一點,小弟是新手還蠻多地方不懂的

如有冒犯,請多包含
=====================================================
marvin@cluster02:/opt/hadoop$ bin/hadoop namenode -format
10/02/26 22:18:02 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = cluster02/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.1
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/ ... 0.20.1-rc1 -r 810220; compiled by 'oom' on Tue Sep 1 20:55:56 UTC 2009
************************************************************/
10/02/26 22:18:02 INFO namenode.FSNamesystem: fsOwner=marvin,marvin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
10/02/26 22:18:02 INFO namenode.FSNamesystem: supergroup=supergroup
10/02/26 22:18:02 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/02/26 22:18:02 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/02/26 22:18:02 INFO common.Storage: Storage directory /tmp/hadoop/hadoop-marvin/dfs/name has been successfully formatted.
10/02/26 22:18:02 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cluster02/127.0.0.1
************************************************************/
marvin@cluster02:/opt/hadoop$ bin/start-all.sh
starting namenode, logging to /tmp/hadoop/logs/hadoop-marvin-namenode-cluster02.out
cluster01: starting datanode, logging to /tmp/hadoop/logs/hadoop-marvin-datanode-cluster01.out
cluster02: starting secondarynamenode, logging to /tmp/hadoop/logs/hadoop-marvin-secondarynamenode-cluster02.out
starting jobtracker, logging to /tmp/hadoop/logs/hadoop-marvin-jobtracker-cluster02.out
cluster01: starting tasktracker, logging to /tmp/hadoop/logs/hadoop-marvin-tasktracker-cluster01.out

marvin@cluster02:/opt/hadoop$ jps
3971 SecondaryNameNode
3802 NameNode
4045 JobTracker
4157 Jps

marvin@cluster02:/opt/hadoop$ bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)


最後由 Marvin 於 2010-03-01, 20:19 編輯,總共編輯了 1 次。

回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Hadoop 0.20.1 叢集安裝失敗
文章發表於 : 2010-02-27, 01:13 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
Marvin 寫:
marvin@cluster02:/opt/hadoop$ bin/hadoop namenode -format
marvin@cluster02:/opt/hadoop$ bin/start-all.sh
starting namenode, logging to /tmp/hadoop/logs/hadoop-marvin-namenode-cluster02.out
cluster01: starting datanode, logging to /tmp/hadoop/logs/hadoop-marvin-datanode-cluster01.out
cluster02: starting secondarynamenode, logging to /tmp/hadoop/logs/hadoop-marvin-secondarynamenode-cluster02.out
starting jobtracker, logging to /tmp/hadoop/logs/hadoop-marvin-jobtracker-cluster02.out
cluster01: starting tasktracker, logging to /tmp/hadoop/logs/hadoop-marvin-tasktracker-cluster01.out


從以上資料,您是在 cluster02 產生 NameNode、Secondary NameNode、JobTracker,而 cluster01 上跑 DataNode 跟 TaskTracker。

Marvin 寫:
marvin@cluster02:/opt/hadoop$ jps
3971 SecondaryNameNode
3802 NameNode
4045 JobTracker
4157 Jps


可否在 cluster01 上也下 jps 看是否有 DataNode 跟 TaskTracker?

其次,http://trac.nchc.org.tw/cloud/wiki/Hadoop_Lab7 目前記載的是 0.18.3 的設定,
0.20.1 的設定檔由 0.18.3 的兩個設定檔改成三個設定檔。

* 編輯 /opt/hadoop/conf/core-site.xml

代碼:
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://cluster02:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop/hadoop-${user.name}</value>
  </property>
</configuration>


* 編輯 hdfs-site.xml

代碼:
    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
         <name>dfs.permissions</name>
         <value>false</value>
      </property>
    </configuration>


* 編輯 mapred-site.xml

代碼:
    <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>cluster02:9001</value>
      </property>
    </configuration>


請提供:

1. cluster01 在 start-all.sh 後的 jps

marvin@cluster01:/opt/hadoop$ jps

2. cluster01 的/tmp/hadoop/logs/hadoop-marvin-datanode-cluster01.log

比較容易除錯。

要注意如果 core-site.xml 跟 mapred-site.xml 中如果寫的是 cluster01,請在 cluster01 中執行 start-all.sh,且 slaves 應寫為 cluster02,否則設定是剛好相反的,這個錯誤在我們實作課程中,學員經常發生。


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Hadoop 0.20.1 叢集安裝失敗
文章發表於 : 2010-02-28, 10:53 
離線

註冊時間: 2009-11-25, 15:24
文章: 4
非常感謝jazz前輩的指導

小弟我之後在依照您的指示重新做了一次,改變配置如下
以cluster01當namenode、datanode、jobtracker、tasktracker
而cluster02當datanode、tasktracker

conf/core-site.xml hdfs-site.xml mapred-site.xml依照您的指示修改完成,主機為cluster01

至於conf/masters我是改成cluster01
conf/slaves則是改成
代碼:
cluster01
cluster02


設定完成後,在cluster01執行start-all.sh,結果如下:
代碼:
marvin@cluster01:/opt/hadoop$ jps
10735 DataNode
10953 JobTracker
22028 Jps
10590 NameNode
11086 TaskTracker
10879 SecondaryNameNode

代碼:
marvin@cluster01:/opt/hadoop$ bin/hadoop dfsadmin -report
Configured Capacity: 240433397760 (223.92 GB)
Present Capacity: 224568414208 (209.15 GB)
DFS Remaining: 224568377344 (209.15 GB)
DFS Used: 36864 (36 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 240433397760 (223.92 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 15864983552 (14.78 GB)
DFS Remaining: 224568377344(209.15 GB)
DFS Used%: 0%
DFS Remaining%: 93.4%
Last contact: Sun Feb 28 10:42:46 CST 2010

cluster02則是如下:
代碼:
marvin@cluster02:~$ jps
9497 TaskTracker
10243 Jps
9378 DataNode

hadoop-marvin-datanode-cluster02.log的內容如下
代碼:
2010-02-28 00:00:00,674 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 8 time(s).
2010-02-28 00:00:01,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 9 time(s).
2010-02-28 00:00:01,676 INFO org.apache.hadoop.ipc.RPC: Server at cluster01/120.107.172.112:9000 not available yet, Zzzzz...
2010-02-28 00:00:03,677 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 0 time(s).
2010-02-28 00:00:04,677 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 1 time(s).
2010-02-28 00:00:05,677 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 2 time(s).
2010-02-28 00:00:06,678 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 3 time(s).
2010-02-28 00:00:07,678 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 4 time(s).
2010-02-28 00:00:08,679 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 5 time(s).
2010-02-28 00:00:09,679 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 6 time(s).
2010-02-28 00:00:10,680 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 7 time(s).
2010-02-28 00:00:11,680 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 8 time(s).
2010-02-28 00:00:12,681 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 9 time(s).
2010-02-28 00:00:12,681 INFO org.apache.hadoop.ipc.RPC: Server at cluster01/120.107.172.112:9000 not available yet, Zzzzz...
2010-02-28 00:00:14,682 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 0 time(s).
2010-02-28 00:00:15,682 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 1 time(s).
2010-02-28 00:00:16,683 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 2 time(s).
2010-02-28 00:00:17,683 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 3 time(s).
2010-02-28 00:00:18,684 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 4 time(s).
2010-02-28 00:00:19,684 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 5 time(s).
2010-02-28 00:00:20,684 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 6 time(s).
2010-02-28 00:00:21,685 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 7 time(s).
2010-02-28 00:00:22,685 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 8 time(s).
2010-02-28 00:00:23,686 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 9 time(s).
2010-02-28 00:00:23,686 INFO org.apache.hadoop.ipc.RPC: Server at cluster01/120.107.172.112:9000 not available yet, Zzzzz...
2010-02-28 00:00:25,687 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 0 time(s).
2010-02-28 00:00:26,687 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 1 time(s).
2010-02-28 00:00:27,688 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 2 time(s).
2010-02-28 00:00:28,688 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 3 time(s).
2010-02-28 00:00:29,689 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 4 time(s).
2010-02-28 00:00:30,689 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 5 time(s).
2010-02-28 00:00:31,690 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 6 time(s).
2010-02-28 00:00:32,690 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 7 time(s).
2010-02-28 00:00:33,691 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 8 time(s).
2010-02-28 00:00:34,691 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 9 time(s).
2010-02-28 00:00:34,692 INFO org.apache.hadoop.ipc.RPC: Server at cluster01/120.107.172.112:9000 not available yet, Zzzzz...

以下重複如上結果


還煩請前輩們指導一下小弟看問題出在哪邊,該如何解決
謝謝


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Hadoop 0.20.1 叢集安裝失敗
文章發表於 : 2010-03-01, 15:49 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
從 hadoop-marvin-datanode-cluster02.log 的內容,
代碼:
2010-02-28 00:00:00,674 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 8 time(s).
2010-02-28 00:00:01,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cluster01/120.107.172.112:9000. Already tried 9 time(s).
2010-02-28 00:00:01,676 INFO org.apache.hadoop.ipc.RPC: Server at cluster01/120.107.172.112:9000 not available yet, Zzzzz...


合理的懷疑有二:

1. 網路環境有防火牆阻擋 - 某些 linux distrobution 會預設開啟防火牆,因此會造成 cluster02 要連 120.107.172.112:9000 連不到
2. cluster01 binding 的 IP 與 cluster02 嘗試連的 IP 不同。

請在 cluster01 上下
代碼:
$ netstat -nap | grep 9000


應該會有類似以下的訊息
代碼:
tcp6       0      0 120.107.172.112:9000    :::*                    LISTEN      10590/java


倘若出現
代碼:
tcp6       0      0 127.0.0.1:9000    :::*                    LISTEN      10590/java


代表 cluster01 的 conf/core-site.xml 中,設錯設到 localhost
代碼:
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop/hadoop-${user.name}</value>
  </property>
</configuration>


正確應該 cluster01 跟 cluster02 的 conf/core-site.xml 都必須是

代碼:
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://cluster01:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop/hadoop-${user.name}</value>
  </property>
</configuration>


且 cluster01 與 cluster02 的 /etc/hosts 都必須是

代碼:
120.107.172.112 cluster01
120.107.172.XXX cluster02


我從 cluster01 看到的 datanode 回報,IP 位址似乎不太正確

代碼:
marvin@cluster01:/opt/hadoop$ bin/hadoop dfsadmin -report

... 略 ...

Name: 127.0.0.1:50010


理論上應該是

代碼:
Name: 120.107.172.112:50010


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Hadoop 0.20.1 叢集安裝失敗
文章發表於 : 2010-03-01, 19:53 
離線

註冊時間: 2009-11-25, 15:24
文章: 4
照著jazz前輩的指導去做後

發現真的是hosts裡面設定錯了 :oops:

不過一開始修正好之後執行bin/hadoop dfsadmin -report 會發現沒有找到

讓我一度以為我又失敗了

不過隔了一會再次執行就出現成功訊息了 :D
代碼:
marvin@cluster01:/opt/hadoop$ bin/hadoop dfsadmin -report                                       
Configured Capacity: 480509091840 (447.51 GB)                                                   
Present Capacity: 449187934208 (418.34 GB)                                                     
DFS Remaining: 449187872768 (418.34 GB)                                                         
DFS Used: 61440 (60 KB)                                                                         
DFS Used%: 0%                                                                                   
Under replicated blocks: 0                                                                     
Blocks with corrupt replicas: 0                                                                 
Missing blocks: 0                                                                               

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)         

Name: 120.107.172.113:50010
Decommission Status : Normal
Configured Capacity: 240075694080 (223.59 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 15473942528 (14.41 GB)
DFS Remaining: 224601726976(209.18 GB)
DFS Used%: 0%
DFS Remaining%: 93.55%
Last contact: Mon Mar 01 19:46:37 CST 2010


Name: 120.107.172.112:50010
Decommission Status : Normal
Configured Capacity: 240433397760 (223.92 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 15847215104 (14.76 GB)
DFS Remaining: 224586145792(209.16 GB)
DFS Used%: 0%
DFS Remaining%: 93.41%
Last contact: Mon Mar 01 19:46:35 CST 2010


真的很感謝jazz前輩的教學
沒有放棄我這個新手 :D

小弟終於可以試著跑跑看範例程式了 8-)


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Hadoop 0.20.1 叢集安裝失敗
文章發表於 : 2010-03-02, 20:10 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
Marvin 寫:
不過一開始修正好之後執行bin/hadoop dfsadmin -report 會發現沒有找到
讓我一度以為我又失敗了
不過隔了一會再次執行就出現成功訊息了 :D


目前我們發現 0.20.1 在啟動 HDFS 時會有多 30 秒的時間,
也就是若你 stop-all.sh 再 start-all.sh 的話,
會在 http://namenode:50070 頁面看到進入 Safe Mode 的訊息,
要等待 30 秒才會解除 Safe Mode。

所以要稍微有耐心一點喔....另外就是要養成看 log 的習慣。

用 less /tmp/*.log 然後按大寫 F,就會進入等待模式,已有更新就會馬上秀在最後一行。

代碼:
Waiting for data... (interrupt to abort)


要離開按 CTRL+C 就可以恢復正常 less 模式。

- Jazz


回頂端
 個人資料 E-mail  
 
顯示文章 :  排序  
發表新文章 回覆主題  [ 6 篇文章 ] 

所有顯示的時間為 UTC + 8 小時


誰在線上

正在瀏覽這個版面的使用者:沒有註冊會員 和 5 位訪客


不能 在這個版面發表主題
不能 在這個版面回覆主題
不能 在這個版面編輯您的文章
不能 在這個版面刪除您的文章
不能 在這個版面上傳附加檔案

搜尋:
前往 :  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
正體中文語系由 竹貓星球 維護製作