hadoop版本 : 2.2.0 , 2-node : hadoop01(nn,dn) hadoop02(dn)
我在hadoop用mahout跑fpg演算法,去分析一個1.2g的資料,最後觀察他的map reduce case分配到各台node發現一些問題,當他跑第一步演算法counting algorithm,他的map task全部都集中在hadoop01這台,第二步fp-growth algorithm的map task全部集中在第二台,想請問他是什麼原因沒有將task分散給兩台分別處理呢?
我有另外再架3-node : hadoop01(nn,dn) hadoop02(dn) hadoop03(dn)
它第一步演算法counting algorithm的情況是有分別分配到hadoop01,02
第二步fp-growth則是分配到hadoop02,03
想請問我上方敘述的分配狀況是正常的嗎? 如果是不正常的那請問該如何調才能讓task正常分配到各台,希望有人幫我解惑!
以下是我mapred-site.xml,yarn-site.xml文件基本配置:
mapred-site.xml:
代碼:
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Set MapReduce that job can submit to ResourceManager</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3276M</value>
</property>
yarn-site.xml:
代碼:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>