根據 MapReduce 的設計,Mapper 結束後會經過 sort by key 跟 shuffle 的流程。
map(K1,V1) -> (K2,V2) ->
sort by key and shuffle -> reduce(K2,
list(V2)) -> (K3, V3)
(the,1), (fox,1), (faster,1), (than,1), (the,1), (dog,1)
-> sort by key and shuffle ->
(dog,{1}), (faster, {1}), (fox,{1}), (the,{1,1})
----------------
("Square","Red"), ("Circle","Yellow"),("Square","Yellow"),("Trangle","Red"),("square","Green")
-> sort by key and shuffle ->
("Circle",{"Yellow"}),("Square",{"Red","Yellow"}),("square",{"Green"}),("Trangle",{"Red"})
我想答案很清楚,我就不明講了~
- Jazz
jerryc9855 寫:
您好,
最近在準備CCD-410的考試,有以下兩個問題需要請教,因為在網路上做了research,每個人的說法都不太相同,有的人說同一個Key只會傳一次,有的說都會傳到reduce,我個人是覺得都會傳,可是還是不太確定,可否麻煩看一下,感激不盡!
1. You have the following key-value pairs as output from your Map task:
(the,1)
(fox,1)
(faster,1)
(than,1)
(the,1)
(dog,1)
How many keys will be passed to the Reducer's reduce method ?
2. You have written a Mapper which invokes the following calls to the outputcollector.collect():
output.collect(new Text("Square"),new Text("Red");
output.collect(new Text("Circle"),new Text("Yellow");
output.collect(new Text("Square"),new Text("Yellow");
output.collect(new Text("Trangle"),new Text("Red");
output.collect(new Text("square"),new Text("Green");
How many times it is going to call reduce method ?