Taiwan Hadoop Forum
http://forum.hadoop.tw/

Mahout 運行 Kmeans
http://forum.hadoop.tw/viewtopic.php?f=7&t=38317
1 頁 (共 1 頁)

發表人:  seasky [ 2016-04-20, 12:16 ]
文章主題 :  Mahout 運行 Kmeans

各位好

請問運行mahout kmeans的時候能否關掉最後一個dump out clusters的步驟嗎?
因為我的點有25M個點~125M個點,我可以dump 1萬的點可是無法dump大資料,請用是否有方法將dump out關掉?
謝謝。

以下是最後的訊息,顯示OutOfMemoryError
代碼:
INFO kmeans.Job: Dumping out clusters from clusters: Mahout25M/clusters-*-final and clusteredPoints: Mahout25M/clusteredPoints
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
   at org.apache.mahout.math.map.OpenIntDoubleHashMap.setUp(OpenIntDoubleHashMap.java:553)
   at org.apache.mahout.math.map.OpenIntDoubleHashMap.<init>(OpenIntDoubleHashMap.java:89)
   at org.apache.mahout.math.map.OpenIntDoubleHashMap.<init>(OpenIntDoubleHashMap.java:75)
   at org.apache.mahout.math.RandomAccessSparseVector.<init>(RandomAccessSparseVector.java:47)
   at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:109)
   at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:89)
   at org.apache.mahout.clustering.classify.WeightedVectorWritable.readFields(WeightedVectorWritable.java:56)
   at org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable.readFields(WeightedPropertyVectorWritable.java:56)
   at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2309)
   at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:101)
   at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:40)
   at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
   at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
   at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543)
   at com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:43)
   at org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:311)
   at org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:262)
   at org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:92)
   at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:141)
   at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:95)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:54)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

發表人:  jazz [ 2016-04-21, 23:09 ]
文章主題 :  Re: Mahout 運行 Kmeans

seasky 寫:
各位好
請問運行mahout kmeans的時候能否關掉最後一個dump out clusters的步驟嗎?
因為我的點有25M個點~125M個點,我可以dump 1萬的點可是無法dump大資料,請用是否有方法將dump out關掉?
謝謝。
以下是最後的訊息,顯示OutOfMemoryError
代碼:
INFO kmeans.Job: Dumping out clusters from clusters: Mahout25M/clusters-*-final and clusteredPoints: Mahout25M/clusteredPoints
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space


單機執行?錯誤訊息是記憶體不足。

想關掉 dump out clusters .....
沒相關經驗,也只能找 Mahout 的書看有沒有參數可以控制,
否則就只能往原始碼查了。

如果是多台,可以調多 reduce 個數,或調大 HEAPSIZE 參數。

- Jazz

1 頁 (共 1 頁) 所有顯示的時間為 UTC + 8 小時
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/