Taiwan Hadoop Forum http://forum.hadoop.tw/ |
|
Mahout 運行 Kmeans http://forum.hadoop.tw/viewtopic.php?f=7&t=38317 |
第 1 頁 (共 1 頁) |
發表人: | seasky [ 2016-04-20, 12:16 ] |
文章主題 : | Mahout 運行 Kmeans |
各位好 請問運行mahout kmeans的時候能否關掉最後一個dump out clusters的步驟嗎? 因為我的點有25M個點~125M個點,我可以dump 1萬的點可是無法dump大資料,請用是否有方法將dump out關掉? 謝謝。 以下是最後的訊息,顯示OutOfMemoryError 代碼: INFO kmeans.Job: Dumping out clusters from clusters: Mahout25M/clusters-*-final and clusteredPoints: Mahout25M/clusteredPoints
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.mahout.math.map.OpenIntDoubleHashMap.setUp(OpenIntDoubleHashMap.java:553) at org.apache.mahout.math.map.OpenIntDoubleHashMap.<init>(OpenIntDoubleHashMap.java:89) at org.apache.mahout.math.map.OpenIntDoubleHashMap.<init>(OpenIntDoubleHashMap.java:75) at org.apache.mahout.math.RandomAccessSparseVector.<init>(RandomAccessSparseVector.java:47) at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:109) at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:89) at org.apache.mahout.clustering.classify.WeightedVectorWritable.readFields(WeightedVectorWritable.java:56) at org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable.readFields(WeightedPropertyVectorWritable.java:56) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2181) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2309) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:101) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$5.hasNext(Iterators.java:543) at com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:43) at org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:311) at org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:262) at org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:92) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:141) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:95) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) |
發表人: | jazz [ 2016-04-21, 23:09 ] |
文章主題 : | Re: Mahout 運行 Kmeans |
seasky 寫: 各位好 請問運行mahout kmeans的時候能否關掉最後一個dump out clusters的步驟嗎? 因為我的點有25M個點~125M個點,我可以dump 1萬的點可是無法dump大資料,請用是否有方法將dump out關掉? 謝謝。 以下是最後的訊息,顯示OutOfMemoryError 代碼: INFO kmeans.Job: Dumping out clusters from clusters: Mahout25M/clusters-*-final and clusteredPoints: Mahout25M/clusteredPoints Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 單機執行?錯誤訊息是記憶體不足。 想關掉 dump out clusters ..... 沒相關經驗,也只能找 Mahout 的書看有沒有參數可以控制, 否則就只能往原始碼查了。 如果是多台,可以調多 reduce 個數,或調大 HEAPSIZE 參數。 - Jazz |
第 1 頁 (共 1 頁) | 所有顯示的時間為 UTC + 8 小時 |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |