Taiwan Hadoop Forum

台灣 Hadoop 技術討論區
現在的時間是 2020-01-20, 12:04

所有顯示的時間為 UTC + 8 小時




發表新文章 回覆主題  [ 5 篇文章 ] 
發表人 內容
 文章主題 : 问一个distributedCache的问题
文章發表於 : 2014-02-07, 11:56 
離線

註冊時間: 2013-10-20, 16:48
文章: 11
看书上的distributedCache可以方便的将文件放到hdfs,然后每个node可以方便的取出来文件,可是现在遇到问题,估计是放入文件和取出文件的路径有问题,导致了NullPointerException,请高手们帮忙看看“放入”和“取出”的代码应该怎么修改一下?谢谢了。(ps,MP算法不重要,不用看它实现了什么功能)

hadoop 0.20.2\伪分布式环境。

放入的文件格式如下:
1:1,1
2:1,2
3:1,3
.....

in、out、distributedCache文件的路径都是如hdfs://localhost:9000/user/root/in/1/Matrix的格式。

代码如下
package hdp_t1;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.Hashtable;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Matrix {


public static class MatrixMapper extends Mapper<LongWritable, Text, IntWritable, Text>
{

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{

String[] lines=value.toString().split(":", 2);
context.write(new IntWritable(Integer.parseInt(lines[0])), new Text(lines[1]));
}
}


public static class MatrixReducer extends Reducer<IntWritable, Text, Text, IntWritable>
{
//record the local matrix
private Hashtable<IntWritable,String> localMatrix=new Hashtable<IntWritable,String>();

public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (Text val : values)
{
String line1=val.toString();
String[] lines1=line1.split(",");
String line2=localMatrix.get(key);
String[] lines2=line2.split(",");
for(int i=0;i<lines1.length;i++)
{
sum+=Integer.parseInt( lines1[i] ) * Integer.parseInt( lines2[i] );
}

}
context.write(new Text(key.toString()), new IntWritable(sum));
}

public void setup(Context context)
{
try {
Path [] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
// URI [] cacheFiles = DistributedCache.getCacheFiles(context.getConfiguration());

if(null != cacheFiles && cacheFiles.length > 0){
String line;
String[] tokens;
BufferedReader br = new BufferedReader(new FileReader(cacheFiles[0].toString()));
try{
while((line = br.readLine()) != null){
tokens = line.split(":", 2);
localMatrix.put(new IntWritable( Integer.parseInt(tokens[0]) ), tokens[1].toString());

}
}finally{
br.close();
}
}
} catch (IOException e) {
e.printStackTrace();
}
}

}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

if (args.length != 3) {
System.err.println("ERROR!!");
System.exit(2);
}

Job job = new Job(conf, "Matrix");
job.setJarByClass(Matrix.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
DistributedCache.addCacheFile(new Path(args[0]).toUri(), job.getConfiguration());
// DistributedCache.addCacheFile(new URI("hdfs://localhost:9000/user/root/in/1/Matrix"), job.getConfiguration());


job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: 问一个distributedCache的问题
文章發表於 : 2014-02-08, 17:30 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
可以請教您的跑法嗎?我用同一個程式碼,執行上並沒有問題耶。

- Jazz

代碼:
jazz@vmm:~/my_code$ cat in
1:1,1
2:1,2
3:1,3

jazz@vmm:~/my_code$ hadoop fs -put in in

jazz@vmm:~/my_code$ hadoop fs -ls
Found 2 items
-rw-r--r--   1 jazz supergroup         18 2014-02-08 17:24 /user/jazz/in
drwxr-xr-x   - jazz supergroup          0 2014-02-08 17:24 /user/jazz/tmp

jazz@vmm:~/my_code$ jar vtf WordCount.jar
     0 Sat Feb 08 16:48:08 CST 2014 META-INF/
   106 Sat Feb 08 16:48:06 CST 2014 META-INF/MANIFEST.MF
  2239 Sat Feb 08 16:48:04 CST 2014 Matrix$MatrixMapper.class
  4503 Sat Feb 08 16:48:04 CST 2014 Matrix$MatrixReducer.class
  2089 Sat Feb 08 16:48:04 CST 2014 Matrix.class

jazz@vmm:~/my_code$ hadoop jar WordCount.jar Matrix /user/jazz/in in out
14/02/08 17:24:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/08 17:24:53 INFO input.FileInputFormat: Total input paths to process : 1
14/02/08 17:24:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/08 17:24:53 WARN snappy.LoadSnappy: Snappy native library not loaded
14/02/08 17:24:54 INFO mapred.JobClient: Running job: job_201402081644_0001
14/02/08 17:24:55 INFO mapred.JobClient:  map 0% reduce 0%
14/02/08 17:25:09 INFO mapred.JobClient:  map 100% reduce 0%
14/02/08 17:25:21 INFO mapred.JobClient:  map 100% reduce 100%
14/02/08 17:25:26 INFO mapred.JobClient: Job complete: job_201402081644_0001
14/02/08 17:25:26 INFO mapred.JobClient: Counters: 29
14/02/08 17:25:26 INFO mapred.JobClient:   Job Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Launched reduce tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11813
14/02/08 17:25:26 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/08 17:25:26 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/08 17:25:26 INFO mapred.JobClient:     Launched map tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     Data-local map tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10004
14/02/08 17:25:26 INFO mapred.JobClient:   File Output Format Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Bytes Written=13
14/02/08 17:25:26 INFO mapred.JobClient:   FileSystemCounters
14/02/08 17:25:26 INFO mapred.JobClient:     FILE_BYTES_READ=36
14/02/08 17:25:26 INFO mapred.JobClient:     HDFS_BYTES_READ=117
14/02/08 17:25:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44527
14/02/08 17:25:26 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=13
14/02/08 17:25:26 INFO mapred.JobClient:   File Input Format Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Bytes Read=18
14/02/08 17:25:26 INFO mapred.JobClient:   Map-Reduce Framework
14/02/08 17:25:26 INFO mapred.JobClient:     Map output materialized bytes=36
14/02/08 17:25:26 INFO mapred.JobClient:     Map input records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/02/08 17:25:26 INFO mapred.JobClient:     Spilled Records=6
14/02/08 17:25:26 INFO mapred.JobClient:     Map output bytes=24
14/02/08 17:25:26 INFO mapred.JobClient:     CPU time spent (ms)=2820
14/02/08 17:25:26 INFO mapred.JobClient:     Total committed heap usage (bytes)=401997824
14/02/08 17:25:26 INFO mapred.JobClient:     Combine input records=0
14/02/08 17:25:26 INFO mapred.JobClient:     SPLIT_RAW_BYTES=99
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce input records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce input groups=3
14/02/08 17:25:26 INFO mapred.JobClient:     Combine output records=0
14/02/08 17:25:26 INFO mapred.JobClient:     Physical memory (bytes) snapshot=354582528
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce output records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1147809792
14/02/08 17:25:26 INFO mapred.JobClient:     Map output records=3


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: 问一个distributedCache的问题
文章發表於 : 2014-02-09, 14:39 
離線

註冊時間: 2013-10-20, 16:48
文章: 11
jazz 寫:
可以請教您的跑法嗎?我用同一個程式碼,執行上並沒有問題耶。

- Jazz

代碼:
jazz@vmm:~/my_code$ cat in
1:1,1
2:1,2
3:1,3

jazz@vmm:~/my_code$ hadoop fs -put in in

jazz@vmm:~/my_code$ hadoop fs -ls
Found 2 items
-rw-r--r--   1 jazz supergroup         18 2014-02-08 17:24 /user/jazz/in
drwxr-xr-x   - jazz supergroup          0 2014-02-08 17:24 /user/jazz/tmp

jazz@vmm:~/my_code$ jar vtf WordCount.jar
     0 Sat Feb 08 16:48:08 CST 2014 META-INF/
   106 Sat Feb 08 16:48:06 CST 2014 META-INF/MANIFEST.MF
  2239 Sat Feb 08 16:48:04 CST 2014 Matrix$MatrixMapper.class
  4503 Sat Feb 08 16:48:04 CST 2014 Matrix$MatrixReducer.class
  2089 Sat Feb 08 16:48:04 CST 2014 Matrix.class

jazz@vmm:~/my_code$ hadoop jar WordCount.jar Matrix /user/jazz/in in out
14/02/08 17:24:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/08 17:24:53 INFO input.FileInputFormat: Total input paths to process : 1
14/02/08 17:24:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/08 17:24:53 WARN snappy.LoadSnappy: Snappy native library not loaded
14/02/08 17:24:54 INFO mapred.JobClient: Running job: job_201402081644_0001
14/02/08 17:24:55 INFO mapred.JobClient:  map 0% reduce 0%
14/02/08 17:25:09 INFO mapred.JobClient:  map 100% reduce 0%
14/02/08 17:25:21 INFO mapred.JobClient:  map 100% reduce 100%
14/02/08 17:25:26 INFO mapred.JobClient: Job complete: job_201402081644_0001
14/02/08 17:25:26 INFO mapred.JobClient: Counters: 29
14/02/08 17:25:26 INFO mapred.JobClient:   Job Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Launched reduce tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11813
14/02/08 17:25:26 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/08 17:25:26 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/08 17:25:26 INFO mapred.JobClient:     Launched map tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     Data-local map tasks=1
14/02/08 17:25:26 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10004
14/02/08 17:25:26 INFO mapred.JobClient:   File Output Format Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Bytes Written=13
14/02/08 17:25:26 INFO mapred.JobClient:   FileSystemCounters
14/02/08 17:25:26 INFO mapred.JobClient:     FILE_BYTES_READ=36
14/02/08 17:25:26 INFO mapred.JobClient:     HDFS_BYTES_READ=117
14/02/08 17:25:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44527
14/02/08 17:25:26 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=13
14/02/08 17:25:26 INFO mapred.JobClient:   File Input Format Counters
14/02/08 17:25:26 INFO mapred.JobClient:     Bytes Read=18
14/02/08 17:25:26 INFO mapred.JobClient:   Map-Reduce Framework
14/02/08 17:25:26 INFO mapred.JobClient:     Map output materialized bytes=36
14/02/08 17:25:26 INFO mapred.JobClient:     Map input records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/02/08 17:25:26 INFO mapred.JobClient:     Spilled Records=6
14/02/08 17:25:26 INFO mapred.JobClient:     Map output bytes=24
14/02/08 17:25:26 INFO mapred.JobClient:     CPU time spent (ms)=2820
14/02/08 17:25:26 INFO mapred.JobClient:     Total committed heap usage (bytes)=401997824
14/02/08 17:25:26 INFO mapred.JobClient:     Combine input records=0
14/02/08 17:25:26 INFO mapred.JobClient:     SPLIT_RAW_BYTES=99
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce input records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce input groups=3
14/02/08 17:25:26 INFO mapred.JobClient:     Combine output records=0
14/02/08 17:25:26 INFO mapred.JobClient:     Physical memory (bytes) snapshot=354582528
14/02/08 17:25:26 INFO mapred.JobClient:     Reduce output records=3
14/02/08 17:25:26 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1147809792
14/02/08 17:25:26 INFO mapred.JobClient:     Map output records=3


您好,我打包成jar跑也没有问题。之前我是在eclipse中安装了MapReduce插件,用伪分布式环境来做的。
就有空指针异常了。
main函数的三个agrs参数如下
hdfs://forhadoop:9000/user/root/in/1/Matrix
hdfs://forhadoop:9000/user/root/in/1
hdfs://forhadoop:9000/user/root/out/1
我的是ubuntu12.04,etc/下的networks和hosts增加了forhadoop。


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: 问一个distributedCache的问题
文章發表於 : 2014-02-09, 14:52 
離線

註冊時間: 2013-10-20, 16:48
文章: 11
如果我在conf中做如下的设置
代碼:
        Configuration conf = new Configuration();
       
        conf.set("mapred.job.tracker", "hdfs://forhadoop:9001");
        conf.set("fs.default.name", "hdfs://forhadoop:9000");


打包jar后./hadoop jar也是能够执行成功看到结果的。但是在eclipse还是不能执行成功,并且现在报错变了!
我猜想现在应该是能够读出文件了吧,但是找不到mapper了?
报错如下
代碼:
14/02/09 14:34:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/09 14:34:45 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/02/09 14:34:45 INFO input.FileInputFormat: Total input paths to process : 1
14/02/09 14:34:45 INFO mapred.JobClient: Running job: job_201402082237_0013
14/02/09 14:34:46 INFO mapred.JobClient:  map 0% reduce 0%
14/02/09 14:34:56 INFO mapred.JobClient: Task Id : attempt_201402082237_0013_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
   at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
   ... 4 more

14/02/09 14:35:02 INFO mapred.JobClient: Task Id : attempt_201402082237_0013_m_000000_1, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
   at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
   ... 4 more

14/02/09 14:35:08 INFO mapred.JobClient: Task Id : attempt_201402082237_0013_m_000000_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
   at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
   ... 4 more

14/02/09 14:35:17 INFO mapred.JobClient: Job complete: job_201402082237_0013
14/02/09 14:35:17 INFO mapred.JobClient: Counters: 3
14/02/09 14:35:17 INFO mapred.JobClient:   Job Counters
14/02/09 14:35:17 INFO mapred.JobClient:     Launched map tasks=4
14/02/09 14:35:17 INFO mapred.JobClient:     Data-local map tasks=4
14/02/09 14:35:17 INFO mapred.JobClient:     Failed map tasks=1


为何呢?


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: 问一个distributedCache的问题
文章發表於 : 2014-02-18, 12:44 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
java.lang.ClassNotFoundException: hdp_t1.Matrix_old$MatrixMapper

一般就 CLASSPATH 設定有問題,在 Eclipse 專案中,就得看您的 CLASSPATH 設定是否有包含所需要的 class、jar 檔了。

- Jazz


回頂端
 個人資料 E-mail  
 
顯示文章 :  排序  
發表新文章 回覆主題  [ 5 篇文章 ] 

所有顯示的時間為 UTC + 8 小時


誰在線上

正在瀏覽這個版面的使用者:沒有註冊會員 和 2 位訪客


不能 在這個版面發表主題
不能 在這個版面回覆主題
不能 在這個版面編輯您的文章
不能 在這個版面刪除您的文章
不能 在這個版面上傳附加檔案

搜尋:
前往 :  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
正體中文語系由 竹貓星球 維護製作