Taiwan Hadoop Forum http://forum.hadoop.tw/ |
|
请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 http://forum.hadoop.tw/viewtopic.php?f=7&t=38315 |
第 1 頁 (共 1 頁) |
發表人: | yantaiGao [ 2016-04-16, 11:10 ] |
文章主題 : | 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
在map中,需要将hdfs文件,追加到一个hbase表中,这个表中有数据,目标是根据hdfs中数据,更新这个hbase中数据 ,当hbase 表是空表时候没有问题,但是当hbase表中有数据时候,往其中更新数据,在程序运行之初会出这个一个问题: Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. |
發表人: | jazz [ 2016-04-17, 16:08 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
yantaiGao 寫: 在map中,需要将hdfs文件,追加到一个hbase表中,这个表中有数据,目标是根据hdfs中数据,更新这个hbase中数据 ,当hbase 表是空表时候没有问题,但是当hbase表中有数据时候,往其中更新数据,在程序运行之初会出这个一个问题: Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. 錯誤訊息只有提到沒有設定輸出目錄,不過原因很難猜。 其次,想透過 map() 將 hdfs 檔案追加倒 hbase 資料表,那該檔案是 HFile 格式嗎? 感覺還是透過 HBase API 會比較正規一點。 - Jazz |
發表人: | yantaiGao [ 2016-04-21, 15:52 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
job.setOutputFormatClass(TableOutputFormat.class); 为什么报错 如下: The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>) 注释掉这句话运行就报: Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. import 忽略 public class HdfsAppend2HbaseUtil extends Configured implements Tool{ public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{ public void map(Text ikey, Text ivalue, Context context) throws IOException, InterruptedException { //将新的追加到hbase中现存的数据中 String oldIdList = HBaseHelper.getValueByKey(ikey.toString()); StringBuffer sb = new StringBuffer(oldIdList); String newIdList = ivalue.toString(); sb.append("\t" + newIdList); Put p = new Put(ikey.toString().getBytes()); p.addColumn("idFam".getBytes(), "idsList".getBytes(), sb.toString().getBytes()); context.write(new ImmutableBytesWritable(), p); } } public int run(String[] paths) throws Exception { Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", "master,salve1"); conf.set("hbase.zookeeper.property.clientPort", "2181"); Job job = Job.getInstance(conf,"AppendToHbase"); job.setJarByClass(cn.edu.hadoop.util.HdfsAppend2HbaseUtil.class); job.setInputFormatClass(KeyValueTextInputFormat.class); job.setMapperClass(HdfsAdd2HbaseMapper.class); job.setNumReduceTasks(0); // job.setOutputFormatClass(TableOutputFormat.class); job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "reachableTable"); FileInputFormat.setInputPaths(job, new Path(paths[0])); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Put.class); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { System.out.println("Append Start: "); long time1 = System.currentTimeMillis(); long time2; String[] pathsStr = {Const.TwoDegreeReachableOutputPathDetail}; int exitCode = ToolRunner.run(new HdfsAppend2HbaseUtil(), pathsStr); time2 = System.currentTimeMillis(); System.out.println("Append Cost " + "\t" + (time2-time1)/1000 +" s"); System.exit(exitCode); } } jazz 寫: yantaiGao 寫: 在map中,需要将hdfs文件,追加到一个hbase表中,这个表中有数据,目标是根据hdfs中数据,更新这个hbase中数据 ,当hbase 表是空表时候没有问题,但是当hbase表中有数据时候,往其中更新数据,在程序运行之初会出这个一个问题: Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. 錯誤訊息只有提到沒有設定輸出目錄,不過原因很難猜。 其次,想透過 map() 將 hdfs 檔案追加倒 hbase 資料表,那該檔案是 HFile 格式嗎? 感覺還是透過 HBase API 會比較正規一點。 - Jazz |
發表人: | jazz [ 2016-04-21, 22:39 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
代碼: public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{ public void map(Text ikey, Text ivalue, Context context) throws IOException, InterruptedException { 根據這個語法,使用的是 org.apache.mapreduce.Mapper 也就是新版 MapReduce API 代碼: // job.setOutputFormatClass(TableOutputFormat.class); yantaiGao 寫: job.setOutputFormatClass(TableOutputFormat.class); 为什么报错 如下: 代碼: The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>) 錯誤訊息的可能原因是您引用了 org.apache.hadoop.hbase.mapred.TableOutputFormat 而非 org.apache.hadoop.hbase.mapreduce.TableOutputFormat 前者繼承自 org.apache.hadoop.mapred.FileOutputFormat<ImmutableBytesWritable,Put> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableOutputFormat.html 後者繼承自 org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html 錯誤訊息顯示的 <extends OutputFormat> 指的是後者。因此很可能您 import 的是前者。 - Jazz |
發表人: | yantaiGao [ 2016-05-04, 10:56 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
十分感謝您! 確實是這個問題導致的bug。 還想請教您一個問題是: 在mapper處理中,我想使用一個本文文件中的內容,這個文件是不是應該放到hdfs中,應該使用什麼方法來操作讀取呢? 再次十分感謝您!!! jazz 寫: 代碼: public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{ public void map(Text ikey, Text ivalue, Context context) throws IOException, InterruptedException { 根據這個語法,使用的是 org.apache.mapreduce.Mapper 也就是新版 MapReduce API 代碼: // job.setOutputFormatClass(TableOutputFormat.class); yantaiGao 寫: job.setOutputFormatClass(TableOutputFormat.class); 为什么报错 如下: 代碼: The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>) 錯誤訊息的可能原因是您引用了 org.apache.hadoop.hbase.mapred.TableOutputFormat 而非 org.apache.hadoop.hbase.mapreduce.TableOutputFormat 前者繼承自 org.apache.hadoop.mapred.FileOutputFormat<ImmutableBytesWritable,Put> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableOutputFormat.html 後者繼承自 org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html 錯誤訊息顯示的 <extends OutputFormat> 指的是後者。因此很可能您 import 的是前者。 - Jazz |
發表人: | yantaiGao [ 2016-05-04, 21:04 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
關於上一個我回復請教您的問題: 我使用了如下代碼: 代碼: import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ReadFileMapper extends Mapper<LongWritable, Text, Text, Text> { private Text newkeyText = new Text(); private Text newvalText = new Text(); public void map(LongWritable ikey, Text ivalue, Context context) throws IOException, InterruptedException { String[] lineStrArr = ivalue.toString().split(",",-1); newkeyText.set(lineStrArr[2]); //创建FileSystem对象 Configuration conf = context.getConfiguration(); FileSystem fs = FileSystem.get(conf); //getLocalCacheFiles()方法已经被弃 Path[] paths = context.getLocalCacheFiles(); System.out.println(paths[0].toString()); FSDataInputStream in = fs.open(paths[0]); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String lineString = ""; while (lineString == br.readLine()) { String[] strArray = lineString.split("\t"); if(strArray.length >= 0 && strArray != null){ newvalText.set(strArray[0]); context.write(newkeyText, newvalText); } } } } 但是卻報錯:若能賜教一二,不勝感激!!! 代碼: 2016-05-04 20:53:17,300 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.FileNotFoundException: File does not exist: /opt/hadoop_tmp/nm-local-dir/usercache/root/appcache/application_1462352222877_0003/container_1462352222877_0003_01_000005/pattern.txt at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at cn.edu.hadoop.test.ReadFileMapper.map(ReadFileMapper.java:34) at cn.edu.hadoop.test.ReadFileMapper.map(ReadFileMapper.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /opt/hadoop_tmp/nm-local-dir/usercache/root/appcache/application_1462352222877_0003/container_1462352222877_0003_01_000005/pattern.txt at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64) jazz 寫: 代碼: public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{ public void map(Text ikey, Text ivalue, Context context) throws IOException, InterruptedException { 根據這個語法,使用的是 org.apache.mapreduce.Mapper 也就是新版 MapReduce API 代碼: // job.setOutputFormatClass(TableOutputFormat.class); yantaiGao 寫: job.setOutputFormatClass(TableOutputFormat.class); 为什么报错 如下: 代碼: The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>) 錯誤訊息的可能原因是您引用了 org.apache.hadoop.hbase.mapred.TableOutputFormat 而非 org.apache.hadoop.hbase.mapreduce.TableOutputFormat 前者繼承自 org.apache.hadoop.mapred.FileOutputFormat<ImmutableBytesWritable,Put> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableOutputFormat.html 後者繼承自 org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html 錯誤訊息顯示的 <extends OutputFormat> 指的是後者。因此很可能您 import 的是前者。 - Jazz |
發表人: | jazz [ 2016-05-04, 22:07 ] |
文章主題 : | Re: 请教在Map中将hdfs文件追加到hbase已有的表中,报错问题 |
yantaiGao 寫: 關於上一個我回復請教您的問題: 我使用了如下代碼: 代碼: //创建FileSystem对象 Configuration conf = context.getConfiguration(); FileSystem fs = FileSystem.get(conf); //getLocalCacheFiles()方法已经被弃 Path[] paths = context.getLocalCacheFiles(); System.out.println(paths[0].toString()); FSDataInputStream in = fs.open(paths[0]); 但是卻報錯:若能賜教一二,不勝感激!!! 代碼: 2016-05-04 20:53:17,300 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.FileNotFoundException: File does not exist: /opt/hadoop_tmp/nm-local-dir/usercache/root/appcache/application_1462352222877_0003/container_1462352222877_0003_01_000005/pattern.txt Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /opt/hadoop_tmp/nm-local-dir/usercache/root/appcache/application_1462352222877_0003/container_1462352222877_0003_01_000005/pattern.txt at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64) 看起來錯在 fs.open() 原因是找不到 pattern.txt 從程式碼 getLocalCacheFiles() 初步懷疑您想要用 DistributedCache 將 pattern.txt 附帶到每個 mapper task 因為沒有 main() 所以無從推測 pattern.txt 是以 DistribuetdCache 加入或透過 GenericOptions 的 -file 方式附帶到 Job 建議您搜尋 DistributedCache 相關範例程式。 - Jazz |
第 1 頁 (共 1 頁) | 所有顯示的時間為 UTC + 8 小時 |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |