错误 TableInputFormat:Java.lang.NullPointerException 在 org.Apache.Hadoop.hbase.TableName.valueOf
ERROR TableInputFormat: Java.lang.NullPointerException at org.Apache.Hadoop.hbase.TableName.valueOf
我正在尝试使用 Spark 从 HBase 读取数据。我正在使用的版本是
星火 1.3.1 和 Hbase 1.1.1。我收到以下错误
ERROR TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
at org.apache.spark.rdd.RDD$$anonfun$dependencies.apply(RDD.scala:206)
at org.apache.spark.rdd.RDD$$anonfun$dependencies.apply(RDD.scala:204)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:204)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scal
代码如下
public static void main( String[] args )
{
String TABLE_NAME = "Hello";
HTable table=null;
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("Data Reader").setMaster("local[1]");
sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)");
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
Configuration hbConf = HBaseConfiguration.create();
hbConf.set("zookeeper.znode.parent", "/hbase-unsecure");
try {
table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));
} catch (IOException e) {
e.printStackTrace();
}
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sparkContext
.newAPIHadoopRDD(
hbConf,
TableInputFormat.class,
org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);
hBaseRDD.coalesce(1, true);
System.out.println("Count "+hBaseRDD.count());
//.saveAsTextFile("hBaseRDD");
try {
table.close();
sparkContext.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
我无法解决问题。为此,我正在使用 Hortonworks Sandbox。
你写了:
try {
table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));
} catch (IOException e) {
e.printStackTrace();
}
如果您使用的是 1.1.1 api:
在devapidocs中我只能看到两个构造函数:
protected HTable(ClusterConnection conn, BufferedMutatorParams params)
For internal testing.
protected HTable(TableName tableName, ClusterConnection connection,
TableConfiguration tableConfig, RpcRetryingCallerFactory
rpcCallerFactory, RpcControllerFactory rpcControllerFactory,
ExecutorService pool) Creates an object to access a HBase table.
第一个构造函数params的构造函数是:BufferedMutatorParams(TableName tableName)
并且 TableName 没有构造函数。
所以你必须像这样初始化你的 HTable:
table = new HTable(hbConf, new bufferedMutatorParams(TableName.valueOf(TABLE_NAME))
如果您正在使用 0.94 API:
HTBale的构造函数是:
HTable(byte[] tableName, HConnection connection) Creates an object to
access a HBase table. HTable(byte[] tableName, HConnection connection,
ExecutorService pool) Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, byte[] tableName)
Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, byte[] tableName,
ExecutorService pool) Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, String tableName)
Creates an object to access a HBase table.
所以,看最后一个,你只需要传递字符串名称而不是字节[]
table = new HTable(hbConf, TABLE_NAME);
应该没问题。
我正在尝试使用 Spark 从 HBase 读取数据。我正在使用的版本是 星火 1.3.1 和 Hbase 1.1.1。我收到以下错误
ERROR TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
at org.apache.spark.rdd.RDD$$anonfun$dependencies.apply(RDD.scala:206)
at org.apache.spark.rdd.RDD$$anonfun$dependencies.apply(RDD.scala:204)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:204)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scal
代码如下
public static void main( String[] args )
{
String TABLE_NAME = "Hello";
HTable table=null;
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("Data Reader").setMaster("local[1]");
sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)");
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
Configuration hbConf = HBaseConfiguration.create();
hbConf.set("zookeeper.znode.parent", "/hbase-unsecure");
try {
table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));
} catch (IOException e) {
e.printStackTrace();
}
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sparkContext
.newAPIHadoopRDD(
hbConf,
TableInputFormat.class,
org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);
hBaseRDD.coalesce(1, true);
System.out.println("Count "+hBaseRDD.count());
//.saveAsTextFile("hBaseRDD");
try {
table.close();
sparkContext.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
我无法解决问题。为此,我正在使用 Hortonworks Sandbox。
你写了:
try {
table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));
} catch (IOException e) {
e.printStackTrace();
}
如果您使用的是 1.1.1 api:
在devapidocs中我只能看到两个构造函数:
protected HTable(ClusterConnection conn, BufferedMutatorParams params) For internal testing.
protected HTable(TableName tableName, ClusterConnection connection, TableConfiguration tableConfig, RpcRetryingCallerFactory rpcCallerFactory, RpcControllerFactory rpcControllerFactory, ExecutorService pool) Creates an object to access a HBase table.
第一个构造函数params的构造函数是:BufferedMutatorParams(TableName tableName)
并且 TableName 没有构造函数。
所以你必须像这样初始化你的 HTable:
table = new HTable(hbConf, new bufferedMutatorParams(TableName.valueOf(TABLE_NAME))
如果您正在使用 0.94 API:
HTBale的构造函数是:
HTable(byte[] tableName, HConnection connection) Creates an object to access a HBase table. HTable(byte[] tableName, HConnection connection, ExecutorService pool) Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, byte[] tableName) Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, byte[] tableName, ExecutorService pool) Creates an object to access a HBase table.
HTable(org.apache.hadoop.conf.Configuration conf, String tableName) Creates an object to access a HBase table.
所以,看最后一个,你只需要传递字符串名称而不是字节[]
table = new HTable(hbConf, TABLE_NAME);
应该没问题。