Spark-HBASE 错误 java.lang.IllegalStateException:未读块数据
Spark-HBASE Error java.lang.IllegalStateException: unread block data
我正在尝试使用 jersey Rest-API 通过 java-Spark 程序从 HBASE table 获取记录然后我收到下面提到的错误但是当我通过 spark-Jar 访问 HBase-table 然后代码执行无误。
我有一个 Hbase 的 2 个工作节点和 spark 的 2 个工作节点,它们由同一个 Master 维护。
WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.31.16.140): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
好的,我可能知道你的问题,因为我刚刚经历过。
原因很可能是漏掉了一些hbase jar,因为spark运行时,spark需要通过hbase jar来读取数据,如果不存在,就会抛出一些异常,怎么办?很简单
在提交作业之前,您需要添加参数 --jars 并在下面加入一些 jar:
--罐子
/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
可以的话,尽情享受吧!
我在 CDH5.4.0 中提交使用 java api 实现的 spark 作业时遇到了同样的问题,这是我的解决方案:
解决方案1:Using spark-submit:
--jars zookeeper-3.4.5-cdh5.4.0.jar,
hbase-client-1.0.0-cdh5.4.0.jar,
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors
解决方案2:Use代码中的SparkConf:
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});
总结
问题是由于spark项目中缺少jar引起的,你需要将这些jar添加到你的项目类路径中,此外,使用以上2种解决方案来帮助将这些jar分发到你的spark集群中。
CDP/CDH:
第一步:将hbase-site.xml文件复制到/etc/spark/conf/目录下。
cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml /etc/spark/conf/
第 2 步:将以下库添加到 spark-submit/spark-shell。
/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
Spark-shell:
sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml
我正在尝试使用 jersey Rest-API 通过 java-Spark 程序从 HBASE table 获取记录然后我收到下面提到的错误但是当我通过 spark-Jar 访问 HBase-table 然后代码执行无误。
我有一个 Hbase 的 2 个工作节点和 spark 的 2 个工作节点,它们由同一个 Master 维护。
WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.31.16.140): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
好的,我可能知道你的问题,因为我刚刚经历过。
原因很可能是漏掉了一些hbase jar,因为spark运行时,spark需要通过hbase jar来读取数据,如果不存在,就会抛出一些异常,怎么办?很简单
在提交作业之前,您需要添加参数 --jars 并在下面加入一些 jar:
--罐子
/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
可以的话,尽情享受吧!
我在 CDH5.4.0 中提交使用 java api 实现的 spark 作业时遇到了同样的问题,这是我的解决方案:
解决方案1:Using spark-submit:
--jars zookeeper-3.4.5-cdh5.4.0.jar,
hbase-client-1.0.0-cdh5.4.0.jar,
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors
解决方案2:Use代码中的SparkConf:
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});
总结
问题是由于spark项目中缺少jar引起的,你需要将这些jar添加到你的项目类路径中,此外,使用以上2种解决方案来帮助将这些jar分发到你的spark集群中。
CDP/CDH:
第一步:将hbase-site.xml文件复制到/etc/spark/conf/目录下。 cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml /etc/spark/conf/
第 2 步:将以下库添加到 spark-submit/spark-shell。
/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
Spark-shell:
sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml