Cassandra 上的 Spark 运行 由于 ClassNotFoundException 失败:com.datastax.spark.connector.rdd.partitioner.CassandraPartition(内有详细信息)
Spark running on Cassandra fails due to ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition (details inside)
我正在使用 spark 2.0.0
(本地独立)和 spark-cassandra-connector 2.0.0-M1
与 scala 2.11
。我正在 IDE 上做一个项目,每次我 运行 spark 命令我都会得到
ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的build.sbt文件
ibraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0-M1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
所以本质上这是一条错误消息
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 13, 192.168.0.12): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
问题是如果我 运行 spark shell 与 spark-cassandra-connector
$ ./spark-shell --jars /home/Applications/spark-2.0.0-bin-hadoop2.7/spark-cassandra-connector-assembly-2.0.0-M1-22-gab4eda2.jar
我可以在没有错误消息的情况下使用 spark 和 cassandra。
关于如何解决这种奇怪的不兼容问题的任何线索?
编辑:
这很有趣,从工作节点的角度来看,当我 运行 一个程序时,连接器给出
`java.io.InvalidClassException: com.datastax.spark.connector.rdd.CassandraTableScanRDD; local class incompatible: stream classdesc serialVersionUID = 1517205208424539072, local class serialVersionUID = 6631934706192455668`
这就是最终给出 ClassNotFound 的原因(它不会因为冲突而绑定)。但是项目只用过spark and connector 2.0
和scala 2.11
,没有版本不兼容的地方。
在 Spark 中,仅仅因为您针对库进行构建并不意味着它将包含在运行时类路径中。如果您在
中添加
--jars /home/Applications/spark-2.0.0-bin-hadoop2.7/spark-cassandra-connector-assembly-2.0.0-M1-22-gab4eda2.jar
对于您的应用程序的 spark-submit,它将在运行时和所有远程 jvms 上包含所有这些必要的库。
基本上您看到的是,在第一个示例中,none 连接器库位于运行时类路径中,在 spark-shell 示例中它们是。
我正在使用 spark 2.0.0
(本地独立)和 spark-cassandra-connector 2.0.0-M1
与 scala 2.11
。我正在 IDE 上做一个项目,每次我 运行 spark 命令我都会得到
ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的build.sbt文件
ibraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0-M1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
所以本质上这是一条错误消息
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 13, 192.168.0.12): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
问题是如果我 运行 spark shell 与 spark-cassandra-connector
$ ./spark-shell --jars /home/Applications/spark-2.0.0-bin-hadoop2.7/spark-cassandra-connector-assembly-2.0.0-M1-22-gab4eda2.jar
我可以在没有错误消息的情况下使用 spark 和 cassandra。
关于如何解决这种奇怪的不兼容问题的任何线索?
编辑:
这很有趣,从工作节点的角度来看,当我 运行 一个程序时,连接器给出
`java.io.InvalidClassException: com.datastax.spark.connector.rdd.CassandraTableScanRDD; local class incompatible: stream classdesc serialVersionUID = 1517205208424539072, local class serialVersionUID = 6631934706192455668`
这就是最终给出 ClassNotFound 的原因(它不会因为冲突而绑定)。但是项目只用过spark and connector 2.0
和scala 2.11
,没有版本不兼容的地方。
在 Spark 中,仅仅因为您针对库进行构建并不意味着它将包含在运行时类路径中。如果您在
中添加--jars /home/Applications/spark-2.0.0-bin-hadoop2.7/spark-cassandra-connector-assembly-2.0.0-M1-22-gab4eda2.jar
对于您的应用程序的 spark-submit,它将在运行时和所有远程 jvms 上包含所有这些必要的库。
基本上您看到的是,在第一个示例中,none 连接器库位于运行时类路径中,在 spark-shell 示例中它们是。