使用 CassandraSQLContext 从 Spark 查询 Cassandra
Query Cassandra from Spark using CassandraSQLContext
我尝试使用 CassandraSQLContext 从 Spark 查询 Cassandra,但我收到一个奇怪的缺少依赖项错误。我有一个如下所示的 Spark 应用程序:
val spark: SparkSession = SparkSession.builder().appName(appName).getOrCreate()
val cassandraSQLContext = new org.apache.spark.sql.cassandra.CassandraSQLContext(spark.sparkContext)
val path = args(0)
cassandraSQLContext.setKeyspace(args(1))
val dataFrame: DataFrame = cassandraSQLContext.sql(args(2))
dataFrame.write.mode(SaveMode.Overwrite).option("header", "true").csv(path)
我得到一个丢失的 Spark scala class 错误:
User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/Catalog
at com.test.batch.utils.CSVFromCassandraSQLQuery$.<init>(CSVFromCassandraSQLQuery.scala:19)
at com.test.batch.utils.CSVFromCassandraSQLQuery$.<clinit>(CSVFromCassandraSQLQuery.scala)
at com.test.batch.utils.CSVFromCassandraSQLQuery.main(CSVFromCassandraSQLQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:721)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.analysis.Catalog
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 8 more
我也尝试在 spark-submit
命令中显式添加 spark-catalyst
jar 路径,但我仍然遇到同样的问题(无论是 运行 在本地还是在纱线集群上) ...
这是我的项目设置 build.sbt
:
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"log4j" % "log4j" % "1.2.17",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.3.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0"
)
关于我遗漏的任何想法?
出现此问题是因为 org.apache.spark.sql.catalyst.analysis.目录 class 不是 spark-catalyst_2.11 的一部分:2.x.x 了。请参考spark-catalyst_2.11:2.0.0
的源码
https://jar-download.com/artifacts/org.apache.spark/spark-catalyst_2.11/2.0.0/source-code
org.apache.spark.sql.catalyst.analysis.Catalog class 可用至 spark-catalyst_2.11:1.6.3版本。
https://jar-download.com/artifacts/org.apache.spark/spark-catalyst_2.11/1.6.3/source-code
我会请求您不要使用 CassandraSQLContext,因为它已被弃用。请检查 https://datastax-oss.atlassian.net/browse/SPARKC-399.
请检查 SO post 以在 spark 2.0.
中使用 Cassandra Context
how to use Cassandra Context in spark 2.0
我尝试使用 CassandraSQLContext 从 Spark 查询 Cassandra,但我收到一个奇怪的缺少依赖项错误。我有一个如下所示的 Spark 应用程序:
val spark: SparkSession = SparkSession.builder().appName(appName).getOrCreate()
val cassandraSQLContext = new org.apache.spark.sql.cassandra.CassandraSQLContext(spark.sparkContext)
val path = args(0)
cassandraSQLContext.setKeyspace(args(1))
val dataFrame: DataFrame = cassandraSQLContext.sql(args(2))
dataFrame.write.mode(SaveMode.Overwrite).option("header", "true").csv(path)
我得到一个丢失的 Spark scala class 错误:
User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/Catalog
at com.test.batch.utils.CSVFromCassandraSQLQuery$.<init>(CSVFromCassandraSQLQuery.scala:19)
at com.test.batch.utils.CSVFromCassandraSQLQuery$.<clinit>(CSVFromCassandraSQLQuery.scala)
at com.test.batch.utils.CSVFromCassandraSQLQuery.main(CSVFromCassandraSQLQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:721)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.analysis.Catalog
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 8 more
我也尝试在 spark-submit
命令中显式添加 spark-catalyst
jar 路径,但我仍然遇到同样的问题(无论是 运行 在本地还是在纱线集群上) ...
这是我的项目设置 build.sbt
:
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"log4j" % "log4j" % "1.2.17",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.3.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0"
)
关于我遗漏的任何想法?
出现此问题是因为 org.apache.spark.sql.catalyst.analysis.目录 class 不是 spark-catalyst_2.11 的一部分:2.x.x 了。请参考spark-catalyst_2.11:2.0.0
的源码https://jar-download.com/artifacts/org.apache.spark/spark-catalyst_2.11/2.0.0/source-code
org.apache.spark.sql.catalyst.analysis.Catalog class 可用至 spark-catalyst_2.11:1.6.3版本。
https://jar-download.com/artifacts/org.apache.spark/spark-catalyst_2.11/1.6.3/source-code
我会请求您不要使用 CassandraSQLContext,因为它已被弃用。请检查 https://datastax-oss.atlassian.net/browse/SPARKC-399.
请检查 SO post 以在 spark 2.0.
中使用 Cassandra Contexthow to use Cassandra Context in spark 2.0