如何避免依赖于加载 spark-streaming 和 kafka?

how to avoid a dependency to load in spark-streaming and kafka?

我正在尝试获取 kafka 和 spark-streaming 工作的示例,但在 运行 过程中发现问题。

这是个例外:

[error] Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.8

这是build.sbt:

name := "SparkJobs"

version := "1.0"

scalaVersion := "2.11.6"

val sparkVersion = "2.4.1"

val flinkVersion = "1.7.2"

resolvers ++= Seq(
"Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/",
"apache snapshots" at "http://repository.apache.org/snapshots/",
"confluent.io" at "http://packages.confluent.io/maven/",
"Maven central" at "http://repo1.maven.org/maven2/"
)

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion

// ,"org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion
, "org.apache.kafka" %% "kafka-streams-scala" % "2.2.0"
// , "io.confluent" % "kafka-streams-avro-serde" % "5.2.1"
)

//excludeDependencies ++= Seq(
// commons-logging is replaced by jcl-over-slf4j
//  ExclusionRule("jackson-module-scala", "jackson-module-scala")
//
)

这是code

做一个 sbt dependencyTree 我可以看到 spark-core_2.11-2.4.1.jarjackson-databind-2.6.7.1,它告诉我它被 2.9.8 version 驱逐,这表明两者之间存在冲突库,但 spark-core_2.11-2.4.1.jar 不是唯一的,kafka-streams-scala_2.11:2.2.0 使用 jackson-databind-2.9.8 版本,所以我不知道哪个库必须驱逐 jackson-databind-2.9.8. Spark-core / kafka-streams -斯卡拉?还是两者都有?

如何避免 jackson library version 2.9.8 以完成此任务 运行?

我假设我需要 jackson-databind-2.6.7 version ...

更新建议。还是不行。

我已经删除了 kafka-stream-scala 的依赖项,它试图使用 jackson 2.9.8,使用这个 build.sbt

name := "SparkJobs"

version := "1.0"

scalaVersion := "2.11.6"

val sparkVersion = "2.4.1"

val flinkVersion = "1.7.2"

val kafkaStreamScala = "2.2.0"

resolvers ++= Seq(
"Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/",
"apache snapshots" at "http://repository.apache.org/snapshots/",
"confluent.io" at "http://packages.confluent.io/maven/",
"Maven central" at "http://repo1.maven.org/maven2/"
)


libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion ,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion

)

但是我有新的exception

更新 2

知道了,现在我明白了第二个异常,我忘了 awaitToTermination。

Kafka Streams includes Jackson 2.9.8

但是在使用 Spark Streaming 的 Kafka Integration 时不需要它,所以您真的应该删除它。

同样,kafka-streams-avro-serde 不是您希望与 Spark 一起使用的,您可能会发现 AbraOSS/ABRiS 有用。