Apache Zeppelin 无法反序列化数据集:"NoSuchMethodError"
Apache Zeppelin cannot deserialize dataset: "NoSuchMethodError"
我正在尝试使用 Apache Zeppelin(0.7.2,在 Mac 上本地网络安装 运行)探索从 s3 存储桶加载的数据。数据似乎加载得很好,因为命令:
val p = spark.read.textFile("s3a://sparkcookbook/person")
给出结果:
p: org.apache.spark.sql.Dataset[String] = [value: string]
但是,当我尝试调用对象 p
上的方法时,出现错误。例如:
p.take(1)
结果:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute.apply(Dataset.scala:2371)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute(Dataset.scala:2370)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
at org.apache.spark.sql.Dataset$$anonfun$head.apply(Dataset.scala:2113)
at org.apache.spark.sql.Dataset$$anonfun$head.apply(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
我的conf/zeppelin-env.sh
和默认的一样,只是我在那里定义了amazon access key和secret key环境变量。在 Zeppelin Notebook 的 Spark 解释器中,我添加了以下工件:
org.apache.hadoop:hadoop-aws:2.7.3
com.amazonaws:aws-java-sdk:1.7.9
com.fasterxml.jackson.core:jackson-core:2.9.0
com.fasterxml.jackson.core:jackson-databind:2.9.0
com.fasterxml.jackson.core:jackson-annotations:2.9.0
(我觉得只有前两个是必须的)。上面的两个命令在 Spark shell 中运行良好,只是在 Zeppelin notebook 中运行不正常(请参阅 了解其设置方式)。
看来 Jackson 的一个库有问题。也许我在上面为 Zeppelin 解释器使用了错误的工件?
更新:按照下面建议答案中的建议,我删除了 Zeppelin 附带的 jackson
罐子,并用以下内容替换它们:
jackson-annotations-2.6.0.jar
jackson-core-2.6.7.jar
jackson-databind-2.6.7.jar
并用这些替换了工件,所以我的工件现在是:
org.apache.hadoop:hadoop-aws:2.7.3
com.amazonaws:aws-java-sdk:1.7.9
com.fasterxml.jackson.core:jackson-core:2.6.7
com.fasterxml.jackson.core:jackson-databind:2.6.7
com.fasterxml.jackson.core:jackson-annotations:2.6.0
但是,我从 运行 上面的命令得到的错误是相同的。
UDPATE2:根据我从工件列表中删除 jackson
库,因为它们现在已经在 jars/
文件夹中 - 现在唯一添加的工件是上面的 aws 工件。然后我通过在笔记本中输入以下内容来清理 class 路径(根据 instructions):
%spark.dep
z.reset()
我现在得到一个不同的错误:
val p = spark.read.textFile("s3a://sparkcookbook/person")
p.take(1)
p: org.apache.spark.sql.Dataset[String] = [value: string]
java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute.apply(Dataset.scala:2371)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
更新 3:根据对下面建议答案的评论中的建议,我通过删除本地存储库中的所有文件来清理 class 路径:
rm -rf local-repo/*
然后我重新启动了 Zeppelin 服务器。为了检查 class 路径,我在笔记本中执行了以下命令:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
这给出了以下输出(我在这里只包括输出中的 jackson 库,否则输出太长而无法粘贴):
...
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-jaxrs-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-xc-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-annotations-2.6.0.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-core-2.6.7.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-databind-2.6.7.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-annotations-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-databind-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-annotations-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-databind-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-paranamer-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-scala_2.11-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/json4s-jackson_2.11-3.2.11.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar
...
似乎从存储库中提取了多个版本。我应该排除旧版本吗?如果是这样,我该怎么做?
您使用的 Jackson 版本可能太新了。甚至 spark 2.3 仍在 `2.6.7 上。降级,并确保您所有的 jackson JAR 都是一致的。
使用这个jar版本;
aws-java-sdk-1.7.4.jar
hadoop-aws-2.6.0.jar
就像在这个脚本中一样:https://github.com/2dmitrypavlov/sparkDocker/blob/master/zeppelin.sh
不要使用包但下载罐子并将它们放在一个路径中,让我们说在“/root/jars/”然后编辑你的飞艇-env.sh;
然后 运行 这个命令来自 zeppelin/conf 目录;
回声'export SPARK_SUBMIT_OPTIONS="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>飞艇-env.sh
之后重新启动飞艇。
上面 link 处的代码粘贴在下面(以防 link 变得陈旧):
#!/bin/bash
# Download jars
cd /root/jars
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.39/mysql-connector-java-5.1.39.jar
cd /usr/share/
wget http://archive.apache.org/dist/zeppelin/zeppelin-0.7.1/zeppelin-0.7.1-bin-all.tgz
tar -zxvf zeppelin-0.7.1-bin-all.tgz
cd zeppelin-0.7.1-bin-all/conf
cp zeppelin-env.sh.template zeppelin-env.sh
echo 'export MASTER=spark://'$MASTERZ':7077'>>zeppelin-env.sh
echo 'export SPARK_SUBMIT_OPTIONS="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh
echo 'export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.VFSNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo"'>>zeppelin-env.sh
echo 'export ZEPPELINHUB_API_ADDRESS="https://www.zeppelinhub.com"'>>zeppelin-env.sh
echo 'export ZEPPELIN_PORT=9999'>>zeppelin-env.sh
echo 'export SPARK_HOME=/usr/share/spark'>>zeppelin-env.sh
cd ../bin/
./zeppelin.sh
我正在尝试使用 Apache Zeppelin(0.7.2,在 Mac 上本地网络安装 运行)探索从 s3 存储桶加载的数据。数据似乎加载得很好,因为命令:
val p = spark.read.textFile("s3a://sparkcookbook/person")
给出结果:
p: org.apache.spark.sql.Dataset[String] = [value: string]
但是,当我尝试调用对象 p
上的方法时,出现错误。例如:
p.take(1)
结果:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute.apply(Dataset.scala:2371)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute(Dataset.scala:2370)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
at org.apache.spark.sql.Dataset$$anonfun$head.apply(Dataset.scala:2113)
at org.apache.spark.sql.Dataset$$anonfun$head.apply(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2112)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2327)
我的conf/zeppelin-env.sh
和默认的一样,只是我在那里定义了amazon access key和secret key环境变量。在 Zeppelin Notebook 的 Spark 解释器中,我添加了以下工件:
org.apache.hadoop:hadoop-aws:2.7.3
com.amazonaws:aws-java-sdk:1.7.9
com.fasterxml.jackson.core:jackson-core:2.9.0
com.fasterxml.jackson.core:jackson-databind:2.9.0
com.fasterxml.jackson.core:jackson-annotations:2.9.0
(我觉得只有前两个是必须的)。上面的两个命令在 Spark shell 中运行良好,只是在 Zeppelin notebook 中运行不正常(请参阅
看来 Jackson 的一个库有问题。也许我在上面为 Zeppelin 解释器使用了错误的工件?
更新:按照下面建议答案中的建议,我删除了 Zeppelin 附带的 jackson
罐子,并用以下内容替换它们:
jackson-annotations-2.6.0.jar
jackson-core-2.6.7.jar
jackson-databind-2.6.7.jar
并用这些替换了工件,所以我的工件现在是:
org.apache.hadoop:hadoop-aws:2.7.3
com.amazonaws:aws-java-sdk:1.7.9
com.fasterxml.jackson.core:jackson-core:2.6.7
com.fasterxml.jackson.core:jackson-databind:2.6.7
com.fasterxml.jackson.core:jackson-annotations:2.6.0
但是,我从 运行 上面的命令得到的错误是相同的。
UDPATE2:根据我从工件列表中删除 jackson
库,因为它们现在已经在 jars/
文件夹中 - 现在唯一添加的工件是上面的 aws 工件。然后我通过在笔记本中输入以下内容来清理 class 路径(根据 instructions):
%spark.dep
z.reset()
我现在得到一个不同的错误:
val p = spark.read.textFile("s3a://sparkcookbook/person")
p.take(1)
p: org.apache.spark.sql.Dataset[String] = [value: string]
java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute.apply(Dataset.scala:2371)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
更新 3:根据对下面建议答案的评论中的建议,我通过删除本地存储库中的所有文件来清理 class 路径:
rm -rf local-repo/*
然后我重新启动了 Zeppelin 服务器。为了检查 class 路径,我在笔记本中执行了以下命令:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
这给出了以下输出(我在这里只包括输出中的 jackson 库,否则输出太长而无法粘贴):
...
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.1.1.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.2.3.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-jaxrs-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-xc-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-annotations-2.6.0.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-core-2.6.7.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-databind-2.6.7.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-annotations-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-databind-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-annotations-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-databind-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-paranamer-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-scala_2.11-2.6.5.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/json4s-jackson_2.11-3.2.11.jar
file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar
...
似乎从存储库中提取了多个版本。我应该排除旧版本吗?如果是这样,我该怎么做?
您使用的 Jackson 版本可能太新了。甚至 spark 2.3 仍在 `2.6.7 上。降级,并确保您所有的 jackson JAR 都是一致的。
使用这个jar版本;
aws-java-sdk-1.7.4.jar
hadoop-aws-2.6.0.jar
就像在这个脚本中一样:https://github.com/2dmitrypavlov/sparkDocker/blob/master/zeppelin.sh 不要使用包但下载罐子并将它们放在一个路径中,让我们说在“/root/jars/”然后编辑你的飞艇-env.sh; 然后 运行 这个命令来自 zeppelin/conf 目录;
回声'export SPARK_SUBMIT_OPTIONS="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>飞艇-env.sh
之后重新启动飞艇。
上面 link 处的代码粘贴在下面(以防 link 变得陈旧):
#!/bin/bash
# Download jars
cd /root/jars
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.39/mysql-connector-java-5.1.39.jar
cd /usr/share/
wget http://archive.apache.org/dist/zeppelin/zeppelin-0.7.1/zeppelin-0.7.1-bin-all.tgz
tar -zxvf zeppelin-0.7.1-bin-all.tgz
cd zeppelin-0.7.1-bin-all/conf
cp zeppelin-env.sh.template zeppelin-env.sh
echo 'export MASTER=spark://'$MASTERZ':7077'>>zeppelin-env.sh
echo 'export SPARK_SUBMIT_OPTIONS="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh
echo 'export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.VFSNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo"'>>zeppelin-env.sh
echo 'export ZEPPELINHUB_API_ADDRESS="https://www.zeppelinhub.com"'>>zeppelin-env.sh
echo 'export ZEPPELIN_PORT=9999'>>zeppelin-env.sh
echo 'export SPARK_HOME=/usr/share/spark'>>zeppelin-env.sh
cd ../bin/
./zeppelin.sh