Elasticsearch 支持 spark 2.4.2 和 scala 2.12
Elasticsearch support for spark 2.4.2 with scala 2.12
我找不到任何 ES 6.7.1 支持 spark 2.4.2 和 scala 2.12 的 jar
在 maven repo 中,jar 仅支持 scala 2.11 和 2.10。
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>6.7.1</version>
</dependency>
对于我的应用程序,我们使用仅支持 scala 2.12 版本的 spark 2.4.2。
以下是我尝试使用 "elasticsearch-spark-20_2.11" jar
运行 时显示的错误
StreamingExecutionRelation KafkaV2[Subscribe[test_topic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:302)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon.run(StreamExecution.scala:193)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.elasticsearch.spark.sql.DataFrameValueWriter.writeStruct(DataFrameValueWriter.scala:78)
at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:70)
at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:53)
at org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:53)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:71)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:58)
at org.elasticsearch.hadoop.serialization.bulk.BulkEntryWriter.writeBulkEntry(BulkEntryWriter.java:68)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:170)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:74)
at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.run(EsStreamQueryWriter.scala:41)
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$$anonfun.apply(EsSparkSqlStreamingSink.scala:52)
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$$anonfun.apply(EsSparkSqlStreamingSink.scala:51)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
抱歉耽搁了,
我们还不能在 Scala 2.12 中使用 Elasticsearch Spark / Elasticsearch Hadoop 库。存在一个已打开的合并请求 (https://github.com/elastic/elasticsearch-hadoop/pull/1308),它正在等待测试通过。
您要么必须将项目降级以使用 Scala 2.11,要么必须等待库在 Scala 2.12 上发布
该项目的最后一个 Maven 版本是 November 2016,仍然不支持 Scala 2.12。
这很不幸,因为 Spark 3 需要 Scala 2.12,所以如果您的项目依赖于 elasticsearch-spark,那么您将无法升级到 Spark 3。
elasticsearch-hadoop 声称支持 Spark 并且看起来会得到积极维护,因此这可能是您最好的选择。
如 this post 中所述,请注意您在 Spark / Scala 世界中的依赖项,因为依赖项经常被遗弃,使用户陷入困境。
我找不到任何 ES 6.7.1 支持 spark 2.4.2 和 scala 2.12 的 jar 在 maven repo 中,jar 仅支持 scala 2.11 和 2.10。
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>6.7.1</version>
</dependency>
对于我的应用程序,我们使用仅支持 scala 2.12 版本的 spark 2.4.2。 以下是我尝试使用 "elasticsearch-spark-20_2.11" jar
运行 时显示的错误StreamingExecutionRelation KafkaV2[Subscribe[test_topic]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:302)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon.run(StreamExecution.scala:193)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.elasticsearch.spark.sql.DataFrameValueWriter.writeStruct(DataFrameValueWriter.scala:78)
at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:70)
at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:53)
at org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:53)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:71)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:58)
at org.elasticsearch.hadoop.serialization.bulk.BulkEntryWriter.writeBulkEntry(BulkEntryWriter.java:68)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:170)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:74)
at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.run(EsStreamQueryWriter.scala:41)
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$$anonfun.apply(EsSparkSqlStreamingSink.scala:52)
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$$anonfun.apply(EsSparkSqlStreamingSink.scala:51)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
抱歉耽搁了,
我们还不能在 Scala 2.12 中使用 Elasticsearch Spark / Elasticsearch Hadoop 库。存在一个已打开的合并请求 (https://github.com/elastic/elasticsearch-hadoop/pull/1308),它正在等待测试通过。
您要么必须将项目降级以使用 Scala 2.11,要么必须等待库在 Scala 2.12 上发布
该项目的最后一个 Maven 版本是 November 2016,仍然不支持 Scala 2.12。
这很不幸,因为 Spark 3 需要 Scala 2.12,所以如果您的项目依赖于 elasticsearch-spark,那么您将无法升级到 Spark 3。
elasticsearch-hadoop 声称支持 Spark 并且看起来会得到积极维护,因此这可能是您最好的选择。
如 this post 中所述,请注意您在 Spark / Scala 世界中的依赖项,因为依赖项经常被遗弃,使用户陷入困境。