运行 Spark 上的 Golang apache Beam 管道

Running a Golang apache Beam pipeline on Spark

我创建了一个简单的 golang Apache Beam 管道，它与 DirectRunner 配合得很好。我尝试使用以下命令将其部署在 Spark 集群上： ./bin/spark-submit --master=spark://vm:7077 main.go --runner=SparkRunner --job_endpoint=localhost:8099 --artifact_endpoint=localhost:8098 --environment_type=LOOPBACK --output=/tmp/output

在提交申请之前，我使用以下命令运行了 job_endpoint：

./gradlew :runners:spark:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077

作业在 Spark 上失败并出现此错误： WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'null'. Please specify one with --class.

似乎我需要指定 class 参数，但我不明白错误是什么意思？我能得到帮助吗？

spark-submit 是一个接受 Java JAR 或 Python 脚本的 Spark 实用程序。它不知道如何运行 Go 程序。

我用 Spark 运行ner 的说明更新了 Beam Go quickstart guide。让我知道这是否适合你。

运行 Spark 上的 Golang apache Beam 管道

Running a Golang apache Beam pipeline on Spark

go

apache-spark

apache-beam