apache spark:使用所有依赖项构建 jar 时出现 akka 版本错误
apache spark: akka version error by build jar with all dependencies
我已经使用 maven(mvn clean compile assembly:single)和以下 pom 文件从我的 spark 应用程序构建了一个 jar 文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>mgm.tp.bigdata</groupId>
<artifactId>ma-spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>ma-spark</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0-cdh5.2.5</version>
</dependency>
<dependency>
<groupId>mgm.tp.bigdata</groupId>
<artifactId>ma-commons</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>mgm.tp.bigdata.ma_spark.SparkMain</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
如果我 运行 我的应用程序在终端上带有 java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar,我会收到以下错误消息:
VirtualBox:~/Schreibtisch$ java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2015-Jun-02 12:53:36,348 [main] org.apache.spark.util.Utils
WARN - Your hostname, proewer-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
2015-Jun-02 12:53:36,350 [main] org.apache.spark.util.Utils
WARN - Set SPARK_LOCAL_IP if you need to bind to another address
2015-Jun-02 12:53:36,401 [main] org.apache.spark.SecurityManager
INFO - Changing view acls to: proewer
2015-Jun-02 12:53:36,402 [main] org.apache.spark.SecurityManager
INFO - Changing modify acls to: proewer
2015-Jun-02 12:53:36,403 [main] org.apache.spark.SecurityManager
INFO - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(proewer); users with modify permissions: Set(proewer)
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort.apply$mcVI$sp(Utils.scala:1454)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:203)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
at mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)
我做错了什么?
此致,
保罗
这就是你做错的地方:
i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
构建应用程序后,您应该使用 spark-submit 脚本启动它。该脚本负责设置 Spark 及其依赖项的类路径,并且可以支持 Spark 支持的不同集群管理器和部署模式:
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
我强烈建议您阅读有关 Submitting Application 的官方文档。
这很可能是因为 akka jar 中的 akka conf 文件在打包 fat jar 时被覆盖或遗漏了。
您可以尝试另一个名为maven-shade-plugin 的插件。而在pom.xml中需要指定如何解决同名资源的冲突。下面是一个例子 -
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>false</minimizeJar>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<!-- Include here the dependencies you want to be packed in your fat jar -->
<include>my.package.etc....:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
请注意 <transformers>
部分,它指示阴影插件附加内容,而不是替换内容。
这对我有用。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>allinone</shadedClassifierName>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.handlers</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.schemas</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.echoed.chamber.Main</Main-Class>
</manifestEntries>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
ConfigException$Missing 错误表示 akka 配置文件,即 reference.conf
文件未捆绑在应用程序 jar 文件中。原因可能是当不同依赖 jar 中有多个同名文件可用时,默认策略将检查它们是否都相同。如果没有,那么它将忽略该文件。
我也遇到了同样的问题,我是这样解决的:
Generate merged reference.conf using AppendingTransformer: 通过merged reference.conf文件,我的意思是所有的依赖模块如akka-core, akka -http、akka-remoting 等包含名为 reference.conf 的资源被 AppendingTransformer 附加在一起。我们在pom文件中添加AppendingTransformer如下:
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
mvn clean install
现在将生成带有合并的 reference.conf 文件的 fat jar。
仍然出现同样的错误: spark-submit <main-class> <app.jar>
当我在 EMR 中部署我的 spark-app 时,仍然出现同样的错误。
原因:由于HDFS是配置的文件系统,EMR集群上的Spark作业默认从HDFS读取。因此,您要使用的文件必须已经存在于 HDFS 中。我使用以下方法将 reference.conf 文件添加到 hdfs:
1. Extract reference.conf file from app.jar into /tmp folder
`cd /tmp`
`jar xvf path_to_application.jar reference.conf`
2. Copy extracted reference.conf from local-path (in this case /tmp) to HDFS-path (ex: /user/hadoop)
`hdfs dfs -put /tmp/reference.conf /user/hadoop`
3. Load config as follows:
`val parsedConfig = ConfigFactory.parseFile(new File("/user/hadoop/reference.conf"))`
`val config = COnfigFactory.load(par)`
备选方案:
- 从app.jar文件中提取reference.conf文件,并将其复制到EMR集群的所有节点上,驱动程序和执行程序的路径相同。
ConfigFactory.parseFile(new File(“file:///tmp/reference.conf”))
现在将从本地文件系统读取 reference.conf。希望对你们有所帮助并节省一些调试时间!!
我已经使用 maven(mvn clean compile assembly:single)和以下 pom 文件从我的 spark 应用程序构建了一个 jar 文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>mgm.tp.bigdata</groupId>
<artifactId>ma-spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>ma-spark</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0-cdh5.2.5</version>
</dependency>
<dependency>
<groupId>mgm.tp.bigdata</groupId>
<artifactId>ma-commons</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>mgm.tp.bigdata.ma_spark.SparkMain</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
如果我 运行 我的应用程序在终端上带有 java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar,我会收到以下错误消息:
VirtualBox:~/Schreibtisch$ java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2015-Jun-02 12:53:36,348 [main] org.apache.spark.util.Utils
WARN - Your hostname, proewer-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
2015-Jun-02 12:53:36,350 [main] org.apache.spark.util.Utils
WARN - Set SPARK_LOCAL_IP if you need to bind to another address
2015-Jun-02 12:53:36,401 [main] org.apache.spark.SecurityManager
INFO - Changing view acls to: proewer
2015-Jun-02 12:53:36,402 [main] org.apache.spark.SecurityManager
INFO - Changing modify acls to: proewer
2015-Jun-02 12:53:36,403 [main] org.apache.spark.SecurityManager
INFO - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(proewer); users with modify permissions: Set(proewer)
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort.apply$mcVI$sp(Utils.scala:1454)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:203)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
at mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)
我做错了什么?
此致, 保罗
这就是你做错的地方:
i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
构建应用程序后,您应该使用 spark-submit 脚本启动它。该脚本负责设置 Spark 及其依赖项的类路径,并且可以支持 Spark 支持的不同集群管理器和部署模式:
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
我强烈建议您阅读有关 Submitting Application 的官方文档。
这很可能是因为 akka jar 中的 akka conf 文件在打包 fat jar 时被覆盖或遗漏了。
您可以尝试另一个名为maven-shade-plugin 的插件。而在pom.xml中需要指定如何解决同名资源的冲突。下面是一个例子 -
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>false</minimizeJar>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<!-- Include here the dependencies you want to be packed in your fat jar -->
<include>my.package.etc....:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
请注意 <transformers>
部分,它指示阴影插件附加内容,而不是替换内容。
这对我有用。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>allinone</shadedClassifierName>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.handlers</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.schemas</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.echoed.chamber.Main</Main-Class>
</manifestEntries>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
ConfigException$Missing 错误表示 akka 配置文件,即 reference.conf
文件未捆绑在应用程序 jar 文件中。原因可能是当不同依赖 jar 中有多个同名文件可用时,默认策略将检查它们是否都相同。如果没有,那么它将忽略该文件。
我也遇到了同样的问题,我是这样解决的:
Generate merged reference.conf using AppendingTransformer: 通过merged reference.conf文件,我的意思是所有的依赖模块如akka-core, akka -http、akka-remoting 等包含名为 reference.conf 的资源被 AppendingTransformer 附加在一起。我们在pom文件中添加AppendingTransformer如下:
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer>
mvn clean install
现在将生成带有合并的 reference.conf 文件的 fat jar。
仍然出现同样的错误: spark-submit <main-class> <app.jar>
当我在 EMR 中部署我的 spark-app 时,仍然出现同样的错误。
原因:由于HDFS是配置的文件系统,EMR集群上的Spark作业默认从HDFS读取。因此,您要使用的文件必须已经存在于 HDFS 中。我使用以下方法将 reference.conf 文件添加到 hdfs:
1. Extract reference.conf file from app.jar into /tmp folder
`cd /tmp`
`jar xvf path_to_application.jar reference.conf`
2. Copy extracted reference.conf from local-path (in this case /tmp) to HDFS-path (ex: /user/hadoop)
`hdfs dfs -put /tmp/reference.conf /user/hadoop`
3. Load config as follows:
`val parsedConfig = ConfigFactory.parseFile(new File("/user/hadoop/reference.conf"))`
`val config = COnfigFactory.load(par)`
备选方案:
- 从app.jar文件中提取reference.conf文件,并将其复制到EMR集群的所有节点上,驱动程序和执行程序的路径相同。
ConfigFactory.parseFile(new File(“file:///tmp/reference.conf”))
现在将从本地文件系统读取 reference.conf。希望对你们有所帮助并节省一些调试时间!!