apache spark:使用所有依赖项构建 jar 时出现 akka 版本错误

apache spark: akka version error by build jar with all dependencies

我已经使用 maven(mvn clean compile assembly:single)和以下 pom 文件从我的 spark 应用程序构建了一个 jar 文件:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>mgm.tp.bigdata</groupId>
  <artifactId>ma-spark</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>ma-spark</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.1.0-cdh5.2.5</version>
    </dependency>
    <dependency>
        <groupId>mgm.tp.bigdata</groupId>
        <artifactId>ma-commons</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    </dependency>
  </dependencies>

  <build>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <configuration>
        <archive>
          <manifest>
            <mainClass>mgm.tp.bigdata.ma_spark.SparkMain</mainClass>
          </manifest>
        </archive>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
  </plugins>
</build>
</project>

如果我 运行 我的应用程序在终端上带有 java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar,我会收到以下错误消息:

VirtualBox:~/Schreibtisch$ java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2015-Jun-02 12:53:36,348 [main] org.apache.spark.util.Utils
 WARN  - Your hostname, proewer-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
2015-Jun-02 12:53:36,350 [main] org.apache.spark.util.Utils
 WARN  - Set SPARK_LOCAL_IP if you need to bind to another address
2015-Jun-02 12:53:36,401 [main] org.apache.spark.SecurityManager
 INFO  - Changing view acls to: proewer
2015-Jun-02 12:53:36,402 [main] org.apache.spark.SecurityManager
 INFO  - Changing modify acls to: proewer
2015-Jun-02 12:53:36,403 [main] org.apache.spark.SecurityManager
 INFO  - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(proewer); users with modify permissions: Set(proewer)
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
    at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:54)
    at org.apache.spark.util.AkkaUtils$$anonfun.apply(AkkaUtils.scala:53)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort.apply$mcVI$sp(Utils.scala:1454)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:203)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
    at mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)

我做错了什么?

此致, 保罗

这就是你做错的地方:

i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar

构建应用程序后,您应该使用 spark-submit 脚本启动它。该脚本负责设置 Spark 及其依赖项的类路径,并且可以支持 Spark 支持的不同集群管理器和部署模式:

./bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

我强烈建议您阅读有关 Submitting Application 的官方文档。

这很可能是因为 akka jar 中的 akka conf 文件在打包 fat jar 时被覆盖或遗漏了。

您可以尝试另一个名为maven-shade-plugin 的插件。而在pom.xml中需要指定如何解决同名资源的冲突。下面是一个例子 -

             <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>false</minimizeJar>
                            <createDependencyReducedPom>false</createDependencyReducedPom>
                            <artifactSet>
                                <includes>
                                    <!-- Include here the dependencies you want to be packed in your fat jar -->
                                    <include>my.package.etc....:*</include>
                                </includes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

请注意 <transformers> 部分,它指示阴影插件附加内容,而不是替换内容。

这对我有用。

 <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>1.5</version>
      <executions>
        <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>allinone</shadedClassifierName>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
          <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>reference.conf</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.handlers</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.schemas</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <manifestEntries>
                <Main-Class>com.echoed.chamber.Main</Main-Class>
              </manifestEntries>
            </transformer>
          </transformers>
        </configuration>
          </execution>
        </executions>
    </plugin>

ConfigException$Missing 错误表示 akka 配置文件,即 reference.conf 文件未捆绑在应用程序 jar 文件中。原因可能是当不同依赖 jar 中有多个同名文件可用时,默认策略将检查它们是否都相同。如果没有,那么它将忽略该文件。

我也遇到了同样的问题,我是这样解决的:

Generate merged reference.conf using AppendingTransformer: 通过merged reference.conf文件,我的意思是所有的依赖模块如akka-core, akka -http、akka-remoting 等包含名为 reference.conf 的资源被 AppendingTransformer 附加在一起。我们在pom文件中添加AppendingTransformer如下:

 <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
     <resource>reference.conf</resource>
 </transformer>

mvn clean install 现在将生成带有合并的 reference.conf 文件的 fat jar。

仍然出现同样的错误: spark-submit <main-class> <app.jar> 当我在 EMR 中部署我的 spark-app 时,仍然出现同样的错误。

原因:由于HDFS是配置的文件系统,EMR集群上的Spark作业默认从HDFS读取。因此,您要使用的文件必须已经存在于 HDFS 中。我使用以下方法将 reference.conf 文件添加到 hdfs:

1. Extract reference.conf file from app.jar into /tmp folder
    `cd /tmp`
    `jar xvf path_to_application.jar reference.conf` 
2. Copy extracted reference.conf from local-path (in this case /tmp) to HDFS-path (ex: /user/hadoop)
    `hdfs dfs -put /tmp/reference.conf /user/hadoop`
3. Load config as follows:
   `val parsedConfig = ConfigFactory.parseFile(new File("/user/hadoop/reference.conf"))`                                   
   `val config = COnfigFactory.load(par)`   

备选方案:

  • 从app.jar文件中提取reference.conf文件,并将其复制到EMR集群的所有节点上,驱动程序和执行程序的路径相同。
  • ConfigFactory.parseFile(new File(“file:///tmp/reference.conf”)) 现在将从本地文件系统读取 reference.conf。希望对你们有所帮助并节省一些调试时间!!