我的 java 软件 运行 在 dataproc 上升级 google-云库后库发生冲突

Libraries conflict after upgrade of google-cloud library in my java software running on dataproc

我在 google dataproc 上的 java 软件 运行 上将 google-云库从 0.8.0 升级到 0.32.0-alpha 版本后遇到问题.

这里是我的 Maven 依赖项:

<dependencies>
    <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud</artifactId>
        <version>0.32.0-alpha</version>  
    </dependency>

    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.5.2</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.19</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.19</version>
     </dependency>

     <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-core</artifactId>
        <version>1.12</version>
     </dependency>

     <dependency>
        <groupId>args4j</groupId>
        <artifactId>args4j</artifactId>
        <version>2.33</version>
     </dependency>

     <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
     </dependency>

     <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1</version>
     </dependency>

     <dependency>
        <groupId>org.mockito</groupId>
        <artifactId>mockito-all</artifactId>
        <version>2.0.2-beta</version>
        <scope>test</scope>
     </dependency>

     <dependency>
         <groupId>javax.mail</groupId>
         <artifactId>mail</artifactId>
         <version>1.4</version>
     </dependency>

     <dependency>
         <groupId>mysql</groupId>
         <artifactId>mysql-connector-java</artifactId>
         <version>5.1.39</version>
     </dependency>

     <dependency>
         <groupId>commons-lang</groupId>
         <artifactId>commons-lang</artifactId>
         <version>2.6</version>
     </dependency>

</dependencies>

这是我在 dataproc 作业输出中看到的错误

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
    at com.google.api.gax.retrying.BasicRetryingFuture.<init>(BasicRetryingFuture.java:77)
    at com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:73)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:73)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
    at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:375)
    at com.google.cloud.bigquery.BigQueryImpl.getTable(BigQueryImpl.java:366)
    at com.finscience.job.link.LinkAnalyzerProcess.process(LinkAnalyzerProcess.java:210)
    at com.finscience.job.link.LinkAnalyzerJob.main(LinkAnalyzerJob.java:140)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

看来问题与google.api.gax库有关。

经过搜索,我发现有人解决了从 maven 依赖项中排除番石榴的问题。 所以我以这种方式修改了我的 google 云依赖项:

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud</artifactId>
    <version>0.32.0-alpha</version>
    <exclusions>
       <exclusion>
           <artifactId>com.google</artifactId>
            <groupId>guava</groupId>
       </exclusion>  
    </exclusions>  
 </dependency>

但不幸的是,这并没有解决我的问题。

google云图书馆(https://github.com/googlecloudplatform/google-cloud-java)的github页面说:

The easiest way to solve version conflicts is to use google-cloud's BOM

所以我将以下依赖项添加到我的 pom 中:

<dependency>
   <groupId>com.google.cloud</groupId>
   <artifactId>google-cloud-bom</artifactId>
   <version>0.41.0-alpha</version>
   <type>pom</type>
   <scope>import</scope>
</dependency>

但即使在这种情况下,问题也没有得到解决。

有人可以帮我吗?

您很可能 运行 与 Dataproc 集群上的 Spark 和 Hadoop jar 发生冲突。 介绍如何重新打包 jar 以处理 Hadoop / Spark 冲突。