带有齐柏林飞艇的 pyspark 是 emr 给出 NoClassDefFoundError
pyspark with zeppelin on was emr gives NoClassDefFoundError
我是 运行 emr 上的 zeppelin,使用 pyspark 处理一些日志文件。
我收到这个 "java.lang.NoClassDefFoundError: com/amazonaws/services/s3/AmazonS3" 错误。
不确定如何解决。我看过各种资源。帮助表示赞赏。
---错误日志---
Py4JJavaError: An error occurred while calling o188.partitions. :
java.lang.NoClassDefFoundError: com/amazonaws/services/s3/AmazonS3 at
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:99)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2644)
at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:90) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:200)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:279)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at
org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:239)
at
org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at
org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:239)
at
org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at
org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:65)
at
org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:47)
at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
py4j.Gateway.invoke(Gateway.java:259) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:207) at
java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.ClassNotFoundException: com.amazonaws.services.s3.AmazonS3
at java.net.URLClassLoader.run(URLClassLoader.java:366) at
java.net.URLClassLoader.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 32 more
对于给您带来的不便,我们深表歉意!这是由于 emr-4.2.0 中引入的更改意外地从有效的 Zeppelin 类路径中删除了 AWS Java SDK 库。过去几天已将修复推送到大多数地区,并将在本周末推送到所有其他地区,因此现在应该可以在 emr-4.2.0 中再次使用。
我是 运行 emr 上的 zeppelin,使用 pyspark 处理一些日志文件。
我收到这个 "java.lang.NoClassDefFoundError: com/amazonaws/services/s3/AmazonS3" 错误。
不确定如何解决。我看过各种资源。帮助表示赞赏。
---错误日志---
Py4JJavaError: An error occurred while calling o188.partitions. : java.lang.NoClassDefFoundError: com/amazonaws/services/s3/AmazonS3 at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:99) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2644) at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:200) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:279) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:65) at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:47) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.s3.AmazonS3 at java.net.URLClassLoader.run(URLClassLoader.java:366) at java.net.URLClassLoader.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 32 more
对于给您带来的不便,我们深表歉意!这是由于 emr-4.2.0 中引入的更改意外地从有效的 Zeppelin 类路径中删除了 AWS Java SDK 库。过去几天已将修复推送到大多数地区,并将在本周末推送到所有其他地区,因此现在应该可以在 emr-4.2.0 中再次使用。