为什么集群模式下 YARN 上的 Spark 会因 "Exception in thread "Driver" java.lang.NullPointerException" 而失败?
Why does Spark on YARN in cluster mode fail with "Exception in thread "Driver" java.lang.NullPointerException"?
我在 Spark 2.1.0 中使用 emr-5.4.0。我明白 NullPointerException
是什么,这个问题是关于为什么在这种特殊情况下抛出的。
无法真正弄清楚为什么我在驱动程序线程中出现 NullPointerException。
我的这个奇怪的工作因这个错误而失败:
18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a separate Thread
18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context initialization...
Exception in thread "Driver" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:637)
18/03/29 20:07:52 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: SparkContext is null but app is still running!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:415)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/03/29 20:07:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Deleting staging directory hdfs://<ip-address>.ec2.internal:8020/user/hadoop/.sparkStaging/application_1522348295743_0010
18/03/29 20:07:52 INFO ShutdownHookManager: Shutdown hook called
End of LogType:stderr
我提交的这份工作是这样的:
spark-submit --deploy-mode cluster --master yarn --num-executors 40 --executor-cores 16 --executor-memory 100g --driver-cores 8 --driver-memory 100g --class <package.class_name> --jars <s3://s3_path/some_lib.jar> <s3://s3_path/class.jar>
我的 class 看起来像这样:
class MyClass {
def main(args: Array[String]): Unit = {
val c = new MyClass()
c.process()
}
def process(): Unit = {
val sparkConf = new SparkConf().setAppName("my-test")
val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
....
}
...
}
将 class MyClass
更改为 object MyClass
即可。
在我们这样做的同时,我还将 class MyClass
更改为 object MyClass extends App
并删除 def main(args: Array[String]): Unit
(由 extends App
给出)。
我报告了 Spark 2.3.0 的改进 - [SPARK-23830] Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object - 可以很好地向最终用户报告。
深入了解 Spark on YARN 的工作原理,以下消息是当 ApplicationMaster of a Spark application starts the driver(您将 --deploy-mode cluster --master yarn
与 spark-submit
一起使用时)。
ApplicationMaster: Starting the user application in a separate Thread
在 INFO 消息之后,您应该会看到另一个消息:
ApplicationMaster: Waiting for spark context initialization...
这是 driver initialization when the ApplicationMaster runs 的一部分。
异常的原因Exception in thread "Driver" java.lang.NullPointerException
是由于following code:
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
我的理解是mainMethod
此时是null
所以following line(其中mainMethod
是null
)"triggers"NullPointerException
:
mainMethod.invoke(null, userArgs.toArray)
该线程确实被称为 Driver
(如 Exception in thread "Driver" java.lang.NullPointerException
),如 this line 中所设置:
userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()
行号不同,因为我使用 Spark 2.3.0 来引用行,而你使用 emr-5.4.0 和 Spark 2.1.0。
我在 Spark 2.1.0 中使用 emr-5.4.0。我明白 NullPointerException
是什么,这个问题是关于为什么在这种特殊情况下抛出的。
无法真正弄清楚为什么我在驱动程序线程中出现 NullPointerException。
我的这个奇怪的工作因这个错误而失败:
18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a separate Thread
18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context initialization...
Exception in thread "Driver" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:637)
18/03/29 20:07:52 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: SparkContext is null but app is still running!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:415)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/03/29 20:07:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Deleting staging directory hdfs://<ip-address>.ec2.internal:8020/user/hadoop/.sparkStaging/application_1522348295743_0010
18/03/29 20:07:52 INFO ShutdownHookManager: Shutdown hook called
End of LogType:stderr
我提交的这份工作是这样的:
spark-submit --deploy-mode cluster --master yarn --num-executors 40 --executor-cores 16 --executor-memory 100g --driver-cores 8 --driver-memory 100g --class <package.class_name> --jars <s3://s3_path/some_lib.jar> <s3://s3_path/class.jar>
我的 class 看起来像这样:
class MyClass {
def main(args: Array[String]): Unit = {
val c = new MyClass()
c.process()
}
def process(): Unit = {
val sparkConf = new SparkConf().setAppName("my-test")
val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
....
}
...
}
将 class MyClass
更改为 object MyClass
即可。
在我们这样做的同时,我还将 class MyClass
更改为 object MyClass extends App
并删除 def main(args: Array[String]): Unit
(由 extends App
给出)。
我报告了 Spark 2.3.0 的改进 - [SPARK-23830] Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object - 可以很好地向最终用户报告。
深入了解 Spark on YARN 的工作原理,以下消息是当 ApplicationMaster of a Spark application starts the driver(您将 --deploy-mode cluster --master yarn
与 spark-submit
一起使用时)。
ApplicationMaster: Starting the user application in a separate Thread
在 INFO 消息之后,您应该会看到另一个消息:
ApplicationMaster: Waiting for spark context initialization...
这是 driver initialization when the ApplicationMaster runs 的一部分。
异常的原因Exception in thread "Driver" java.lang.NullPointerException
是由于following code:
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
我的理解是mainMethod
此时是null
所以following line(其中mainMethod
是null
)"triggers"NullPointerException
:
mainMethod.invoke(null, userArgs.toArray)
该线程确实被称为 Driver
(如 Exception in thread "Driver" java.lang.NullPointerException
),如 this line 中所设置:
userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()
行号不同,因为我使用 Spark 2.3.0 来引用行,而你使用 emr-5.4.0 和 Spark 2.1.0。