文件不存在 - 火花提交

file does not exist - spark submit


我正在尝试使用此命令启动 spark 应用程序:

time spark-submit --master "local[4]" optimize-spark.py

但我遇到了这些错误:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/27 15:43:32 INFO SparkContext: Running Spark version 1.6.0
16/01/27 15:43:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/27 15:43:32 INFO SecurityManager: Changing view acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: Changing modify acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(DamianFox); users with modify permissions: Set(DamianFox)
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriver' on port 51613.
16/01/27 15:43:33 INFO Slf4jLogger: Slf4jLogger started
16/01/27 15:43:33 INFO Remoting: Starting remoting
16/01/27 15:43:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.102:51614]
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 51614.
16/01/27 15:43:33 INFO SparkEnv: Registering MapOutputTracker
16/01/27 15:43:33 INFO SparkEnv: Registering BlockManagerMaster
16/01/27 15:43:33 INFO DiskBlockManager: Created local directory at /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/blockmgr-defb91b0-50f9-45a7-8e92-6d15041c01bc
16/01/27 15:43:33 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/27 15:43:33 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/27 15:43:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/27 15:43:33 INFO SparkUI: Started SparkUI at http://192.168.0.102:4040
16/01/27 15:43:33 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Project/MinimumFunction/optimize-spark.py does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
16/01/27 15:43:34 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
16/01/27 15:43:34 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/27 15:43:34 INFO MemoryStore: MemoryStore cleared
16/01/27 15:43:34 INFO BlockManager: BlockManager stopped
16/01/27 15:43:34 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/27 15:43:34 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/01/27 15:43:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/27 15:43:34 INFO SparkContext: Successfully stopped SparkContext
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
ERROR - failed to write data to stream: <open file '<stdout>', mode 'w' at 0x10bb6e150>

16/01/27 15:43:34 INFO ShutdownHookManager: Shutdown hook called
16/01/27 15:43:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/spark-c00170ca-0e05-4ece-a962-f9303bce4f9f
spark-submit --master "local[4]" optimize-spark.py  6.12s user 0.52s system 187% cpu 3.539 total

我该如何解决这个问题?变量有问题吗?我已经搜索了很多时间,但找不到解决方案。谢谢!

对于造成的混乱,我们深表歉意。 --py-files 用于提供程序所需的附加依赖 python 文件,以便将它们放在 PYTHONPATH 中。 我再次尝试按照 windows/ Spark-1.6 中对我有用的命令:-

bin\spark-submit --master "local[4]" testingpyfiles.py

testingpyfiles.py 是一个简单的 python 文件,它在控制台上打印一些随机数据,并存储在我执行上述命令的同一目录中。这是 testingpyfiles.py

的代码
from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("Python App")
sc = SparkContext(conf=conf)

data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print("Now it will print the data")
print(distData)

在您的情况下,似乎路径不正确或者执行文件的权限可能存在一些问题。还要确保 optimize-spark.py 位于我们执行 spark-submit 的同一目录中。

我将项目文件夹移动到桌面文件夹,现在可以使用了。
可能之前没用,因为我把项目放在了一个名字有空格的文件夹中,所以命令很可能没有找到文件。

您可以通过两种方式解决这个问题:

  1. 您可以像这样将文件作为参数传递给 --py-files

    spark-submit --master "local[4]" --py-files="<filepath>/optimize-spark.py" optimize-spark.py
    

其中 filepath 是本地文件系统上的路径。

  1. 您可以将 optimize-spark.py 文件转储到 HDFS 并通过您的代码添加它

    sc.addFile("hdfs:<filepath_on_hdfs>/optimize-spark.py")