PySpark 中未加载 Elephas:没有名为 elephas.spark_model 的模块
Elephas not loaded in PySpark: No module named elephas.spark_model
我正在尝试在集群上分发 Keras 训练并为此使用 Elephas。但是,当 运行 来自 Elephas (https://github.com/maxpumperla/elephas) 文档的基本示例时:
from elephas.utils.rdd_utils import to_simple_rdd
rdd = to_simple_rdd(sc, x_train, y_train)
from elephas.spark_model import SparkModel
from elephas import optimizers as elephas_optimizers
sgd = elephas_optimizers.SGD()
spark_model = SparkModel(sc, model, optimizer=sgd, frequency='epoch', mode='asynchronous', num_workers=2)
spark_model.train(rdd, nb_epoch=epochs, batch_size=batch_size, verbose=1, validation_split=0.1)
我收到以下错误:
ImportError: No module named elephas.spark_model
```Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 58, xxxx, executor 8): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File /yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/yarn//local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 454, in loads
return pickle.loads(obj)
ImportError: No module named elephas.spark_model
at org.apache.spark.api.python.PythonRunner$$anon.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
此外,模型实际上已创建,我可以做到print(spark_model)
并且会得到这个<elephas.spark_model.SparkModel object at 0x7efce0abfcd0>
。错误发生在 spark_model.train
.
期间
我已经使用 pip2 install git+https://github.com/maxpumperla/elephas
安装了 elephas,也许这是相关的。
我使用 PySpark 2.1.1、Keras 2.1.4 和 Python 2.7。
我用 spark-submit:
试过 运行
PYSPARK_DRIVER_PYTHON=`which python` spark-submit --driver-memory 1G filname.py
并且还直接在 Jupyter Notebook 中。两者都会导致同样的问题。
谁能给我指点一下?这个 elephas 是相关的还是 PySpark 的问题?
编辑:我还上传了虚拟环境的 zip 文件并在脚本中调用它:
virtualenv spark_venv --relocatable
cd spark_venv
zip -qr ../spark_venv.zip *
PYSPARK_DRIVER_PYTHON=`which python` spark-submit --driver-memory 1G --py-files spark_venv.zip filename.py
然后在我做的文件中:
sc.addPyFile("spark_venv.zip")
这个keras导入后没有任何问题,但我仍然得到上面的elephas
错误。
您应该添加 elephas
库作为您的 spark-submit
命令的参数。
引用官方指南:
For Python, you can use the --py-files
argument of spark-submit
to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg.
我找到了一个关于如何正确地将虚拟环境加载到 master 和所有 slave worker 的解决方案:
virtualenv venv --relocatable
cd venv
zip -qr ../venv.zip *
PYSPARK_PYTHON=./SP/bin/python spark-submit --master yarn --deploy-mode cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./SP/bin/python --driver-memory 4G --archives venv.zip#SP filename.py
GitHub 问题中的更多详细信息:
https://github.com/maxpumperla/elephas/issues/80#issuecomment-371073492
我正在尝试在集群上分发 Keras 训练并为此使用 Elephas。但是,当 运行 来自 Elephas (https://github.com/maxpumperla/elephas) 文档的基本示例时:
from elephas.utils.rdd_utils import to_simple_rdd
rdd = to_simple_rdd(sc, x_train, y_train)
from elephas.spark_model import SparkModel
from elephas import optimizers as elephas_optimizers
sgd = elephas_optimizers.SGD()
spark_model = SparkModel(sc, model, optimizer=sgd, frequency='epoch', mode='asynchronous', num_workers=2)
spark_model.train(rdd, nb_epoch=epochs, batch_size=batch_size, verbose=1, validation_split=0.1)
我收到以下错误:
ImportError: No module named elephas.spark_model
```Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 58, xxxx, executor 8): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File /yarn/local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/yarn//local/usercache/xx/appcache/application_151xxx857247_19188/container_1512xxx247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 454, in loads
return pickle.loads(obj)
ImportError: No module named elephas.spark_model
at org.apache.spark.api.python.PythonRunner$$anon.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
此外,模型实际上已创建,我可以做到print(spark_model)
并且会得到这个<elephas.spark_model.SparkModel object at 0x7efce0abfcd0>
。错误发生在 spark_model.train
.
我已经使用 pip2 install git+https://github.com/maxpumperla/elephas
安装了 elephas,也许这是相关的。
我使用 PySpark 2.1.1、Keras 2.1.4 和 Python 2.7。 我用 spark-submit:
试过 运行PYSPARK_DRIVER_PYTHON=`which python` spark-submit --driver-memory 1G filname.py
并且还直接在 Jupyter Notebook 中。两者都会导致同样的问题。
谁能给我指点一下?这个 elephas 是相关的还是 PySpark 的问题?
编辑:我还上传了虚拟环境的 zip 文件并在脚本中调用它:
virtualenv spark_venv --relocatable
cd spark_venv
zip -qr ../spark_venv.zip *
PYSPARK_DRIVER_PYTHON=`which python` spark-submit --driver-memory 1G --py-files spark_venv.zip filename.py
然后在我做的文件中:
sc.addPyFile("spark_venv.zip")
这个keras导入后没有任何问题,但我仍然得到上面的elephas
错误。
您应该添加 elephas
库作为您的 spark-submit
命令的参数。
引用官方指南:
For Python, you can use the
--py-files
argument ofspark-submit
to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg.
我找到了一个关于如何正确地将虚拟环境加载到 master 和所有 slave worker 的解决方案:
virtualenv venv --relocatable
cd venv
zip -qr ../venv.zip *
PYSPARK_PYTHON=./SP/bin/python spark-submit --master yarn --deploy-mode cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./SP/bin/python --driver-memory 4G --archives venv.zip#SP filename.py
GitHub 问题中的更多详细信息: https://github.com/maxpumperla/elephas/issues/80#issuecomment-371073492