有没有办法在 Google Cloud Platform dataproc 集群上从 Jupyter notebook 运行ning 中保存的 .py 文件导入和运行函数？

Question

当运行 Jupyter notebook 原生时，从保存的 .py 脚本导入函数和实用程序很简单。

当我在运行 Cloud Platform dataproc 集群上的 Jupyter notebook 运行上工作并尝试同样的事情时-（在将 .py 脚本上传到我的 dataproc Jupyter notebook 之后-它是因此在云中***) 我无法将函数导入 (dataproc) notebook。

有谁知道我该怎么做？它是否只与找出正确但不明显的路径有关？（我正在尝试从与 Jupyter notebook 相同的文件夹中导入一个 .py 文件，所以如果这是运行本机它不需要路径，但也许它与 dataproc 不同？

*** 我尝试将 desktop/native .py 脚本导入 GC dataproc notebook 并没有犯错。

任何帮助或线索将不胜感激！

Answer 1

很遗憾，这不受支持。但是您可以下载 .py 文件然后导入，作为解决方法 - 详细信息可以在类似问题的答案中找到： Dataproc import python module stored in google cloud storage (gcs) bucket.

Answer 2

如果您使用的是 PySpark 内核，您可以将依赖项添加到 sparkContext。

spark.sparkContext.addPyFile(f'gs://{your_bucket}/{path_to_file}/dependencies.zip')

您的 dependencies.zip 将包含一个包含所有 .py 脚本和 __init__.py:

的文件夹

dependencies/
├── __init__.py
└── my_script.py

然后您可以使用

导入所有依赖项

import dependencies

或使用

导入单个依赖项

from dependencies.my_script import my_class

PS：对 dependencies.zip 的任何更改都不会反映在您的导入中，您必须重新启动 PySpark 内核才能使用更新的脚本。

有没有办法在 Google Cloud Platform dataproc 集群上从 Jupyter notebook 运行ning 中保存的 .py 文件导入和运行函数？

Is there a way to import and run functions from saved .py files in a Jupyter notebook running on a Google Cloud Platform dataproc cluster?

python

cluster-computing

google-cloud-platform

google-cloud-dataproc

jupyter-notebook

有没有办法在 Google Cloud Platform dataproc 集群上从 Jupyter notebook 运行ning 中保存的 .py 文件导入和 运行 函数？

Is there a way to import and run functions from saved .py files in a Jupyter notebook running on a Google Cloud Platform dataproc cluster?

python

cluster-computing

google-cloud-platform

google-cloud-dataproc

jupyter-notebook

有没有办法在 Google Cloud Platform dataproc 集群上从 Jupyter notebook 运行ning 中保存的 .py 文件导入和运行函数？