如何在 DataBricks 中开发 Python 库而不在每次更改后打包和安装?
How to develop Python Library in DataBricks without packaging and installing after every single change?
为简单起见,假设我有 2 个 Python 脚本。 1 是主要的,1 是库。我的问题是如何在 main 中测试我的库而不需要每次都构建和安装库?
单个文件可以像这里回答的那样轻松完成()。我有嵌套库怎么办?
我们的想法是像在 Jupyter 实验室中一样在 DataBricks 中进行开发。
有两种方法:
使用%run
(doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run
cell. Full example of this approach could be found in this file.
使用Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py
(right now you can't use notebooks), and then you include it as a "normal" Python package using import
command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example的新功能:
%load_ext autoreload
%autoreload 2
第二种方法有更多优势,因为它允许获取代码,例如,从中构建一个库,或应用更多代码检查,这对于开箱即用的笔记本是不可能的。
P.S。 My repository 展示了如何使用 Databricks Repos 并在来自 CI/CD 管道
的笔记本中执行代码测试的完整示例
为简单起见,假设我有 2 个 Python 脚本。 1 是主要的,1 是库。我的问题是如何在 main 中测试我的库而不需要每次都构建和安装库?
单个文件可以像这里回答的那样轻松完成(
我们的想法是像在 Jupyter 实验室中一样在 DataBricks 中进行开发。
有两种方法:
使用
%run
(doc) to include the "library" notebook into "main" notebook. You need to re-execute that%run
cell. Full example of this approach could be found in this file.使用Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding
__init__.py
(right now you can't use notebooks), and then you include it as a "normal" Python package usingimport
command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example的新功能:
%load_ext autoreload
%autoreload 2
第二种方法有更多优势,因为它允许获取代码,例如,从中构建一个库,或应用更多代码检查,这对于开箱即用的笔记本是不可能的。
P.S。 My repository 展示了如何使用 Databricks Repos 并在来自 CI/CD 管道
的笔记本中执行代码测试的完整示例