如何在 DataBricks 中开发 Python 库而不在每次更改后打包和安装?

How to develop Python Library in DataBricks without packaging and installing after every single change?

为简单起见,假设我有 2 个 Python 脚本。 1 是主要的,1 是库。我的问题是如何在 main 中测试我的库而不需要每次都构建和安装库?

单个文件可以像这里回答的那样轻松完成()。我有嵌套库怎么办?

我们的想法是像在 Jupyter 实验室中一样在 DataBricks 中进行开发。

有两种方法:

  1. 使用%run (doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run cell. Full example of this approach could be found in this file.

  2. 使用Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py (right now you can't use notebooks), and then you include it as a "normal" Python package using import command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example的新功能:

%load_ext autoreload
%autoreload 2

第二种方法有更多优势,因为它允许获取代码,例如,从中构建一个库,或应用更多代码检查,这对于开箱即用的笔记本是不可能的。

P.S。 My repository 展示了如何使用 Databricks Repos 并在来自 CI/CD 管道

的笔记本中执行代码测试的完整示例