如何在 DataBricks 中开发 Python 库而不在每次更改后打包和安装？

Question

为简单起见，假设我有 2 个 Python 脚本。 1 是主要的，1 是库。我的问题是如何在 main 中测试我的库而不需要每次都构建和安装库？

单个文件可以像这里回答的那样轻松完成（）。我有嵌套库怎么办？

我们的想法是像在 Jupyter 实验室中一样在 DataBricks 中进行开发。

Answer 1

有两种方法：

使用%run (doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run cell. Full example of this approach could be found in this file.
使用Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py (right now you can't use notebooks), and then you include it as a "normal" Python package using import command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example的新功能：

%load_ext autoreload
%autoreload 2

第二种方法有更多优势，因为它允许获取代码，例如，从中构建一个库，或应用更多代码检查，这对于开箱即用的笔记本是不可能的。

P.S。 My repository 展示了如何使用 Databricks Repos 并在来自 CI/CD 管道

的笔记本中执行代码测试的完整示例

How to develop Python Library in DataBricks without packaging and installing after every single change?