I/O 使用 Azure Databricks REST 作业的操作 API

Question

我想按以下方式使用 REST 作业 API 执行 Azure Databricks notebook 的内容：

将一组 key:value 个参数传递给笔记本的 PySpark 上下文
根据参数

对于第 1 点，我使用以下内容（按照文档 here 的建议）：

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path"}, "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

为了解决第 2 点，我尝试了以下方法但没有成功：

2.1。方法 1：input = spark.conf.get("base_parameters", "default")

2.2。方法 2：input = spark.sparkContext.getConf().getAll()

2.3。方法 3:

a = dbutils.widgets.getArgument("input_multiple_polygons", "default")

b = dbutils.widgets.getArgument("input_date_start", "default")

c = dbutils.widgets.getArgument("input_date_end", "default")

input = [a,b,c]

2.4。方法 4（根据官方文档 here）：

a = dbutils.widgets.get("input_multiple_polygons")

b = dbutils.widgets.get("input_date_start")

c = dbutils.widgets.get("input_date_end")

input = [a,b,c]

REST 作业端点工作正常并且执行成功，但是，概述的四种方法中的 none 似乎能够将参数传递给 PySpark 上下文。

我确定我在 curl 部分或 args 检索部分做错了什么，但我无法确定问题所在。谁能指出问题出在哪里？

Answer 1

您似乎没有将 base_parameter 作为元素包含在 notebook_task 中。你能试试下面的东西吗？我假设您正在为 base_parameters 传递正确的值，因为示例共享显示参数值与参数名称相同。

curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path", "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit

识别它的外观的简单方法是使用 UI 定义作业并使用 api/2.0/jobs/get?job_id=<jobId> 查看 JSON 响应.

I/O 使用 Azure Databricks REST 作业的操作 API

I/O operations with Azure Databricks REST Jobs API

databricks

azure-databricks