Azure Databricks:无法连接到 Azure Data Lake Storage Gen2

Azure Databricks: can't connect to Azure Data Lake Storage Gen2

我有存储帐户 kagsa1,里面有容器 cont1,需要它可以通过 Databricks 访问(安装)

如果我在 KeyVault 中使用存储帐户密钥,它会正常工作:

configs = {
    "fs.azure.account.key.kagsa1.blob.core.windows.net":dbutils.secrets.get(scope = "kv-db1", key = "storage-account-access-key")
}

dbutils.fs.mount(
  source = "wasbs://cont1@kagsa1.blob.core.windows.net",
  mount_point = "/mnt/cont1",
  extra_configs = configs)

dbutils.fs.ls("/mnt/cont1")

..但如果我尝试使用 Azure Active Directory 凭据进行连接:

configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

dbutils.fs.ls("abfss://cont1@kagsa1.dfs.core.windows.net/")

..失败:

ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: GET https://kagsa1.dfs.core.windows.net/cont1?resource=filesystem&maxResults=5000&timeout=90&recursive=false
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.

Databrics Workspace 层级为高级,
群集启用了 Azure Data Lake Storage 凭据直通选项,
存储帐户启用了分层命名空间选项,
文件系统已用

初始化
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://cont1@kagsa1.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")

并且我可以完全访问存储帐户中的容器:

我做错了什么?

注意:执行将应用程序分配给角色中的步骤时,确保分配 Storage Blob Data Contributor 角色给服务负责人。

作为重现的一部分,我已经向服务主体提供 owner 权限并尝试 运行 “dbutils.fs.ls( "mnt/azure/")”,返回与上面相同的错误消息。

现在将 Storage Blob Data Contributor 角色分配给服务主体。

最终,在将 Storage Blob Data Contributor 角色分配给服务主体后,能够获得没有任何错误消息的输出。

详情请参考“Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark”。

参考: Azure Databricks - ADLS Gen2 throws 403 error message.