无法使用 Azure Databricks 安装 Azure Data Lake Storage Gen 2
Unable to mount Azure Data Lake Storage Gen 2 with Azure Databricks
我尝试使用服务主体和 OAuth 2.0 安装 Azure Data Lake Storage Gen2 帐户 here:
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs
)
我使用的服务主体在存储帐户级别具有 Storage Blob Data Contributor
角色,在容器级别也具有 rwx
访问权限。
无论如何,我得到这个错误:
ExecutionError: An error occurred while calling o242.mount.
: HEAD https://<storage-account-name>.dfs.core.windows.net/<file-system-name>?resource=filesystem&timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
我什至尝试使用存储帐户访问密钥直接访问它 here 但没有成功:
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>")
)
dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")
事实是,使用 Azure CLI 我可以毫无问题地与此存储帐户交互:
az login --service-principal --username <application-id> --tenant <directory-id>
az storage container list --account-name <storage-account-name> --auth-mode login
此外,在我的机器上使用 REST API 没问题,但我在集群上获得了一次 AuthorizationFailure
:
from getpass import getpass
import requests
from msal import ConfidentialClientApplication
client_id = "<application-id>"
client_password = getpass()
authority = "https://login.microsoftonline.com/<directory-id>"
scope = ["https://storage.azure.com/.default"]
app = ConfidentialClientApplication(
client_id, authority=authority, client_credential=client_password
)
tokens = app.acquire_token_for_client(scopes=scope)
headers = {
"Authorization": "Bearer " + tokens["access_token"],
"x-ms-version": "2019-07-07" # THIS IS REQUIRED OTHERWISE I GET A 400 RESPONSE
}
endpoint = (
"https://<account-name>.dfs.core.windows.net/<filesystem>//?action=getAccessControl"
)
response = requests.head(endpoint, headers=headers)
print(response.headers)
防火墙设置为仅允许受信任的 Microsoft 服务访问存储帐户。
我是进了黑洞还是有人遇到了与 Databricks 相同的问题?是不是ABFS驱动引起的?
确实是防火墙设置问题。谢谢阿克塞尔 R!
我被误导了,我也有一个具有相同防火墙设置的 ADLS Gen 1 并且没有问题。
但是,细节决定成败。第 1 代防火墙例外允许 所有 Azure 服务访问资源。与此同时,第 2 代仅允许 trusted Azure 服务。
希望对大家有所帮助。
我尝试使用服务主体和 OAuth 2.0 安装 Azure Data Lake Storage Gen2 帐户 here:
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs
)
我使用的服务主体在存储帐户级别具有 Storage Blob Data Contributor
角色,在容器级别也具有 rwx
访问权限。
无论如何,我得到这个错误:
ExecutionError: An error occurred while calling o242.mount.
: HEAD https://<storage-account-name>.dfs.core.windows.net/<file-system-name>?resource=filesystem&timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation.
我什至尝试使用存储帐户访问密钥直接访问它 here 但没有成功:
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>")
)
dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")
事实是,使用 Azure CLI 我可以毫无问题地与此存储帐户交互:
az login --service-principal --username <application-id> --tenant <directory-id>
az storage container list --account-name <storage-account-name> --auth-mode login
此外,在我的机器上使用 REST API 没问题,但我在集群上获得了一次 AuthorizationFailure
:
from getpass import getpass
import requests
from msal import ConfidentialClientApplication
client_id = "<application-id>"
client_password = getpass()
authority = "https://login.microsoftonline.com/<directory-id>"
scope = ["https://storage.azure.com/.default"]
app = ConfidentialClientApplication(
client_id, authority=authority, client_credential=client_password
)
tokens = app.acquire_token_for_client(scopes=scope)
headers = {
"Authorization": "Bearer " + tokens["access_token"],
"x-ms-version": "2019-07-07" # THIS IS REQUIRED OTHERWISE I GET A 400 RESPONSE
}
endpoint = (
"https://<account-name>.dfs.core.windows.net/<filesystem>//?action=getAccessControl"
)
response = requests.head(endpoint, headers=headers)
print(response.headers)
防火墙设置为仅允许受信任的 Microsoft 服务访问存储帐户。
我是进了黑洞还是有人遇到了与 Databricks 相同的问题?是不是ABFS驱动引起的?
确实是防火墙设置问题。谢谢阿克塞尔 R!
我被误导了,我也有一个具有相同防火墙设置的 ADLS Gen 1 并且没有问题。
但是,细节决定成败。第 1 代防火墙例外允许 所有 Azure 服务访问资源。与此同时,第 2 代仅允许 trusted Azure 服务。
希望对大家有所帮助。