Azure 函数 Python 写入 Azure DataLake Gen2
Azure Function Python write to Azure DataLake Gen2
我想使用 Azure Function 和 Python.
将文件写入我的 Azure DataLake Gen2
很遗憾,我遇到了以下身份验证问题:
Exception: ClientAuthenticationError: (InvalidAuthenticationInfo)
Server failed to authenticate the request. Please refer to the
information in the www-authenticate header.
'WWW-Authenticate': 'REDACTED'
我的帐户和 Function 应用程序都应该具有访问我分配的 DataLake 的必要角色。
这是我的功能:
import datetime
import logging
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)
file_system_client = service_client.get_file_system_client(file_system="temp")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
logging.info('Python timer trigger function ran at %s', utc_timestamp)
我错过了什么?
THX 和 BR
彼得
问题似乎来自 DefaultAzureCredential。
DefaultAzureCredential 使用的标识取决于环境。当需要访问令牌时,它会依次使用这些身份请求一个,当一个提供令牌时停止:
1. A service principal configured by environment variables.
2. An Azure managed identity.
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.
事实上,您完全可以在不使用默认凭据的情况下生成数据湖服务对象。您可以这样做(直接使用连接字符串连接):
import logging
import datetime
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
service_client = DataLakeServiceClient.from_connection_string(connect_str)
file_system_client = service_client.get_file_system_client(file_system="test")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
return func.HttpResponse(
"Test.",
status_code=200
)
另外,为了保证数据写入顺利,请检查您的datalake是否有访问限制。
Bowman Zhu 建议的函数有错误。根据 Azure documentation 参数“length”需要以字节为单位的长度。但是,建议的函数使用字符长度。其中一些字符可能由多个字节组成。在这种情况下,该函数不会将 file_contents 的所有字节写入文件,从而导致数据丢失!
因此,
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
必须是这样的:
length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)
我想使用 Azure Function 和 Python.
将文件写入我的 Azure DataLake Gen2很遗憾,我遇到了以下身份验证问题:
Exception: ClientAuthenticationError: (InvalidAuthenticationInfo) Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
'WWW-Authenticate': 'REDACTED'
我的帐户和 Function 应用程序都应该具有访问我分配的 DataLake 的必要角色。
这是我的功能:
import datetime
import logging
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)
file_system_client = service_client.get_file_system_client(file_system="temp")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
logging.info('Python timer trigger function ran at %s', utc_timestamp)
我错过了什么?
THX 和 BR
彼得
问题似乎来自 DefaultAzureCredential。
DefaultAzureCredential 使用的标识取决于环境。当需要访问令牌时,它会依次使用这些身份请求一个,当一个提供令牌时停止:
1. A service principal configured by environment variables.
2. An Azure managed identity.
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.
事实上,您完全可以在不使用默认凭据的情况下生成数据湖服务对象。您可以这样做(直接使用连接字符串连接):
import logging
import datetime
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
service_client = DataLakeServiceClient.from_connection_string(connect_str)
file_system_client = service_client.get_file_system_client(file_system="test")
directory_client = file_system_client.get_directory_client("test")
file_client = directory_client.create_file("uploaded-file.txt")
file_contents = 'some data'
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
return func.HttpResponse(
"Test.",
status_code=200
)
另外,为了保证数据写入顺利,请检查您的datalake是否有访问限制。
Bowman Zhu 建议的函数有错误。根据 Azure documentation 参数“length”需要以字节为单位的长度。但是,建议的函数使用字符长度。其中一些字符可能由多个字节组成。在这种情况下,该函数不会将 file_contents 的所有字节写入文件,从而导致数据丢失!
因此,
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
必须是这样的:
length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)