将 pandas 数据帧写入 Azure Blob - Python sdk
Write pandas dataframe to Azure Blob - Python sdk
我正在尝试将数据帧作为 csv 上传到 blob。
以下是我的代码:
from azure.storage.blob import BlobClient
sas_url = "https://XXX.blob.core.windows.net/YYYY?sp=r&st=2021-04-26T16:21:37Z&se=2021-04-27T00:21:37Z&spr=" \
"https&sv=2020-02-10&sr=c&sig=lJxx45wdBT%2F5ZJQwPxxxxxxxxx0%3D"
blob_client = BlobClient.from_blob_url(sas_url)
print (blob_client)
blob_client.upload_blob(data=df1.to_csv(index=False))
错误是脸是:
Traceback (most recent call last):
File "C:\xxx\xxx\PycharmProjects\DIF\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-40ff66c54682>", line 1, in <module>
runfile('C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts/SF_ADLS.py', wdir='C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts')
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.4\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.4\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts/SF_ADLS.py", line 99, in <module>
blob_client = BlobClient.from_blob_url(sas_url)
File "C:\Users\User\PycharmProjects\DIF\venv\lib\site-packages\azure\storage\blob\_blob_client.py", line 246, in from_blob_url
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
第二种方法:
通过 python 代码生成 SAS 令牌:
from datetime import datetime, timedelta
from azure.storage.blob import BlobServiceClient, generate_account_sas, ResourceTypes, AccountSasPermissions
import pandas as pd
df1 = pd.read_csv(r'C:\ccc\ccc\AppData\Roaming\JetBrains\PyCharmCE2020.1\scratches\sf_metadata.csv')
sas_token = generate_account_sas(
account_name="acct",
account_key="so1uwLUIrFluxxxxxx38MGpL5XKU/yFNIkiyyyyitQPrWQ==",
resource_types=ResourceTypes(service=True),
permission=AccountSasPermissions(read=True,write=True,delete=True,add=True,create=True,update=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
blob_service_client = BlobServiceClient(account_url="https://acct.blob.core.windows.net", credential=sas_token)
print (sas_token)
blob_client = blob_service_client.get_blob_client('testfs1', 'one', snapshot=None)
blob_client.upload_blob(data=df1.to_csv(index=False))
我遇到的错误:
azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation using this resource type.
RequestId:03e71e74-601e-0022-2f25-3be77a000000
Time:2021-04-27T05:24:51.5741680Z
ErrorCode:AuthorizationResourceTypeMismatch
Error:None
你能告诉我我的代码需要做哪些更改吗?
谢谢。
根据官方文档的定义,你的sas_url
是错误的,少了blob-name
:
https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>?<sas-token>
你可以参考这个example。
你最好在这里生成 SAS Token
:
如果这里生成SAS Token,可能会出现鉴权失败错误:
======================更新============ =======
请更改
resource_types=ResourceTypes(service=True)
到
resource_types=ResourceTypes(object=True)
我正在尝试将数据帧作为 csv 上传到 blob。
以下是我的代码:
from azure.storage.blob import BlobClient
sas_url = "https://XXX.blob.core.windows.net/YYYY?sp=r&st=2021-04-26T16:21:37Z&se=2021-04-27T00:21:37Z&spr=" \
"https&sv=2020-02-10&sr=c&sig=lJxx45wdBT%2F5ZJQwPxxxxxxxxx0%3D"
blob_client = BlobClient.from_blob_url(sas_url)
print (blob_client)
blob_client.upload_blob(data=df1.to_csv(index=False))
错误是脸是:
Traceback (most recent call last):
File "C:\xxx\xxx\PycharmProjects\DIF\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-40ff66c54682>", line 1, in <module>
runfile('C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts/SF_ADLS.py', wdir='C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts')
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.4\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.4\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/xxx/xxx/PycharmProjects/DIF/venv/Scripts/SF_ADLS.py", line 99, in <module>
blob_client = BlobClient.from_blob_url(sas_url)
File "C:\Users\User\PycharmProjects\DIF\venv\lib\site-packages\azure\storage\blob\_blob_client.py", line 246, in from_blob_url
container_name, blob_name = unquote(path_blob[-2]), unquote(path_blob[-1])
IndexError: list index out of range
第二种方法: 通过 python 代码生成 SAS 令牌:
from datetime import datetime, timedelta
from azure.storage.blob import BlobServiceClient, generate_account_sas, ResourceTypes, AccountSasPermissions
import pandas as pd
df1 = pd.read_csv(r'C:\ccc\ccc\AppData\Roaming\JetBrains\PyCharmCE2020.1\scratches\sf_metadata.csv')
sas_token = generate_account_sas(
account_name="acct",
account_key="so1uwLUIrFluxxxxxx38MGpL5XKU/yFNIkiyyyyitQPrWQ==",
resource_types=ResourceTypes(service=True),
permission=AccountSasPermissions(read=True,write=True,delete=True,add=True,create=True,update=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
blob_service_client = BlobServiceClient(account_url="https://acct.blob.core.windows.net", credential=sas_token)
print (sas_token)
blob_client = blob_service_client.get_blob_client('testfs1', 'one', snapshot=None)
blob_client.upload_blob(data=df1.to_csv(index=False))
我遇到的错误:
azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation using this resource type.
RequestId:03e71e74-601e-0022-2f25-3be77a000000
Time:2021-04-27T05:24:51.5741680Z
ErrorCode:AuthorizationResourceTypeMismatch
Error:None
你能告诉我我的代码需要做哪些更改吗? 谢谢。
根据官方文档的定义,你的sas_url
是错误的,少了blob-name
:
https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>?<sas-token>
你可以参考这个example。
你最好在这里生成 SAS Token
:
如果这里生成SAS Token,可能会出现鉴权失败错误:
======================更新============ =======
请更改
resource_types=ResourceTypes(service=True)
到
resource_types=ResourceTypes(object=True)