如何消费blob文件？

Question

我有问题，我正在将 xlsx 文件上传到 google 存储。当我以后想重用它们时，我获得了一个 blob 文件。

在那之后我不知道如何使用实际的 xlsx 文件。

from google.cloud import storage

import openpyxl

client = storage.Client()
new_bucket = client.get_bucket('bucket.appspot.com')

#get blob object:
o = new_bucket.get_blob('old_version.xlsx')

# <Blob: blobstorage.appspot.com, old_version.xlsx, 16372393787851916>

#download the object

bytes_version = o.download_as_bytes()

#load it to openpyxl library
wb = load_workbook(filename = bytes_version ,data_only=True)

InvalidFileException: openpyxl does not support b'.xmlpk\x05\x06\x00\x00\x00\x00:\x00:\x00n\x10\x00\x00\xa6\x06\x01\x00\x00\x00' file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm

最终目标是将文件作为对象下载并使用 openpyxl 库读取它们（它与原始文件一起使用但在云端存储后找不到获取我的 xlsx 文件的方法）。

感谢帮助！

编辑：添加当前代码

Answer 1

您的代码正在将云存储 blob 读入内存：

bytes_version = o.download_as_bytes()

然后尝试从内存中加载工作簿：

wb = load_workbook(filename = bytes_version ,data_only=True)

但是，load_workbook() 方法需要文件名或类似文件的对象。 不支持在文件内容中使用字节字符串。

openpyxl.reader.excel.load_workbook(filename, read_only=False, keep_vba=False, data_only=False, keep_links=True)

Parameters:
filename (string or a file-like object open in binary mode c.f., zipfile.ZipFile) – the path to open or a file-like object

Documentation

解法：

首先将云存储 blob 保存到本地磁盘文件，然后在调用 load_workbook():

时指定文件名

o.download_to_filename('/path/to/file')
wb = load_workbook(filename = '/path/to/file' ,data_only=True)

注意：将 /path/to/file 替换为系统上的真实路径和 .xlsx 文件扩展名。

Answer 2

它应该像（假设Python3）一样简单：

import io  # Python3
wb = load_workbook(io.BytesIO(bytes_version))

如何消费blob文件？

How to consume blob file?

python

excel

blob

google-cloud-storage