从 s3 读取 .pptx 文件

Question

我尝试从 Amazon S3 打开一个 .pptx 并使用 python-pptx 库阅读它。这是代码：

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))

它给出“AttributeError：'StreamingBody' 对象没有属性 'seek'”。这不应该工作吗？我怎样才能解决这个问题？我也尝试先在 body 上使用 read() 。有没有真正下载文件的解决方案？

Answer 1

要从 S3 加载文件，您应该下载（或使用流策略）并使用 io.BytesIO 转换您的数据，因为 pptx.Presentation 可以处理。

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))

参考：

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations. journaldev

从 s3 读取 .pptx 文件

Read .pptx file from s3

python

amazon-s3

boto3

python-pptx