从 s3 读取时出现溢出错误 - 有符号整数大于最大值
Overflowerror when reading from s3 - signed integer is greater than maximum
使用以下代码将大文件从 S3 (>5GB) 读取到 lambda 中:
import json
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
response = s3.get_object(
Bucket="my-bucket",
Key="my-key"
)
text_bytes = response['Body'].read()
...
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
但是我收到以下错误:
"errorMessage": "signed integer is greater than maximum"
"errorType": "OverflowError"
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 13, in lambda_handler\n text_bytes = response['Body'].read()\n"
" File \"/var/runtime/botocore/response.py\", line 77, in read\n chunk = self._raw_stream.read(amt)\n"
" File \"/var/runtime/urllib3/response.py\", line 515, in read\n data = self._fp.read() if not fp_closed else b\"\"\n"
" File \"/var/lang/lib/python3.8/http/client.py\", line 472, in read\n s = self._safe_read(self.length)\n"
" File \"/var/lang/lib/python3.8/http/client.py\", line 613, in _safe_read\n data = self.fp.read(amt)\n"
" File \"/var/lang/lib/python3.8/socket.py\", line 669, in readinto\n return self._sock.recv_into(b)\n"
" File \"/var/lang/lib/python3.8/ssl.py\", line 1241, in recv_into\n return self.read(nbytes, buffer)\n"
" File \"/var/lang/lib/python3.8/ssl.py\", line 1099, in read\n return self._sslobj.read(len, buffer)\n"
]
我正在使用 Python 3.8,我在这里发现了 Python 3.8/9 的问题,这可能是原因:https://bugs.python.org/issue42853
有什么解决办法吗?
如您链接到的错误中所述,Python 3.8 中的核心问题是一次读取超过 1gb 的错误。您可以使用错误中建议的解决方法的变体来分块读取文件。
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
response = s3.get_object(
Bucket="-example-bucket-",
Key="path/to/key.dat"
)
buf = bytearray(response['ContentLength'])
view = memoryview(buf)
pos = 0
while True:
chunk = response['Body'].read(67108864)
if len(chunk) == 0:
break
view[pos:pos+len(chunk)] = chunk
pos += len(chunk)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
然而,充其量,每个 Lambda 运行 您将花费一分钟或更多时间来读取 S3。如果您可以将文件存储在 EFS 中并在 Lambda 中从那里读取它,或者使用其他解决方案(如 ECS)来避免从远程数据源读取,那就更好了。
使用以下代码将大文件从 S3 (>5GB) 读取到 lambda 中:
import json
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
response = s3.get_object(
Bucket="my-bucket",
Key="my-key"
)
text_bytes = response['Body'].read()
...
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
但是我收到以下错误:
"errorMessage": "signed integer is greater than maximum"
"errorType": "OverflowError"
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 13, in lambda_handler\n text_bytes = response['Body'].read()\n"
" File \"/var/runtime/botocore/response.py\", line 77, in read\n chunk = self._raw_stream.read(amt)\n"
" File \"/var/runtime/urllib3/response.py\", line 515, in read\n data = self._fp.read() if not fp_closed else b\"\"\n"
" File \"/var/lang/lib/python3.8/http/client.py\", line 472, in read\n s = self._safe_read(self.length)\n"
" File \"/var/lang/lib/python3.8/http/client.py\", line 613, in _safe_read\n data = self.fp.read(amt)\n"
" File \"/var/lang/lib/python3.8/socket.py\", line 669, in readinto\n return self._sock.recv_into(b)\n"
" File \"/var/lang/lib/python3.8/ssl.py\", line 1241, in recv_into\n return self.read(nbytes, buffer)\n"
" File \"/var/lang/lib/python3.8/ssl.py\", line 1099, in read\n return self._sslobj.read(len, buffer)\n"
]
我正在使用 Python 3.8,我在这里发现了 Python 3.8/9 的问题,这可能是原因:https://bugs.python.org/issue42853
有什么解决办法吗?
如您链接到的错误中所述,Python 3.8 中的核心问题是一次读取超过 1gb 的错误。您可以使用错误中建议的解决方法的变体来分块读取文件。
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
response = s3.get_object(
Bucket="-example-bucket-",
Key="path/to/key.dat"
)
buf = bytearray(response['ContentLength'])
view = memoryview(buf)
pos = 0
while True:
chunk = response['Body'].read(67108864)
if len(chunk) == 0:
break
view[pos:pos+len(chunk)] = chunk
pos += len(chunk)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
然而,充其量,每个 Lambda 运行 您将花费一分钟或更多时间来读取 S3。如果您可以将文件存储在 EFS 中并在 Lambda 中从那里读取它,或者使用其他解决方案(如 ECS)来避免从远程数据源读取,那就更好了。