有没有一种方法可以有效地从中型 DynamoDB 中获取所有结果?
Is there a way to efficiently fetch all the results from a medium sized DynamoDB?
我将 boto3 与 python 一起使用,但我相信问题和逻辑应该在所有语言中都是通用的。
我知道 table.scan()
理论上应该 return 所有记录,但实际上,它们 scan() 结果的大小限制为 1MB。建议基于 LastEvaluatedKey
创建一个 while 循环,但这也不会给我所有结果(15200 而不是 16000),代码在这里:
dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])
items=[]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
print("response:", response["Count"])
items+=response["Items"]
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
如何可靠地获取所有记录?
It's recommended to create a while loop based on LastEvaluatedKey, but that also doesn't give me all the results (15200 instead of 16000).
你确定吗?我的猜测是你还有其他事情要发生。我在每天运行的生产代码循环中使用 boto3 和 LastEvaludatedKey,并且从未遇到过不是所有行都被返回的情况 - 并不是说这是不可能的,但我首先要确保你的代码是正确的。
编辑,此代码有效:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('DeadLetterQueue')
response = table.scan()
print("item_count:", table.item_count)
items=response["Items"]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
items.extend(response["Items"])
print (len(items))
您遇到的问题与DynamoDB扫描操作无关。这与您的代码有关。最后一次扫描操作未附加到项目数组。
以下是您稍作修改的代码 -
dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])
items=response["Items"] // Changed HERE.
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
print("response:", response["Count"])
items+=response["Items"] // Shifted this line HERE
我将 boto3 与 python 一起使用,但我相信问题和逻辑应该在所有语言中都是通用的。
我知道 table.scan()
理论上应该 return 所有记录,但实际上,它们 scan() 结果的大小限制为 1MB。建议基于 LastEvaluatedKey
创建一个 while 循环,但这也不会给我所有结果(15200 而不是 16000),代码在这里:
dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])
items=[]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
print("response:", response["Count"])
items+=response["Items"]
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
如何可靠地获取所有记录?
It's recommended to create a while loop based on LastEvaluatedKey, but that also doesn't give me all the results (15200 instead of 16000).
你确定吗?我的猜测是你还有其他事情要发生。我在每天运行的生产代码循环中使用 boto3 和 LastEvaludatedKey,并且从未遇到过不是所有行都被返回的情况 - 并不是说这是不可能的,但我首先要确保你的代码是正确的。
编辑,此代码有效:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('DeadLetterQueue')
response = table.scan()
print("item_count:", table.item_count)
items=response["Items"]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
items.extend(response["Items"])
print (len(items))
您遇到的问题与DynamoDB扫描操作无关。这与您的代码有关。最后一次扫描操作未附加到项目数组。
以下是您稍作修改的代码 -
dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])
items=response["Items"] // Changed HERE.
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
print("response:", response["Count"])
items+=response["Items"] // Shifted this line HERE