有没有一种方法可以有效地从中型 DynamoDB 中获取所有结果?

Is there a way to efficiently fetch all the results from a medium sized DynamoDB?

我将 boto3 与 python 一起使用,但我相信问题和逻辑应该在所有语言中都是通用的。

我知道 table.scan() 理论上应该 return 所有记录,但实际上,它们 scan() 结果的大小限制为 1MB。建议基于 LastEvaluatedKey 创建一个 while 循环,但这也不会给我所有结果(15200 而不是 16000),代码在这里:

dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])

items=[]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":  
    print("response:", response["Count"])
    items+=response["Items"]
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])

如何可靠地获取所有记录?

It's recommended to create a while loop based on LastEvaluatedKey, but that also doesn't give me all the results (15200 instead of 16000).

你确定吗?我的猜测是你还有其他事情要发生。我在每天运行的生产代码循环中使用 boto3 和 LastEvaludatedKey,并且从未遇到过不是所有行都被返回的情况 - 并不是说​​这是不可能的,但我首先要确保你的代码是正确的。

编辑,此代码有效:

import boto3

from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('DeadLetterQueue')
response = table.scan()
print("item_count:", table.item_count)

items=response["Items"]
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":  
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    items.extend(response["Items"])


print (len(items))

您遇到的问题与DynamoDB扫描操作无关。这与您的代码有关。最后一次扫描操作未附加到项目数组。

以下是您稍作修改的代码 -

dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table = dynamodb.Table(dBTable)
response = table.scan()
print("item_count:", table.item_count)
print("response1:", response["Count"])

items=response["Items"] // Changed HERE.
while 'LastEvaluatedKey' in response and response['LastEvaluatedKey'] != "":  
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    print("response:", response["Count"])
    items+=response["Items"] // Shifted this line HERE