DynamoDB Table.scan 有和没有分页

Question

我试图了解以下两个代码段之间的区别。一个使用页面来获取扫描结果，而第二个则不使用。我想知道如果数据库中的项目总数非常大，第二种方法是否可行？ AWS 文档说扫描结果限制为 1 Mb。这对版本 2 有何影响？它只会获得前 1 MB 的结果，还是会在每页之后仍进行数据库调用？

请注意，我使用的是 table.scan API，这与 DynamoDBClient.scan api 不同。有关 API 详细信息，请参阅 http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/Table.html。

版本 1（使用页面）：

            ItemCollection<ScanOutcome> items = table.scan(spec);
            items.pages().forEach(page -> {
                for (Item item : page) {
                    response.add(item);
                }
            });

版本 2（遍历没有页面的项目）：

            ItemCollection<ScanOutcome> items = table.scan(spec);
            for (Item item : items) {
                    response.add(item);
            }

Answer 1

我做了一个实验，我创建了 1000 条记录，每条记录大小为 5kb。然后我使用版本 2 扫描 table 并仍然得到所有 1000 条记录，尽管总大小显然 > 1mb。两个版本都扫描了整个table，所以看起来没有什么区别。似乎 ItemCollection 为你处理了分页，没有必要使用页面，除非你想控制网络调用和页面大小。

Answer 2

Tofig 是正确的。这两种方法之间没有区别。关于扫描结果限制为 1 MB 的声明仅适用于 low-level API 不适用于文档 API.
来自 ItemCollection

的文档

A collection of Item's. An ItemCollection object maintains a cursor pointing to its current pages of data. Initially the cursor is positioned before the first page. The next method moves the cursor to the next row, and because it returns false when there are no more rows in the ItemCollection object, it can be used in a while loop to iterate through the collection. Network calls can be triggered when the collection is iterated across page boundaries.

DynamoDB Table.scan 有和没有分页

DynamoDB Table.scan with and without pagination

java

pagination

amazon-dynamodb

aws-sdk