Boto3 DynamoDb Query with Select Count without pagination

Question

这更像是一个概念澄清。我可以通过使用先前响应的 LastEvaluatedKey 重复查询使用 Boto3 找到实际计数。

我想计算 dynamoDb 中符合特定条件的项目。我正在使用 "select = count"，根据文档 [1]，它应该只是 return 匹配项目的计数，我的假设 响应不会被分页.

COUNT - Returns the number of matching items, rather than the matching items themselves.

当我通过 aws-cli 尝试时，我的假设似乎是正确的，（就像文档 [1] 中的其余 api 示例一样）

    aws dynamodb query \
    --table-name 'my-table' \
    --index-name 'classification-date-index' \
    --key-condition-expression 'classification = :col AND #dt BETWEEN :start AND :end' \
    --expression-attribute-values '{":col" : {"S":"INTERNAL"}, ":start" : {"S": "2020-04-10"}, ":end" : {"S": "2020-04-25"}}' \
    --expression-attribute-names '{"#dt" : "date"}' \
    --select 'COUNT'
 {
      "Count": 18817,
      "ScannedCount": 18817,
      "ConsumedCapacity": null
  }

但是当我尝试使用 Python3 和 Boto3 时，响应是分页的，我必须重复查询直到 LastEvaluatedKey 为空。

In [22]: table.query(IndexName='classification-date-index', Select='COUNT', KeyConditionExpression= Key('classification').eq('INTERNAL') & Key('date').between('2020-04-10', '2020-04-25'))

Out[22]:
{'Count': 5667,
 'ScannedCount': 5667,
 'LastEvaluatedKey': {'classification': 'INTERNAL',
  'date': '2020-04-14',
  's3Path': '<redacted>'},
 'ResponseMetadata': {'RequestId': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 25 Apr 2020 13:32:36 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '230',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '133035383'},
  'RetryAttempts': 0}}

我期望 Boto3 sdk 的行为与 aws cli 相同，因为响应似乎小于 1mb。文档略有冲突...

"Paginating Table Query Results" [2] 页说：

DynamoDB paginates the results from Query operations. With pagination, the Query results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. A single Query only returns a result set that fits within the 1 MB size limit.

而 "Query" [1] 页面显示：

A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.

[1] https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

[2] https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html

Answer 1

我自己就运行解决了这个问题。 AWS CLI 对来自 DynamoDB 查询的页面进行自动汇总。要阻止它这样做，请将 --no-paginate 添加到 this page

中列出的命令中

Boto3 DynamoDb Query with Select Count without pagination

Boto3 DynamoDb Query with Select Count without pagination

amazon-dynamodb

aws-cli

boto3

dynamodb-queries