在 Elasticsearch 中获取每个存储桶的平均文档数的最佳方法是什么?
What is the best way to get the average number of documents per bucket in Elasticsearch?
假设我们是帽子制造商并且有一个 Elasticsearch 索引,其中每个文档对应一顶帽子的销售。销售记录的一部分是出售这顶帽子的商店的名称。我想找出每家商店售出的帽子数量,以及所有商店售出的平均帽子数量。我想出的最好方法是这个搜索:
GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
},
"aggs": {
"sales_count": {
"cardinality": {
"field": "_id"
}
}
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>sales_count"
}
}
}
}
(旁白: 我将大小设置为 65536,因为这是默认的最大桶数。)
此查询的问题在于 sales_count
聚合执行冗余计算:每个 stores
桶已经有一个 doc_count
属性。但是如何在存储桶路径中访问此 doc_count
?
我想这就是你想要的
PUT hat_sales
{
"mappings": {
"properties": {
"storename": {
"type": "keyword"
}
}
}
}
POST hat_sales/_bulk?refresh=true
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "bar"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>_count"
}
}
}
}
到达 _count
的路径是 stores>_count
结果如下:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"stores" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "baz",
"doc_count" : 3
},
{
"key" : "foo",
"doc_count" : 2
},
{
"key" : "bar",
"doc_count" : 1
}
]
},
"average_sales_count" : {
"value" : 2.0
}
}
}
假设我们是帽子制造商并且有一个 Elasticsearch 索引,其中每个文档对应一顶帽子的销售。销售记录的一部分是出售这顶帽子的商店的名称。我想找出每家商店售出的帽子数量,以及所有商店售出的平均帽子数量。我想出的最好方法是这个搜索:
GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
},
"aggs": {
"sales_count": {
"cardinality": {
"field": "_id"
}
}
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>sales_count"
}
}
}
}
(旁白: 我将大小设置为 65536,因为这是默认的最大桶数。)
此查询的问题在于 sales_count
聚合执行冗余计算:每个 stores
桶已经有一个 doc_count
属性。但是如何在存储桶路径中访问此 doc_count
?
我想这就是你想要的
PUT hat_sales
{
"mappings": {
"properties": {
"storename": {
"type": "keyword"
}
}
}
}
POST hat_sales/_bulk?refresh=true
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "bar"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>_count"
}
}
}
}
到达 _count
的路径是 stores>_count
结果如下:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"stores" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "baz",
"doc_count" : 3
},
{
"key" : "foo",
"doc_count" : 2
},
{
"key" : "bar",
"doc_count" : 1
}
]
},
"average_sales_count" : {
"value" : 2.0
}
}
}