嵌套聚合 ElasticSearch 的问题:在最大值之后求和
Issue with nested aggregations ElasticSearch : doing a sum after a max
我知道指标聚合无法进行子聚合,而且 Elasticsearch 支持使用存储桶进行子聚合。但我对如何做到这一点有点迷茫。
我想在嵌套聚合之后和按最大时间戳聚合之后进行求和。
类似于下面的代码,给我这个错误:"Aggregator [max_date_aggs] of type [max] cannot accept sub-aggregations" 这是正常的。有什么办法让它起作用吗?
{
"aggs": {
"sender_comp_aggs": {
"terms": {
"field": "senderComponent"
},
"aggs": {
"activity_mnemo_aggs": {
"terms": {
"field": "activityMnemo"
},
"aggs": {
"activity_instance_id_aggs": {
"terms": {
"field": "activityInstanceId"
},
"aggs": {
"business_date_aggs": {
"terms": {
"field": "correlationIdSet.businessDate"
},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {
"field": "contextSetId.closing"
},
"aggs": {
"max_date_aggs": {
"max": {
"field": "timestamp"
},
"aggs" : {
"sum_done": {
"sum": {
"field": "itemNumberDone"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
谢谢
我不是 100% 确定您想要实现的目标,如果您也能分享映射会有所帮助。
桶聚合是关于定义 buckets/groups。正如您在示例中所做的那样,您可以 wrap/nest 桶聚合以进一步将您的桶分解为子桶等。
默认情况下,Elasticsearch 始终计算计数指标,但您也可以指定其他指标进行计算。指标是针对每个桶/针对一个桶(而不是针对另一个指标)计算的,这就是为什么您不能将指标聚合嵌套在指标聚合下的原因,这根本没有意义。
根据您的数据看起来如何,您可能需要做的唯一更改是将 sum_done
聚合从 aggs
子句中移出,与您的 max_date_aggs
-聚合。
代码段
"aggs": {
"max_date_aggs": { "max": {"field": "timestamp"} },
"sum_done": { "sum": { "field": "itemNumberDone"} }
}
在您细化了您的问题并提供了我设法想出一个只需要一个请求的解决方案之后。如前所述,sum
-metric 聚合需要对 bucket 而不是 metric 进行操作。解决方案非常简单:无需计算 max
-日期,只需将此聚合重新制定为 terms
-聚合,按降序时间戳排序,只要求一个桶。
解决方案
GET gos_element/_search
{
"size": 0,
"aggs": {
"sender_comp_aggs": {
"terms": {"field": "senderComponent.keyword"},
"aggs": {
"activity_mnemo_aggs": {
"terms": {"field": "activityMnemo.keyword"},
"aggs": {
"activity_instance_id_aggs": {
"terms": {"field": "activityInstanceId.keyword"},
"aggs": {
"business_date_aggs": {
"terms": {"field": "correlationIdSet.businessDate"},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {"field": "contextSetId.closing.keyword"},
"aggs": {
"max_date_bucket_aggs": {
"terms": {
"field": "timestamp",
"size": 1,
"order": {"_key": "desc"}
},
"aggs": {
"sum_done": {
"sum": {"field": "itemNumberDone"}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
由于我依赖默认的 Elasticsearch 映射,因此我不得不参考字段的 .keyword
版本。如果您的字段直接映射到 keyword
类型的字段,则不需要这样做。
您可以使用以下 2 个命令对您提供的文档进行索引后立即测试上述请求:
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz
{
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz8z
{
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
结果您得到以下响应(预期值为 10):
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sender_comp_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PS",
"doc_count" : 2,
"activity_mnemo_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PScommand",
"doc_count" : 2,
"activity_instance_id_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123466",
"doc_count" : 2,
"business_date_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1580083200000,
"key_as_string" : "2020-01-27T00:00:00.000Z",
"doc_count" : 2,
"context_set_id_closing_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PARIS",
"doc_count" : 2,
"max_date_bucket_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 1580180460000,
"key_as_string" : "2020-01-28T03:01:00.000Z",
"doc_count" : 1,
"sum_done" : {
"value" : 10.0
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
这里有两个文档:
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz-o",
"_score": 5.785128,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
},
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz8z",
"_score": 4.8696175,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
}
]
}
我想聚合几个术语(senderComponent、activityMnemo、activityInstanceId、correlationIdSet.businessDate 和 contextSetId.closing),并聚合每个聚合的最大时间戳。完成后,我想对 itemNumberDone 求和。
如果我们只获取这两个文档并进行聚合,我想得到 10 itemNumberDone。
是否可以只使用一个查询并使用存储桶?
我知道指标聚合无法进行子聚合,而且 Elasticsearch 支持使用存储桶进行子聚合。但我对如何做到这一点有点迷茫。
我想在嵌套聚合之后和按最大时间戳聚合之后进行求和。
类似于下面的代码,给我这个错误:"Aggregator [max_date_aggs] of type [max] cannot accept sub-aggregations" 这是正常的。有什么办法让它起作用吗?
{
"aggs": {
"sender_comp_aggs": {
"terms": {
"field": "senderComponent"
},
"aggs": {
"activity_mnemo_aggs": {
"terms": {
"field": "activityMnemo"
},
"aggs": {
"activity_instance_id_aggs": {
"terms": {
"field": "activityInstanceId"
},
"aggs": {
"business_date_aggs": {
"terms": {
"field": "correlationIdSet.businessDate"
},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {
"field": "contextSetId.closing"
},
"aggs": {
"max_date_aggs": {
"max": {
"field": "timestamp"
},
"aggs" : {
"sum_done": {
"sum": {
"field": "itemNumberDone"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
谢谢
我不是 100% 确定您想要实现的目标,如果您也能分享映射会有所帮助。
桶聚合是关于定义 buckets/groups。正如您在示例中所做的那样,您可以 wrap/nest 桶聚合以进一步将您的桶分解为子桶等。
默认情况下,Elasticsearch 始终计算计数指标,但您也可以指定其他指标进行计算。指标是针对每个桶/针对一个桶(而不是针对另一个指标)计算的,这就是为什么您不能将指标聚合嵌套在指标聚合下的原因,这根本没有意义。
根据您的数据看起来如何,您可能需要做的唯一更改是将 sum_done
聚合从 aggs
子句中移出,与您的 max_date_aggs
-聚合。
代码段
"aggs": {
"max_date_aggs": { "max": {"field": "timestamp"} },
"sum_done": { "sum": { "field": "itemNumberDone"} }
}
在您细化了您的问题并提供了我设法想出一个只需要一个请求的解决方案之后。如前所述,sum
-metric 聚合需要对 bucket 而不是 metric 进行操作。解决方案非常简单:无需计算 max
-日期,只需将此聚合重新制定为 terms
-聚合,按降序时间戳排序,只要求一个桶。
解决方案
GET gos_element/_search
{
"size": 0,
"aggs": {
"sender_comp_aggs": {
"terms": {"field": "senderComponent.keyword"},
"aggs": {
"activity_mnemo_aggs": {
"terms": {"field": "activityMnemo.keyword"},
"aggs": {
"activity_instance_id_aggs": {
"terms": {"field": "activityInstanceId.keyword"},
"aggs": {
"business_date_aggs": {
"terms": {"field": "correlationIdSet.businessDate"},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {"field": "contextSetId.closing.keyword"},
"aggs": {
"max_date_bucket_aggs": {
"terms": {
"field": "timestamp",
"size": 1,
"order": {"_key": "desc"}
},
"aggs": {
"sum_done": {
"sum": {"field": "itemNumberDone"}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
由于我依赖默认的 Elasticsearch 映射,因此我不得不参考字段的 .keyword
版本。如果您的字段直接映射到 keyword
类型的字段,则不需要这样做。
您可以使用以下 2 个命令对您提供的文档进行索引后立即测试上述请求:
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz
{
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz8z
{
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
结果您得到以下响应(预期值为 10):
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sender_comp_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PS",
"doc_count" : 2,
"activity_mnemo_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PScommand",
"doc_count" : 2,
"activity_instance_id_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123466",
"doc_count" : 2,
"business_date_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1580083200000,
"key_as_string" : "2020-01-27T00:00:00.000Z",
"doc_count" : 2,
"context_set_id_closing_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PARIS",
"doc_count" : 2,
"max_date_bucket_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 1580180460000,
"key_as_string" : "2020-01-28T03:01:00.000Z",
"doc_count" : 1,
"sum_done" : {
"value" : 10.0
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
这里有两个文档:
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz-o",
"_score": 5.785128,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
},
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz8z",
"_score": 4.8696175,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
}
]
}
我想聚合几个术语(senderComponent、activityMnemo、activityInstanceId、correlationIdSet.businessDate 和 contextSetId.closing),并聚合每个聚合的最大时间戳。完成后,我想对 itemNumberDone 求和。
如果我们只获取这两个文档并进行聚合,我想得到 10 itemNumberDone。
是否可以只使用一个查询并使用存储桶?