如何在 ElasticSearch 中对聚合字段执行复杂查询
How to perform complex query on aggregated fields in ElasticSearch
我正在尝试弄清楚如何在弹性搜索中执行复杂查询,假设我有以下 table 数据:
我从以下查询中得到的
{
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_of_views": {
"sum": {
"field": "views"
}
},
"sum_of_costs": {
"sum": {
"field": "cost"
}
},
"sum_of_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_of_earned": {
"sum": {
"field": "earned"
}
},
"sum_of_adv_earned": {
"sum": {
"field": "adv_earned"
}
}
}
}
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "hour",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"range": {
"hour": {
"format": "strict_date_optional_time",
"gte": "2019-08-08T06:29:34.723Z",
"lte": "2020-08-08T06:29:34.724Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}
现在举例来说,如果我想获取具有以下条件的记录
(sum_of_clicks / sum_of_views) * (sum_of_earned2 / sum_of_earned1) < 0.5
我应该查询什么?
认为以下内容应该有所帮助。我的理解是,您可能希望首先根据 ColumnA, ColumnB, ColumnC
进行分组,计算 clicks, views, earned1 and earned2
字段的总和,然后应用您正在寻找的自定义聚合逻辑。
我已经能够使用 Bucket Selector Aggregation.
提出以下查询
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_views": {
"sum": {
"field": "views"
}
},
"sum_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_earned1": {
"sum": {
"field": "earned1"
}
},
"sum_earned2": {
"sum": {
"field": "earned2"
}
},
"custom_sum_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"sum_of_views": "sum_views",
"sum_of_clicks": "sum_clicks",
"sum_of_earned1": "sum_earned1",
"sum_of_earned2": "sum_earned2"
},
"script": "(params.sum_of_views/params.sum_of_clicks) * (params.sum_of_earned1/params.sum_of_earned2) < 0.5"
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "5._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "4._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
}
}
}
请注意,为了获得您正在寻找的确切结果,我必须在 4
和 5
.
添加存储桶的过滤条件
我使用的聚合是
- 用于计算您提到的条件的存储桶选择器
- 再次使用桶选择器,以便在聚合 5 时不显示空桶
- 再次使用桶选择器,以便现在显示第 4 级的空桶聚合。
为了测试为什么我添加了额外的空桶过滤器,您可以删除它们并查看您观察到的结果。
请注意,为了简单起见,我忽略了 query
部分以及 cost
字段。请随时添加它们并进行测试。
我正在尝试弄清楚如何在弹性搜索中执行复杂查询,假设我有以下 table 数据:
我从以下查询中得到的
{
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_of_views": {
"sum": {
"field": "views"
}
},
"sum_of_costs": {
"sum": {
"field": "cost"
}
},
"sum_of_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_of_earned": {
"sum": {
"field": "earned"
}
},
"sum_of_adv_earned": {
"sum": {
"field": "adv_earned"
}
}
}
}
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "hour",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"range": {
"hour": {
"format": "strict_date_optional_time",
"gte": "2019-08-08T06:29:34.723Z",
"lte": "2020-08-08T06:29:34.724Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}
现在举例来说,如果我想获取具有以下条件的记录
(sum_of_clicks / sum_of_views) * (sum_of_earned2 / sum_of_earned1) < 0.5
我应该查询什么?
认为以下内容应该有所帮助。我的理解是,您可能希望首先根据 ColumnA, ColumnB, ColumnC
进行分组,计算 clicks, views, earned1 and earned2
字段的总和,然后应用您正在寻找的自定义聚合逻辑。
我已经能够使用 Bucket Selector Aggregation.
提出以下查询POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_views": {
"sum": {
"field": "views"
}
},
"sum_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_earned1": {
"sum": {
"field": "earned1"
}
},
"sum_earned2": {
"sum": {
"field": "earned2"
}
},
"custom_sum_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"sum_of_views": "sum_views",
"sum_of_clicks": "sum_clicks",
"sum_of_earned1": "sum_earned1",
"sum_of_earned2": "sum_earned2"
},
"script": "(params.sum_of_views/params.sum_of_clicks) * (params.sum_of_earned1/params.sum_of_earned2) < 0.5"
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "5._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "4._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
}
}
}
请注意,为了获得您正在寻找的确切结果,我必须在 4
和 5
.
我使用的聚合是
- 用于计算您提到的条件的存储桶选择器
- 再次使用桶选择器,以便在聚合 5 时不显示空桶
- 再次使用桶选择器,以便现在显示第 4 级的空桶聚合。
为了测试为什么我添加了额外的空桶过滤器,您可以删除它们并查看您观察到的结果。
请注意,为了简单起见,我忽略了 query
部分以及 cost
字段。请随时添加它们并进行测试。