具有条件的 Elasticsearch 子聚合
Elasticsearch sub-aggregation with a condition
我的数据库 table 列如下:
编号 |企业名称 |执照号 |违规 | ...
我需要找出违规次数超过 5 次的商家。
我有以下内容:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
似乎有效。
现在我需要找出那些有 5 次或更多违规行为(如上),并且还拥有 3 次或更多许可证#s 的企业。
我不确定如何进一步汇总。
谢谢!
假设您的 License #
字段的定义与 Biz Name
一样,并且具有 .keyword
映射 .
现在,声明:
find the businesses that have ... 3 or more license #s
可以改写为:
aggregate by the business name
under the condition that the number of distinct values of the bucketed license IDs
is greater or equal to 3.
话虽如此,您可以使用 cardinality
aggregation 来获得不同的许可证 ID。
其次,“在条件下聚合”的机制是方便的bucket_script
aggregation,它执行一个脚本来确定当前迭代的桶是否将保留在最终聚合。
同时利用这两者意味着:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
仅此而已!
我的数据库 table 列如下:
编号 |企业名称 |执照号 |违规 | ...
我需要找出违规次数超过 5 次的商家。
我有以下内容:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
似乎有效。
现在我需要找出那些有 5 次或更多违规行为(如上),并且还拥有 3 次或更多许可证#s 的企业。
我不确定如何进一步汇总。
谢谢!
假设您的 License #
字段的定义与 Biz Name
一样,并且具有 .keyword
映射 .
现在,声明:
find the businesses that have ... 3 or more license #s
可以改写为:
aggregate by the
business name
under the condition that the number of distinct values of the bucketedlicense IDs
is greater or equal to 3.
话虽如此,您可以使用 cardinality
aggregation 来获得不同的许可证 ID。
其次,“在条件下聚合”的机制是方便的bucket_script
aggregation,它执行一个脚本来确定当前迭代的桶是否将保留在最终聚合。
同时利用这两者意味着:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
仅此而已!