具有过滤器匹配的 Elasticsearch 聚合
Elasticsearch aggregations with filter match
我有一个包含嵌套文档集合的文档:
{
"_source": {
...
"groups": [
{
"group_id": 100,
"parent_group_id": 1,
"title": "Wheel",
"parent_group_title": "Parts"
},
{
"group_id": 200,
"parent_group_id": 2,
"title": "Seat",
"parent_group_title": "Parts"
}
]
...
}
}
映射看下:
{
...,
"groups": {
"type": "nested",
"properties": {
"group_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"parent_group_id": {
"type": "long"
},
"parent_group_title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
...
}
我要做的是下一个聚合:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "groups",
"query": {
"match": {
"groups.title": {
"query": "whe"
}
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "groups",
"query": {
"match": {
"groups.title": {
"query": "whe"
}
}
}
}
}
]
}
},
"aggs": {
"groups": {
"nested": {
"path": "groups"
},
"aggs": {
"titles": {
"terms": {
"field": "groups.title.keyword",
"size": 5
},
"aggs": {
"parents": {
"terms": {
"field": "groups.parent_group_title.keyword",
"size": 3
}
}
}
}
}
}
}
}
}
}
通过这样的查询,我得到的结果类似于下一个:
"aggregations" : {
"filtered" : {
"doc_count" : ...,
"groups" : {
"doc_count" : ...,
"titles" : {
"doc_count_error_upper_bound" : ...,
"sum_other_doc_count" : ...,
"buckets" : [
{
"key" : "Seat",
"doc_count" : 10,
"parents" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [
{
"key" : "Parts",
"doc_count" : 6
},
{
"key" : "Other",
"doc_count" : 4
}
]
}
},
{
"key" : "Wheel",
"doc_count" : 3,
"parents" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3,
"buckets" : [
{
"key" : "Parts",
"doc_count" : 2
},
{
"key" : "Other",
"doc_count" : 1
}
]
}
}
]
}
}
}
}
但我想要的是只有键为 Wheel
的结果才会出现在结果桶中(或匹配 whe
搜索字符串的任何其他结果)。
希望问题够清楚。我做错了什么?任何建议或更改数据结构或查询?
UPD:
添加一个my_custom_analyzer
供参考:
{
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"min_gram": 2,
"max_gram": 15,
"token_chars": [
"letter",
"digit"
]
}
}
您可能想在 groups.title
组之前过滤。这意味着您根本不需要顶级查询,也不需要 filtered
级查询。
我没有你的 my_custom_analyzer
,所以我使用了基本匹配,但你明白了要点:
GET groups/_search
{
"size": 0,
"aggs": {
"groups": {
"nested": {
"path": "groups"
},
"aggs": {
"titles": {
"filter": {
"match": {
"groups.title": {
"query": "wheel"
}
}
},
"aggs": {
"group_title_terms": {
"terms": {
"field": "groups.title.keyword",
"size": 5
},
"aggs": {
"parents": {
"terms": {
"field": "groups.parent_group_title.keyword",
"size": 3
}
}
}
}
}
}
}
}
}
}
更新:
您的分析器存在问题 -- 让我们使用 _analyze
来确定 whe
将如何被标记化:
GET groups/_analyze
{
"text": "whe",
"analyzer": "my_custom_analyzer"
}
屈服
{
"tokens" : [
{
"token" : "w",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "wh",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "h",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 2
},
{
"token" : "he",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 3
},
{
"token" : "e",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 4
}
]
}
我怀疑基于令牌 e
,Seats
匹配。
我的建议是使用 edge_ngram
而不是 n_gram
如下:
PUT groups
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"groups": {
"type": "nested",
"properties": {
"group_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"parent_group_id": {
"type": "long"
},
"parent_group_title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
}
应用映射,重建索引,一切顺利!
我有一个包含嵌套文档集合的文档:
{
"_source": {
...
"groups": [
{
"group_id": 100,
"parent_group_id": 1,
"title": "Wheel",
"parent_group_title": "Parts"
},
{
"group_id": 200,
"parent_group_id": 2,
"title": "Seat",
"parent_group_title": "Parts"
}
]
...
}
}
映射看下:
{
...,
"groups": {
"type": "nested",
"properties": {
"group_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"parent_group_id": {
"type": "long"
},
"parent_group_title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
...
}
我要做的是下一个聚合:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "groups",
"query": {
"match": {
"groups.title": {
"query": "whe"
}
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "groups",
"query": {
"match": {
"groups.title": {
"query": "whe"
}
}
}
}
}
]
}
},
"aggs": {
"groups": {
"nested": {
"path": "groups"
},
"aggs": {
"titles": {
"terms": {
"field": "groups.title.keyword",
"size": 5
},
"aggs": {
"parents": {
"terms": {
"field": "groups.parent_group_title.keyword",
"size": 3
}
}
}
}
}
}
}
}
}
}
通过这样的查询,我得到的结果类似于下一个:
"aggregations" : {
"filtered" : {
"doc_count" : ...,
"groups" : {
"doc_count" : ...,
"titles" : {
"doc_count_error_upper_bound" : ...,
"sum_other_doc_count" : ...,
"buckets" : [
{
"key" : "Seat",
"doc_count" : 10,
"parents" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [
{
"key" : "Parts",
"doc_count" : 6
},
{
"key" : "Other",
"doc_count" : 4
}
]
}
},
{
"key" : "Wheel",
"doc_count" : 3,
"parents" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3,
"buckets" : [
{
"key" : "Parts",
"doc_count" : 2
},
{
"key" : "Other",
"doc_count" : 1
}
]
}
}
]
}
}
}
}
但我想要的是只有键为 Wheel
的结果才会出现在结果桶中(或匹配 whe
搜索字符串的任何其他结果)。
希望问题够清楚。我做错了什么?任何建议或更改数据结构或查询?
UPD:
添加一个my_custom_analyzer
供参考:
{
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "ngram",
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"min_gram": 2,
"max_gram": 15,
"token_chars": [
"letter",
"digit"
]
}
}
您可能想在 groups.title
组之前过滤。这意味着您根本不需要顶级查询,也不需要 filtered
级查询。
我没有你的 my_custom_analyzer
,所以我使用了基本匹配,但你明白了要点:
GET groups/_search
{
"size": 0,
"aggs": {
"groups": {
"nested": {
"path": "groups"
},
"aggs": {
"titles": {
"filter": {
"match": {
"groups.title": {
"query": "wheel"
}
}
},
"aggs": {
"group_title_terms": {
"terms": {
"field": "groups.title.keyword",
"size": 5
},
"aggs": {
"parents": {
"terms": {
"field": "groups.parent_group_title.keyword",
"size": 3
}
}
}
}
}
}
}
}
}
}
更新:
您的分析器存在问题 -- 让我们使用 _analyze
来确定 whe
将如何被标记化:
GET groups/_analyze
{
"text": "whe",
"analyzer": "my_custom_analyzer"
}
屈服
{
"tokens" : [
{
"token" : "w",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "wh",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "h",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 2
},
{
"token" : "he",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 3
},
{
"token" : "e",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 4
}
]
}
我怀疑基于令牌 e
,Seats
匹配。
我的建议是使用 edge_ngram
而不是 n_gram
如下:
PUT groups
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"groups": {
"type": "nested",
"properties": {
"group_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"parent_group_id": {
"type": "long"
},
"parent_group_title": {
"type": "text",
"analyzer": "my_custom_analyzer",
"term_vector": "with_positions_offsets",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
}
应用映射,重建索引,一切顺利!