使用 Elasticsearch 的独特字段组合的最新结果
Latest result for a unique combination of fields using Elasticsearch
我有以下格式的文档:
{name: 'A', website: 'example.com', date: 1, + other fields}
{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 1, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 1, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}
我想对 name
和 website
进行多查询,同时只返回最新的结果。我的查询如下所示:
query: {
bool: {
...optional filters...,
must: {
multi_match: {
query: input,
type: "most_fields",
fields: ["name^3", ..., "website"],
},
},
},
},
我想要的输出应该是这样的,排序方式 _score
:
{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}
现在我明白了需要 agg
才能使用 top_hits
获得最新结果,例如:
top_hits: {
size: 1,
sort: [{ date: "desc" }],
},
但是,在按 website
然后按 name
进行聚合的过程中,我丢失了 _score
的排序,这对我的查询很重要。我已经尝试过使用 composite
agg,怎么可能无法按结果记录的分数对其进行排序。
我正在考虑使用额外的手动创建一个字段,该字段是 name
和 website
的串联,然后我可以将其用作单级聚合,然后允许我对键进行排序通过 _score
。例如:
aggs: {
latest_results: {
terms: {
field: "website_name.keyword",
order: {
maximum_score: "desc",
},
},
aggs: {
maximum_score: {
max: {
script: {
source: "_score",
},
},
},
hits: {
top_hits: {
size: 1,
sort: [{ date: "desc" }],
},
},
},
},
},
您应该能够使用脚本对术语聚合中的热门点击聚合进行此操作。
根据 top_hits
的文档
sort - 应如何对最匹配的命中进行排序。默认情况下,命中按主查询的分数排序。
{
"size": 0,
"query": {
"bool": {
"must": [
{"multi_match": {
"query": "A",
"type": "most_fields",
"fields": ["name^3", "website"]
}}
]
}
},
"aggs": {
"visitor": {
"terms": {
"script": "doc['name'].value +'-'+ doc['website'].value",
"size": 10
},
"aggs": {
"top_visitors": {
"top_hits": {
"size": 1
}
}
}
}
}
}
您的结果将如下所示:
"visitor" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A-example.com",
"doc_count" : 2,
"top_visitors" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.7260926,
"hits" : [
{
"_index" : "test-52",
"_type" : "_doc",
"_id" : "vu_xUnQB5HlCKIdlWRy8",
"_score" : 1.7260926,
"_source" : {
"name" : "A",
"website" : "example.com",
"date" : 1
}
}
]
}
}
},
{
"key" : "A-something.com",
"doc_count" : 2,
"top_visitors" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.7260926,
"hits" : [
{
"_index" : "test-52",
"_type" : "_doc",
"_id" : "VWDxUnQBx_BqvGcp8U8j",
"_score" : 1.7260926,
"_source" : {
"name" : "A",
"website" : "something.com",
"date" : 1
}
}
]
}
}
}
]
}
小心使用脚本进行性能聚合会占用大量资源并且速度很慢。
我有以下格式的文档:
{name: 'A', website: 'example.com', date: 1, + other fields}
{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 1, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 1, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}
我想对 name
和 website
进行多查询,同时只返回最新的结果。我的查询如下所示:
query: {
bool: {
...optional filters...,
must: {
multi_match: {
query: input,
type: "most_fields",
fields: ["name^3", ..., "website"],
},
},
},
},
我想要的输出应该是这样的,排序方式 _score
:
{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}
现在我明白了需要 agg
才能使用 top_hits
获得最新结果,例如:
top_hits: {
size: 1,
sort: [{ date: "desc" }],
},
但是,在按 website
然后按 name
进行聚合的过程中,我丢失了 _score
的排序,这对我的查询很重要。我已经尝试过使用 composite
agg,怎么可能无法按结果记录的分数对其进行排序。
我正在考虑使用额外的手动创建一个字段,该字段是 name
和 website
的串联,然后我可以将其用作单级聚合,然后允许我对键进行排序通过 _score
。例如:
aggs: {
latest_results: {
terms: {
field: "website_name.keyword",
order: {
maximum_score: "desc",
},
},
aggs: {
maximum_score: {
max: {
script: {
source: "_score",
},
},
},
hits: {
top_hits: {
size: 1,
sort: [{ date: "desc" }],
},
},
},
},
},
您应该能够使用脚本对术语聚合中的热门点击聚合进行此操作。 根据 top_hits
的文档sort - 应如何对最匹配的命中进行排序。默认情况下,命中按主查询的分数排序。
{
"size": 0,
"query": {
"bool": {
"must": [
{"multi_match": {
"query": "A",
"type": "most_fields",
"fields": ["name^3", "website"]
}}
]
}
},
"aggs": {
"visitor": {
"terms": {
"script": "doc['name'].value +'-'+ doc['website'].value",
"size": 10
},
"aggs": {
"top_visitors": {
"top_hits": {
"size": 1
}
}
}
}
}
}
您的结果将如下所示:
"visitor" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A-example.com",
"doc_count" : 2,
"top_visitors" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.7260926,
"hits" : [
{
"_index" : "test-52",
"_type" : "_doc",
"_id" : "vu_xUnQB5HlCKIdlWRy8",
"_score" : 1.7260926,
"_source" : {
"name" : "A",
"website" : "example.com",
"date" : 1
}
}
]
}
}
},
{
"key" : "A-something.com",
"doc_count" : 2,
"top_visitors" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.7260926,
"hits" : [
{
"_index" : "test-52",
"_type" : "_doc",
"_id" : "VWDxUnQBx_BqvGcp8U8j",
"_score" : 1.7260926,
"_source" : {
"name" : "A",
"website" : "something.com",
"date" : 1
}
}
]
}
}
}
]
}
小心使用脚本进行性能聚合会占用大量资源并且速度很慢。