Elasticsearch,按地理距离和分数排序聚合
Elasticsearch, ordering aggregations by geo distance and score
我的映射如下:
PUT places
{
"mappings": {
"test": {
"properties": {
"id_product": { "type": "keyword" },
"id_product_unique": { "type": "integer" },
"location": { "type": "geo_point" },
"suggest": {
"type": "text"
},
"active": {"type": "boolean"}
}
}
}
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 2,
"location": {
"lat": 1.878446,
"lon": 108.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": ["coke"],
"active": true
}
POST places/test
{
"id_product" : "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": ["coke","light"],
"active": true
}
在我的示例中有 2 罐可乐零("id_product_unique" = 1
和 2
)、1 罐可乐("id_product_unique" = 3
)和一罐淡焦可乐("id_product_unique" = 4
)
所有这些罐头都在不同的位置。
“id_product
”不是唯一的,因为完全相同的 "can of coke" 可以在两个不同的地点出售(例如 "id_product_unique" = 1
和 2
)。
只有“id_product_unique
”和"location"从一个"can of coke"变为另一个(2个相同的"can of coke"具有相同的字段"suggest"和“ id_product
”但不相同的“id_product_unique
”和“location
”)。
我的目标是从给定的 GPS 位置搜索产品,并通过 id_product(最近的一个)显示唯一的结果:
POST /places/_search?size=0
{
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
我现在想从这个结果列表应用一个应该查询并根据计算的分数重新排序我的结果列表。我尝试了以下方法:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
但我不知道如何用文档分数替换距离排序分数。
任何帮助都会很棒。
我设法通过添加新聚合来做到这一点 "max_score":
"max_score": {
"max": {
"script": {
"lang": "painless",
"source": "_score"
}
}
}
并按 max_score.value desc:
订购
"order": {"max_score.value": "desc"}
我的最终查询如下:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : {
"field" : "id_product",
"order": {"max_score.value": "desc"}
},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
},
"max_score": {
"max": {
"script": {
"lang": "painless",
"inline": "_score"
}
}
}
}
}
}
}
答案:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group-by-type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 1,
"max_score": {
"value": 1.0300811529159546
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VhJdOmIBKhzTB9xcDvfk",
"_score": null,
"_source": {
"id_product": "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": [
"coke",
"light"
],
"active": true
},
"sort": [
1.0399999646503995
]
}
]
}
}
},
{
"key": "A",
"doc_count": 2,
"max_score": {
"value": 0.28768208622932434
},
"min-distance": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "UhJcOmIBKhzTB9xc6ve-",
"_score": null,
"_source": {
"id_product": "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": [
"coke",
"zero"
],
"active": true
},
"sort": [
2.1999999592114756
]
}
]
}
}
},
{
"key": "B",
"doc_count": 1,
"max_score": {
"value": 0.1596570909023285
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VRJcOmIBKhzTB9xc_vc0",
"_score": null,
"_source": {
"id_product": "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": [
"coke"
],
"active": true
},
"sort": [
3.2999999020282695
]
}
]
}
}
}
]
}
}
}
根据我收集到的信息,您的用例是您希望将文档中特定字段的值考虑到相关性分数的计算中。
这在您希望根据字段值(例如价格或此处为特定产品的查询)提升文档相关性的情况下很典型。
如果您正在搜索产品 A,那么在这种情况下,这比产品本身的距离更重要。因此,如果 B 距离起点 2 英里,A 距离原点 5 英里,则 A 是您要搜索的最接近的产品。
您需要的是基于距离使用 decay_function 的函数得分查询。我想你想要一个高斯类型来反映衰减率,它像钟形曲线一样运行。
这是一个使用 exp(指数)类型的衰减函数的示例。这个用例做同样的事情,但它使用的字段类型(日期)与
你是,但思路应该是一样的。
Suppose that instead of wanting to boost incrementally by the value of
a field, you have an ideal value you want to target and you want the
boost factor to decay the further away you move from the value. This
is typically useful in boosts based on lat/long, numeric fields like
price, or dates. In our contrived example, we are searching for books
on “search engines” ideally published around June 2014.
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin": "2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
这里有一些有用的参考资料:
Elasticsearch 6.2 Function Score document
The Closer the Better
这是一个 Elasticsearch 2x Decay Function 示例,尽管它是不同的版本,但我认为它与您的用例非常相似
我的映射如下:
PUT places
{
"mappings": {
"test": {
"properties": {
"id_product": { "type": "keyword" },
"id_product_unique": { "type": "integer" },
"location": { "type": "geo_point" },
"suggest": {
"type": "text"
},
"active": {"type": "boolean"}
}
}
}
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 2,
"location": {
"lat": 1.878446,
"lon": 108.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": ["coke"],
"active": true
}
POST places/test
{
"id_product" : "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": ["coke","light"],
"active": true
}
在我的示例中有 2 罐可乐零("id_product_unique" = 1
和 2
)、1 罐可乐("id_product_unique" = 3
)和一罐淡焦可乐("id_product_unique" = 4
)
所有这些罐头都在不同的位置。
“id_product
”不是唯一的,因为完全相同的 "can of coke" 可以在两个不同的地点出售(例如 "id_product_unique" = 1
和 2
)。
只有“id_product_unique
”和"location"从一个"can of coke"变为另一个(2个相同的"can of coke"具有相同的字段"suggest"和“ id_product
”但不相同的“id_product_unique
”和“location
”)。
我的目标是从给定的 GPS 位置搜索产品,并通过 id_product(最近的一个)显示唯一的结果:
POST /places/_search?size=0
{
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
我现在想从这个结果列表应用一个应该查询并根据计算的分数重新排序我的结果列表。我尝试了以下方法:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
但我不知道如何用文档分数替换距离排序分数。
任何帮助都会很棒。
我设法通过添加新聚合来做到这一点 "max_score":
"max_score": {
"max": {
"script": {
"lang": "painless",
"source": "_score"
}
}
}
并按 max_score.value desc:
订购"order": {"max_score.value": "desc"}
我的最终查询如下:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : {
"field" : "id_product",
"order": {"max_score.value": "desc"}
},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
},
"max_score": {
"max": {
"script": {
"lang": "painless",
"inline": "_score"
}
}
}
}
}
}
}
答案:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group-by-type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 1,
"max_score": {
"value": 1.0300811529159546
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VhJdOmIBKhzTB9xcDvfk",
"_score": null,
"_source": {
"id_product": "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": [
"coke",
"light"
],
"active": true
},
"sort": [
1.0399999646503995
]
}
]
}
}
},
{
"key": "A",
"doc_count": 2,
"max_score": {
"value": 0.28768208622932434
},
"min-distance": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "UhJcOmIBKhzTB9xc6ve-",
"_score": null,
"_source": {
"id_product": "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": [
"coke",
"zero"
],
"active": true
},
"sort": [
2.1999999592114756
]
}
]
}
}
},
{
"key": "B",
"doc_count": 1,
"max_score": {
"value": 0.1596570909023285
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VRJcOmIBKhzTB9xc_vc0",
"_score": null,
"_source": {
"id_product": "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": [
"coke"
],
"active": true
},
"sort": [
3.2999999020282695
]
}
]
}
}
}
]
}
}
}
根据我收集到的信息,您的用例是您希望将文档中特定字段的值考虑到相关性分数的计算中。 这在您希望根据字段值(例如价格或此处为特定产品的查询)提升文档相关性的情况下很典型。 如果您正在搜索产品 A,那么在这种情况下,这比产品本身的距离更重要。因此,如果 B 距离起点 2 英里,A 距离原点 5 英里,则 A 是您要搜索的最接近的产品。
您需要的是基于距离使用 decay_function 的函数得分查询。我想你想要一个高斯类型来反映衰减率,它像钟形曲线一样运行。
这是一个使用 exp(指数)类型的衰减函数的示例。这个用例做同样的事情,但它使用的字段类型(日期)与 你是,但思路应该是一样的。
Suppose that instead of wanting to boost incrementally by the value of a field, you have an ideal value you want to target and you want the boost factor to decay the further away you move from the value. This is typically useful in boosts based on lat/long, numeric fields like price, or dates. In our contrived example, we are searching for books on “search engines” ideally published around June 2014.
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin": "2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
这里有一些有用的参考资料:
Elasticsearch 6.2 Function Score document
The Closer the Better
这是一个 Elasticsearch 2x Decay Function 示例,尽管它是不同的版本,但我认为它与您的用例非常相似