ElasticSearch

Question

我有以下查询 - 工作正常（这可能不是实际查询）：

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "location",
            "query": {
              "geo_distance": {
                "distance": "16090km",
                "distance_type": "arc",
                "location.point": {
                  "lat": "51.794177",
                  "lon": "-0.063055"
                }
              }
            }
          }
        },
        {
          "geo_distance": {
            "distance": "16090km",
            "distance_type": "arc",
            "location.point": {
              "lat": "51.794177",
              "lon": "-0.063055"
            }
          }
        }
      ]
    }
  }
}

尽管我想执行以下操作（作为查询的一部分但不影响现有查询）：

查找具有 field_name = 1
在所有 field_name = 1 运行按 geo_distance
删除 field_name = 1 和 field_name_2 = 2 下的相同值的重复项，并在文档结果中保留最接近的项目，但删除其余项

更新（进一步说明）：

不能使用聚合，因为我们想在结果中操作文档。

同时保持文档内的顺序；含义：

如果我有20个文档，按字段排序；我有 5 个 field_name = 1，我想按距离对 5 个进行排序，并消除其中的 4 个；同时仍然保持第一类。（可能在实际查询之前进行地理距离排序和消除？）

不太确定如何执行此操作，如有任何帮助，我们将不胜感激 - 我目前正在使用 ElasticSearch DSL DRF - 但我可以轻松地将查询转换为 ElasticSearch DSL。

示例文档（处理前）：

[{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

输出（期望）：

[{
"field_name": 1,
"field_name_2": 2,
"location": .... <- closest
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

Answer 1

实现您想要的结果的一种方法是保留查询部分，因为您现在拥有它（这样您仍然可以获得所需的匹配）并添加聚合部分以获得最接近的文档以及附加条件filed_name。聚合部分将由以下内容组成：

a filter aggregation 只考虑带有 field_name = 1
a geo_distance aggregation 距离非常小
a top_hits aggregation到return距离最近的文档

聚合部分如下所示：

{
  "query": {
    ...same as you have now...
  },
  "aggs": {
    "field_name": {
      "filter": {
        "term": {
          "field_name": 1           <--- only select desired documents
        }
      },
      "aggs": {
        "geo_distance": {
          "field": "location.point",
          "unit": "km",
          "distance_type": "arc",
          "origin": {
            "lat": "51.794177",
            "lon": "-0.063055"
          },
          "ranges": [
            {
              "to": 1               <---- single bucket for docs < 1km (change as needed)
            }
          ]
        },
        "aggs": {
          "closest": {
            "top_hits": {
              "size": 1,            <---- closest document
              "sort": [
                {
                  "_geo_distance": {
                    "location.point": {
                      "lat": "51.794177",
                      "lon": "-0.063055"
                    },
                    "order": "asc",
                    "unit": "km",
                    "mode": "min",
                    "distance_type": "arc",
                    "ignore_unmapped": true
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

Answer 2

这可以使用 Field Collapsing 来完成 - 这相当于分组。 - 以下是如何实现这一目标的示例：

{"collapse": {"field": "vin",
              "inner_hits": {
                  "name": "closest_dealer",
                  "size": 1,
                  "sort": [
                      {
                          "_geo_distance": {
                              "location.point": {
                                  "lat": "latitude",
                                  "lon": "longitude"
                              },
                              "order": "desc",
                              "unit": "km",
                              "distance_type": "arc",
                              "nested_path": "location"
                          }
                      }
                  ]
              }
              }
 }

折叠是在字段 vin 上完成的 - inner_hits 用于对分组项目进行排序并获得最接近的项目。（尺寸 = 1）

ElasticSearch - 过滤结果和操作文档

ElasticSearch - Filtering a result and manipulating the documents

elasticsearch-dsl

elasticsearch-aggregation