ElasticSearch NEST 结合 AND 和 OR 查询

ElasticSearch NEST combining AND with OR queries

问题

如何编写 NEST 代码来为这个简单的布尔逻辑生成弹性搜索查询?

term1 && (term2 || term3 || term4)

我使用 Nest (5.2) 语句查询 ElasticSearch (5.2) 来实现此逻辑的伪代码

// additional requirements
( truckOemName = "HYSTER" && truckModelName = "S40FT" && partCategoryCode = "RECO" && partID != "")

//Section I can't get working correctly
AND (
    ( SerialRangeInclusiveFrom <= "F187V-6785D" AND SerialRangeInclusiveTo >= "F187V-6060D" )
    OR 
    ( SerialRangeInclusiveFrom = "" || SerialRangeInclusiveTo = "" )
)

相关文档解读

Writing Bool Queries中的"Combining queries with || or should clauses"提到了

The bool query does not quite follow the same boolean logic you expect from a programming language. term1 && (term2 || term3 || term4) does not become

bool
|___must
|   |___term1
|
|___should
   |___term2
   |___term3
   |___term4

you could get back results that only contain term1

这正是我认为正在发生的事情。

但是他们解决这个问题的答案超出了我对如何将其应用于 Nest 的理解。答案是?

  1. Add parentheses to force evaluation order (i am)
  2. Use boost factor? (what?)

代码

这是 NEST 代码

 var searchDescriptor = new SearchDescriptor<ElasticPart>();
 var terms = new List<Func<QueryContainerDescriptor<ElasticPart>, QueryContainer>>
 {
     s =>
         (s.TermRange(r => r.Field(f => f.SerialRangeInclusiveFrom)
              .LessThanOrEquals(dataSearchParameters.SerialRangeEnd))
          &&
          s.TermRange(r => r.Field(f => f.SerialRangeInclusiveTo)
              .GreaterThanOrEquals(dataSearchParameters.SerialRangeStart)))
         //None of the data that matches these ORs returns with the query this code generates, below.
         ||
         (!s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveFrom))
          ||
          !s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveTo))
         )
 };

 //Terms is the piece in question
 searchDescriptor.Query(s => s.Bool(bq => bq.Filter(terms))
     && !s.Terms(term => term.Field(x => x.OemID)
         .Terms(RulesHelper.GetOemExclusionList(exclusions))));

 searchDescriptor.Aggregations(a => a
     .Terms(aggPartInformation, t => t.Script(s => s.Inline(script)).Size(50000))
 );
 searchDescriptor.Type(string.Empty);
 searchDescriptor.Size(0);

 var searchResponse = ElasticClient.Search<ElasticPart>(searchDescriptor);

这是它生成的 ES JSON 查询

{
   "query":{
      "bool":{
         "must":[
            {
               "term":{ "truckOemName": { "value":"HYSTER" }}
            },
            {
               "term":{ "truckModelName": { "value":"S40FT" }}
            },
            {
               "term":{ "partCategoryCode": { "value":"RECO" }}
            },
            {
               "bool":{
                  "should":[
                     {
                        "bool":{
                           "must":[
                              {
                                 "range":{ "serialRangeInclusiveFrom": { "lte":"F187V-6785D" }}
                              },
                              {
                                 "range":{ "serialRangeInclusiveTo": { "gte":"F187V-6060D" }}
                              }
                           ]
                        }
                     },
                     {
                        "bool":{
                           "must_not":[
                              {
                                 "exists":{ "field":"serialRangeInclusiveFrom" }
                              }
                           ]
                        }
                     },
                     {
                        "bool":{
                           "must_not":[
                              {
                                 "exists":{ "field":"serialRangeInclusiveTo" }
                              }
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "exists":{
                  "field":"partID"
               }
            }
         ]
      }
   }
}

这是我们希望它生成的似乎有效的查询。

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "term": { "truckOemName": { "value": "HYSTER" }}
              },
              {
                "term": {"truckModelName": { "value": "S40FT" }}
              },
              {
                "term": {"partCategoryCode": { "value": "RECO" }}
              },
              {
                "exists": { "field": "partID" }
              }
            ],
            "should": [
              {
                "bool": {
                  "must": [
                    {
                      "range": { "serialRangeInclusiveFrom": {"lte": "F187V-6785D"}}
                    },
                    {
                      "range": {"serialRangeInclusiveTo": {"gte": "F187V-6060D"}}
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must_not": [
                    {
                      "exists": {"field": "serialRangeInclusiveFrom"}
                    },
                    {
                      "exists": {  "field": "serialRangeInclusiveTo"}
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

文档

使用 bool 查询的重载运算符,无法表达 must 子句与 should 子句的组合,即

term1 && (term2 || term3 || term4)

变成

bool
|___must
   |___term1
   |___bool
       |___should
           |___term2
           |___term3
           |___term4

这是一个带有两个 must 子句的 bool 查询,其中第二个 must 子句是一个 bool 查询,其中至少有一个匹配项should 个子句。 NEST 像这样组合查询,因为它符合 .NET 中对布尔逻辑的期望。

如果确实变成了

bool
|___must
|   |___term1
|
|___should
   |___term2
   |___term3
   |___term4

如果文档仅满足 must 子句,则该文档被视为匹配项。 should 子句在这种情况下起到了推动作用,即如果文档除了匹配 must 子句之外还匹配一个或多个 should 子句,那么它将具有更高的相关性分数,假设 term2term3term4 是计算相关性得分的查询。

在此基础上,您要生成的查询表示要将文档视为匹配项,它必须匹配 must 子句中的所有 4 个查询

"must": [
  {
    "term": { "truckOemName": { "value": "HYSTER" }}
  },
  {
    "term": {"truckModelName": { "value": "S40FT" }}
  },
  {
    "term": {"partCategoryCode": { "value": "RECO" }}
  },
  {
    "exists": { "field": "partID" }
  }
],

然后,对于匹配 must 子句的文档,if

  1. 它有一个 serialRangeInclusiveFrom 小于或等于 "F187V-6785D" 和一个 serialRangeInclusiveFrom 大于或等于 "F187V-6060D"

  2. serialRangeInclusiveFromserialRangeInclusiveTo

然后提高该文档的相关性得分。关键是

If a document matches the must clauses but does not match any of the should clauses, it will still be a match for the query (but have a lower relevancy score).

如果这是意图,则可以构建此查询 using the longer form of the Bool query