ElasticSearch - 按术语和字段优先级查询文档

ElasticSearch - query documents by term and field priority

我目前正在使用 elasticsearch,我正在尝试从 Java 后端实现一个查询,该查询将不仅按术语而且按字段优先级从我的索引中查询文档。在我的索引中,我有包含术语和指定类型的字段的文档。

e.g 
term: "Flu Shot"
type: "procedure"

term: "Fluphenazine"
type: "drug"

我创建了一个按术语搜索的查询,弹性索引将 return 匹配该术语的最相关结果。我想要创建的功能是创建一个查询 return 结果匹配相同的术语,但按 'type' 字段的优先级排序。例如,当我输入“flu”时,我想首先获取类型为“procedure”的文档,然后是“drug”类型的文档。目前,由于许多药物以“流感”开头,因此索引 return 仅记录类型为“药物”的文件。

您可以使用 function_score.

The function_score allows you to modify the score of documents that are retrieved by a query. To use function_score, the user has to define a query and one or more functions, that compute a new score for each document returned by the query.

示例您的相关数据(使用 Elasticsearch 服务器 7.9):

  1. 创建索引,添加文档

     PUT /example_index
     {
       "mappings": {
         "properties": {
           "term": {"type": "text" },
           "type": {"type": "keyword"}
         }
       }
     }
    
     PUT /_bulk
     {"create": {"_index": "example_index", "_id": 1}}
     {"term": "Flu Shot", "type": "procedure"}
     {"create": {"_index": "example_index", "_id": 2}}
     {"term": "Fluphenazine", "type": "drug"}
     {"create": {"_index": "example_index", "_id": 3}}
     {"term": "Flu Shot2", "type": "procedure"}
     {"create": {"_index": "example_index", "_id": 4}}
     {"term": "Fluphenazine2", "type": "drug"}
    
  2. 使用自定义评分逻辑查询文档

     GET /example_index/_search
     {
       "query": {
         "function_score": {
           "query": {
             "wildcard": {
               "term": {
                 "value": "*flu*"
               }
             }
           },
           "functions": [
             {
               "filter": {
                 "term": {
                   "type": "procedure"
                 }
               },
               "weight": 2
             },
             {
               "filter": {
                 "term": {
                   "type": "drug"
                 }
               },
               "weight": 1
             }
           ]
         }
       }
     }
    
  3. 结果:

     {
       "took" : 2,
       "timed_out" : false,
       "_shards" : {
         "total" : 1,
         "successful" : 1,
         "skipped" : 0,
         "failed" : 0
       },
       "hits" : {
         "total" : {
           "value" : 4,
           "relation" : "eq"
         },
         "max_score" : 2.0,
         "hits" : [
           {
             "_index" : "example_index",
             "_type" : "_doc",
             "_id" : "1",
             "_score" : 2.0,
             "_source" : {
               "term" : "Flu Shot",
               "type" : "procedure"
             }
           },
           {
             "_index" : "example_index",
             "_type" : "_doc",
             "_id" : "3",
             "_score" : 2.0,
             "_source" : {
               "term" : "Flu Shot2",
               "type" : "procedure"
             }
           },
           {
             "_index" : "example_index",
             "_type" : "_doc",
             "_id" : "2",
             "_score" : 1.0,
             "_source" : {
               "term" : "Fluphenazine",
               "type" : "drug"
             }
           },
           {
             "_index" : "example_index",
             "_type" : "_doc",
             "_id" : "4",
             "_score" : 1.0,
             "_source" : {
               "term" : "Fluphenazine2",
               "type" : "drug"
             }
           }
         ]
       }
     }
    

您可以看到 type 设置为 procedure 的文档比 type 设置为 drug 的文档得分更高。这是因为我们为 function_score.

中的不同 type 分配了不同的权重