弹性搜索结果中的错误分数

Question

没有得到弹性搜索查询结果的正确分数。

ES 查询 -

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
            "fields": [
              "MDMGlobalData.Name1"
            ]
          }
        }
      ]
    }
  }
}

ES 结果 -

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 798,
      "relation": "eq"
    },
    "max_score": 9.169065,
    "hits": [
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551037160",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551040507",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551076447",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551100746",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551090880",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PAFFORD EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551106787",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "CAPROCK EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551021568",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "WILTON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551124137",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551125549",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551133066",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      }
    ]
  }
}

理想情况下，结果中的第一个集合应该是值为“emergency”或以“emergency”一词开头的 Name1

我们怎么可能前 5 个结果集的分数几乎相同？作为 Name1 的值是不同的。

由于打分错误，结果乱七八糟。如何修正结果中的分数？

Answer 1

不，不必如此。因为 ES 遵循 Lucene scoring function

同分原因：

您在每个文档中只有两个术语 - emergency and one more word
Emergency 单词按原样匹配。 Field Length is same
出现次数为1次。即 Term frequencies are same.
所有术语的相关性相同。 idf
Coord 与您的文档相同，只包含一次 Emergency

但是如果你有一个文件Emergency X Y Z，那么它的分数会低于你拥有的其他文件。因为term frequency这个更高。

如果你只有Emergency，这篇文档的分数会高于所有

在你的场景中有相同的分数是完全正常的，因为用户不知道 emergency he/she 是什么意思。

更新：

{
    "query":{
        "bool":{
            "must":{
                "term":{
                "MDMGlobalData.Name1":"emergency"
                }
            }
        }
    }
}

使用示例数据，输出：

"hits": [
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "iN1hKnMBojxRtp6HNI7d",
        "_score": 0.10938574,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "g91TKnMBojxRtp6Hto4q",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hN1TKnMBojxRtp6H2I6A",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hd1TKnMBojxRtp6H_I6_",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "h91VKnMBojxRtp6HYI4e",
        "_score": 0.07223585,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD X"
          }
        }
      }
    ]

弹性搜索结果中的错误分数

Wrong score in elastic search result

elasticsearch

aws-elasticsearch