弹性搜索结果中的错误分数
Wrong score in elastic search result
没有得到弹性搜索查询结果的正确分数。
ES 查询 -
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
"fields": [
"MDMGlobalData.Name1"
]
}
}
]
}
}
}
ES 结果 -
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 798,
"relation": "eq"
},
"max_score": 9.169065,
"hits": [
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551037160",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551040507",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551076447",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551100746",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551090880",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PAFFORD EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551106787",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "CAPROCK EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551021568",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "WILTON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551124137",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551125549",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551133066",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
}
]
}
}
理想情况下,结果中的第一个集合应该是值为“emergency”或以“emergency”一词开头的 Name1
我们怎么可能前 5 个结果集的分数几乎相同?作为 Name1 的值是不同的。
由于打分错误,结果乱七八糟。
如何修正结果中的分数?
不,不必如此。因为 ES 遵循 Lucene scoring function
同分原因:
- 您在每个文档中只有两个术语 -
emergency and one more word
Emergency
单词按原样匹配。 Field Length is same
- 出现次数为1次。即 Term frequencies are same.
- 所有术语的相关性相同。 idf
- Coord 与您的文档相同,只包含一次
Emergency
但是如果你有一个文件Emergency X Y Z
,那么它的分数会低于你拥有的其他文件。因为term frequency
这个更高。
如果你只有Emergency
,这篇文档的分数会高于所有
在你的场景中有相同的分数是完全正常的,因为用户不知道 emergency
he/she 是什么意思。
更新:
{
"query":{
"bool":{
"must":{
"term":{
"MDMGlobalData.Name1":"emergency"
}
}
}
}
}
使用示例数据,输出:
"hits": [
{
"_index": "emerge",
"_type": "_doc",
"_id": "iN1hKnMBojxRtp6HNI7d",
"_score": 0.10938574,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "g91TKnMBojxRtp6Hto4q",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hN1TKnMBojxRtp6H2I6A",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hd1TKnMBojxRtp6H_I6_",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "h91VKnMBojxRtp6HYI4e",
"_score": 0.07223585,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD X"
}
}
}
]
没有得到弹性搜索查询结果的正确分数。
ES 查询 -
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
"fields": [
"MDMGlobalData.Name1"
]
}
}
]
}
}
}
ES 结果 -
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 798,
"relation": "eq"
},
"max_score": 9.169065,
"hits": [
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551037160",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551040507",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551076447",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551100746",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551090880",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PAFFORD EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551106787",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "CAPROCK EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551021568",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "WILTON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551124137",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551125549",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551133066",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
}
]
}
}
理想情况下,结果中的第一个集合应该是值为“emergency”或以“emergency”一词开头的 Name1
我们怎么可能前 5 个结果集的分数几乎相同?作为 Name1 的值是不同的。
由于打分错误,结果乱七八糟。 如何修正结果中的分数?
不,不必如此。因为 ES 遵循 Lucene scoring function
同分原因:
- 您在每个文档中只有两个术语 -
emergency and one more word
Emergency
单词按原样匹配。 Field Length is same- 出现次数为1次。即 Term frequencies are same.
- 所有术语的相关性相同。 idf
- Coord 与您的文档相同,只包含一次
Emergency
但是如果你有一个文件Emergency X Y Z
,那么它的分数会低于你拥有的其他文件。因为term frequency
这个更高。
如果你只有Emergency
,这篇文档的分数会高于所有
在你的场景中有相同的分数是完全正常的,因为用户不知道 emergency
he/she 是什么意思。
更新:
{
"query":{
"bool":{
"must":{
"term":{
"MDMGlobalData.Name1":"emergency"
}
}
}
}
}
使用示例数据,输出:
"hits": [
{
"_index": "emerge",
"_type": "_doc",
"_id": "iN1hKnMBojxRtp6HNI7d",
"_score": 0.10938574,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "g91TKnMBojxRtp6Hto4q",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hN1TKnMBojxRtp6H2I6A",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hd1TKnMBojxRtp6H_I6_",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "h91VKnMBojxRtp6HYI4e",
"_score": 0.07223585,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD X"
}
}
}
]