如何在天蓝色搜索中将精确匹配排名更高
How can I rank exact matches higher in azure search
我在 Azure 搜索中有一个索引,其中包含名字和姓氏等个人数据。
当我使用
之类的查询搜索 3 个字母的姓氏时
rau&searchFields=LastName
/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName
找到了名字 rau,但它在最后很远。
{
"@odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Liebetrau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Damerau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Rau"
更靠前的是像 "Liebetrau"、"Damerau".
这样的名字
有没有办法在顶部进行完全匹配?
编辑
使用 RestApi 查询索引定义
GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview
返回姓氏
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null,
"synonymMaps": []
编辑 1
分析器定义
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"name": "prefix",
"tokenizer": "standard",
"tokenFilters": [
"lowercase",
"my_edgeNGram"
],
"charFilters": []
}
],
"tokenizers": [],
"tokenFilters": [
{
"name": "my_edgeNGram",
"minGram": 2,
"maxGram": 20,
"side": "back"
}
],
"charFilters": []
编辑 2
最后指定我在查询时使用的 ScoringProfile
{
"name": "person-index",
"fields": [
{
"name": "ID",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
}
,
{
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"analyzer": "my_standard"
},
{
"name": "PartialLastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null
}
],
"analyzers":[
{
"name":"my_standard",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding" ]
},
{
"name":"prefix",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters":[
{
"name":"my_edgeNGram",
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":2,
"maxGram":20,
"side": "back"
}
],
"scoringProfiles":[
{
"name":"exactFirst",
"text":{
"weights":{ "LastName":2, "PartialLastName":1 }
}
}
]
}
在 LastName 字段上设置的分析器 "prefix" 为姓名 Liebetrau 生成以下术语:au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau
。这些 edge ngrams 的长度范围从单词的后面开始,从 2 到 20,如索引定义中的 my_edgeNGram 标记过滤器中所定义。分析器将以相同的方式处理其他名称。
当您搜索姓名 rau 时,它会匹配所有以这些字符结尾的姓名。这就是为什么结果集中的所有文档都具有相同的相关性得分。
您可以使用 Analyze API.
测试您的分析器配置
要了解有关自定义分析器的更多信息,请转至 here and here。
希望对您有所帮助
我在 Azure 搜索中有一个索引,其中包含名字和姓氏等个人数据。
rau&searchFields=LastName
/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName
找到了名字 rau,但它在最后很远。
{
"@odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Liebetrau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Damerau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Rau"
更靠前的是像 "Liebetrau"、"Damerau".
这样的名字有没有办法在顶部进行完全匹配?
编辑
使用 RestApi 查询索引定义
GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview
返回姓氏
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null,
"synonymMaps": []
编辑 1
分析器定义
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"name": "prefix",
"tokenizer": "standard",
"tokenFilters": [
"lowercase",
"my_edgeNGram"
],
"charFilters": []
}
],
"tokenizers": [],
"tokenFilters": [
{
"name": "my_edgeNGram",
"minGram": 2,
"maxGram": 20,
"side": "back"
}
],
"charFilters": []
编辑 2
最后指定我在查询时使用的 ScoringProfile
{
"name": "person-index",
"fields": [
{
"name": "ID",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
}
,
{
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"analyzer": "my_standard"
},
{
"name": "PartialLastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null
}
],
"analyzers":[
{
"name":"my_standard",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding" ]
},
{
"name":"prefix",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters":[
{
"name":"my_edgeNGram",
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":2,
"maxGram":20,
"side": "back"
}
],
"scoringProfiles":[
{
"name":"exactFirst",
"text":{
"weights":{ "LastName":2, "PartialLastName":1 }
}
}
]
}
在 LastName 字段上设置的分析器 "prefix" 为姓名 Liebetrau 生成以下术语:au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau
。这些 edge ngrams 的长度范围从单词的后面开始,从 2 到 20,如索引定义中的 my_edgeNGram 标记过滤器中所定义。分析器将以相同的方式处理其他名称。
当您搜索姓名 rau 时,它会匹配所有以这些字符结尾的姓名。这就是为什么结果集中的所有文档都具有相同的相关性得分。
您可以使用 Analyze API.
测试您的分析器配置要了解有关自定义分析器的更多信息,请转至 here and here。
希望对您有所帮助