将停用词添加到标准的 Azure 搜索分析器?
Add stopwords to a standard azure search analyzer?
我在我的 Azure 搜索索引中使用 en.microsoft 分析器。在大多数情况下,它运行良好,但我需要添加一些特定于域的停用词。有没有办法向现有分析器添加停用词?或者实现一个从标准分析器继承其行为的自定义分析器,并仅覆盖停用词而其他一切保持原样?
虽然您不能从现有分析器继承,但您可以创建一对 custom analyzers(一个用于索引,一个用于搜索),其功能等同于 en.microsoft
,但您的自己的停用词列表。这是它在 REST API:
的索引定义负载中的样子
{
...
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "my_search_analyzer",
"tokenizer": "my_english_search_tokenizer",
"tokenFilters": [ "my_asciifolding_search", "lowercase", "my_stopword_filter" ]
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "my_index_analyzer",
"tokenizer": "my_english_index_tokenizer",
"tokenFilters": [ "my_asciifolding_index", "lowercase", "my_stopword_filter" ]
}
],
"tokenizers": [
{
"name": "my_english_search_tokenizer",
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"isSearchTokenizer": true,
"language": "english"
},
{
"name": "my_english_index_tokenizer",
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"isSearchTokenizer": false,
"language": "english"
}
],
"tokenFilters": [
{
"name": "my_asciifolding_search",
"@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
"preserveOriginal": false
},
{
"name": "my_asciifolding_index",
"@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
"preserveOriginal": true
},
{
"name": "my_stopword_filter",
"@odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": [ "put", "your", "custom", "stopwords", "here" ]
}
]
}
我在我的 Azure 搜索索引中使用 en.microsoft 分析器。在大多数情况下,它运行良好,但我需要添加一些特定于域的停用词。有没有办法向现有分析器添加停用词?或者实现一个从标准分析器继承其行为的自定义分析器,并仅覆盖停用词而其他一切保持原样?
虽然您不能从现有分析器继承,但您可以创建一对 custom analyzers(一个用于索引,一个用于搜索),其功能等同于 en.microsoft
,但您的自己的停用词列表。这是它在 REST API:
{
...
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "my_search_analyzer",
"tokenizer": "my_english_search_tokenizer",
"tokenFilters": [ "my_asciifolding_search", "lowercase", "my_stopword_filter" ]
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "my_index_analyzer",
"tokenizer": "my_english_index_tokenizer",
"tokenFilters": [ "my_asciifolding_index", "lowercase", "my_stopword_filter" ]
}
],
"tokenizers": [
{
"name": "my_english_search_tokenizer",
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"isSearchTokenizer": true,
"language": "english"
},
{
"name": "my_english_index_tokenizer",
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"isSearchTokenizer": false,
"language": "english"
}
],
"tokenFilters": [
{
"name": "my_asciifolding_search",
"@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
"preserveOriginal": false
},
{
"name": "my_asciifolding_index",
"@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
"preserveOriginal": true
},
{
"name": "my_stopword_filter",
"@odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": [ "put", "your", "custom", "stopwords", "here" ]
}
]
}