Sankt 和 St 的同义词
Synonyms for Sankt and St
我正在尝试让同义词适用于我现有的设置。目前我有这个设置:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
}
},
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
在这个 City
索引中,我有这样的文档:
St. Wolfgang
或 Sankt Wolfgang
等等。对我来说 St.
和 Sankt
是同义词。因此,如果我搜索 Sankt
,这两个文档都应该出现。
我创建了一个新过滤器并将过滤器添加到我的 autocomplete analyzer
:
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st."
]
}
暂时还不错。但我遇到的问题如下:
很明显,st
后面的点暂时没有分析,也搜索不到。但是对于同义词,点很重要。
第二个问题是,如果我搜索 sankt
,同义词是 st
,它会为我提供所有以 st 开头的文档,如 Stuttgart
。所以发生这种情况也是因为没有使用点。
你知道我如何实现这些东西吗?如果您需要更多信息,请告诉我。
更新:
经过讨论,我在设置中做了以下更改:
将 edge_ngram
分词器更改为 standard
分词器。
添加了一个 edgeNGram
过滤器并将此过滤器添加到我的分析器中。
从我的分析器中删除了过滤器 german_normalization
和 my_ascii_folding
以简化测试。
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "autocomplete",
"filter": [
"my_synonym_filter",
"lowercase"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "standard"
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
我将这 3 个文档添加到索引中:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"
查询字符串 - 结果
st -> "St. Wolfgang", "Stuttgart"
st. -> "St. Wolfgang", "Sankt Wolfgang"
sankt -> "St. Wolfgang", "Sankt Wolfgang"
这对我来说效果很好。这里的要点是确保
- 将同义词过滤器放在小写字母之后
- 将edge-n-gram过滤器放在最后
- 仅在索引时使用 edge-n-gram
所以我们创建索引:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st. => sankt"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
然后我们索引数据:
PUT city/city/1
{
"name":"St. Wolfgang"
}
PUT city/city/2
{
"name":"Stuttgart"
}
PUT city/city/3
{
"name":"Sankt Wolfgang"
}
最终搜索 st
或 sankt
只会 return 文档 1 和 3 而不是 2
POST city/_search?q=name:st
POST city/_search?q=name:sankt
我正在尝试让同义词适用于我现有的设置。目前我有这个设置:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
}
},
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
在这个 City
索引中,我有这样的文档:
St. Wolfgang
或 Sankt Wolfgang
等等。对我来说 St.
和 Sankt
是同义词。因此,如果我搜索 Sankt
,这两个文档都应该出现。
我创建了一个新过滤器并将过滤器添加到我的 autocomplete analyzer
:
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st."
]
}
暂时还不错。但我遇到的问题如下:
很明显,st
后面的点暂时没有分析,也搜索不到。但是对于同义词,点很重要。
第二个问题是,如果我搜索 sankt
,同义词是 st
,它会为我提供所有以 st 开头的文档,如 Stuttgart
。所以发生这种情况也是因为没有使用点。
你知道我如何实现这些东西吗?如果您需要更多信息,请告诉我。
更新:
经过讨论,我在设置中做了以下更改:
将 edge_ngram
分词器更改为 standard
分词器。
添加了一个 edgeNGram
过滤器并将此过滤器添加到我的分析器中。
从我的分析器中删除了过滤器 german_normalization
和 my_ascii_folding
以简化测试。
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "autocomplete",
"filter": [
"my_synonym_filter",
"lowercase"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "standard"
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
我将这 3 个文档添加到索引中:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"
查询字符串 - 结果
st -> "St. Wolfgang", "Stuttgart"
st. -> "St. Wolfgang", "Sankt Wolfgang"
sankt -> "St. Wolfgang", "Sankt Wolfgang"
这对我来说效果很好。这里的要点是确保
- 将同义词过滤器放在小写字母之后
- 将edge-n-gram过滤器放在最后
- 仅在索引时使用 edge-n-gram
所以我们创建索引:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st. => sankt"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
然后我们索引数据:
PUT city/city/1
{
"name":"St. Wolfgang"
}
PUT city/city/2
{
"name":"Stuttgart"
}
PUT city/city/3
{
"name":"Sankt Wolfgang"
}
最终搜索 st
或 sankt
只会 return 文档 1 和 3 而不是 2
POST city/_search?q=name:st
POST city/_search?q=name:sankt