如何处理多词同义词
How to handle multi-word synonyms
我试图了解在几种情况下我在 Elastic 搜索中获得的结果。我定义了这个同义词列表:
"product insert, product inserts, qc package, qc package inserts, qc package insert, package insert => package inserts"
我希望箭头左侧的所有术语都被视为右侧的术语。这是我的索引设置:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}
我的问题是,当我搜索几个术语时 - "product insert",我没有得到我期望的结果。但是 "product inserts" 工作得很好。我的配置有问题吗?我错过了一步吗?
我认为您遗漏了 mapping 部分,因此需要将其与您的字段进行映射,并使其能够通过 synonyms 进行搜索如下所示,
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym_filter" : {
"type" : "synonym",
"synonyms" : [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "synonym_analyzer"
}
}
}
}
我已经测试了您的设置,我猜您还没有为您的字段分配 my_synonyms
分析器。
在不知道您如何定义映射的情况下,我将向您展示一个工作示例:
假设您的映射和设置如下所示:
PUT /my_index
{
"mappings": {
"properties": {
"data": {
"type": "text",
"analyzer": "my_synonyms", => my guess
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}
索引一些数据:
POST my_index/_doc/1
{
"data":"package inserts"
}
查询哪些实用同义词:
GET my_index/_search
{
"query": {
"match": {
"data": "product insert"
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"data" : "package inserts"
}
}
]
}
}
如果不将分析器分配给您的字段,只有当您的搜索查询中包含 package
或 inserts
中的一个词时,您才会得到结果,实际上,如果没有分析器,您可以执行一个简单的 match
使用默认 elasticsearch standard
分析器的查询。
希望这对您有所帮助
我试图了解在几种情况下我在 Elastic 搜索中获得的结果。我定义了这个同义词列表:
"product insert, product inserts, qc package, qc package inserts, qc package insert, package insert => package inserts"
我希望箭头左侧的所有术语都被视为右侧的术语。这是我的索引设置:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}
我的问题是,当我搜索几个术语时 - "product insert",我没有得到我期望的结果。但是 "product inserts" 工作得很好。我的配置有问题吗?我错过了一步吗?
我认为您遗漏了 mapping 部分,因此需要将其与您的字段进行映射,并使其能够通过 synonyms 进行搜索如下所示,
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym_filter" : {
"type" : "synonym",
"synonyms" : [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "synonym_analyzer"
}
}
}
}
我已经测试了您的设置,我猜您还没有为您的字段分配 my_synonyms
分析器。
在不知道您如何定义映射的情况下,我将向您展示一个工作示例:
假设您的映射和设置如下所示:
PUT /my_index
{
"mappings": {
"properties": {
"data": {
"type": "text",
"analyzer": "my_synonyms", => my guess
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"my_syn_filt": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": [
"product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
]
}
},
"analyzer": {
"my_synonyms": {
"filter": [
"lowercase",
"my_syn_filt"
],
"tokenizer": "keyword"
}
}
}
}
}
}
索引一些数据:
POST my_index/_doc/1
{
"data":"package inserts"
}
查询哪些实用同义词:
GET my_index/_search
{
"query": {
"match": {
"data": "product insert"
}
}
}
结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"data" : "package inserts"
}
}
]
}
}
如果不将分析器分配给您的字段,只有当您的搜索查询中包含 package
或 inserts
中的一个词时,您才会得到结果,实际上,如果没有分析器,您可以执行一个简单的 match
使用默认 elasticsearch standard
分析器的查询。
希望这对您有所帮助