如何处理多词同义词

How to handle multi-word synonyms

我试图了解在几种情况下我在 Elastic 搜索中获得的结果。我定义了这个同义词列表:

"product insert, product inserts, qc package, qc package inserts, qc package insert, package insert => package inserts"

我希望箭头左侧的所有术语都被视为右侧的术语。这是我的索引设置:

PUT /test_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_syn_filt": {
            "tokenizer": "keyword",
            "type": "synonym",
            "synonyms": [
              "product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_syn_filt"
            ],
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}

我的问题是,当我搜索几个术语时 - "product insert",我没有得到我期望的结果。但是 "product inserts" 工作得很好。我的配置有问题吗?我错过了一步吗?

我认为您遗漏了 mapping 部分,因此需要将其与您的字段进行映射,并使其能够通过 synonyms 进行搜索如下所示,

{
    "settings": {
        "index" : {
            "analysis" : {
                "filter" : {
                    "synonym_filter" : {
                        "type" : "synonym",
                        "synonyms" : [
                            "product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
                        ]
                    }
                },
                "analyzer" : {
                    "synonym_analyzer" : {
                        "tokenizer" : "standard",
                        "filter" : ["lowercase", "synonym_filter"] 
                    }
                }
            }
        }
    },
    "mappings": {
            "properties": {
              "title": { 
                "type": "text",
                "analyzer": "synonym_analyzer"
              }
            }     
    }
}

我已经测试了您的设置,我猜您还没有为您的字段分配 my_synonyms 分析器。

在不知道您如何定义映射的情况下,我将向您展示一个工作示例:

假设您的映射和设置如下所示:

PUT /my_index
{
  "mappings": {
    "properties": {
      "data": {
        "type": "text",
        "analyzer": "my_synonyms",  => my guess
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  },
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_syn_filt": {
            "tokenizer": "keyword",
            "type": "synonym",
            "synonyms": [
              "product insert, product inserts, package inserts, qc package, qc packages, qc insert, qc inserts, package insert, qc package insert, qc package inserts => package inserts"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_syn_filt"
            ],
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}

索引一些数据:

POST my_index/_doc/1
{
  "data":"package inserts"
}

查询哪些实用同义词:

GET my_index/_search
{
  "query": {
      "match": {
        "data": "product insert"
      }
  }
}

结果:

{
 "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "data" : "package inserts"
        }
      }
    ]
  }
}

如果不将分析器分配给您的字段,只有当您的搜索查询中包含 packageinserts 中的一个词时,您才会得到结果,实际上,如果没有分析器,您可以执行一个简单的 match 使用默认 elasticsearch standard 分析器的查询。

希望这对您有所帮助