使用空格在 ElasticSearch 7.6 中搜索关键字

Question

我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索，但我对那些包含 whitespace 的词有疑问，如下例所示：

Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".

这是我的映射模式：

mapping = {
    "mappings": {
        "properties": {
            "date": { 
                "type": "date" 
            },
            "description": { 
                "type": "text", 
                "fielddata": True 
            },
        }
    }
}

这是我的查询：

{
    "query" : {
        "match": { "description": escaped_keywords }
    },
    "highlight" : {
        "pre_tags" : ["<match>"],
        "post_tags" : ["</match>"],
        "fields" : {
            "description" : {"number_of_fragments" : 0 }
        }
    }
}

escaped_keywords包含前面转义的关键字，如下："toronto new\ mexico paris lisbona new\ york sedro\-woolley"

因此，该查询适用于单个名称城市和带有破折号的城市，但不适用于名称带有 space（纽约，新墨西哥）的名称，它们被分成（纽约，纽约，新，墨西哥）。

我也试过用这种方式给 space 的城市加上括号 toronto (new mexico) paris lisbona (new york) sedro\-woolley 但结果没有改变。

EDIT 突出显示对包含破折号的名称无效。它 returns 拆分单词（例如 [sedro, wooley] 而不是 [sedro-wooley]）

EDIT 2 我的目的是匹配动态关键字列表（例如 "new york"、"toronto"、"sedro-wooley"）使用 高亮标签。这是数据样本：

{
    "_index": "test_Whosebug",
    "_type": "_doc",
    "_id": "x4nKv3EBQE6DGGITWX-O",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "title": "Best places: New Mexico and Sedro-Woolley",
        "description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
    }
}

Answer 1

您需要定义一个 custom analyzer using char filter 来删除空格和连字符 (-) so 以使您生成的标记符合您的要求。

索引定义

{
    "settings": {
        "analysis": {
            "char_filter": {
                "my_space_char_filter": {
                    "type": "mapping",
                    "mappings": [
                        "\u0020=>",  -> whitespace
                        "\u002D=>"   --> for hyphen(-)
                    ]
                }
            },
            "analyzer": {
                "splcharanalyzer": {
                    "char_filter": [
                        "my_space_char_filter"
                    ],
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings" :{
        "properties" :{
            "title" :{
                "type" : "text",
                "analyzer" : "splcharanalyzer"
            }
        }
    }
}

自定义生成的令牌 `splcharanalyzer`

POST myindex/_analyze

{
  "analyzer": "splcharanalyzer",
  "text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}

{
    "tokens": [
        {
            "token": "toronto",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "newmexico",
            "start_offset": 9,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "paris",
            "start_offset": 21,
            "end_offset": 26,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "lisbona",
            "start_offset": 28,
            "end_offset": 35,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "newyork",
            "start_offset": 37,
            "end_offset": 45,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "sedrowoolley",
            "start_offset": 47,
            "end_offset": 60,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

差异搜索查询

{
    "query": {
        "match" : {
            "title" : {
                "query" : "sedro-woolley"
            }
        }
    }
}

搜索结果

 "hits": [
            {
                "_index": "white",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
                }
            }
        ]

搜索 new 或 york 不会产生任何结果。

{
    "query": {
        "match" : {
            "title" : {
                "query" : "york"
            }
        }
    }
}

结果

 "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }

使用空格在 ElasticSearch 7.6 中搜索关键字

Search keywords in ElasticSearch 7.6 with whitespace

elasticsearch

elasticsearch-query

自定义生成的令牌 `splcharanalyzer`

差异搜索查询

搜索结果

使用空格在 ElasticSearch 7.6 中搜索关键字

Search keywords in ElasticSearch 7.6 with whitespace

elasticsearch

elasticsearch-query

自定义生成的令牌 splcharanalyzer

差异搜索查询

搜索结果

自定义生成的令牌 `splcharanalyzer`