使用空格在 ElasticSearch 7.6 中搜索关键字

Search keywords in ElasticSearch 7.6 with whitespace

我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索,但我对那些包含 whitespace 的词有疑问,如下例所示:

Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".

这是我的映射模式:

mapping = {
    "mappings": {
        "properties": {
            "date": { 
                "type": "date" 
            },
            "description": { 
                "type": "text", 
                "fielddata": True 
            },
        }
    }
}

这是我的查询:

{
    "query" : {
        "match": { "description": escaped_keywords }
    },
    "highlight" : {
        "pre_tags" : ["<match>"],
        "post_tags" : ["</match>"],
        "fields" : {
            "description" : {"number_of_fragments" : 0 }
        }
    }
}

escaped_keywords包含前面转义的关键字,如下:"toronto new\ mexico paris lisbona new\ york sedro\-woolley"

因此,该查询适用于单个名称城市和带有破折号的城市,但不适用于名称带有 space(纽约,新墨西哥)的名称,它们被分成(纽约,纽约,新,墨西哥)。

我也试过用这种方式给 space 的城市加上括号 toronto (new mexico) paris lisbona (new york) sedro\-woolley 但结果没有改变。

EDIT 突出显示对包含破折号的名称无效。它 returns 拆分单词(例如 [sedro, wooley] 而不是 [sedro-wooley])

EDIT 2 我的目的是匹配动态关键字列表(例如 "new york"、"toronto"、"sedro-wooley")使用 高亮标签。 这是数据样本:

{
    "_index": "test_Whosebug",
    "_type": "_doc",
    "_id": "x4nKv3EBQE6DGGITWX-O",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "title": "Best places: New Mexico and Sedro-Woolley",
        "description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
    }
}

您需要定义一个 custom analyzer using char filter 来删除空格和连字符 (-) so 以使您生成的标记符合您的要求。

索引定义

{
    "settings": {
        "analysis": {
            "char_filter": {
                "my_space_char_filter": {
                    "type": "mapping",
                    "mappings": [
                        "\u0020=>",  -> whitespace
                        "\u002D=>"   --> for hyphen(-)
                    ]
                }
            },
            "analyzer": {
                "splcharanalyzer": {
                    "char_filter": [
                        "my_space_char_filter"
                    ],
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings" :{
        "properties" :{
            "title" :{
                "type" : "text",
                "analyzer" : "splcharanalyzer"
            }
        }
    }
}

自定义生成的令牌 splcharanalyzer

POST myindex/_analyze

{
  "analyzer": "splcharanalyzer",
  "text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}

{
    "tokens": [
        {
            "token": "toronto",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "newmexico",
            "start_offset": 9,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "paris",
            "start_offset": 21,
            "end_offset": 26,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "lisbona",
            "start_offset": 28,
            "end_offset": 35,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "newyork",
            "start_offset": 37,
            "end_offset": 45,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "sedrowoolley",
            "start_offset": 47,
            "end_offset": 60,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

差异搜索查询

{
    "query": {
        "match" : {
            "title" : {
                "query" : "sedro-woolley"
            }
        }
    }
}

搜索结果

 "hits": [
            {
                "_index": "white",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
                }
            }
        ]

搜索 newyork 不会产生任何结果。

{
    "query": {
        "match" : {
            "title" : {
                "query" : "york"
            }
        }
    }
}

结果

 "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }