ElasticSearch 中的 Path Hierarchy Tokenizer 无法正常工作

Question

对于我分析访问日志的项目，我需要使 Path Hierarchy Tokenizer 工作。问题是分析器本身似乎工作正常，只是不适用于我的索引数据。我有一种感觉，映射可能有问题。

注意：我使用的 Elasticsearch 版本是 5.6。升级不是一种选择。我错误地使用了一些在 v.5.6 中尚不可用的语法，所以我有可能语法有问题。不过，我没能发现我的错误。

这是我的自定义模板的一部分：

{
"template": "beam-*"
"order" : 20,
"settings": {
    "number_of_shards": 1,
    "analysis": {
      "analyzer": {
        "custom_path_tree": {
          "tokenizer": "custom_hierarchy"
        },
        "custom_path_tree_reversed": {
          "tokenizer": "custom_hierarchy_reversed"
        }
      },
      "tokenizer": {
        "custom_hierarchy": {
          "type": "path_hierarchy",
          "delimiter": "/"
        },
        "custom_hierarchy_reversed": {
          "type": "path_hierarchy",
          "delimiter": "/",
          "reverse": "true"
        }
      }
    }
  },

这就是映射。对象字段包含路径。我希望能够搜索 object.tree 和 object.tree_reversed 以确定在线商店中访问量最大的类别。

 "mappings": {
    "logs": {
    "properties": {
      "object": {
        "type": "text",
        "fields": {
          "tree": {
            "type": "text",
            "analyzer": "custom_path_tree"
          },
          "tree_reversed": {
            "type": "text",
            "analyzer": "custom_path_tree_reversed"
          }
        }
      },

当我尝试这个时

POST beam-2019.07.02/_analyze
{
  "analyzer": "custom_path_tree",
  "text": "/belletristik/science-fiction/postapokalypse"
}

我明白了

{
  "tokens": [
    {
      "token": "/belletristik",
      "start_offset": 0,
      "end_offset": 13,
      "type": "word",
      "position": 0
    },
    {
      "token": "/belletristik/science-fiction",
      "start_offset": 0,
      "end_offset": 29,
      "type": "word",
      "position": 0
    },
    {
      "token": "/belletristik/science-fiction/postapokalypse",
      "start_offset": 0,
      "end_offset": 44,
      "type": "word",
      "position": 0
    }
  ]
}

分析仪本身似乎工作得很好并且正在做它应该做的事情。

然而，当我尝试构建查询时

GET beam-2019.07.03/_search
{
  "query": {
    "term": {
      "object.tree": "/belletristik/"
    }
  }
}

我没有得到结果，虽然应该有几百个。

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

可能我的查询有误。或者映射的东西不合算？

Answer 1

术语查询不会在查询时对输入字符串应用分析器，因此它会尝试匹配 /belletristik/。如果你注意到分析器的输出，它生成的标记是 /belletristik。生成的token末尾没有斜杠/。所以输入的术语与任何文档都不匹配。

修改查询如下：

GET beam-2019.07.03/_search
{
  "query": {
    "term": {
      "object.tree": "/belletristik"
    }
  }
}

如果您不想更改查询的输入词，也可以使用 match query。因为 match 也会在 /belletristik/ 上应用分析器。因此，这将尝试匹配 /belletristik（分析器在 /belletristik/ 上应用匹配查询时生成的令牌），因此将匹配文档。

GET beam-2019.07.03/_search
{
  "query": {
    "match": {
      "object.tree": "/belletristik/"
    }
  }
}

ElasticSearch 中的 Path Hierarchy Tokenizer 无法正常工作

Path Hierarchy Tokenizer in ElasticSearch not working properly

tokenize

elasticsearch