not_analyzed 字段 doc_values 仍在字段数据缓存中

Question

在 fielddata vs doc_values, I encountered a weird case. In my earlier mapping, I didn't use doc values at all. In my new mapping, I've added doc_values: true to all fields in my mapping, except analyzed string fields and booleans (not supported until 2.0) 的一些实验中。

详细来说，我是这样处理的：

在重新索引我的所有数据之前，我重新启动了我的 ES 1.7 集群，运行一个带有排序、聚合和脚本字段的查询到 "warm up" 字段数据缓存。然后我查询了 /fielddata 端点以了解字段数据缓存的使用情况。它看起来像这样：

curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw more_fields...
rKX7... myhost 192.168.1.100 Doom  32.9mb 2.3mb          ...

如您所见，字段 items.desc.raw 使用了 2.3mb 的堆 space。 items 的类型为 nested，包含一个字符串多字段和一个名为 raw 的 not_analyzed 子字段。简而言之，嵌套字段的映射如下所示：

    "items": {
      "type": "nested",
      "properties": {
        "desc": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }

将 doc_values: true 添加到 items.desc.raw 后，重新索引整个索引和运行一些聚合，再次排序和编写脚本以预热字段数据缓存，我查询了 /fielddata 端点，结果如下：

curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw some_bools...
tAB5... myhost 192.168.1.100 Yack  2.1mb  9.2kb          ...

所以字段数据的使用确实大大降低了（这很好），我看到的唯一字段是布尔字段（即上面的 some_bools），这是预期的，但令我惊讶的是，我的嵌套 not_analyzed 字符串字段也出现了，但 space 使用率低得多。

items.desc.raw 仍然出现在字段数据缓存中的原因可能是什么？

Answer 1

我怎么忘了 global ordinals。这就是为什么即使在使用 doc_values 之后我仍然使用 fielddata 的原因，因为全局序数不能包含在 doc_values.

中

见more details here

not_analyzed 字段 doc_values 仍在字段数据缓存中

not_analyzed field with doc_values still in fielddata cache

mapping

elasticsearch