Kibana 的索引管理不会更新文档计数

Question

我开始使用 elasticsearch-dsl 使用 elasticsearch 和 kibana。我正在遵循本指南：https://elasticsearch-dsl.readthedocs.io/en/latest/index.html#persistence-example

似乎一切正常。但是，当在 Kibana 的索引管理面板中刷新统计信息时，文档计数不会更新，直到我执行搜索（可能是巧合，但我对此表示怀疑）。

这是我插入弹性的代码：

connections.create_connection(hosts=['localhost'])
for index, doc in df.iterrows():
    new_cluster = Cluster(meta={'id': doc.url_hashed}, 
                      title = doc.title,
                      cluster = doc.cluster,
                      url = doc.url,
                      paper = doc.paper,
                      published = doc.published,
                      entered = datetime.datetime.now()
                   )
    new_cluster.save()

其中 "cluster" 是定义我的索引结构的自定义 class：

from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections

class Cluster(Document):
    title = Text(analyzer='standard', fields={'raw': Keyword()})
    cluster = Integer()
    url = Text()
    paper = Text()
    published = Date()
    entered = Date()

    class Index:
        name = 'cluster'

    def save(self, ** kwargs):
        return super(Cluster, self).save(** kwargs)

这是我正在查看的面板：https://www.screencast.com/t/zpEhv66Np 在运行上面的 "for" 循环并单击 Kibana 上的 "Reload indices" 按钮后，数字保持不变。他们只改变我在我的脚本上执行搜索（只是为了测试）：

s2 = Search(using=client, index="cluster")
test_df = pd.DataFrame(d.to_dict() for d in s2.scan())

为什么会这样？非常感谢！

Answer 1

首先，你有 1 个节点（可能是主节点和数据节点），在索引管理中它说你的索引状态是 yellow 这意味着副本分片没有分配（你不能有副本如果你只有1个节点，因为副本意味着把那些主分片放在另一个节点上。如果你想要1个副本，你需要至少有2个数据节点）。您需要将索引的副本数设置为 0，以使集群再次处于绿色状态：

PUT /<YOUR_INDEX>/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}

至于索引计数，批量操作后需要flush将文件写入磁盘。来自文档：

Flushing an index is the process of making sure that any data that is currently only stored in the transaction log is also permanently stored in the Lucene index. When restarting, Elasticsearch replays any unflushed operations from the transaction log into the Lucene index to bring it back into the state that it was in before the restart. Elasticsearch automatically triggers flushes as needed, using heuristics that trade off the size of the unflushed transaction log against the cost of performing each flush.

Once each operation has been flushed it is permanently stored in the Lucene index.

基本上，当你批量处理 N 个文档时，你不会立即看到它们，因为它们还没有写入 Lucene 索引。 bulk操作完成后可以手动触发flush：

POST /<YOUR_INDEX>/_flush

然后检查索引中的文档数量：

GET _cat/indices?v&s=index

您也可以强制每 N 秒刷新一次，例如：

PUT /<YOUR_INDEX>/_settings
{
    "index" : {
        "refresh_interval" : "1s"
    }
}

您可以在 docs 中阅读更多相关信息，但我的建议是如果文档数量与您批量处理的文档数量相同，请不要担心，而是使用 Kibana dev tools比 index management GUI。

Kibana 的索引管理不会更新文档计数

Kibana's index management won't update document count

python

elasticsearch

kibana