Kibana 的索引管理不会更新文档计数
Kibana's index management won't update document count
我开始使用 elasticsearch-dsl 使用 elasticsearch 和 kibana。我正在遵循本指南:https://elasticsearch-dsl.readthedocs.io/en/latest/index.html#persistence-example
似乎一切正常。但是,当在 Kibana 的索引管理面板中刷新统计信息时,文档计数不会更新,直到我执行搜索(可能是巧合,但我对此表示怀疑)。
这是我插入弹性的代码:
connections.create_connection(hosts=['localhost'])
for index, doc in df.iterrows():
new_cluster = Cluster(meta={'id': doc.url_hashed},
title = doc.title,
cluster = doc.cluster,
url = doc.url,
paper = doc.paper,
published = doc.published,
entered = datetime.datetime.now()
)
new_cluster.save()
其中 "cluster" 是定义我的索引结构的自定义 class:
from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
class Cluster(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
cluster = Integer()
url = Text()
paper = Text()
published = Date()
entered = Date()
class Index:
name = 'cluster'
def save(self, ** kwargs):
return super(Cluster, self).save(** kwargs)
这是我正在查看的面板:https://www.screencast.com/t/zpEhv66Np
在 运行 上面的 "for" 循环并单击 Kibana 上的 "Reload indices" 按钮后,数字保持不变。他们只改变我在我的脚本上执行搜索(只是为了测试):
s2 = Search(using=client, index="cluster")
test_df = pd.DataFrame(d.to_dict() for d in s2.scan())
为什么会这样?
非常感谢!
首先,你有 1 个节点(可能是主节点和数据节点),在索引管理中它说你的索引状态是 yellow
这意味着副本分片没有分配(你不能有副本如果你只有1个节点,因为副本意味着把那些主分片放在另一个节点上。如果你想要1个副本,你需要至少有2个数据节点)。您需要将索引的副本数设置为 0,以使集群再次处于绿色状态:
PUT /<YOUR_INDEX>/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
至于索引计数,批量操作后需要flush
将文件写入磁盘。来自文档:
Flushing an index is the process of making sure that any data that is
currently only stored in the transaction log is also permanently
stored in the Lucene index. When restarting, Elasticsearch replays any
unflushed operations from the transaction log into the Lucene index to
bring it back into the state that it was in before the restart.
Elasticsearch automatically triggers flushes as needed, using
heuristics that trade off the size of the unflushed transaction log
against the cost of performing each flush.
Once each operation has been flushed it is permanently stored in the
Lucene index.
基本上,当你批量处理 N 个文档时,你不会立即看到它们,因为它们还没有写入 Lucene 索引。 bulk
操作完成后可以手动触发flush
:
POST /<YOUR_INDEX>/_flush
然后检查索引中的文档数量:
GET _cat/indices?v&s=index
您也可以强制每 N 秒刷新一次,例如:
PUT /<YOUR_INDEX>/_settings
{
"index" : {
"refresh_interval" : "1s"
}
}
您可以在 docs 中阅读更多相关信息,但我的建议是如果文档数量与您批量处理的文档数量相同,请不要担心,而是使用 Kibana dev tools
比 index management
GUI。
我开始使用 elasticsearch-dsl 使用 elasticsearch 和 kibana。我正在遵循本指南:https://elasticsearch-dsl.readthedocs.io/en/latest/index.html#persistence-example
似乎一切正常。但是,当在 Kibana 的索引管理面板中刷新统计信息时,文档计数不会更新,直到我执行搜索(可能是巧合,但我对此表示怀疑)。
这是我插入弹性的代码:
connections.create_connection(hosts=['localhost'])
for index, doc in df.iterrows():
new_cluster = Cluster(meta={'id': doc.url_hashed},
title = doc.title,
cluster = doc.cluster,
url = doc.url,
paper = doc.paper,
published = doc.published,
entered = datetime.datetime.now()
)
new_cluster.save()
其中 "cluster" 是定义我的索引结构的自定义 class:
from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
class Cluster(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
cluster = Integer()
url = Text()
paper = Text()
published = Date()
entered = Date()
class Index:
name = 'cluster'
def save(self, ** kwargs):
return super(Cluster, self).save(** kwargs)
这是我正在查看的面板:https://www.screencast.com/t/zpEhv66Np 在 运行 上面的 "for" 循环并单击 Kibana 上的 "Reload indices" 按钮后,数字保持不变。他们只改变我在我的脚本上执行搜索(只是为了测试):
s2 = Search(using=client, index="cluster")
test_df = pd.DataFrame(d.to_dict() for d in s2.scan())
为什么会这样? 非常感谢!
首先,你有 1 个节点(可能是主节点和数据节点),在索引管理中它说你的索引状态是 yellow
这意味着副本分片没有分配(你不能有副本如果你只有1个节点,因为副本意味着把那些主分片放在另一个节点上。如果你想要1个副本,你需要至少有2个数据节点)。您需要将索引的副本数设置为 0,以使集群再次处于绿色状态:
PUT /<YOUR_INDEX>/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
至于索引计数,批量操作后需要flush
将文件写入磁盘。来自文档:
Flushing an index is the process of making sure that any data that is currently only stored in the transaction log is also permanently stored in the Lucene index. When restarting, Elasticsearch replays any unflushed operations from the transaction log into the Lucene index to bring it back into the state that it was in before the restart. Elasticsearch automatically triggers flushes as needed, using heuristics that trade off the size of the unflushed transaction log against the cost of performing each flush.
Once each operation has been flushed it is permanently stored in the Lucene index.
基本上,当你批量处理 N 个文档时,你不会立即看到它们,因为它们还没有写入 Lucene 索引。 bulk
操作完成后可以手动触发flush
:
POST /<YOUR_INDEX>/_flush
然后检查索引中的文档数量:
GET _cat/indices?v&s=index
您也可以强制每 N 秒刷新一次,例如:
PUT /<YOUR_INDEX>/_settings
{
"index" : {
"refresh_interval" : "1s"
}
}
您可以在 docs 中阅读更多相关信息,但我的建议是如果文档数量与您批量处理的文档数量相同,请不要担心,而是使用 Kibana dev tools
比 index management
GUI。