Elasticsearch:在指定时间范围之间滚动
Elasticsearch: scroll between specified time frame
我在elasticsearch中有一些数据。如图
我使用下面的 link 示例来进行滚动
https://gist.github.com/drorata/146ce50807d16fd4a6aa
page = es.search(
index = INDEX_NAME,
scroll = '1m',
size = 1000,
body={"query": {"match_all": {}}})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
print( "Scrolling...")
while (scroll_size > 0):
print("Page: ",count)
page = es.scroll(scroll_id = sid, scroll = '10m')
# Update the scroll ID
sid = page['_scroll_id']
for hit in page['hits']['hits']:
#some code processing here
目前我的要求是我想滚动但想指定开始时间戳和结束时间戳
需要有关如何使用滚动来执行此操作的帮助。
简单替换
body={"query": {"match_all": {}}})
来自
body={"query": {"range": {"timestamp":{"gte":"2018-08-05T05:30:00Z", "glte":"2018-08-06T05:30:00Z"}}}})
示例代码。时间范围应该在 es 查询中。此外,您应该处理第一个查询结果。
es_query_dict = {"query": {"range": {"timestamp":{
"gte":"2018-08-00T00:00:00Z", "lte":"2018-08-17T00:00:00Z"}}}}
def get_es_logs():
es_client = Elasticsearch([source_es_ip], port=9200, timeout=300)
total_docs = 0
page = es_client.search(scroll=scroll_time,
size=scroll_size,
body=json.dumps(es_query_dict))
while True:
sid = page['_scroll_id']
details = page["hits"]["hits"]
doc_count = len(details)
if len(details) > 0:
total_docs += doc_count
print("scroll size: " + str(doc_count))
print("start bulk index docs")
# index_bulk(details)
print("end success")
else:
break
page = es_client.scroll(scroll_id=sid, scroll=scroll_time)
print("total docs: " + str(total_docs))
另请查看 elasticsearch.helpers.scan
,您已经在其中实现了循环逻辑,只需将其传递即可 query={"query": {"range": {"timestamp": {"gt": ..., "lt": ...}}}}
我在elasticsearch中有一些数据。如图
我使用下面的 link 示例来进行滚动
https://gist.github.com/drorata/146ce50807d16fd4a6aa
page = es.search(
index = INDEX_NAME,
scroll = '1m',
size = 1000,
body={"query": {"match_all": {}}})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
print( "Scrolling...")
while (scroll_size > 0):
print("Page: ",count)
page = es.scroll(scroll_id = sid, scroll = '10m')
# Update the scroll ID
sid = page['_scroll_id']
for hit in page['hits']['hits']:
#some code processing here
目前我的要求是我想滚动但想指定开始时间戳和结束时间戳 需要有关如何使用滚动来执行此操作的帮助。
简单替换
body={"query": {"match_all": {}}})
来自
body={"query": {"range": {"timestamp":{"gte":"2018-08-05T05:30:00Z", "glte":"2018-08-06T05:30:00Z"}}}})
示例代码。时间范围应该在 es 查询中。此外,您应该处理第一个查询结果。
es_query_dict = {"query": {"range": {"timestamp":{
"gte":"2018-08-00T00:00:00Z", "lte":"2018-08-17T00:00:00Z"}}}}
def get_es_logs():
es_client = Elasticsearch([source_es_ip], port=9200, timeout=300)
total_docs = 0
page = es_client.search(scroll=scroll_time,
size=scroll_size,
body=json.dumps(es_query_dict))
while True:
sid = page['_scroll_id']
details = page["hits"]["hits"]
doc_count = len(details)
if len(details) > 0:
total_docs += doc_count
print("scroll size: " + str(doc_count))
print("start bulk index docs")
# index_bulk(details)
print("end success")
else:
break
page = es_client.scroll(scroll_id=sid, scroll=scroll_time)
print("total docs: " + str(total_docs))
另请查看 elasticsearch.helpers.scan
,您已经在其中实现了循环逻辑,只需将其传递即可 query={"query": {"range": {"timestamp": {"gt": ..., "lt": ...}}}}