使用纳秒时间戳将文档批量注入弹性搜索

Question

我正在尝试使用 ElasticSearch 7.1（实际上是在 7.0 之后）提供的最新纳秒级支持。不确定如何正确执行此操作。

ElasticSearch 7.0之前只支持毫秒时间戳，我使用_bulkAPI注入文件

#bulk post docs to elastic search
def es_bulk_insert(log_lines, batch_size=1000):
   headers = {'Content-Type': 'application/x-ndjson'}
   while log_lines:
       batch, log_lines = log_lines[:batch_size], log_lines[batch_size:]
       batch = '\n'.join([x.es_post_payload for x in batch]) + '\n'
       request = AWSRequest(method='POST', url=f'{ES_HOST}/_bulk', data=batch, headers=headers)
       SigV4Auth(boto3.Session().get_credentials(), 'es', 'eu-west-1').add_auth(request)
       session = URLLib3Session()
       r = session.send(request.prepare())
       if r.status_code > 299:
           raise Exception(f'Received a bad response from Elasticsearch: {r.text}')

每天生成日志索引

#ex:
#log-20190804
#log-20190805
def es_index(self):
       current_date = datetime.strftime(datetime.now(), '%Y%m%d')
       return f'{self.name}-{current_date}'

时间戳以纳秒为单位“2019-08-07T23:59:01.193379911Z”，在 7.0 之前由 Elasticsearch 自动映射到日期类型。

"timestamp": {
    "type": "date"
},

现在我想将时间戳字段映射到 "date_nanos" 类型。从 here 开始，我认为我需要在调用 es_bulk_insert() 函数上传文档之前创建具有正确映射的 ES 索引。

GET https://{es_url}/log-20190823
If not exist (return 404)
PUT https://{es_url}/log-20190823/_mapping
{
 "properties": {
    "timestamp": {
      "type": "date_nanos" 
    }
  }
}
...
call es_bulk_insert()
...

我的问题是：
1.如果我不重新映射旧数据(ex: log-20190804)，那么时间戳会有两个映射(data vs data_nano)，我用Kibana搜索日志的时候会不会有冲突？
2.我没有看到很多关于使用这个新功能的帖子，那会不会对性能有很大影响？有人在产品中使用过这个吗？
3. Kibana not support nanoseconds search before 7.3不确定是否可以正确按纳秒排序，将尝试。

谢谢！

Answer 1

你是对的：对于 date_nanos 你需要显式创建映射——否则动态映射将回退到 date。

你也说对了，Kibana 在 7.3 中总体上支持 date_nanos；尽管相关票证是 IMO https://github.com/elastic/kibana/issues/31424.

但是，排序还不能正常工作。这是因为 date（毫秒精度）和 date_nanos（纳秒精度）都表示为自纪元开始以来的很长时间。因此，第一个的值为 1546344630124，第二个的值为 1546344630123456789 — 这没有为您提供预期的排序顺序。

在 Elasticsearch 中有一个 parameter for search "numeric_type": "date_nanos" that will cast both to nanosecond precision and thus order correctly (added in 7.2). However, that parameter isn't yet used in Kibana. I've raised an issue for that now。

对于性能：release blog post has some details。显然有开销（包括文档大小），所以如果你真的需要它，我只会使用更高的精度。

使用纳秒时间戳将文档批量注入弹性搜索

Bulk inject doc to elastic search with nanoseconds timestamp

elasticsearch

kibana

aws-elasticsearch