在 logstash elasticsearch 中将 _Id 设置为更新键

Question

我的索引如下：

{
"_index": "mydata",
"_type": "_doc",
"_id": "PuhnbG0B1IIlyY9-ArdR",
"_score": 1,
"_source": {
"age": 9,
"@version": "1",
"updated_on": "2019-01-01T00:00:00.000Z",
"id": 4,
"name": "Emma",
"@timestamp": "2019-09-26T07:09:11.947Z"
}

所以我用于更新数据的 logstash conf 已输入 {

    jdbc {
        jdbc_connection_string => "***"
        jdbc_driver_class =>  "***"
    jdbc_driver_library => "***"
        jdbc_user => ***
        statement => "SELECT * from agedata WHERE updated_on > :sql_last_value ORDER BY updated_on"
    use_column_value =>true
        tracking_column =>updated_on
        tracking_column_type => "timestamp"
    }
}
output {
          elasticsearch { hosts => ["localhost:9200"] 
        index => "mydata" 
        action => update
            document_id => "{_id}"
            doc_as_upsert =>true}
          stdout { codec => rubydebug }
       }

因此，当我运行在同一行中进行任何更新后，我的预期输出是为我在该行中所做的任何更改更新现有的 _id 值。但是我的 Elasticsearch 将它索引为一个新行，其中我的 _id 被视为一个字符串。

"_index": "agesep",
"_type": "_doc",
"_id": ***"%{_id}"***

当我将 document_id => "%{id}" 用作：实际：

         {
"_index": "mydata",
"_type": "_doc",
"_id": "BuilbG0B1IIlyY9-4P7t",
"_score": 1,
"_source": {
"id": 1,
"age": 13,
"name": "Greg",
"updated_on": "2019-09-26T08:11:00.000Z",
"@timestamp": "2019-09-26T08:17:52.974Z",
"@version": "1"
}
}

重复：

{
"_index": "mydata",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"age": 56,
"@version": "1",
"id": 1,
"name": "Greg",
"updated_on": "2019-09-26T08:18:00.000Z",
"@timestamp": "2019-09-26T08:20:14.561Z"
}

当我在 ES 中进行更新时，如何让它考虑现有的 _id 而不是创建重复值？我的期望是根据_id更新索引中的数据，而不是创建新的更新行。

Answer 1

我建议使用 id 而不是 _id

        document_id => "%{id}"

在 logstash elasticsearch 中将 _Id 设置为更新键

Set _Id as update key in logstash elasticsearch

elasticsearch

logstash

kibana

elastic-stack