在 ElasticSearch 中更新记录

Update records in ElasticSearch

我想为特定索引中的所有记录更新 logdate 列。到目前为止,从我读到的内容来看,这似乎是不可能的?我是对的?

这是文档示例:

{
            "_index": "logstash-01-2015",
            "_type": "ufdb",
            "_id": "AU__EvrALg15uxY1Wxf9",
            "_score": 1,
            "_source": {
               "message": "2015-08-14 06:50:05 [31946] PASS  level2      10.249.10.70    level2     ads       http://ad.360yield.com/unpixel.... GET",
               "@version": "1",
               "@timestamp": "2015-09-24T11:17:57.389Z",
               "type": "ufdb",
               "file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
               "host": "PROXY-DEV",
               "offset": "3983281700",
               "logdate": "2015-08-14T04:50:05.000Z",
               "status": "PASS",
               "group": "level2",
               "clientip": "10.249.10.70",
               "category": "ads",
               "url": "http://ad.360yield.com/unpixel....",
               "method": "GET",
               "tags": [
                  "_grokparsefailure"
               ]
            }
         }

您可以使用 partial update API.

为了测试它,我创建了一个简单的索引:

PUT /test_index

然后创建了一个文档:

PUT /test_index/doc/1
{
   "message": "2015-08-14 06:50:05 [31946] PASS  level2      10.249.10.70    level2     ads       http://ad.360yield.com/unpixel.... GET",
   "@version": "1",
   "@timestamp": "2015-09-24T11:17:57.389Z",
   "type": "ufdb",
   "file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
   "host": "PROXY-DEV",
   "offset": "3983281700",
   "logdate": "2015-08-14T04:50:05.000Z",
   "status": "PASS",
   "group": "level2",
   "clientip": "10.249.10.70",
   "category": "ads",
   "url": "http://ad.360yield.com/unpixel....",
   "method": "GET",
   "tags": [
      "_grokparsefailure"
   ]
}

现在我可以对文档进行部分更新:

POST /test_index/doc/1/_update
{
    "doc": {
        "logdate": "2015-09-25T12:20:00.000Z"
    }
}

如果我检索文档:

GET /test_index/doc/1

我会看到 logdate 属性 已更新:

{
   "_index": "test_index",
   "_type": "doc",
   "_id": "1",
   "_version": 2,
   "found": true,
   "_source": {
      "message": "2015-08-14 06:50:05 [31946] PASS  level2      10.249.10.70    level2     ads       http://ad.360yield.com/unpixel.... GET",
      "@version": "1",
      "@timestamp": "2015-09-24T11:17:57.389Z",
      "type": "ufdb",
      "file": "/usr/local/ufdbguard/logs/ufdbguardd.log",
      "host": "PROXY-DEV",
      "offset": "3983281700",
      "logdate": "2015-09-25T12:20:00.000Z",
      "status": "PASS",
      "group": "level2",
      "clientip": "10.249.10.70",
      "category": "ads",
      "url": "http://ad.360yield.com/unpixel....",
      "method": "GET",
      "tags": [
         "_grokparsefailure"
      ]
   }
}

这是我用来测试它的代码:

http://sense.qbox.io/gist/236bf271df6d867f5f0c87eacab592e41d3095cf

你是对的,那是不可能的。

有一个问题问了 Update by Query 很长时间了,我不确定它是否会很快实现,因为它对底层的 lucene 引擎有很大的问题。它需要删除所有文档并重新编制索引。

Update by Query Plugin 在 github 上可用,但它是实验性的,我从未尝试过。

更新 2018-05-02

原来的答案已经很老了。 Update By Query 现在支持。