弹性搜索加载带有上下文的 csv 数据

Elastic search load csv data with context

我有 300 万条记录。 Headers 是 value, type, other_fields..

这里我需要像this

一样加载数据

我需要在记录中指定 type 作为 value 的上下文。有没有办法用日志存储来做到这一点?或任何其他选项?

val,val_type,id
Sunnyvale it labs, seller, 10223667

为此,我将使用新的 CSV ingest processor

首先创建摄取管道来解析您的 CSV 数据

PUT _ingest/pipeline/csv-parser
{
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "val",
          "val_type",
          "id"
        ]
      }
    },
    {
      "script": {
        "source": """
          def val = ctx.val;
          ctx.val = [
            'input': val,
            'contexts': [
              'type': [ctx.val_type]
            ]
          ]
          """
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

然后,您可以按如下方式为您的文档编制索引:

PUT index/_doc/1?pipeline=csv-parser
{
  "message": "Sunnyvale it labs,seller,10223667"
}

摄取后,文档将如下所示:

{
  "val_type": "seller",
  "id": "10223667",
  "val": {
    "input": "Sunnyvale it labs",
    "contexts": {
      "type": [
        "seller"
      ]
    }
  }
}

更新:Logstash 解决方案

使用Logstash,也是可行的。配置文件看起来像这样:

input {
    file {
        path => "/path/to/your/file.csv"
        sincedb_path => "/dev/null"
        start_position => "beginning"
    }
}
filter {
    csv {
        skip_header => true
        separator => ","
        columns => ["val", "val_type", "id"]
    }
    mutate {
        rename => { "val" => "value" }
        add_field => { 
            "[val][input]" => "%{value}" 
            "[val][contexts][type]" => "%{val_type}" 
        }
        remove_field => [ "value" ]
    }
}
output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "your-index"
    }    
}