如何解决 Logstash 中 CSV 文件的解析错误
How to resolve parsing error for CSV file in Logstash
我正在使用 Filebeat 将 CSV 文件发送到 Logstash,然后再发送到 Kibana,但是当 Logstash 提取 CSV 文件时我遇到了解析错误。
这是 CSV 文件的内容:
time version id score type
May 6, 2020 @ 11:29:59.863 1 2 PPy_6XEBuZH417wO9uVe _doc
logstash.conf:
input {
beats {
port => 5044
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","index","score","type"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
Filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /etc/test/*.csv
#- c:\programdata\elasticsearch\logs\*
以及 Logstash 中的错误:
[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 @ 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
我确实在 Kibana 中获得了一些数据,但不是我想要看到的。
我已经设法让它在本地工作。到目前为止我注意到的错误是:
- 使用 ES 保留字段,如
@timestamp
、@version
等。
- 时间戳不是 ISO8601 格式。它中间有一个
@
标志。
- 您的过滤器将分隔符设置为
,
,但您的 CSV 实际分隔符是 "\t"
。
- 根据错误,您可以看到它也在尝试在您的标题行上工作,我建议您将其从 CSV 中删除或使用
skip_header
选项。
下面是我使用的 logstash.conf 文件:
input {
file {
path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","score","type"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "csv-test"
}
}
我使用的CSV文件:
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
来自我的 Kibana:
我正在使用 Filebeat 将 CSV 文件发送到 Logstash,然后再发送到 Kibana,但是当 Logstash 提取 CSV 文件时我遇到了解析错误。
这是 CSV 文件的内容:
time version id score type
May 6, 2020 @ 11:29:59.863 1 2 PPy_6XEBuZH417wO9uVe _doc
logstash.conf:
input {
beats {
port => 5044
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","index","score","type"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
Filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /etc/test/*.csv
#- c:\programdata\elasticsearch\logs\*
以及 Logstash 中的错误:
[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 @ 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
我确实在 Kibana 中获得了一些数据,但不是我想要看到的。
我已经设法让它在本地工作。到目前为止我注意到的错误是:
- 使用 ES 保留字段,如
@timestamp
、@version
等。 - 时间戳不是 ISO8601 格式。它中间有一个
@
标志。 - 您的过滤器将分隔符设置为
,
,但您的 CSV 实际分隔符是"\t"
。 - 根据错误,您可以看到它也在尝试在您的标题行上工作,我建议您将其从 CSV 中删除或使用
skip_header
选项。
下面是我使用的 logstash.conf 文件:
input {
file {
path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","score","type"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "csv-test"
}
}
我使用的CSV文件:
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
来自我的 Kibana: