nginx 日志中的字节在 elasticsearch 中被映射为字符串而不是数字
Bytes form nginx logs is mapped as string not number in elasticsearch
最近我部署了 ELK 并开始通过 logstash frowarder 转发来自 nginx 的日志。
问题是,在 elasticsearch (1.4.2) / kibana (4) 中,"bytes" 请求的值映射为字符串。
我使用随处可见的标准配置。
在 logstash 模式中为 nginx 日志添加了新模式:
NGUSERNAME [a-zA-Z\.\@\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float} %{NUMBER:upstream_time:float}
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float}
为 logstash 添加了这些配置
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/logstash/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/logstash/tls/private/logstash-forwarder.key"
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
} else if [type] == "nginx" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
}
output {
elasticsearch_http {
host => localhost
}
}
但在 elsticsearch 中,即使我将 "bytes" 定义为 long
,我也将其视为字符串
(?:%{NUMBER:bytes:long}|-)
有人知道如何将 "bytes" 存储为数字类型吗?
谢谢
(?:%{NUMBER:bytes:long}|-)
您走在正确的轨道上,但 "long" 不是有效的数据类型。引用 grok documentation(强调我的):
Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int}
which converts the num
semantic from a string to an integer. Currently the only supported conversions are int
and float
.
请注意,这并不控制在 Elasticsearch 端的索引中实际使用的数据类型,仅控制发送到 Elasticsearch 的 JSON 文档的数据类型(这可能会或可能不会影响哪个映射 ES 使用)。在 JSON 上下文中,整数和长整数之间没有区别;标量值可以是数字、布尔值或字符串。
最近我部署了 ELK 并开始通过 logstash frowarder 转发来自 nginx 的日志。
问题是,在 elasticsearch (1.4.2) / kibana (4) 中,"bytes" 请求的值映射为字符串。
我使用随处可见的标准配置。
在 logstash 模式中为 nginx 日志添加了新模式:
NGUSERNAME [a-zA-Z\.\@\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float} %{NUMBER:upstream_time:float}
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float}
为 logstash 添加了这些配置
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/logstash/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/logstash/tls/private/logstash-forwarder.key"
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
} else if [type] == "nginx" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
}
output {
elasticsearch_http {
host => localhost
}
}
但在 elsticsearch 中,即使我将 "bytes" 定义为 long
,我也将其视为字符串(?:%{NUMBER:bytes:long}|-)
有人知道如何将 "bytes" 存储为数字类型吗?
谢谢
(?:%{NUMBER:bytes:long}|-)
您走在正确的轨道上,但 "long" 不是有效的数据类型。引用 grok documentation(强调我的):
Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example
%{NUMBER:num:int}
which converts thenum
semantic from a string to an integer. Currently the only supported conversions areint
andfloat
.
请注意,这并不控制在 Elasticsearch 端的索引中实际使用的数据类型,仅控制发送到 Elasticsearch 的 JSON 文档的数据类型(这可能会或可能不会影响哪个映射 ES 使用)。在 JSON 上下文中,整数和长整数之间没有区别;标量值可以是数字、布尔值或字符串。