由管道分隔的数据的 Grok 模式,其中包含空格和可选值

Grok pattern for data separated by pipe with whitespaces and optional values in it

我有一个 textfile/logfile,其中值由管道符号分隔。 “|”有多个空格。

此外,我只是想在没有 gsub 的情况下尝试一下。

下面是一个例子,

有谁知道如何编写 GROK 模式来为 logstash 提取它?因为我对它很陌生。提前致谢

5000|       |       |applicationLog     |ClientLog      |SystemLog      |Green      |       |2014-01-07 11:58:48.76948      |12345 (0x1224)|1) Error 2)Sample Log | Configuration Manager

由于|的个数在不同的词之间不一致,可以用.*?匹配,提取其余数据为predefined grok pattern

%{NUMBER:num}.*?%{WORD:2nd}.*?%{WORD:3rd}.*?%{WORD:4th}.*?%{WORD:5th}.*?%{TIMESTAMP_ISO8601}

这会给你,

{
  "num": [
    [
      "5000"
    ]
  ],
  "BASE10NUM": [
    [
      "5000"
    ]
  ],
  "2nd": [
    [
      "applicationLog"
    ]
  ],
  "3rd": [
    [
      "ClientLog"
    ]
  ],
  "4th": [
    [
      "SystemLog"
    ]
  ],
  "5th": [
    [
      "Green"
    ]
  ],
  "TIMESTAMP_ISO8601": [
    [
      "2014-01-07 11:58:48.76948"
    ]
  ],
  "YEAR": [
    [
      "2014"
    ]
  ],
  "MONTHNUM": [
    [
      "01"
    ]
  ],
  "MONTHDAY": [
    [
      "07"
    ]
  ],
  "HOUR": [
    [
      "11",
      null
    ]
  ],
  "MINUTE": [
    [
      "58",
      null
    ]
  ],
  "SECOND": [
    [
      "48.76948"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ]
}

您可以在 online grok debugger 进行测试。

由于您是 grok 的新手,您可能想阅读 grok filter plugin basics

如果可以,我建议您也看看 dissect filter,它比 grok

更快更高效

The Dissect filter is a kind of split operation. Unlike a regular split operation where one delimiter is applied to the whole string, this operation applies a set of delimiters to a string value. Dissect does not use regular expressions and is very fast. However, if the structure of your text varies from line to line then Grok is more suitable. There is a hybrid case where Dissect can be used to de-structure the section of the line that is reliably repeated and then Grok can be used on the remaining field values with more regex predictability and less overall work to do.