自定义 grok 模式 - 匹配多种模式

Custom grok patterns - matching multiple patterns

我之前问过类似的问题,但没有得到任何回复,所以我想是时候改写它了,希望能得到一些急需的帮助。

最终我想创建一个摄取管道,但在尝试创建自定义 grok 模式时我在第一个障碍上失败了,使用 Kibana 中的 Grok 调试器从消息中提取两个字段。带有以下消息:

This is a document with a lengthy text it contains a number of paragraphs and at the end I'll add some markers that indicate additional information I'd like to pull out and add as additional fields. This is the end of the actual document with additional information being added prior to the closing bracket of the RTF.

additionalfield1: this is information associated with additionalfield1

additionalfield2: information associated with additionalfield2

我正在尝试创建以下字段,但我似乎无法让两种模式都匹配,只有一个或另一个匹配。

{
  "additionalfield1": ": this is information associated with additionalfield1",
  "additionalfield2": ": this is information associated with additionalfield2"

}

下图显示了我在匹配单个模式时所做的事情,我希望了解如何匹配和提取上述两种模式。正如您从屏幕截图中看到的那样,匹配其中一个,在这种情况下 "additionalfield1" 效果很好,如果我更改模式也是如此,但如果我尝试同时寻找两者,我什么也得不到 returns .

下面的屏幕截图显示了尝试提取 additionalfield1 和 additionalfield2 的失败尝试(如果存在的话),在这种情况下它只提取 additionalfield2。

如有任何帮助,我们将不胜感激。

更新:

显然,我真的一点都不明白。文本显然包含许多换行符,但如果我使用

的模式
(?m)%{FINCLASS:finclass}

我正在提取附加字段 1

如果我要添加

(?m)%{FINCLASS:finclass}(?m)%{MYCLASS:myclass}

在自定义模式下放这个:

FINCLASS : (?<=additionalfield1:\s)[^,\n]*
MYCLASS : (?<=additionalfield2:\s)[^,\n]*

我收到一条消息,指出模式不匹配,但跟在 additionalfield1 之后,该行的其余部分是换行符,因此 additionalfield2 始终跟在该模式之后\n

这让我发疯了,所以如果你愿意启发一个新手,请不要让我把头发扯掉。

试试这个:

输入:

This is a document with a lengthy text it contains a number of paragraphs and at the end I'll add some markers that indicate additional information I'd like to pull out and add as additional fields. This is the end of the actual document with additional information being added prior to the closing bracket of the RTF.

additionalfield1: this is information associated with additionalfield1

additionalfield2: information associated with additionalfield2

GROK 模式:

additionalfield1: (?<additionalfield1>([^,]*))additionalfield2: (?<additionalfield2>([^,]*))

输出:

{
  "additionalfield1": [
    [
      "this is information associated with additionalfield1\n\n"
    ]
  ],
  "additionalfield2": [
    [
      "information associated with additionalfield2"
    ]
  ]
}