我的负面前瞻不起作用 - 为什么？

Question

我有一个散布着各种字符串、日期、制表符和语言代码的文本。我想提取日期+制表符组合后面的字符串，然后是像'[en]'这样的语言代码，一个制表符，之后我们没有字符串“BAD THINGS”（例如"2020-01-12\tSTRING 我们需要[en]\t好东西"，而不是 "2020-01-12\tSTRING 我们不需要[en]\t坏东西").

这是我正在使用的简短示例文本：

\n2021-01-12\tThis 不需要字符串 [it]\tBad things\tBad things\n2021-01-12\tThis 也不需要字符串 [en]\tBad things\tBad things\n2021-01-11\tString需要1个！ [it]\t需要的字符串 1！重复 here\tNot 有趣 here\n2021-01-11\tString 需要 2 个 [fr]\tString 需要 2 个重复 here\tUnnecessary string\n2021-01-11\tString 需要 3 个... [ru]\tString需要的 3...重复 here\tAnother 我们不感兴趣的部分

我制作了这个正则表达式来捕获日期和语言代码之间的所有字符串：

(\d{4}-\d{2}-\d{2}\t)(.*?)(\[\w{2}\]\t)

这很好用（参见 here）。但是，当我添加一个否定的前瞻性以排除那些后跟“坏事”的人时，我所有的正则表达式都向南：

(\d{4}-\d{2}-\d{2}\t)(.*?)(\[\w{2}\]\t)(?!Bad things)

你可以看到结果here。我知道我的前瞻不知何故使正则表达式变得贪婪，但我不知道如何避免这种情况，添加一个？在它不起作用之后。你能帮我吗？

Answer 1

不确定这是否涵盖所有情况，但这似乎有效：

(\d{4}-\d{2}-\d{2}\t)([^][]*)(\[\w{2}\]\t)(?!Bad things)

演示 here.

解释：

(\d{4}-\d{2}-\d{2}\t)   date and tab
([^][]*)                 collect only things that do not contain chars `[` and `]`   
(\[\w{2}\]\t)           follow up [<tag>]
(?!Bad things)           Negative Lookahead

我的负面前瞻不起作用 - 为什么？

My negative lookahead is not working - why?

python

regex

regex-lookarounds