Regex lookbehind - 从搜索中排除单词

Regex lookbehind - excluding words from searches

我需要在我的语料库中搜索 gameshame 等词,但我想指定搜索以排除三个字符串一个游戏/一个耻辱 或者,一个游戏/一个耻辱a/an/A/An WORD gamea/an/A/An WORD shame ,其中 WORD 是修饰符,例如 一场伟大的比赛一场巨大的耻辱

如果有人能帮助我,那就太好了,谢谢!

在我的语料库中,不定冠词a/angame之间的可选WORDa/an羞耻最常见的是伟大真实。所以即使排除这两个,也会对我有很大帮助。

下面的 lookbehind 可以完美地排除 a/A

(?<!a\s|A\s)\bshame\b

为了排除修改 WORD,我试图在 lookbehind grep 中使用 ?\w,但它不起作用 - 下面没有 ? 的 grep 运行并且它仍然排除 a shame 等示例,但仍然 returns 不需要的示例,例如 a great shame哭泣的耻辱 - 请参阅下面示例文本中的索引第 (3) 和 (4) 行:

    (?<!a\s|A\s|a\b\w\b|A\b\w\b)\bshame\b

我用来实现正则表达式的工具是 AntConc,它支持 Perl 正则表达式。

在使用下面的搜索字符串后,带有两个不相关示例(3 和 4)的示例文本

(?<!a\s|A\s)\bshame\b

1(匹配耻辱)

, people ogling from the sidelines.&nbsp; If you want a closer look, you have to ring for entry and wait to be admitted.&nbsp; I guess me and Saul just have no shame (or just know the benefits of our bank accounts being in hard currencies), because we wandered into plenty.&nbsp; Lots and lots of little boutiques and edgily designed fashion stores with music blaring.& abbutterflie.txt 47 1

2(匹配耻辱)

last twenty years and I've experienced all sorts of biggotry but I seriously thought that anti black nazism in football wass a thing of the past. You should all hang your heads in shame, bunch of [badword]s. adamdphillips.txt 57 1

3(不配丢脸)

me monetarily as I wasn't that close to her, but she was really good friends with the other girl and it's messed that up for them a bit, which is a great shame. Anyway, Holly and I have since found somewhere to move in just the two of us. It's going to cost an absolute fortune and I'm going to be eating basics beans on aderyn.txt 60 1

4(不配丢脸)

are loads of amazingly good bands out there, gigging up and down the country who will never get signed because no-one can figure out how to market them, and this is a crying shame. There are artists out there like <a href="http://www.angelsintheabattoir.com/" rel="nofollow">Thea Gilmore</a> and <a href="http://blog.amandapalmer.net/" rel="nofollow"> Amanda Palmer& aderyn.txt 60 2

5(匹配耻辱)

/><br />"There is no better time to show these terrorists that we have no fear of them. Instead we are forced, through the cowardly acts of our superiors, to hide in shame."<br /><br />But Herb Wiseman, high school consultant for Lee County, Florida, pointed to the July 7 London bombings.<br /><br />"What happens if kids get on aggy91.txt 64 1

因为不允许可变长度的负后视,所以你上一个问题的答案中的方法不会转移到这个。

我选择了 (*SKIP)(*FAIL) 模式。这将匹配并丢弃不合格的匹配,并且只保留合格的匹配:

/[Aa]n?( \w+)? shame(*SKIP)(*FAIL)|shame/ 3844 步 (Demo)

或者如果您希望包含单词边界元字符:

/\b[Aa]n?( \w+)? shame\b(*SKIP)(*FAIL)|\bshame\b/ 4762 步 (Demo)