grep regex：搜索一组单词中的任何一个

Question

我想在大量文件中搜索一组任意顺序的单词，有或没有空格或标点符号。所以，例如，如果我搜索 hello, there, friend，它应该匹配

hello there my friend
friend, hello there
theretherefriendhello

但不是

hello friend
there there friend

我想不出任何方法来做到这一点。甚至可以使用 grep 或 grep 的某些变体吗？

Answer 1

您可以使用 sed:

sed -n '/word1/{/word2/{/word3/p;};}' *.txt

Answer 2

is it even possible to do using grep, or some variation of grep?

你可以使用grep -P即Perl模式，下面的正则表达式。

^(?=.*hello)(?=.*there)(?=.*friend).*$

查看演示。

Answer 3

为此，我会像这样使用 awk：

awk '/hello/ && /there/ && /friend/' file

这会检查当前行是否包含所有字符串：hello、there 和friend。如果发生这种情况，将打印该行

为什么？因为那时条件为 True，当某事为 True 时 awk 的默认行为是打印当前行。

Answer 4

在 Basic 和 Extended RE 中，如果不使用特定于供应商或版本的扩展（如 Perl RE），您将需要使用如下方式处理此问题：

egrep  -lr 'hello.*there.*friend|hello.*friend.*there|there.*hello.*friend|there.*friend.*hello|friend.*hello.*there|friend.*there.*hello' /path/

注意 -l 选项只告诉你文件名，-r 告诉 grep 递归搜索。该解决方案几乎适用于您可能遇到的 grep 的所有变体。

这在 RE 方面显然不够优雅，但在使用 grep 的内置递归搜索方面很方便。如果 RE 打扰你，我会用 awk 或 sed 代替，如果可以的话，包裹在 find:

find /path/ -exec awk '/hello/&&/there/&&/friend/ {r=1} END {exit 1-r}'\; -print

同样，它的输出是文件列表，而不是行列表。您可以根据自己的具体要求进行调整。

grep regex: search for any of a set of words