mawk 程序既不理解单词边界标记：“\<”、“\>”也不理解其他一些转义序列

Question

我刚刚注意到 Ubuntu 没有默认安装 gawk 的新安装。

因此，我所有包含单词边界标记的 awk 表达式：“<”、“>”根本不起作用，示例：

$ readlink -e $(which awk)
/usr/bin/mawk
$ echo "word1 Bluetooth word3" | awk '/\<Bluetooth\>/'
$

EDIT0：在安装了 gawk 的另一个系统上，它可以工作：

$ readlink -e $(which awk)
/usr/bin/gawk
$ echo "word1 Bluetooth word3" | awk '/\<Bluetooth\>/'
word1 Bluetooth word3
$

EDIT1：此外 mawk 表现出奇怪的行为：

$ echo word1 word2 word3   | mawk '/^\w+/{print}'
word1
$ echo sebastien1 abc toto | mawk '/^\w+/{print}'

下面是一些转义序列 gawk 理解:

$ man gawk | grep '\[yswSW<>].*Matches' 
   \y         Matches the empty string at either the beginning  or  the
   \<         Matches the empty string at the beginning of a word.
   \>         Matches the empty string at the end of a word.
   \s         Matches any whitespace character.
   \S         Matches any nonwhitespace character.
   \w         Matches any word-constituent character (letter, digit, or
   \W         Matches any character that is not word-constituent.

EDIT2：Ed Morton 是正确的，mawk 不理解 \w 也不理解 gawk 理解的其他空格序列：

$ man mawk | grep '\[yswSW<>]' 
$

有没有办法匹配对 mawk 和 gawk 都有效的单词？

Answer 1

取决于您想对比赛做什么，但这可能就足够了：

$ echo "word1 Bluetooth word3" | awk '/(^|[^[:alnum:]_])Bluetooth([^[:alnum:]_]|$)/'
word1 Bluetooth word3

在所有 awks 甚至 POSIX awks 中都没有表示“单词边界”的通用转义序列。

如果这不是您需要的全部内容，请编辑您的问题以更好地解释您要对匹配字符串执行的操作，并提供示例 input/output 来演示该用法。

关于您的编辑 - mawk 没有表现出奇怪的行为。您要求它找到以 1 个或多个 ws 开头的行（w 是文字字符，\w 仍然是相同的文字字符）并打印第一个字段从那条线。您测试的第一行以 w 开头，第二行不是。

如果你试图匹配单词组成字符（这是 \w+ 在 gawk 中所做的）然后在 POSIX awk 或 [=18= 中使用 [[:alnum:]_]+ ] 在任何 awk 中假设这些字符范围对于您的语言环境是正确的。如果您想打印与该正则表达式匹配的单词，那么它将是：

$ echo 'sebastien1 abc toto' |
    awk 'match([=11=],/^[[:alnum:]_]+/){print substr([=11=],RSTART,RLENGTH)}'
sebastien1

mawk 程序既不理解单词边界标记：“\<”、“\>”也不理解其他一些转义序列

mawk program does neither understand word border markers : "\<", "\>" nor some other escape sequences

awk