使用 AWK 的字数统计

Question

我有如下文件：

这是一个示例文件该文件将用于测试

this is a sample file
this file will be used for testing

我想用AWK统计字数

预期输出是

this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1

我写了下面的 AWK，但出现了一些错误

cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'

Answer 1

对我来说效果很好：

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

PS你不需要设置-F" "，因为它默认为任何空白。
PS2、不要把cat和可以自己读取数据的程序一起使用，比如awk

可以在代码后面加sort排序

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2

Answer 2

与其循环每一行并将单词保存在数组中 ({for(i=1;i<=NF;i++) a[$i]++})，不如将 gawk 与多字符 RS (Record Separator) definition support option 并将每个字段保存在数组中如下（有点快）：

gawk '{a[[=10=]]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file

输出：

used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

在上面的gawk命令中我定义了space-character-class [[:space:]]+（包括一个或多个spaces or \new line character）作为记录分隔符。

Answer 3

这是 Perl 代码，它提供与 Jotne 的 awk 解决方案类似的排序输出：

perl -ne 'for (split /\s+/, $_){ $w{$_}++ }; END{ for $key (sort keys %w) { print "$key $w{$key}\n"}}' testfile

$_为当前行，根据空格分割/\s+/
然后将每个单词放入 $_
%w 散列存储每个单词出现的次数
处理完整个文件后，END{}块为运行
%w 哈希的键按字母顺序排序
打印每个单词 $key 和出现次数 $w{$key}

使用 AWK 的字数统计

Word Count using AWK

awk