你如何使用awk找到文本中最多的两个连续单词?
How do you find the most two consecutive words in a text using awk?
如果你的文字像
Reservoir 1992 reviewed by Reservoir Har even RESERVOIR DOGS
你要做的第一件事就是把所有的单词放在一栏中,
tr -s '[[:punct:][:space:]]' '\n'
Reservoir
1992
reviewed
by
Reservoir
Har
even
RESERVOIR
DOGS
然后你必须使用
合并每两个连续的行
awk 'NR == 1 { prev = [=11=]; next }
{ print prev, [=11=]; prev = [=11=] }'
输出:
Reservoir 1992
1992 reviewed
reviewed by
by Michael
Reservoir Har
Ha even
even RESERVOIR
RESERVOIR DOGS
你可以用printf
代替print
,这样输出吗? (往下看答案)
Reservoir 1992
1992 reviewed
reviewed by
by Michael
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS
然后你 -sort
然后 uniq -c
然后 sort -nr
你很接近:
awk 'FNR==1{prev=; next}
{printf "%s\t%s\n", prev, ; prev=}' file
生成您声明的词序输出。
这个:
awk 'FNR==1{prev=; next}
{printf "%s\t%s\n", prev, ; prev=}' | column -t
Reservoir 1992
1992 reviewed
reviewed by
by Reservoir
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS
生成输出格式。注意使列宽均匀的间距是可变的。要在 awk 中生成它,您通常需要遍历文件两次以设置列的宽度。 unix 实用程序 column
会为您完成。
如果您希望 awk 完成所有工作,您可以按照以下方式做一些事情:
awk 'FNR==NR{length()>max ? max=length() : max=max; next}
FNR==1{prev=; next}
{printf "%-*s\t%s\n", max,prev,; prev=}' file file
Reservoir 1992
1992 reviewed
reviewed by
by Reservoir
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS
如果你的文字像
Reservoir 1992 reviewed by Reservoir Har even RESERVOIR DOGS
你要做的第一件事就是把所有的单词放在一栏中,
tr -s '[[:punct:][:space:]]' '\n'
Reservoir
1992
reviewed
by
Reservoir
Har
even
RESERVOIR
DOGS
然后你必须使用
合并每两个连续的行awk 'NR == 1 { prev = [=11=]; next }
{ print prev, [=11=]; prev = [=11=] }'
输出:
Reservoir 1992
1992 reviewed
reviewed by
by Michael
Reservoir Har
Ha even
even RESERVOIR
RESERVOIR DOGS
你可以用printf
代替print
,这样输出吗? (往下看答案)
Reservoir 1992
1992 reviewed
reviewed by
by Michael
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS
然后你 -sort
然后 uniq -c
然后 sort -nr
你很接近:
awk 'FNR==1{prev=; next}
{printf "%s\t%s\n", prev, ; prev=}' file
生成您声明的词序输出。
这个:
awk 'FNR==1{prev=; next}
{printf "%s\t%s\n", prev, ; prev=}' | column -t
Reservoir 1992
1992 reviewed
reviewed by
by Reservoir
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS
生成输出格式。注意使列宽均匀的间距是可变的。要在 awk 中生成它,您通常需要遍历文件两次以设置列的宽度。 unix 实用程序 column
会为您完成。
如果您希望 awk 完成所有工作,您可以按照以下方式做一些事情:
awk 'FNR==NR{length()>max ? max=length() : max=max; next}
FNR==1{prev=; next}
{printf "%-*s\t%s\n", max,prev,; prev=}' file file
Reservoir 1992
1992 reviewed
reviewed by
by Reservoir
Reservoir Har
Har even
even RESERVOIR
RESERVOIR DOGS