如何在 bash 中连续获取唯一值(使用 awk?)
How to get unique values in a row in bash (with awk?)
我的数据集看起来像
A B A
B C A B
D E A D
A D B
我喜欢为每一行排序唯一值:
A B
A B C
A D E
A B D
然后计算(制表符分隔):
A 4
B 3
C 1
D 2
E 1
因为我现在很无聊:
gawk 'BEGIN{PROCINFO["sorted_in"] = "@ind_str_asc"; OFS="\t"}
{ delete row; s=""
for(i=1;i<=NF;i++) row[$i]
for (e in row) {
s= s ? s OFS e : e
total[e]++
}
print s
}
END{ for (e in total) print e, total[e] }' file
打印:
A B
A B C
A D E
A B D
A 4
B 3
C 1
D 2
E 1
我的数据集看起来像
A B A
B C A B
D E A D
A D B
我喜欢为每一行排序唯一值:
A B
A B C
A D E
A B D
然后计算(制表符分隔):
A 4
B 3
C 1
D 2
E 1
因为我现在很无聊:
gawk 'BEGIN{PROCINFO["sorted_in"] = "@ind_str_asc"; OFS="\t"}
{ delete row; s=""
for(i=1;i<=NF;i++) row[$i]
for (e in row) {
s= s ? s OFS e : e
total[e]++
}
print s
}
END{ for (e in total) print e, total[e] }' file
打印:
A B
A B C
A D E
A B D
A 4
B 3
C 1
D 2
E 1