awk 列的总和值，如果其他列保持不变

Question

我有一个大约有 500 万行的文件 (example.csv)，每行包含 4 列（用户、日期、类型、值），如下所示：

user1,2022-01-01,type1,0.1
user1,2022-01-01,type1,0.9
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,0.3
user3,2022-01-01,type1,0.2
user3,2022-01-01,type1,0.5

我想对对应于同一用户、日期和类型的值（本例中的第 4 列）求和，因此预期输出应如下所示：

user1,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0
user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user3,2022-01-01,type1,1.0

我试过这样的东西，看看它是否有效

awk -F"," '!seen[]++;&&!seen[]++;&&!seen[]++;sum+={print sum}' example.csv

但我离正确的解决方案还很远。有什么建议吗？

Answer 1

$ awk '
BEGIN {
    FS=OFS=","
}
{
    a[ OFS  OFS ]+=
}
END {
    for(i in a) 
        print i,sprintf("%.1f",a[i])
}' file

输出：

user2,2022-01-01,type1,1.0
user2,2022-01-01,type2,1.0
user1,2022-01-01,type1,1.0
user3,2022-01-01,type1,1.0
user1,2022-01-02,type1,1.0
user1,2022-01-02,type2,1.0

输出顺序取决于 awk 实现。如果需要，请使用 sort 或 GNU AWK's PROCINFO["sorted_in"].

awk 列的总和值，如果其他列保持不变

awk sum values of a column, if other columns remain constant

awk

sum