使用 awk 查找列的总和、除列值并打印到新列

Find sum of column, divide column value, and print to new column using awk

所以我的数据是这样的:

file Gibbs kcal rel pop pop2
RR2.out -1752.142111    -1099486.696073  0.000000 -0.0000 1.0000
RR1.out -1752.141887    -1099486.555511  0.140562 -0.2374 0.7891
RR4.out -1752.140564    -1099485.725315  0.970758 -1.6398 0.1947
RR3.out -1752.140319    -1099485.571575  1.124498 -1.8995 0.1502
RR5.out -1752.138532    -1099484.450215  2.245858 -3.7937 0.0227
RR6.out -1752.138493    -1099484.425742  2.270331 -3.8351 0.0218

我想找到第 6 列的总和,然后将第 6 列中的每个值除以该总和,并将这些值打印在标题为 "weighted"

的新列中

正在使用

 echo "weighted" >> allRE7
 awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
 paste input out >> final

给我

 file Gibbs kcal rel pop pop2   weighted
 RR2.out    -1752.142111    -1099486.696073  0.000000 -0.0000 1.0000    0.0000
 RR1.out    -1752.141887    -1099486.555511  0.140562 -0.2374 0.7891    0.4590
 RR4.out    -1752.140564    -1099485.725315  0.970758 -1.6398 0.1947    0.3622
 RR3.out    -1752.140319    -1099485.571575  1.124498 -1.8995 0.1502    0.0894
 RR5.out    -1752.138532    -1099484.450215  2.245858 -3.7937 0.0227    0.0689
 RR6.out    -1752.138493    -1099484.425742  2.270331 -3.8351 0.0218    0.0104
         0.0100

我不知道 0.0100 值是从哪里来的。

问题是 awk 代码打印了 header 行的加权结果。要消除它,请替换:

awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out

与:

awk 'NR==FNR{sum+= ; next} FNR>1{printf("%0.4f\n", /sum)}' input input >> out

FNR>1 条件确保 /sum 仅针对数据行打印。

改进

echopaste 命令不是必需的。尝试:

$ awk 'NR==FNR{sum+= ; next} FNR==1{print [=12=],"weighted"; next} {printf("%s %0.4f\n",[=12=],/sum)}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111    -1099486.696073  0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887    -1099486.555511  0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564    -1099485.725315  0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319    -1099485.571575  1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532    -1099484.450215  2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493    -1099484.425742  2.270331 -3.8351 0.0218 0.0100

上面的一个变体使用三元运算符(帽子提示:Ed Morton),:

$ awk 'NR==FNR{sum+= ; next} {print [=13=], (FNR>1 ? sprintf("%0.4f",/sum) : "weighted")}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111    -1099486.696073  0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887    -1099486.555511  0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564    -1099485.725315  0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319    -1099485.571575  1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532    -1099484.450215  2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493    -1099484.425742  2.270331 -3.8351 0.0218 0.0100

您也在计算标题线的平均值。

要省略标题行,您的 awk 脚本应该是:

awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
paste input out >> final 

包含 paste 命令的更干净的 awk 脚本是:

awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%s %0.4f\n", [=11=], /sum)}' input input