使用 awk 查找列的总和、除列值并打印到新列
Find sum of column, divide column value, and print to new column using awk
所以我的数据是这样的:
file Gibbs kcal rel pop pop2
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218
我想找到第 6 列的总和,然后将第 6 列中的每个值除以该总和,并将这些值打印在标题为 "weighted"
的新列中
正在使用
echo "weighted" >> allRE7
awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
paste input out >> final
给我
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.4590
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.3622
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0894
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0689
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0104
0.0100
我不知道 0.0100 值是从哪里来的。
问题是 awk 代码打印了 header 行的加权结果。要消除它,请替换:
awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
与:
awk 'NR==FNR{sum+= ; next} FNR>1{printf("%0.4f\n", /sum)}' input input >> out
FNR>1
条件确保 /sum
仅针对数据行打印。
改进
echo
和 paste
命令不是必需的。尝试:
$ awk 'NR==FNR{sum+= ; next} FNR==1{print [=12=],"weighted"; next} {printf("%s %0.4f\n",[=12=],/sum)}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100
上面的一个变体使用三元运算符(帽子提示:Ed Morton),:
$ awk 'NR==FNR{sum+= ; next} {print [=13=], (FNR>1 ? sprintf("%0.4f",/sum) : "weighted")}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100
您也在计算标题线的平均值。
要省略标题行,您的 awk
脚本应该是:
awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
paste input out >> final
包含 paste
命令的更干净的 awk
脚本是:
awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%s %0.4f\n", [=11=], /sum)}' input input
所以我的数据是这样的:
file Gibbs kcal rel pop pop2
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218
我想找到第 6 列的总和,然后将第 6 列中的每个值除以该总和,并将这些值打印在标题为 "weighted"
的新列中正在使用
echo "weighted" >> allRE7
awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
paste input out >> final
给我
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.4590
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.3622
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0894
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0689
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0104
0.0100
我不知道 0.0100 值是从哪里来的。
问题是 awk 代码打印了 header 行的加权结果。要消除它,请替换:
awk 'NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
与:
awk 'NR==FNR{sum+= ; next} FNR>1{printf("%0.4f\n", /sum)}' input input >> out
FNR>1
条件确保 /sum
仅针对数据行打印。
改进
echo
和 paste
命令不是必需的。尝试:
$ awk 'NR==FNR{sum+= ; next} FNR==1{print [=12=],"weighted"; next} {printf("%s %0.4f\n",[=12=],/sum)}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100
上面的一个变体使用三元运算符(帽子提示:Ed Morton),:
$ awk 'NR==FNR{sum+= ; next} {print [=13=], (FNR>1 ? sprintf("%0.4f",/sum) : "weighted")}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100
您也在计算标题线的平均值。
要省略标题行,您的 awk
脚本应该是:
awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%0.4f\n", /sum)}' input input >> out
paste input out >> final
包含 paste
命令的更干净的 awk
脚本是:
awk 'FNR==1{next}NR==FNR{sum+= ; next}{printf("%s %0.4f\n", [=11=], /sum)}' input input