从日期时间中提取日期 - 更改。 ,并打印不同领域的总结
Extract date from date time - change . to , and print sum up of different field
aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
这是我拥有的数据示例。我想在每次日期更改时总结 cost
、balanceAfter
、MainAmount
和 BALANCEBEFORE
,但我担心的是我将日期与时间结合在一起,我的小数点分隔符是点而不是逗号,所以我的 awk 脚本无法执行该操作。
我可以有一个 AWK 脚本,它首先只提取日期所以最后我会有一个输出看起来像:
Date Cost balanceAfter MainAmount BALANCEBEFORE
02/07/2014 2,00 379,3 0 379,3
03/07/2014 20,00 0,7 20 0,7
04/07/2014 2,00 309,6 0 309,6
05/07/2014 0,00 69,5 0 69,5
这是我的 AWK 脚本
awk -F 'NR==1 {header=[=12=]; next} {a[]+= a[]+= a[]+= a[]+=} END {for (i in a) {printf "%d\t%d\n", i, a[i]}; tot+=a[i]};' out.txt>output.doc
编辑:根据 Etan Reisner 的建议避免预处理步骤,使用 $NF
解决 Operator
列中不同数量的标记。
$ cat data.txt
aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
$ cat so2.awk
NR > 1 {
cost = ;
balanceAfter = ;
mainAmount = $(NF - 1);
balanceBefore = $NF;
sub(",", ".", cost);
sub(",", ".", balanceAfter);
sub(",", ".", mainAmount);
sub(",", ".", balanceBefore);
dateCost[] += cost;
dateBalanceAfter[] += balanceAfter;
dateMainAmount[] += mainAmount;
dateBalanceBefore[] += balanceBefore;
}
END {
printf("%s\t%s\t%s\t%s\t%s\n", "Date", "Cost", "BalanceAfter", "MainAmount", "BalanceBefore");
for (i in dateCost) {
printf("%s\t%f\t%f\t%f\t%f\n", i, dateCost[i], dateBalanceAfter[i], dateMainAmount[i], dateBalanceBefore[i]);
}
}
$ awk -f so2.awk data.txt
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.000000 379.300000 0.000000 379.300000
2014-07-03 20.000000 0.700000 20.000000 0.700000
2014-07-04 2.000000 309.600000 0.000000 309.600000
2014-07-05 0.000000 69.500000 0.000000 69.500000
这不需要对文件进行预处理:
awk '
BEGIN {print "Date Cost BalanceAfter MainAmount BalanceBefore"}
NR == 1 {next}
function showday() {
printf "%s\t%.2f\t%.1f\t%d\t%.1f\n", date, cost, bAfter, main, bBefore
}
date != {
if (date) showday()
date =
cost = bAfter = main = bBefore = 0
}
{
sub(/,/, ".", )
cost +=
bAfter +=
main += $(NF-1)
bBefore += $NF
}
END {showday()}
' file | column -t
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.00 379.3 0 379.3
2014-07-03 20.00 0.7 20 0.7
2014-07-04 2.00 309.6 0 309.6
2014-07-05 0.00 69.5 0 69.5
aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
这是我拥有的数据示例。我想在每次日期更改时总结 cost
、balanceAfter
、MainAmount
和 BALANCEBEFORE
,但我担心的是我将日期与时间结合在一起,我的小数点分隔符是点而不是逗号,所以我的 awk 脚本无法执行该操作。
我可以有一个 AWK 脚本,它首先只提取日期所以最后我会有一个输出看起来像:
Date Cost balanceAfter MainAmount BALANCEBEFORE
02/07/2014 2,00 379,3 0 379,3
03/07/2014 20,00 0,7 20 0,7
04/07/2014 2,00 309,6 0 309,6
05/07/2014 0,00 69,5 0 69,5
这是我的 AWK 脚本
awk -F 'NR==1 {header=[=12=]; next} {a[]+= a[]+= a[]+= a[]+=} END {for (i in a) {printf "%d\t%d\n", i, a[i]}; tot+=a[i]};' out.txt>output.doc
编辑:根据 Etan Reisner 的建议避免预处理步骤,使用 $NF
解决 Operator
列中不同数量的标记。
$ cat data.txt
aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
$ cat so2.awk
NR > 1 {
cost = ;
balanceAfter = ;
mainAmount = $(NF - 1);
balanceBefore = $NF;
sub(",", ".", cost);
sub(",", ".", balanceAfter);
sub(",", ".", mainAmount);
sub(",", ".", balanceBefore);
dateCost[] += cost;
dateBalanceAfter[] += balanceAfter;
dateMainAmount[] += mainAmount;
dateBalanceBefore[] += balanceBefore;
}
END {
printf("%s\t%s\t%s\t%s\t%s\n", "Date", "Cost", "BalanceAfter", "MainAmount", "BalanceBefore");
for (i in dateCost) {
printf("%s\t%f\t%f\t%f\t%f\n", i, dateCost[i], dateBalanceAfter[i], dateMainAmount[i], dateBalanceBefore[i]);
}
}
$ awk -f so2.awk data.txt
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.000000 379.300000 0.000000 379.300000
2014-07-03 20.000000 0.700000 20.000000 0.700000
2014-07-04 2.000000 309.600000 0.000000 309.600000
2014-07-05 0.000000 69.500000 0.000000 69.500000
这不需要对文件进行预处理:
awk '
BEGIN {print "Date Cost BalanceAfter MainAmount BalanceBefore"}
NR == 1 {next}
function showday() {
printf "%s\t%.2f\t%.1f\t%d\t%.1f\n", date, cost, bAfter, main, bBefore
}
date != {
if (date) showday()
date =
cost = bAfter = main = bBefore = 0
}
{
sub(/,/, ".", )
cost +=
bAfter +=
main += $(NF-1)
bBefore += $NF
}
END {showday()}
' file | column -t
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.00 379.3 0 379.3
2014-07-03 20.00 0.7 20 0.7
2014-07-04 2.00 309.6 0 309.6
2014-07-05 0.00 69.5 0 69.5