如何使用排序和 awk 命令对文件第 4 列中的日期进行排序
How to use sort and awk command to sort dates in the 4th column of a file
我有以下名为 st.txt 的文件:
Item Type Amount Date
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
我想按日期对数据进行排序并打印除第一行以外的所有数据。
以下命令有效:
awk -F '\t' 'NR>1{print "\t""\t""\t"}' st.txt | sort -t"-" -n -k1 -k2 -k3
然后输出是:
2020-01-23 Petrol expense -160
2020-03-24 Electricity expense -200
2020-04-24 Electricity expense -200
2020-05-30 Trim line expense -50
2021-03-11 Martha Burns income 150
2021-03-14 Highbury shops income 300
如何编写此命令,以便不必重新排列列,使日期字段保持在 $4?
我尝试了以下但它不起作用:
awk -F '\t' 'NR>1{print [=13=]}' st.txt | sort -t"-" -n -k 4,1 -k 4,2 -k 4,3
此命令未对日期进行排序。
输出应该是:
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -500 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
假设您的输入文件中的字段按照您的代码建议的那样以制表符分隔:
$ tail -n +2 file | sort -t$'\t' -k4
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
使用 GNU awk:
awk -F '\t' 'NR>1{a[]=[=10=]} END{PROCINFO["sorted_in"] = "@ind_str_asc"; for(i in a){print a[i]}}' file
输出:
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
鉴于:
$ awk '{gsub(/\t/,"\t")} 1' file
Item\tType\tAmount\tDate
Petrol\texpense\t-160\t2020-01-23
Electricity\texpense\t-200\t2020-03-24
Electricity\texpense\t-200\t2020-04-24
Trim line\texpense\t-50\t2020-05-30
Martha Burns\tincome\t150\t2021-03-11
Highbury shops\tincome\t300\t2021-03-14
您可以将 Decorate / Sort / Undecorate 模式与 POSIX awk 一起使用:
awk 'BEGIN{FS=OFS="\t"} FNR>1{print , [=11=]}' file | sort | cut -f 2-
或者使用适当的 CSV 解析器设置为使用 \t
而不是逗号。 Ruby 是最简单的:
ruby -r csv -e '
options={:col_sep=>"\t", :headers=>true, :return_headers=>true}
data=CSV.parse($<.read, **options).to_a
header=data.shift.to_csv(**options)
data.sort_by{|r| r[3]}.each{|r| puts r.to_csv(**options)}
' file
要么打印:
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
我有以下名为 st.txt 的文件:
Item Type Amount Date
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
我想按日期对数据进行排序并打印除第一行以外的所有数据。 以下命令有效:
awk -F '\t' 'NR>1{print "\t""\t""\t"}' st.txt | sort -t"-" -n -k1 -k2 -k3
然后输出是:
2020-01-23 Petrol expense -160
2020-03-24 Electricity expense -200
2020-04-24 Electricity expense -200
2020-05-30 Trim line expense -50
2021-03-11 Martha Burns income 150
2021-03-14 Highbury shops income 300
如何编写此命令,以便不必重新排列列,使日期字段保持在 $4? 我尝试了以下但它不起作用:
awk -F '\t' 'NR>1{print [=13=]}' st.txt | sort -t"-" -n -k 4,1 -k 4,2 -k 4,3
此命令未对日期进行排序。
输出应该是:
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -500 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
假设您的输入文件中的字段按照您的代码建议的那样以制表符分隔:
$ tail -n +2 file | sort -t$'\t' -k4
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14
使用 GNU awk:
awk -F '\t' 'NR>1{a[]=[=10=]} END{PROCINFO["sorted_in"] = "@ind_str_asc"; for(i in a){print a[i]}}' file
输出:
Petrol expense -160 2020-01-23 Electricity expense -200 2020-03-24 Electricity expense -200 2020-04-24 Trim line expense -50 2020-05-30 Martha Burns income 150 2021-03-11 Highbury shops income 300 2021-03-14
鉴于:
$ awk '{gsub(/\t/,"\t")} 1' file
Item\tType\tAmount\tDate
Petrol\texpense\t-160\t2020-01-23
Electricity\texpense\t-200\t2020-03-24
Electricity\texpense\t-200\t2020-04-24
Trim line\texpense\t-50\t2020-05-30
Martha Burns\tincome\t150\t2021-03-11
Highbury shops\tincome\t300\t2021-03-14
您可以将 Decorate / Sort / Undecorate 模式与 POSIX awk 一起使用:
awk 'BEGIN{FS=OFS="\t"} FNR>1{print , [=11=]}' file | sort | cut -f 2-
或者使用适当的 CSV 解析器设置为使用 \t
而不是逗号。 Ruby 是最简单的:
ruby -r csv -e '
options={:col_sep=>"\t", :headers=>true, :return_headers=>true}
data=CSV.parse($<.read, **options).to_a
header=data.shift.to_csv(**options)
data.sort_by{|r| r[3]}.each{|r| puts r.to_csv(**options)}
' file
要么打印:
Petrol expense -160 2020-01-23
Electricity expense -200 2020-03-24
Electricity expense -200 2020-04-24
Trim line expense -50 2020-05-30
Martha Burns income 150 2021-03-11
Highbury shops income 300 2021-03-14