如何使用排序和 awk 命令对文件第 4 列中的日期进行排序

How to use sort and awk command to sort dates in the 4th column of a file

我有以下名为 st.txt 的文件:

Item    Type    Amount  Date
Petrol  expense -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -50 2020-05-30
Martha Burns    income  150 2021-03-11
Highbury shops  income  300 2021-03-14

我想按日期对数据进行排序并打印除第一行以外的所有数据。 以下命令有效:

awk -F '\t' 'NR>1{print "\t""\t""\t"}' st.txt | sort -t"-" -n -k1 -k2 -k3

然后输出是:

2020-01-23  Petrol  expense -160
2020-03-24  Electricity expense -200
2020-04-24  Electricity expense -200
2020-05-30  Trim line   expense -50
2021-03-11  Martha Burns    income  150
2021-03-14  Highbury shops  income  300

如何编写此命令,以便不必重新排列列,使日期字段保持在 $4? 我尝试了以下但它不起作用:

awk -F '\t' 'NR>1{print [=13=]}' st.txt | sort -t"-" -n -k 4,1 -k 4,2 -k 4,3

此命令未对日期进行排序。

输出应该是:

Petrol expense  -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -500    2020-05-30
Martha Burns    income      150 2021-03-11
Highbury shops  income      300 2021-03-14

假设您的输入文件中的字段按照您的代码建议的那样以制表符分隔:

$ tail -n +2 file | sort -t$'\t' -k4
Petrol  expense -160    2020-01-23
Electricity     expense -200    2020-03-24
Electricity     expense -200    2020-04-24
Trim line       expense -50     2020-05-30
Martha Burns    income  150     2021-03-11
Highbury shops  income  300     2021-03-14

使用 GNU awk:

awk -F '\t' 'NR>1{a[]=[=10=]} END{PROCINFO["sorted_in"] = "@ind_str_asc"; for(i in a){print a[i]}}' file

输出:

Petrol  expense -160    2020-01-23
Electricity     expense -200    2020-03-24
Electricity     expense -200    2020-04-24
Trim line       expense -50     2020-05-30
Martha Burns    income  150     2021-03-11
Highbury shops  income  300     2021-03-14

鉴于:

$ awk '{gsub(/\t/,"\t")} 1' file
Item\tType\tAmount\tDate
Petrol\texpense\t-160\t2020-01-23
Electricity\texpense\t-200\t2020-03-24
Electricity\texpense\t-200\t2020-04-24
Trim line\texpense\t-50\t2020-05-30
Martha Burns\tincome\t150\t2021-03-11
Highbury shops\tincome\t300\t2021-03-14

您可以将 Decorate / Sort / Undecorate 模式与 POSIX awk 一起使用:

awk 'BEGIN{FS=OFS="\t"} FNR>1{print , [=11=]}' file | sort | cut  -f 2-

或者使用适当的 CSV 解析器设置为使用 \t 而不是逗号。 Ruby 是最简单的:

ruby -r csv -e '
options={:col_sep=>"\t", :headers=>true, :return_headers=>true}
data=CSV.parse($<.read, **options).to_a
header=data.shift.to_csv(**options)
data.sort_by{|r| r[3]}.each{|r| puts r.to_csv(**options)}
' file

要么打印:

Petrol  expense -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -50 2020-05-30
Martha Burns    income  150 2021-03-11
Highbury shops  income  300 2021-03-14