AWK:比较来自 2 个 csv 文件的 2 列,输出到第三个。我如何获得与另一个文件不匹配的输出?
AWK : comparing 2 columns from 2 csv files, outputting to a third. How do I also get the output that doesnt match to another file?
我目前有以下脚本:
awk -F, 'NR==FNR { a[ FS ]=[=11=]; next } FS in a { printf a[ FS ]; sub( FS ,""); print }' file1.csv file2.csv > combined.csv
这会比较两个 csv 文件中的两列 1 和 4,并将两个文件的结果输出到 combined.csv。是否可以输出文件 1 和文件 2 中与具有相同 awk 行的其他文件不匹配的行?或者我需要做单独的解析吗?
File1
ResourceName,ResourceType,PatternType,User,Host,Operation,PermissionType
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow
File2
topic,groupName,Name,User,email,team,contact,teamemail,date,clienttype
BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer
combined
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
Wanted additional files:
non matched file1:
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
non matched file2:
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer```
again, I might be trying to do too much in one line? would it be wiser to run another parse?
假设 $1 和 $4 的密钥对在每个输入文件中是唯一的,然后在每个 Unix 机器上的任何 shell 中使用任何 awk:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = FS }
NR==FNR {
file1[key] = [=10=]
next
}
key in file1 {
print file1[key], [=10=] > "out_combined"
delete file1[key]
next
}
{
print > "out_file2_only"
}
END {
for (key in file1) {
print file1[key] > "out_file1_only"
}
}
$ awk -f tst.awk file{1,2}
$ head out_*
==> out_combined <==
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
==> out_file1_only <==
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
==> out_file2_only <==
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer
out_file1_only 中的行顺序将由 in
运算符打乱 - 如果这是一个问题,请告诉我们,因为保留输入顺序很容易调整。
我目前有以下脚本:
awk -F, 'NR==FNR { a[ FS ]=[=11=]; next } FS in a { printf a[ FS ]; sub( FS ,""); print }' file1.csv file2.csv > combined.csv
这会比较两个 csv 文件中的两列 1 和 4,并将两个文件的结果输出到 combined.csv。是否可以输出文件 1 和文件 2 中与具有相同 awk 行的其他文件不匹配的行?或者我需要做单独的解析吗?
File1
ResourceName,ResourceType,PatternType,User,Host,Operation,PermissionType
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow
File2
topic,groupName,Name,User,email,team,contact,teamemail,date,clienttype
BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer
combined
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
Wanted additional files:
non matched file1:
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
non matched file2:
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer```
again, I might be trying to do too much in one line? would it be wiser to run another parse?
假设 $1 和 $4 的密钥对在每个输入文件中是唯一的,然后在每个 Unix 机器上的任何 shell 中使用任何 awk:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = FS }
NR==FNR {
file1[key] = [=10=]
next
}
key in file1 {
print file1[key], [=10=] > "out_combined"
delete file1[key]
next
}
{
print > "out_file2_only"
}
END {
for (key in file1) {
print file1[key] > "out_file1_only"
}
}
$ awk -f tst.awk file{1,2}
$ head out_*
==> out_combined <==
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby@example.com,team 1,Bobby,boys@example.com,2021-11-26T10:10:17Z,Consumer
==> out_file1_only <==
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
==> out_file2_only <==
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady@example.com,team 1,Susan,girls@example.com,2021-11-26T10:10:17Z,Producer
out_file1_only 中的行顺序将由 in
运算符打乱 - 如果这是一个问题,请告诉我们,因为保留输入顺序很容易调整。