AWK - 比较列 $1 ,将匹配的行附加在一起(包括重复项)
AWK - compare column $1 , append rows together that match (including duplicates)
我正在尝试解析两个包含数千行的 csv 文件。数据将仅根据第一列中的数据进行匹配和附加。我目前正在解析文件并输出到 3 个文件:
1 - key matched
2 - file1 only
3 - file2 only
我遇到的问题是,我注意到一旦它匹配了一个,它就会移动到下一行,而不是找到其他条目。对于有问题的数据,我宁愿输出包含一些重复项的多行也不愿错过任何数据。 (例如,名称列因输入数据的人而异)
输入文件
file1.csv
topic,group,name,allow
fishing,boaties,dave,yes
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2.csv
Resource,name,email,funny
fishing,frank,frank@home.com,no
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
当前输出
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank@home.com,no
file1_only
topic,group,user,allow
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
预期输出
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank@home.com,no
fishing,boaties,dave,yes,fishing,frank,frank@home.com,no
file1_only
topic,group,user,allow
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
因此对于文件 1 第 1 列中的每个键,它需要 output/append 文件 2 中第 1 列匹配的每个键。
这是我当前的 awk 过滤器。我想如果可能我需要添加一个循环?
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = }
NR==FNR {
file1[key] = [=14=]
next
}
key in file1 {
print file1[key], [=14=] > "./out_combined.csv"
delete file1[key]
next
}
{
print > "./out_file2_only.csv"
}
END {
for (key in file1) {
print file1[key] > "./out_file1_only.csv"
}
}
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
if ( NR==FNR ) {
file1hdr = [=10=]
}
else {
print file1hdr > "./out_file1_only.csv"
print > "./out_file2_only.csv"
print file1hdr, [=10=] > "./out_combined.csv"
}
next
}
{ key = }
NR==FNR {
file1[key,++cnt[key]] = [=10=]
next
}
{
file2[key]
if ( key in cnt ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i], [=10=] > "./out_combined.csv"
}
}
else {
print > "./out_file2_only.csv"
}
}
END {
for ( key in cnt ) {
if ( !(key in file2) ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i] > "./out_file1_only.csv"
}
}
}
}
$ awk -f tst.awk file1.csv file2.csv
$ head out_*
==> out_combined.csv <==
topic,group,name,allow,Resource,name,email,funny
fishing,boaties,dave,yes,fishing,frank,frank@home.com,no
fishing,divers,steve,no,fishing,frank,frank@home.com,no
==> out_file1_only.csv <==
topic,group,name,allow
flying,red,luke,yes
walking,red,tom,yes
==> out_file2_only.csv <==
Resource,name,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
我正在尝试解析两个包含数千行的 csv 文件。数据将仅根据第一列中的数据进行匹配和附加。我目前正在解析文件并输出到 3 个文件:
1 - key matched
2 - file1 only
3 - file2 only
我遇到的问题是,我注意到一旦它匹配了一个,它就会移动到下一行,而不是找到其他条目。对于有问题的数据,我宁愿输出包含一些重复项的多行也不愿错过任何数据。 (例如,名称列因输入数据的人而异)
输入文件
file1.csv
topic,group,name,allow
fishing,boaties,dave,yes
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2.csv
Resource,name,email,funny
fishing,frank,frank@home.com,no
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
当前输出
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank@home.com,no
file1_only
topic,group,user,allow
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
预期输出
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank@home.com,no
fishing,boaties,dave,yes,fishing,frank,frank@home.com,no
file1_only
topic,group,user,allow
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no
因此对于文件 1 第 1 列中的每个键,它需要 output/append 文件 2 中第 1 列匹配的每个键。
这是我当前的 awk 过滤器。我想如果可能我需要添加一个循环?
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = }
NR==FNR {
file1[key] = [=14=]
next
}
key in file1 {
print file1[key], [=14=] > "./out_combined.csv"
delete file1[key]
next
}
{
print > "./out_file2_only.csv"
}
END {
for (key in file1) {
print file1[key] > "./out_file1_only.csv"
}
}
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
if ( NR==FNR ) {
file1hdr = [=10=]
}
else {
print file1hdr > "./out_file1_only.csv"
print > "./out_file2_only.csv"
print file1hdr, [=10=] > "./out_combined.csv"
}
next
}
{ key = }
NR==FNR {
file1[key,++cnt[key]] = [=10=]
next
}
{
file2[key]
if ( key in cnt ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i], [=10=] > "./out_combined.csv"
}
}
else {
print > "./out_file2_only.csv"
}
}
END {
for ( key in cnt ) {
if ( !(key in file2) ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i] > "./out_file1_only.csv"
}
}
}
}
$ awk -f tst.awk file1.csv file2.csv
$ head out_*
==> out_combined.csv <==
topic,group,name,allow,Resource,name,email,funny
fishing,boaties,dave,yes,fishing,frank,frank@home.com,no
fishing,divers,steve,no,fishing,frank,frank@home.com,no
==> out_file1_only.csv <==
topic,group,name,allow
flying,red,luke,yes
walking,red,tom,yes
==> out_file2_only.csv <==
Resource,name,email,funny
swiming,lee,lee@wallbanger.com,no
driving,lee,lee@wallbanger.com,no