在一个文件的行中搜索另一个文件中的部分匹配项
Search one file's lines for a partial match in another file
我有 2 个文件,第一个:
values.txt
test@
test1@
test3@
test4@
test6@
test7@
test8@
test9@
test10@
data.csv
"username","email"
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user456","loka@gmail.com"
"user789","lopa@gmail.com"
"user5","test7@gmail.com"
"user","xpos@gmail.com"
"user5","test9@gmail.com"
"user","xpx@gmail.com"
我希望输出是这样的:
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"
我能做什么:
$ awk -F, -v q='"' 'NR==FNR{a[q [=13=] q]; next}
in a' values.txt data.csv > test1.csv
这只有在我有完整的“电子邮件”exp 时才有效:test9@gmail.com
而不仅仅是 test9@
新文件 test1.csv 包含:
"user5","test9@gmail.com"
....
....
无法弄清楚如何使用 awk 处理部分子字符串
您可以使用这个 awk
:
awk -F, 'NR==FNR {a[]; next} {ea = ; gsub(/^"|@.*$/, "", ea)} ea "@" in a' values.txt data.csv
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"
更具可读性的版本:
awk -F, 'NR == FNR {
a[] # from values.txt store each value in array a
next
}
{
ea = # copy into ea (email address)
gsub(/^"|@.*$/, "", ea) # strip starting " and text after @
}
ea "@" in a # check if ea + "@" exists in array a
' values.txt data.csv
您能否尝试使用 GNU awk
中显示的示例进行跟踪、编写和测试。看起来你的几行最后都有空格,以防你想删除它们然后匹配我在我的解决方案中添加的文件内容 gsub(/ +$/,"")
。
awk '
{ gsub(/ +$/,"") }
FNR==NR{
arr[[=10=]]
next
}
{
for(key in arr){
if(index(,key)){
print
next
}
}
}' values.txt FS="," delta.csv
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
{ gsub(/ +$/,"") } ##Using gsub to remove spaces at last of lines.
FNR==NR{ ##Checking condition which will be TRUE when values.txt is being read.
arr[[=11=]] ##Creating arr here with index of current line value.
next ##next will skip all further statements from here.
}
{
for(key in arr){ ##Going through arr elements from here.
if(index(,key)){ ##Checking condition if key is present by index in 2nd field.
print ##Printing the current line.
next ##next will skip all further statements from here.
}
}
}' values.txt FS="," delta.csv ##Mentioning Input_file names here.
我有 2 个文件,第一个:
values.txt
test@
test1@
test3@
test4@
test6@
test7@
test8@
test9@
test10@
data.csv
"username","email"
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user456","loka@gmail.com"
"user789","lopa@gmail.com"
"user5","test7@gmail.com"
"user","xpos@gmail.com"
"user5","test9@gmail.com"
"user","xpx@gmail.com"
我希望输出是这样的:
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"
我能做什么:
$ awk -F, -v q='"' 'NR==FNR{a[q [=13=] q]; next}
in a' values.txt data.csv > test1.csv
这只有在我有完整的“电子邮件”exp 时才有效:test9@gmail.com
而不仅仅是 test9@
新文件 test1.csv 包含:
"user5","test9@gmail.com"
....
....
无法弄清楚如何使用 awk 处理部分子字符串
您可以使用这个 awk
:
awk -F, 'NR==FNR {a[]; next} {ea = ; gsub(/^"|@.*$/, "", ea)} ea "@" in a' values.txt data.csv
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"
更具可读性的版本:
awk -F, 'NR == FNR {
a[] # from values.txt store each value in array a
next
}
{
ea = # copy into ea (email address)
gsub(/^"|@.*$/, "", ea) # strip starting " and text after @
}
ea "@" in a # check if ea + "@" exists in array a
' values.txt data.csv
您能否尝试使用 GNU awk
中显示的示例进行跟踪、编写和测试。看起来你的几行最后都有空格,以防你想删除它们然后匹配我在我的解决方案中添加的文件内容 gsub(/ +$/,"")
。
awk '
{ gsub(/ +$/,"") }
FNR==NR{
arr[[=10=]]
next
}
{
for(key in arr){
if(index(,key)){
print
next
}
}
}' values.txt FS="," delta.csv
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
{ gsub(/ +$/,"") } ##Using gsub to remove spaces at last of lines.
FNR==NR{ ##Checking condition which will be TRUE when values.txt is being read.
arr[[=11=]] ##Creating arr here with index of current line value.
next ##next will skip all further statements from here.
}
{
for(key in arr){ ##Going through arr elements from here.
if(index(,key)){ ##Checking condition if key is present by index in 2nd field.
print ##Printing the current line.
next ##next will skip all further statements from here.
}
}
}' values.txt FS="," delta.csv ##Mentioning Input_file names here.