在一个文件的行中搜索另一个文件中的部分匹配项

Question

我有 2 个文件，第一个：

values.txt

test@
test1@
test3@
test4@
test6@
test7@    
test8@
test9@
test10@

data.csv

"username","email"
"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user456","loka@gmail.com"
"user789","lopa@gmail.com"
"user5","test7@gmail.com"
"user","xpos@gmail.com"
"user5","test9@gmail.com"
"user","xpx@gmail.com"

我希望输出是这样的：

"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"

我能做什么：

$ awk -F, -v q='"' 'NR==FNR{a[q [=13=] q]; next} 
                     in a' values.txt data.csv > test1.csv

这只有在我有完整的“电子邮件”exp 时才有效：test9@gmail.com 而不仅仅是 test9@ 新文件 test1.csv 包含：

"user5","test9@gmail.com"
 ....
 ....

无法弄清楚如何使用 awk 处理部分子字符串

Answer 1

您可以使用这个 awk:

awk -F, 'NR==FNR {a[]; next} {ea = ; gsub(/^"|@.*$/, "", ea)} ea "@" in a' values.txt data.csv

"user","test@gmail.com"
"user1","test1@gmail.com"
"user2","test3@gmail.com"
"user4","test4@gmail.com"
"user5","test7@gmail.com"
"user5","test9@gmail.com"

更具可读性的版本：

awk -F, 'NR == FNR {
   a[]                   # from values.txt store each value in array a
   next
}
{
   ea =                  # copy  into ea (email address)
   gsub(/^"|@.*$/, "", ea) # strip starting " and text after @
}
ea "@" in a                # check if ea + "@" exists in array a
' values.txt data.csv

Answer 2

您能否尝试使用 GNU awk 中显示的示例进行跟踪、编写和测试。看起来你的几行最后都有空格，以防你想删除它们然后匹配我在我的解决方案中添加的文件内容 gsub(/ +$/,"")。

awk '
{ gsub(/ +$/,"") }
FNR==NR{
  arr[[=10=]]
  next
}
{
  for(key in arr){
    if(index(,key)){
      print
      next
    }
  }
}' values.txt FS="," delta.csv

解释：为以上添加详细解释。

awk '                               ##Starting awk program from here.
{ gsub(/ +$/,"") }                  ##Using gsub to remove spaces at last of lines.
FNR==NR{                            ##Checking condition which will be TRUE when values.txt is being read.
  arr[[=11=]]                           ##Creating arr here with index of current line value.
  next                              ##next will skip all further statements from here.
}
{
  for(key in arr){                  ##Going through arr elements from here.
    if(index(,key)){              ##Checking condition if key is present by index in 2nd field.
      print                         ##Printing the current line.
      next                          ##next will skip all further statements from here.
    }
  }
}' values.txt FS="," delta.csv      ##Mentioning Input_file names here.

在一个文件的行中搜索另一个文件中的部分匹配项

Search one file's lines for a partial match in another file

awk

command-line

grep