将列表中的子字符串匹配到另一个文件中的多列
Match substring from a list to multiple columns in another file
我对 linux 和 perl 编程还很陌生。我用尽了所有搜索选项,但没有答案。
我有一个主文件 "master.txt",其中包含 2 列已知的所有交互,其中同一行上的项目已知交互。我有一个项目列表 "list.txt",如果它包含在第 1 列和第 2 列中,我想将其作为主文件 return 结果的搜索条件。所有文件均以制表符分隔。例如:
如果这是主文件:"master.txt"
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
BallP002 AppleP001
CatP001 AppleP002
DogP001 BallP001
DogP002 ZebraP001
ElephantP001 CardinalP001
FishP001 AntelopeP001
和这个搜索文件:"list.txt"
Apple
Ball
Cat
Dog
生成的文件应仅在两列中包含 Apple*、Ball*、Cat* 和 Dog*,但删除重复项目:
我尝试使用 grep:
grep -f list.txt master.txt > Sub_list.txt
但我明白了:
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
BallP002 AppleP001
CatP001 AppleP002
DogP001 BallP001
DogP002 ZebraP001
如何删除重复项(如果两个项目都在同一行中,则将其视为重复项,无论它们位于哪一列)并从输出文件中删除不相关的数据并得到这个?
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
非常感谢任何帮助!谢谢。
如果文件很大但没有提到这个问题,那就有点重了
use warnings;
use strict;
use feature 'say';
use Path::Tiny;
use List::Util qw(uniq any all);
my ($file, $flist) = ('master.txt', 'list.txt');
my @search = path($flist)->lines({ chomp => 1 });
# Sort words within each line so then filter out duplicate lines
my @filtered = uniq map { join ' ', sort split } path($file)->lines;
# Each word on the line needs to match a word in @search list
my @result = grep { all { found($_, \@search) } split } @filtered;
say for @result;
sub found { return any { $_[0] =~ /^$_/ } @{$_[1]} }
输出符合我对问题描述的理解
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
如果由于某种原因您不能让 Path::Tiny 提供 path
,请打开文件并检查,而不是 path(...)->lines
读取文件句柄(因此在列表上下文中)和做 chomp @search;
最后一部分,写出来一点
# Each word on the line needs to match a word in @search list
my @result = grep {
my ($w1, $w2) = split;
any { $w1 =~ /^$_/ } @search and any { $w2 =~ /^$_/ } @search;
} @filtered;
这是 awk 中的一个:
$ awk '
NR==FNR { a[]; next } # read list and hash to a
{ # process master
b="" # reset buffer
for(i in a) # iterate thru a
if(index([=10=],i)) { # if list item is found in current master record
b=[=10=] # set the record to buffer
delete a[i] # remove list entry from a
}
if(b) print b # print b
}' list master # mind the order
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
我对 linux 和 perl 编程还很陌生。我用尽了所有搜索选项,但没有答案。 我有一个主文件 "master.txt",其中包含 2 列已知的所有交互,其中同一行上的项目已知交互。我有一个项目列表 "list.txt",如果它包含在第 1 列和第 2 列中,我想将其作为主文件 return 结果的搜索条件。所有文件均以制表符分隔。例如: 如果这是主文件:"master.txt"
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
BallP002 AppleP001
CatP001 AppleP002
DogP001 BallP001
DogP002 ZebraP001
ElephantP001 CardinalP001
FishP001 AntelopeP001
和这个搜索文件:"list.txt"
Apple
Ball
Cat
Dog
生成的文件应仅在两列中包含 Apple*、Ball*、Cat* 和 Dog*,但删除重复项目:
我尝试使用 grep:
grep -f list.txt master.txt > Sub_list.txt
但我明白了:
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
BallP002 AppleP001
CatP001 AppleP002
DogP001 BallP001
DogP002 ZebraP001
如何删除重复项(如果两个项目都在同一行中,则将其视为重复项,无论它们位于哪一列)并从输出文件中删除不相关的数据并得到这个?
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001
非常感谢任何帮助!谢谢。
如果文件很大但没有提到这个问题,那就有点重了
use warnings;
use strict;
use feature 'say';
use Path::Tiny;
use List::Util qw(uniq any all);
my ($file, $flist) = ('master.txt', 'list.txt');
my @search = path($flist)->lines({ chomp => 1 });
# Sort words within each line so then filter out duplicate lines
my @filtered = uniq map { join ' ', sort split } path($file)->lines;
# Each word on the line needs to match a word in @search list
my @result = grep { all { found($_, \@search) } split } @filtered;
say for @result;
sub found { return any { $_[0] =~ /^$_/ } @{$_[1]} }
输出符合我对问题描述的理解
AppleP001 BallP002 AppleP002 CatP001 BallP001 DogP001
如果由于某种原因您不能让 Path::Tiny 提供 path
,请打开文件并检查,而不是 path(...)->lines
读取文件句柄(因此在列表上下文中)和做 chomp @search;
最后一部分,写出来一点
# Each word on the line needs to match a word in @search list
my @result = grep {
my ($w1, $w2) = split;
any { $w1 =~ /^$_/ } @search and any { $w2 =~ /^$_/ } @search;
} @filtered;
这是 awk 中的一个:
$ awk '
NR==FNR { a[]; next } # read list and hash to a
{ # process master
b="" # reset buffer
for(i in a) # iterate thru a
if(index([=10=],i)) { # if list item is found in current master record
b=[=10=] # set the record to buffer
delete a[i] # remove list entry from a
}
if(b) print b # print b
}' list master # mind the order
AppleP001 BallP002
AppleP002 CatP001
BallP001 DogP001