比较来自 2 个文件的列并以与 file1 中相同的顺序打印匹配和不匹配的行并在匹配和不匹配行的末尾打印 YES/NO

Question

文件 1

3 14573 ab712 A T
8 12099 ab002 G A
9 12874 ab790 A C
3 19879 ab734 G T

文件 2

3 14573 ab712 A T
9 12874 ab790 A C

输出

3 14573 ab712 A T YES
8 12099 ab002 G A NO
9 12874 ab790 A C YES
3 19879 ab734 G T NO

我在文件 1 和 2 上尝试了 perl foreach 循环
生成的输出如下-

3 14573 ab712 A T YES
8 12099 ab002 G A NO
9 12874 ab790 A C NO
3 19879 ab734 G T NO
4 34565 ab992 C G NO
9 12874 ab790 A C YES
3 14573 ab712 A T NO
8 12099 ab002 G A NO
9 12874 ab790 A C NO
3 19879 ab734 G T NO
4 34565 ab992 C G NO

我试过的脚本

foreach $arr1 (@arr1) {
  chomp $arr1;
  ($chr1, $pos1, $id1, $ref1, $alt1) = split(/\t/, $arr1);

  foreach $arr2 (@arr2) {
    chomp $arr2;  
    ($chr2, $pos2, $id2, $ref2, $alt2) = split(/\s/, $arr2);

    {
      if (($pos1 eq $pos2 ) && ($chr1 eq $chr2 )) {
        print "$chr1\t$pos1\t$ref1\t$alt1\tYES\n";
      } else {
        print "$chr1\t$pos1\t$ref1\t$alt1\tNO\n"
      }  
    }   
  }
}

Answer 1

您可以将 file2 读入哈希图中并使用它来查找 file1 中的条目。

示例：

#!/usr/bin/perl

use strict;
use warnings;
use Path::Tiny;

my @file1 = path("file1")->lines;
chomp @file1;
my %file2 = map {chomp; $_ => 1} path("file2")->lines;

for my $line (@file1) {
    print "$line " . (defined($file2{$line}) ? 'YES' : 'NO') . "\n";
}

如果只比较第一列和第二列：

#!/usr/bin/perl

use strict;
use warnings;
use Path::Tiny;

my @file1 = path("file1")->lines;
chomp @file1;
my %file2 = map {my @f = split; $f[0].' '.$f[1] => 1} path("file2")->lines;

for my $line (@file1) {
    my @f=split/\s+/,$line;
    print "$line " . (defined($file2{$f[0].' '.$f[1]}) ? 'YES' : 'NO') . "\n";
}

两种情况下的输出：

3 14573 ab712 A T YES
8 12099 ab002 G A NO
9 12874 ab790 A C YES
3 19879 ab734 G T NO

Answer 2

你的代码比较复杂，恐怕我没有时间去理解它并纠正你做错的地方。

不过，我确实有时间展示我的解决方案（附评论）：

#!/usr/bin/perl

# Always use these
use strict;
use warnings;

# Open file2...
open my $fh2, '<', 'file2' or die $!;

# ... and use its contents to construct a hash.
# The key of the hash is the line of data from the
# file (without the newline) and the value is the
# number 1.
# We can therefore use this hash to work out if a
# given line from file1 exists in file2.

my %file2 = map { chomp; $_ => 1 } <$fh2>;

# Open file1...
open my $fh1, '<', 'file1' or die $!;

# ... and process it a line at a time
while (<$fh1>) {
  # Remove the newline
  chomp;
  # Print the line
  print;
  # Find out if the line exists in file2
  # and print 'YES' or 'NO' as appropriate.
  print $file2{$_} ? ' YES' : ' NO';
  # Print a newline.
  print "\n";
}

更新： 这是一个仅匹配输入数据的前两个字段的版本（考虑到示例输入，这应该无关紧要，但您的代码暗示这就是您的内容想要匹配）。

#!/usr/bin/perl

# Always use these
use strict;
use warnings;

# Open file2...
open my $fh2, '<', 'file2' or die $!;

# ... and use its contents to construct a hash.
# The key of the hash is the first two fields from
# the line of data from the file and the value is the
# number 1.
# We can therefore use this hash to work out if a
# given line from file1 exists in file2.

my %file2 = map { join(' ', (split)[0,1]) => 1 } <$fh2>;

# Open file1...
open my $fh1, '<', 'file1' or die $!;

# ... and process it a line at a time
while (<$fh1>) {
  # Remove the newline
  chomp;
  # Print the line
  print;
  # Find out if the line exists in file2
  # and print 'YES' or 'NO' as appropriate.
  print $file2{join ' ', (split)[0,1]} ? ' YES' : ' NO';
  # Print a newline.
  print "\n";
}

比较来自 2 个文件的列并以与 file1 中相同的顺序打印匹配和不匹配的行并在匹配和不匹配行的末尾打印 YES/NO

compare columns from 2 files & print matching and non-matching rows in same order as in file1 & print YES/NO at end of matching and non-matching rows

perl

foreach

split