同时打印和使用来自两个文件的数据

Question

pdb1.pdb

ATOM    709  CA  THR    25     -29.789  33.001  72.164  1.00  0.00
ATOM    711  CB  THR    25     -29.013  31.703  72.370  1.00  0.00
ATOM    734  CG  THR    25     -29.838  30.458  72.573  1.00  0.00
ATOM    768  CE  THR    25     -28.541  28.330  71.361  1.00  0.00

pdb2.pdb

ATOM    765  N   ALA    25     -30.838  33.150  73.195  1.00  0.00
ATOM    764  N   LEU    26     -29.457  33.193  69.767  1.00  0.00
ATOM    783  N   VAL    27     -30.286  31.938  66.438  1.00  0.00
ATOM    798  N   GLY    28     -28.076  30.044  64.519  1.00  0.00

需要输出

709 CA 765 N 1.477 -29.789 33.001 72.164 -30.838 33.150 73.195
709 CA 764 N 2.427 -29.789 33.001 72.164 -29.457 33.193 69.767
709 CA 783 N 5.844 -29.789 33.001 72.164 -30.286 31.938 66.438

等等。

pdb1.pdb和pdb2.pdb的内容是读取第2,3,6,7,8列的值，然后用第6,7,8列做距离计算

我试过了，但没有打印输出。

Perl

open( f1, "pdb1.pdb" or die $! );
open( f2, "pdb2.pdb" or die $! );

while ( ( $line1 = <$f1> ) and ( $line2 = <$f2> ) ) {

    @splitted = split( ' ', $line1 );

    my @fields = split / /, $line1;

    print $fields[1], "\n";

    my $atom1 = @{ [ $line1 =~ m/\S+/g ] }[2];
    my $no1   = @{ [ $line1 =~ m/\w+/g ] }[3];

    my $x1 = @{ [ $line1 =~ m/\w+/g ] }[6];
    my $y1 = @{ [ $line1 =~ m/\w+/g ] }[7];
    my $z1 = @{ [ $line1 =~ m/\w+/g ] }[8];

    my $atom2 = @{ [ $line2 =~ m/\w+/g ] }[2];
    my $no2   = @{ [ $line2 =~ m/\w+/g ] }[3];

    my $x2 = @{ [ $line2 =~ m/\w+/g ] }[6];
    my $y2 = @{ [ $line2 =~ m/\w+/g ] }[7];
    my $z2 = @{ [ $line2 =~ m/\w+/g ] }[8];

    print $atom1;

    for ( $f1, $f2 ) {
        print $atom1 $no1 $x1 $y1 $z1 $atom2 $no2 $x2 $y2 $z2 "\n";
    }
}

close( $f1 );
close( $f2 );

Answer 1

您的代码有很多语法错误。我对您的代码进行了一些更改，这将使您开始做您想要的。

首先 use strict 和 use warnings 通过这种方式你已经去除了很多噪音。

use strict;
use warnings;

open(my $f1, "pdb1.pdb") or die $!;    
open(my $f2, "pdb2.pdb") or die $!;

while(defined(my $line1 = <$f1>) and defined(my $line2 = <$f2>))
{
   # print "Iam here";
   my  @splitted = split(' ',$line1);

    my @fields = split / /, $line1;

    #print $fields[1], "\n";

    my $atom1 = @{[$line1 =~ m/\S+/g]}[2];
    my $no1   = @{[$line1 =~ m/\w+/g]}[3];

    my $x1 = @{[$line1 =~ m/\w+/g]}[6];
    my $y1 = @{[$line1 =~ m/\w+/g]}[7];
    my $z1 = @{[$line1 =~ m/\w+/g]}[8];

    my $atom2 = @{[$line2 =~ m/\w+/g]}[2];
    my $no2   = @{[$line2 =~ m/\w+/g]}[3];

    my $x2 = @{[$line2 =~ m/\w+/g]}[6];
    my $y2 = @{[$line2 =~ m/\w+/g]}[7];
    my $z2 = @{[$line2 =~ m/\w+/g]}[8];

    #print $atom1;

    for ($f1, $f2) { 
        print "$atom1 $no1 $x1 $y1 $z1 $atom2 $no2 $x2 $y2 $z2 \n"; 
    }
}

close ($f1);
close ($f2);

现在回答你的问题，你的预期输出似乎与你的逻辑不同。您正在同时循环两个文件，这将进行一对一的迭代，而不是 file1 中的每一行与 file2 中的所有行。所以我认为您可能需要查看循环部分。

接下来您需要了解的是列拆分。

@splitted = split(' ',$line1);

如果您以上述方式拆分一行，您将获得数组中的所有列。所以现在你的 column1 在第 0 个索引中，column2 在第一个索引中等等。

所以要获得第一列你应该做

my $col1 = @splitted[0];

如果您使用这些正则表达式只是为了获取列，那么就不需要它了，因为您已经在拆分它们并且数组中的每一列都是独立的。

更新：

您遇到的问题是您正在使用文件句柄进行迭代，这是导致问题的原因。

use strict;
use warnings;

open(my $f1, "<pdb1.pdb") or die "$!" ;
open(my $f2, "<pdb2.pdb") or die "$!" ; 
my @in1 = <$f1>;
my @in2 = <$f2>;

foreach my $file1 (@in1) {       #use array to iterate
    chomp($file1);
    #print "File1 $file1\n";
    my $atomno1=(split " ", $file1)[1];
    my $atomname1=(split " ", $file1)[2];
    my $xx=(split " ", $file1)[5];
    my $yy=(split " ", $file1)[6];
    foreach my  $file2(@in2) {

        chomp($file2);
        #print "File2 $file2\n";


        my $atomno2=(split " ", $file2)[1]; 
        my $atomname2=(split " ", $file2)[2];
        my $x=(split " ", $file2)[5];
        my $y=(split " ", $file2)[6];
        my $dis=sqrt((($x-$xx)*($x-$xx))+ (($y-$yy)*($y-$yy)));
        print "$atomno1 $atomname1 $atomno2 $atomname2 $dis $xx $yy $x $y\n" ; 
    }
    #$file1++;
} 
close ($f1);

Answer 2

将两个文件读入内存可能是最简单的，除非它们很大

此解决方案调用子例程 read_file 来构建每个文件中所有五个感兴趣字段的哈希值数组。然后计算增量并重新格式化输出数据

use strict;
use warnings 'all';

my $f1 = read_file('file1.txt');
my $f2 = read_file('file2.txt');

for my $r1 ( @$f1 ) {

    for my $r2 ( @$f2 ) {

        my ($dx, $dy, $dz) = map { $r1->{$_} - $r2->{$_} } qw/ x y z /;
        my $delta = sqrt( $dx * $dx + $dy * $dy + $dz * $dz );

        my @rec = (
            @{$r1}{qw/ id name /},
            @{$r2}{qw/ id name /},
            sprintf('%5.3f', $delta),
            @{$r1}{qw/ x y z /},
            @{$r2}{qw/ x y z /},
        );

        print "@rec\n";
    }
}

sub read_file {
    my ($file_name) = @_;

    open my $fh, '<', $file_name or die qq{Unable to open "$file_name" for input: $!};

    my @records;

    while ( <$fh> ) {
        next unless /\S/;
        my %record;
        @record{qw/ id name x y z /} = (split)[1,2,5,6,7];
        push @records, \%record;
    }

    \@records;
}

产出

709 CA 765 N 1.478 -29.789 33.001 72.164 -30.838 33.150 73.195
709 CA 764 N 2.427 -29.789 33.001 72.164 -29.457 33.193 69.767
709 CA 783 N 5.845 -29.789 33.001 72.164 -30.286 31.938 66.438
709 CA 798 N 8.374 -29.789 33.001 72.164 -28.076 30.044 64.519
711 CB 765 N 2.471 -29.013 31.703 72.370 -30.838 33.150 73.195
711 CB 764 N 3.032 -29.013 31.703 72.370 -29.457 33.193 69.767
711 CB 783 N 6.072 -29.013 31.703 72.370 -30.286 31.938 66.438
711 CB 798 N 8.079 -29.013 31.703 72.370 -28.076 30.044 64.519
734 CG 765 N 2.938 -29.838 30.458 72.573 -30.838 33.150 73.195
734 CG 764 N 3.937 -29.838 30.458 72.573 -29.457 33.193 69.767
734 CG 783 N 6.327 -29.838 30.458 72.573 -30.286 31.938 66.438
734 CG 798 N 8.255 -29.838 30.458 72.573 -28.076 30.044 64.519
768 CE 765 N 5.646 -28.541 28.330 71.361 -30.838 33.150 73.195
768 CE 764 N 5.199 -28.541 28.330 71.361 -29.457 33.193 69.767
768 CE 783 N 6.348 -28.541 28.330 71.361 -30.286 31.938 66.438
768 CE 798 N 7.069 -28.541 28.330 71.361 -28.076 30.044 64.519

同时打印和使用来自两个文件的数据

Print and use the data from two files simultaenously

perl

file

file-handling

pdb1.pdb

pdb2.pdb

需要输出

Perl

产出