Perl 快速检查重叠间隔?
Perl fast checking of overlapping intervals?
我正在尝试查找重叠的间隔。我有一个间隔 1000 到 5000(例如只给出一个)。这是在下面给出的时间间隔内检查的。该脚本确实有效,但速度非常慢,需要检查数千个间隔。有什么办法可以让它更快吗?谢谢
#!/usr/bin/perl
use warnings;
use strict;
use v5.16;
use List::MoreUtils qw/ any /;
my $start = 1000;
my $end = 5000;
while ( my $line = <DATA> ) {
chomp $line;
my @element = split "\t", $line;
my @checking_array = "";
for my $checking_no ( $element[0] .. $element[1] ) {
push @checking_array, $checking_no;
}
for my $value ( $start .. $end ) {
if ( any { $_ eq $value } @checking_array ) {
print "$start to $end found in $line\n";
last;
}
else { next }
}
}
__DATA__
780895 781139
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253 12156504
9136875 9137168
16657849 16658107
5000 6000
4133539 4133772
7731897 7732188
8660252 8660539
4999 10000
12156253 12156504
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
输出:
1000 to 5000 found in 5000 6000
1000 to 5000 found in 4999 10000
您永远不需要边界之间的数字!!!!只需检查边界即可。
S---------E
L-----H No overlap
L-----H Overlap
L-----H Overlap
L-----H Overlap
L----H No overlap
L---------------H Overlap
因此它们重叠,除非 HE。
while ( my $line = <DATA> ) {
chomp $line;
my ($lo, $hi) = split "\t", $line;
if ( $lo <= $end && $hi >= $start ) {
print "$start to $end found in $line\n";
}
}
无需检查 $start
和 $end
之间的每个值;您可以简单地比较两个范围的限制。我认为这段代码相当简单
#!/usr/bin/perl
use strict;
use warnings 'all';
my $start = 1000;
my $end = 5000;
while ( my $line = <DATA> ) {
my ($low, $high) = split ' ', $line;
unless ( $high < $start or $low > $end ) {
chomp $line;
print qq{$start to $end found in "$line"\n};
}
}
__DATA__
780895 781139
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253 12156504
9136875 9137168
16657849 16658107
5000 6000
4133539 4133772
7731897 7732188
8660252 8660539
4999 10000
12156253 12156504
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
输出
1000 to 5000 found in "5000 6000"
1000 to 5000 found in "4999 10000"
我正在尝试查找重叠的间隔。我有一个间隔 1000 到 5000(例如只给出一个)。这是在下面给出的时间间隔内检查的。该脚本确实有效,但速度非常慢,需要检查数千个间隔。有什么办法可以让它更快吗?谢谢
#!/usr/bin/perl
use warnings;
use strict;
use v5.16;
use List::MoreUtils qw/ any /;
my $start = 1000;
my $end = 5000;
while ( my $line = <DATA> ) {
chomp $line;
my @element = split "\t", $line;
my @checking_array = "";
for my $checking_no ( $element[0] .. $element[1] ) {
push @checking_array, $checking_no;
}
for my $value ( $start .. $end ) {
if ( any { $_ eq $value } @checking_array ) {
print "$start to $end found in $line\n";
last;
}
else { next }
}
}
__DATA__
780895 781139
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253 12156504
9136875 9137168
16657849 16658107
5000 6000
4133539 4133772
7731897 7732188
8660252 8660539
4999 10000
12156253 12156504
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
输出:
1000 to 5000 found in 5000 6000
1000 to 5000 found in 4999 10000
您永远不需要边界之间的数字!!!!只需检查边界即可。
S---------E
L-----H No overlap
L-----H Overlap
L-----H Overlap
L-----H Overlap
L----H No overlap
L---------------H Overlap
因此它们重叠,除非 HE。
while ( my $line = <DATA> ) {
chomp $line;
my ($lo, $hi) = split "\t", $line;
if ( $lo <= $end && $hi >= $start ) {
print "$start to $end found in $line\n";
}
}
无需检查 $start
和 $end
之间的每个值;您可以简单地比较两个范围的限制。我认为这段代码相当简单
#!/usr/bin/perl
use strict;
use warnings 'all';
my $start = 1000;
my $end = 5000;
while ( my $line = <DATA> ) {
my ($low, $high) = split ' ', $line;
unless ( $high < $start or $low > $end ) {
chomp $line;
print qq{$start to $end found in "$line"\n};
}
}
__DATA__
780895 781139
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
8261045 8261250
4133539 4133772
7731897 7732188
8660252 8660539
12156253 12156504
9136875 9137168
16657849 16658107
5000 6000
4133539 4133772
7731897 7732188
8660252 8660539
4999 10000
12156253 12156504
3707570 3707794
13753925 13754168
2409582 2409790
6360880 6361084
输出
1000 to 5000 found in "5000 6000"
1000 to 5000 found in "4999 10000"