在 perl 中创建集合

Create sets in perl

我正在尝试寻找三种输入条件下的集合(见附图)。

例如:

C1:

I
want
to
create
a
set
in
perl
with
some
values

C2:

how
to
create
set
these
values

C3:

a
set
in
perl
with
values
like
these

会产生一个像这样的集合图:

我知道如何针对每种情况以笨拙的方式做到这一点:

use warnings;
use strict; 

open my $C1, '<', 'C1.txt';
open my $C2, '<', 'C2.txt';
open my $C3, '<', 'C3.txt';

my (%c1_vals, %c2_vals, %c3_vals);
$c1_vals{$_}++ while(<$C1>);
$c2_vals{$_}++ while(<$C2>);
$c3_vals{$_}++ while(<$C3>);


my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $c1 = 0;
my $total = 0;
my $all = 0;

for my $val (keys %c1_vals){
    $total++;
    $c1++ if not $c2_vals{$val} and not $c3_vals{$val};
    $c1_c2_count++ if $c2_vals{$val} and not $c3_vals{$val};
    $c1_c3_count++ if $c3_vals{$val} and not $c2_vals{$val};
    $all++ if $c2_vals{$val} and $c3_vals{$val};
}
print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2  = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";

c1 total = 11
c1 = 4
c1 + c2  = 2
c1 + c3 = 4
c1+c2+c3 = 1

但我想知道是否有一种更简单的方法可以使用从 @ARGV 读取每个文件并计算每个集合的子例程。

我已经走到这一步了,但想不出一个优雅的方式来做到这一点:

parse($_) foreach @ARGV;

my %total;

sub parse {
    my $file = shift;
    open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
    while (<$list>) {
        chomp;
        $total{$_}++;
    }
}

如有任何帮助,我们将不胜感激!

更新

为了清楚起见,我想找到所有 3 个数据集(总共 7 个)的所有交叉点(维恩图中的所有数字)。我不想使用模块,因为我想在不做太多更改的情况下将其构建到更大的程序中。

只要你将它保持在 32-64 组以内,这可能使用按位运算更容易:

my %c_vals;
$c_vals{$_} |= 1 while(<$C1>);
$c_vals{$_} |= 2 while(<$C2>);
$c_vals{$_} |= 4 while(<$C3>);

my $total = values %c_vals;
my $c1 = grep { $_ & 1 } values %c_vals;
my $c1_c2_count = grep { ($_ & 3) == 3 } values %c_vals;
my $c1_c3_count = grep { ($_ & 5) == 5 } values %c_vals;
my $all = grep { $_ == 7 } values %c_vals;

print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2  = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";

...

my @count_in_set;
foreach my $val (values %c_values) {
    $count_in_set[$val]++;
}
for (my $i=1; $i<=7; $i++) {
    printf "Count in set %03b: %d\n", $i, $count_in_set[$i];
}

一般情况下:

my %vals;
my $n = 0;
foreach my $file (@ARGV) {
    open my $fh, '<', $file;
    $vals{$_} |= 1 << $n for <$fh>;
    $n++;
}
my @count_in_set;
foreach my $val (values %c_values) {
    $count_in_set[$val]++;
}
for (my $i=1; $i<=$#count_in_set; $i++) {
    printf "Count in set %0*b: %d\n", $n, $i, $count_in_set[$i];
}

cpan 模块 List::Compare 为包括列表交集在内的 n 列表操作提供了方便的 api。

就使用文件而言,File::Slurp 提供了一个简单的 api 来获取数组引用,完整的示例是

use List::Compare;
use File::Slurp;
my @lists = ();

push(@lists, read_file( $_, array_ref => 1 ) )  foreach @ARGV;

my @intersection = List::Compare->new(@lists)->get_intersection();

print join('', @intersection);

用法示例intersection.pl l1.txt l2.txt l3.txt

输出

set
values

这个程序完成了我需要它做的事情。

它从 @ARGV 中读取列表并打印出给定集合的所有 5 个交集。如果 运行 为 perl set.pl c1 c2 c3,并且用户输入 c1 作为 'primary' 集合,则集合定义如下:

设置A:C1

SetB: C1 + C2

SetC: C1 + C3

SetD: C1 + C2 + C3

use warnings;
use strict; 

unless ($#ARGV == 2) {
    usage();
    exit;
}

print "Enter primary set: ";
chomp(my $set = <STDIN>);

my (%c1_vals, %c2_vals, %c3_vals);

my $count = 0;
my $c;
my ($c1, $c2, $c3);

parse($_) foreach @ARGV;

my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $cond1 = 0;
my $total = 0;
my $all = 0;

for my $item (keys %c1_vals){                       
    $total++;
    if (not $c2_vals{$item} and not $c3_vals{$item}){
        $cond1++;
    }
    if ($c2_vals{$item} and not $c3_vals{$item}){
        $c1_c2_count++;
    }
    if ($c3_vals{$item} and not $c2_vals{$item}){
        $c1_c3_count++;
    }
    if ($c2_vals{$item} and $c3_vals{$item}){
        $all++;
    }
}

# print numbers for each set
print "$c1 total = $total\n";
print "$c1 = $cond1\n";
print "$c1 + $c2  = $c1_c2_count\n";
print "$c1 + $c3 = $c1_c3_count\n";
print "$c1+$c2+$c3 = $all\n";
my $check = ($cond1 + $c1_c2_count + $c1_c3_count + $all);
print "check = $check\n";


# read in each file. $ARGV[0] is set as the 'primary' set (ie that for which intersecting lists are found)
sub parse {
    $count++;
    my $file = shift;
    ($c = $file) =~ s/\.[^.]+$//;
    open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
    while(<$list>) {
        chomp;
        if ($count == 1){
            my @split = split(/\t/);
            $c1_vals{$split[0]}++;
             $c1 = $c;
         }
        if ($count == 2){
            my @split = split(/\t/);
            $c2_vals{$split[0]}++;
             $c2 = $c;
         }
        if ($count == 3){
            my @split = split(/\t/);
            $c3_vals{$split[0]}++;
             $c3 = $c;
         }
    }
}

sub usage {
    print "Usage: set.pl <list1> <list2> <ist3>\n";
    print "Calculates intersections between different sets\n";
}

当 运行 为 perl set.pl c1.txt c2.txt c3.txt 时得到:

Enter primary set: c1

c1 total = 11
c1 = 3
c1 + c2  = 2
c1 + c3 = 4
c1+c2+c3 = 2
check = 11