在 perl 中创建集合
Create sets in perl
我正在尝试寻找三种输入条件下的集合(见附图)。
例如:
C1:
I
want
to
create
a
set
in
perl
with
some
values
C2:
how
to
create
set
these
values
C3:
a
set
in
perl
with
values
like
these
会产生一个像这样的集合图:
我知道如何针对每种情况以笨拙的方式做到这一点:
use warnings;
use strict;
open my $C1, '<', 'C1.txt';
open my $C2, '<', 'C2.txt';
open my $C3, '<', 'C3.txt';
my (%c1_vals, %c2_vals, %c3_vals);
$c1_vals{$_}++ while(<$C1>);
$c2_vals{$_}++ while(<$C2>);
$c3_vals{$_}++ while(<$C3>);
my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $c1 = 0;
my $total = 0;
my $all = 0;
for my $val (keys %c1_vals){
$total++;
$c1++ if not $c2_vals{$val} and not $c3_vals{$val};
$c1_c2_count++ if $c2_vals{$val} and not $c3_vals{$val};
$c1_c3_count++ if $c3_vals{$val} and not $c2_vals{$val};
$all++ if $c2_vals{$val} and $c3_vals{$val};
}
print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2 = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";
c1 total = 11
c1 = 4
c1 + c2 = 2
c1 + c3 = 4
c1+c2+c3 = 1
但我想知道是否有一种更简单的方法可以使用从 @ARGV
读取每个文件并计算每个集合的子例程。
我已经走到这一步了,但想不出一个优雅的方式来做到这一点:
parse($_) foreach @ARGV;
my %total;
sub parse {
my $file = shift;
open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
while (<$list>) {
chomp;
$total{$_}++;
}
}
如有任何帮助,我们将不胜感激!
更新
为了清楚起见,我想找到所有 3 个数据集(总共 7 个)的所有交叉点(维恩图中的所有数字)。我不想使用模块,因为我想在不做太多更改的情况下将其构建到更大的程序中。
只要你将它保持在 32-64 组以内,这可能使用按位运算更容易:
my %c_vals;
$c_vals{$_} |= 1 while(<$C1>);
$c_vals{$_} |= 2 while(<$C2>);
$c_vals{$_} |= 4 while(<$C3>);
my $total = values %c_vals;
my $c1 = grep { $_ & 1 } values %c_vals;
my $c1_c2_count = grep { ($_ & 3) == 3 } values %c_vals;
my $c1_c3_count = grep { ($_ & 5) == 5 } values %c_vals;
my $all = grep { $_ == 7 } values %c_vals;
print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2 = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";
...
my @count_in_set;
foreach my $val (values %c_values) {
$count_in_set[$val]++;
}
for (my $i=1; $i<=7; $i++) {
printf "Count in set %03b: %d\n", $i, $count_in_set[$i];
}
一般情况下:
my %vals;
my $n = 0;
foreach my $file (@ARGV) {
open my $fh, '<', $file;
$vals{$_} |= 1 << $n for <$fh>;
$n++;
}
my @count_in_set;
foreach my $val (values %c_values) {
$count_in_set[$val]++;
}
for (my $i=1; $i<=$#count_in_set; $i++) {
printf "Count in set %0*b: %d\n", $n, $i, $count_in_set[$i];
}
cpan 模块 List::Compare 为包括列表交集在内的 n 列表操作提供了方便的 api。
就使用文件而言,File::Slurp 提供了一个简单的 api 来获取数组引用,完整的示例是
use List::Compare;
use File::Slurp;
my @lists = ();
push(@lists, read_file( $_, array_ref => 1 ) ) foreach @ARGV;
my @intersection = List::Compare->new(@lists)->get_intersection();
print join('', @intersection);
用法示例intersection.pl l1.txt l2.txt l3.txt
输出
set
values
这个程序完成了我需要它做的事情。
它从 @ARGV
中读取列表并打印出给定集合的所有 5 个交集。如果 运行 为 perl set.pl c1 c2 c3
,并且用户输入 c1
作为 'primary' 集合,则集合定义如下:
设置A:C1
SetB: C1 + C2
SetC: C1 + C3
SetD: C1 + C2 + C3
use warnings;
use strict;
unless ($#ARGV == 2) {
usage();
exit;
}
print "Enter primary set: ";
chomp(my $set = <STDIN>);
my (%c1_vals, %c2_vals, %c3_vals);
my $count = 0;
my $c;
my ($c1, $c2, $c3);
parse($_) foreach @ARGV;
my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $cond1 = 0;
my $total = 0;
my $all = 0;
for my $item (keys %c1_vals){
$total++;
if (not $c2_vals{$item} and not $c3_vals{$item}){
$cond1++;
}
if ($c2_vals{$item} and not $c3_vals{$item}){
$c1_c2_count++;
}
if ($c3_vals{$item} and not $c2_vals{$item}){
$c1_c3_count++;
}
if ($c2_vals{$item} and $c3_vals{$item}){
$all++;
}
}
# print numbers for each set
print "$c1 total = $total\n";
print "$c1 = $cond1\n";
print "$c1 + $c2 = $c1_c2_count\n";
print "$c1 + $c3 = $c1_c3_count\n";
print "$c1+$c2+$c3 = $all\n";
my $check = ($cond1 + $c1_c2_count + $c1_c3_count + $all);
print "check = $check\n";
# read in each file. $ARGV[0] is set as the 'primary' set (ie that for which intersecting lists are found)
sub parse {
$count++;
my $file = shift;
($c = $file) =~ s/\.[^.]+$//;
open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
while(<$list>) {
chomp;
if ($count == 1){
my @split = split(/\t/);
$c1_vals{$split[0]}++;
$c1 = $c;
}
if ($count == 2){
my @split = split(/\t/);
$c2_vals{$split[0]}++;
$c2 = $c;
}
if ($count == 3){
my @split = split(/\t/);
$c3_vals{$split[0]}++;
$c3 = $c;
}
}
}
sub usage {
print "Usage: set.pl <list1> <list2> <ist3>\n";
print "Calculates intersections between different sets\n";
}
当 运行 为 perl set.pl c1.txt c2.txt c3.txt
时得到:
Enter primary set: c1
c1 total = 11
c1 = 3
c1 + c2 = 2
c1 + c3 = 4
c1+c2+c3 = 2
check = 11
我正在尝试寻找三种输入条件下的集合(见附图)。
例如:
C1:
I
want
to
create
a
set
in
perl
with
some
values
C2:
how
to
create
set
these
values
C3:
a
set
in
perl
with
values
like
these
会产生一个像这样的集合图:
我知道如何针对每种情况以笨拙的方式做到这一点:
use warnings;
use strict;
open my $C1, '<', 'C1.txt';
open my $C2, '<', 'C2.txt';
open my $C3, '<', 'C3.txt';
my (%c1_vals, %c2_vals, %c3_vals);
$c1_vals{$_}++ while(<$C1>);
$c2_vals{$_}++ while(<$C2>);
$c3_vals{$_}++ while(<$C3>);
my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $c1 = 0;
my $total = 0;
my $all = 0;
for my $val (keys %c1_vals){
$total++;
$c1++ if not $c2_vals{$val} and not $c3_vals{$val};
$c1_c2_count++ if $c2_vals{$val} and not $c3_vals{$val};
$c1_c3_count++ if $c3_vals{$val} and not $c2_vals{$val};
$all++ if $c2_vals{$val} and $c3_vals{$val};
}
print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2 = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";
c1 total = 11
c1 = 4
c1 + c2 = 2
c1 + c3 = 4
c1+c2+c3 = 1
但我想知道是否有一种更简单的方法可以使用从 @ARGV
读取每个文件并计算每个集合的子例程。
我已经走到这一步了,但想不出一个优雅的方式来做到这一点:
parse($_) foreach @ARGV;
my %total;
sub parse {
my $file = shift;
open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
while (<$list>) {
chomp;
$total{$_}++;
}
}
如有任何帮助,我们将不胜感激!
更新
为了清楚起见,我想找到所有 3 个数据集(总共 7 个)的所有交叉点(维恩图中的所有数字)。我不想使用模块,因为我想在不做太多更改的情况下将其构建到更大的程序中。
只要你将它保持在 32-64 组以内,这可能使用按位运算更容易:
my %c_vals;
$c_vals{$_} |= 1 while(<$C1>);
$c_vals{$_} |= 2 while(<$C2>);
$c_vals{$_} |= 4 while(<$C3>);
my $total = values %c_vals;
my $c1 = grep { $_ & 1 } values %c_vals;
my $c1_c2_count = grep { ($_ & 3) == 3 } values %c_vals;
my $c1_c3_count = grep { ($_ & 5) == 5 } values %c_vals;
my $all = grep { $_ == 7 } values %c_vals;
print "c1 total = $total\n";
print "c1 = $c1\n";
print "c1 + c2 = $c1_c2_count\n";
print "c1 + c3 = $c1_c3_count\n";
print "c1+c2+c3 = $all\n";
...
my @count_in_set;
foreach my $val (values %c_values) {
$count_in_set[$val]++;
}
for (my $i=1; $i<=7; $i++) {
printf "Count in set %03b: %d\n", $i, $count_in_set[$i];
}
一般情况下:
my %vals;
my $n = 0;
foreach my $file (@ARGV) {
open my $fh, '<', $file;
$vals{$_} |= 1 << $n for <$fh>;
$n++;
}
my @count_in_set;
foreach my $val (values %c_values) {
$count_in_set[$val]++;
}
for (my $i=1; $i<=$#count_in_set; $i++) {
printf "Count in set %0*b: %d\n", $n, $i, $count_in_set[$i];
}
cpan 模块 List::Compare 为包括列表交集在内的 n 列表操作提供了方便的 api。
就使用文件而言,File::Slurp 提供了一个简单的 api 来获取数组引用,完整的示例是
use List::Compare;
use File::Slurp;
my @lists = ();
push(@lists, read_file( $_, array_ref => 1 ) ) foreach @ARGV;
my @intersection = List::Compare->new(@lists)->get_intersection();
print join('', @intersection);
用法示例intersection.pl l1.txt l2.txt l3.txt
输出
set
values
这个程序完成了我需要它做的事情。
它从 @ARGV
中读取列表并打印出给定集合的所有 5 个交集。如果 运行 为 perl set.pl c1 c2 c3
,并且用户输入 c1
作为 'primary' 集合,则集合定义如下:
设置A:C1
SetB: C1 + C2
SetC: C1 + C3
SetD: C1 + C2 + C3
use warnings;
use strict;
unless ($#ARGV == 2) {
usage();
exit;
}
print "Enter primary set: ";
chomp(my $set = <STDIN>);
my (%c1_vals, %c2_vals, %c3_vals);
my $count = 0;
my $c;
my ($c1, $c2, $c3);
parse($_) foreach @ARGV;
my $c1_c2_count = 0;
my $c1_c3_count = 0;
my $cond1 = 0;
my $total = 0;
my $all = 0;
for my $item (keys %c1_vals){
$total++;
if (not $c2_vals{$item} and not $c3_vals{$item}){
$cond1++;
}
if ($c2_vals{$item} and not $c3_vals{$item}){
$c1_c2_count++;
}
if ($c3_vals{$item} and not $c2_vals{$item}){
$c1_c3_count++;
}
if ($c2_vals{$item} and $c3_vals{$item}){
$all++;
}
}
# print numbers for each set
print "$c1 total = $total\n";
print "$c1 = $cond1\n";
print "$c1 + $c2 = $c1_c2_count\n";
print "$c1 + $c3 = $c1_c3_count\n";
print "$c1+$c2+$c3 = $all\n";
my $check = ($cond1 + $c1_c2_count + $c1_c3_count + $all);
print "check = $check\n";
# read in each file. $ARGV[0] is set as the 'primary' set (ie that for which intersecting lists are found)
sub parse {
$count++;
my $file = shift;
($c = $file) =~ s/\.[^.]+$//;
open my $list, '<', $file or die "Can't read file '$file' [$!]\n";
while(<$list>) {
chomp;
if ($count == 1){
my @split = split(/\t/);
$c1_vals{$split[0]}++;
$c1 = $c;
}
if ($count == 2){
my @split = split(/\t/);
$c2_vals{$split[0]}++;
$c2 = $c;
}
if ($count == 3){
my @split = split(/\t/);
$c3_vals{$split[0]}++;
$c3 = $c;
}
}
}
sub usage {
print "Usage: set.pl <list1> <list2> <ist3>\n";
print "Calculates intersections between different sets\n";
}
当 运行 为 perl set.pl c1.txt c2.txt c3.txt
时得到:
Enter primary set: c1
c1 total = 11
c1 = 3
c1 + c2 = 2
c1 + c3 = 4
c1+c2+c3 = 2
check = 11