使用 Cygwin,我如何聚合 1 列中的内容,然后计算另一列中数组中的出现次数?
Using Cygwin, how do I aggregate content in 1 column, and then do counts of occurrences in an array from another column?
例如:
20150401 A,C,R,AB,CD,EF,EE,FF
20150401 A,C,EF,FF,G
20150401 A,BB,C,EF,FG
20150401 R,AB,CD,EF,G
20150401 R,C,EF,EE,GG
20150402 A,C,EF,FF,G
20150402 D,DD,CD,FF,GG,AB,EE,EE
20150403 R,R,CD,EF,G,EE
20150403 A,C,EF,FF,G
20150403 D,CD,FF,EE,G,GG
20150403 F,EF,G,EE,C,AB
如何在不指定每个项目的情况下计算每个项目在每个日期出现的次数?所以理想情况下,输出会给我一个 "A" 在 20150401、20150402 和 20150403 上出现了多少次的列表。然后它将给出 "C" 在 20150401、20150402 和 20150403 上出现的次数。等等
Perl 来拯救!
将以下内容另存为count.pl
:
#! /usr/bin/perl
use warnings;
use strict;
my %table;
while (<>) { # Read the input line by line.
my ($date, $list) = split; # Split on whitespace.
my @items = split /,/, $list; # Split the list on commas.
$table{$_}{$date}++ for @items;# Record the occurrences.
}
for my $item (sort keys %table) { # Iterate over the items.
for my $date (sort keys %{ $table{$item} }) { # Iterate over the dates.
print "$item $date $table{$item}{$date}\n";
}
}
然后运行
perl count.pl input-file
例如:
20150401 A,C,R,AB,CD,EF,EE,FF
20150401 A,C,EF,FF,G
20150401 A,BB,C,EF,FG
20150401 R,AB,CD,EF,G
20150401 R,C,EF,EE,GG
20150402 A,C,EF,FF,G
20150402 D,DD,CD,FF,GG,AB,EE,EE
20150403 R,R,CD,EF,G,EE
20150403 A,C,EF,FF,G
20150403 D,CD,FF,EE,G,GG
20150403 F,EF,G,EE,C,AB
如何在不指定每个项目的情况下计算每个项目在每个日期出现的次数?所以理想情况下,输出会给我一个 "A" 在 20150401、20150402 和 20150403 上出现了多少次的列表。然后它将给出 "C" 在 20150401、20150402 和 20150403 上出现的次数。等等
Perl 来拯救!
将以下内容另存为count.pl
:
#! /usr/bin/perl
use warnings;
use strict;
my %table;
while (<>) { # Read the input line by line.
my ($date, $list) = split; # Split on whitespace.
my @items = split /,/, $list; # Split the list on commas.
$table{$_}{$date}++ for @items;# Record the occurrences.
}
for my $item (sort keys %table) { # Iterate over the items.
for my $date (sort keys %{ $table{$item} }) { # Iterate over the dates.
print "$item $date $table{$item}{$date}\n";
}
}
然后运行
perl count.pl input-file