Perl - 排序复杂的数据结构
Perl - sorting complex data structure
我有一个旧的 perl 项目,一个事件日志的文本解析器,并收到一个请求,要求按事件 ID 对输出进行排序并删除重复的事件。所以解析器读取一个文本文件并将每个事件放入一个数组中。数组中的每个字段都包含一个具有多个键 -> 值对的散列。一个键称为序列,它包含事件的编号。我现在想根据每个数组字段的序列值对数组进行排序。其次,我想从数组中删除重复的相同序列号。
下面是我如何创建数组和哈希的一些代码,以便您了解数据结构:
open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";
my $i=0;
my $eventcount = 0;
while (<$mel>) {
# Separate events by "Date/Time" :
if (/^$/) {
next;
}
if (/^Date\/Time:\s(.*)$/) {
if ($eventcount >0) {
$i++;
}
$eventcount++; # eventcount initialized with ‘0’
}
# Gathering information of the MEL event :
if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = ; next;}
if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = ; next;}
if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc ; next;}
if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = ; next;}
if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = ; next;}
if (/^Description:\s(.*)$/) {$MEL[$i]{description} = ; next;}
if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = ; next;}
if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = ; next;}
if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = ; next;}
if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex ;}
}
文本文件中的事件示例:
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
所以基本上我想根据散列中的键值对包含对散列的引用的数组进行排序。
其次,当键的值也存在于不同的数组字段中时,我想从数组中删除一个字段。
我希望有人理解我的需要:-)
这可能吗?
您可以使用自定义排序块对数组进行排序:
my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;
但是使用散列的散列比散列的数组更容易。
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;
my %event;
my ($current, $id);
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$event{$id} = $current;
}
$current = { date => };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = ;
} elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
$current->{type} = lc ;
} elsif (/^Event category:\s(\w+)$/) {
$current->{category} = ;
} elsif (/^Priority:\s(\w+)/) {
$current->{priority} = ;
} elsif (/^Description:\s(.*)$/) {
$current->{description} = ;
} elsif (/^Event specific codes:\s(.*)$/) {
$current->{code} = ;
} elsif (/^Component location:\s(.*)$/) {
$current->{location} = ;
} elsif (/^Logged by:\s.*(.)$/) {
$current->{logged_by} = ;
} elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
$current->{version} = hex ;
}
}
for my $e (sort { $a <=> $b } keys %event) {
say 'Sequence number:', $e;
for my $k (sort keys %{ $event{$e} }) {
say "$k: $event{$e}{$k}";
}
}
可以通过构建一个大型正则表达式来匹配大部分细节来进一步简化:
my $regex = qr/
Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
|Event\ category:\s(?<category>\w+)$
|Priority:\s(?<priority>\w+)
|Description:\s(?<description>.*)$
|Event\ specific\ codes:\s(?<code>.*)$
|Component\ location:\s(?<location>.*)$
|Logged\ by:\s.*(?<logged>.)$
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$current->{type} = lc $current->{type}
if exists $current->{type};
$current->{version} = hex $current->{version}
if exists $current->{version};
$event{$id} = $current;
}
$current = { date => };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = ;
} elsif (/^$regex/) {
$current->{ (keys %+)[0] } = (values %+)[0];
} else {
warn "Skipping: $_";
}
}
问题描述不完整。不清楚记录是否同质(所有相同类型)。
好吧,如果上面的假设是正确的,那么任务就很简单了。
将文件拆分成记录,然后用事件编号作为关键字填充散列,并记录为跳过重复的值。
然后对key hash进行排序,输出记录。
use strict;
use warnings;
use feature 'say';
my %events;
my %seen;
my $data = do { local $/; <DATA> };
$data =~ s!\n(Date/Time)!\n\n!g;
my @data = split '\n\n', $data;
for my $record (@data) {
my $event = get_event_n( $record );
next if $seen{$event};
$seen{$event} = 1;
$events{$event} = $record;
}
say '----- Sorted Events -----';
for my $event (sort keys %events) {
say $events{$event};
say '-' x 45; # record separator as visual indicator
}
sub get_event_n {
my $record = shift;
my $sequence;
$record =~ /Sequence number:\s+(\d+)/;
$sequence = ;
return $sequence;
}
__DATA__
Date/Time: 2/3/20, 12:19:20 PM
Sequence number: 230
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:28 PM
Sequence number: 209
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 92, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
我的回答是基于@choroba 的正则表达式,但我认为这个更简单:
my $key = 'sequence'; #or other fields
my $keep = 'first'; #or 'last' record with identical $key
my $regex = qr{
Date/Time: \s* (?<date>.*)
|Sequence\ number: \s* (?<sequence>\d+)
|Event\ type: \s* (?<type>[0-9|a-f|A-F]{1,6})
|Event\ category: \s* (?<category>\w+)
|Priority: \s* (?<priority>\w+)
|Description: \s* (?<description>.*)
|Event\ specific\ codes: \s* (?<code>.*)
|Component\ location: \s* (?<location>.*)
|Logged\ by: \s* (?<logged_by>.*)
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
}x;
my @event=();
while (<>) {
m{^Date/Time:} and push @event, {};
m{^$regex} and @{$event[-1]}{keys %+} = values %+;
}
#special treatment for type and version: hex and lc
exists $$_{type} and $$_{type} = hex $$_{type} for @event;
exists $$_{version} and $$_{version} = lc $$_{version} for @event;
#mark for deletion
my %exists; $exists{$$_{$key}}++ and $$_{delete}=1
for $keep eq 'first' ? @event
: $keep eq 'last' ? reverse(@event)
: die "keep must be first or last";
#delete those marked
@event = grep !$$_{delete}, @event;
#sort by $key
@event = sort { $$a{$key} <=> $$b{$key} } @event;
我猜类型应该是 hex
ed,版本应该是 lc
ed,而不是像问题中那样反过来。
运行 喜欢:
perl script.pl input_file
我有一个旧的 perl 项目,一个事件日志的文本解析器,并收到一个请求,要求按事件 ID 对输出进行排序并删除重复的事件。所以解析器读取一个文本文件并将每个事件放入一个数组中。数组中的每个字段都包含一个具有多个键 -> 值对的散列。一个键称为序列,它包含事件的编号。我现在想根据每个数组字段的序列值对数组进行排序。其次,我想从数组中删除重复的相同序列号。
下面是我如何创建数组和哈希的一些代码,以便您了解数据结构:
open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";
my $i=0;
my $eventcount = 0;
while (<$mel>) {
# Separate events by "Date/Time" :
if (/^$/) {
next;
}
if (/^Date\/Time:\s(.*)$/) {
if ($eventcount >0) {
$i++;
}
$eventcount++; # eventcount initialized with ‘0’
}
# Gathering information of the MEL event :
if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = ; next;}
if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = ; next;}
if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc ; next;}
if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = ; next;}
if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = ; next;}
if (/^Description:\s(.*)$/) {$MEL[$i]{description} = ; next;}
if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = ; next;}
if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = ; next;}
if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = ; next;}
if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex ;}
}
文本文件中的事件示例:
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
所以基本上我想根据散列中的键值对包含对散列的引用的数组进行排序。
其次,当键的值也存在于不同的数组字段中时,我想从数组中删除一个字段。
我希望有人理解我的需要:-)
这可能吗?
您可以使用自定义排序块对数组进行排序:
my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;
但是使用散列的散列比散列的数组更容易。
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;
my %event;
my ($current, $id);
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$event{$id} = $current;
}
$current = { date => };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = ;
} elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
$current->{type} = lc ;
} elsif (/^Event category:\s(\w+)$/) {
$current->{category} = ;
} elsif (/^Priority:\s(\w+)/) {
$current->{priority} = ;
} elsif (/^Description:\s(.*)$/) {
$current->{description} = ;
} elsif (/^Event specific codes:\s(.*)$/) {
$current->{code} = ;
} elsif (/^Component location:\s(.*)$/) {
$current->{location} = ;
} elsif (/^Logged by:\s.*(.)$/) {
$current->{logged_by} = ;
} elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
$current->{version} = hex ;
}
}
for my $e (sort { $a <=> $b } keys %event) {
say 'Sequence number:', $e;
for my $k (sort keys %{ $event{$e} }) {
say "$k: $event{$e}{$k}";
}
}
可以通过构建一个大型正则表达式来匹配大部分细节来进一步简化:
my $regex = qr/
Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
|Event\ category:\s(?<category>\w+)$
|Priority:\s(?<priority>\w+)
|Description:\s(?<description>.*)$
|Event\ specific\ codes:\s(?<code>.*)$
|Component\ location:\s(?<location>.*)$
|Logged\ by:\s.*(?<logged>.)$
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$current->{type} = lc $current->{type}
if exists $current->{type};
$current->{version} = hex $current->{version}
if exists $current->{version};
$event{$id} = $current;
}
$current = { date => };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = ;
} elsif (/^$regex/) {
$current->{ (keys %+)[0] } = (values %+)[0];
} else {
warn "Skipping: $_";
}
}
问题描述不完整。不清楚记录是否同质(所有相同类型)。
好吧,如果上面的假设是正确的,那么任务就很简单了。
将文件拆分成记录,然后用事件编号作为关键字填充散列,并记录为跳过重复的值。
然后对key hash进行排序,输出记录。
use strict;
use warnings;
use feature 'say';
my %events;
my %seen;
my $data = do { local $/; <DATA> };
$data =~ s!\n(Date/Time)!\n\n!g;
my @data = split '\n\n', $data;
for my $record (@data) {
my $event = get_event_n( $record );
next if $seen{$event};
$seen{$event} = 1;
$events{$event} = $record;
}
say '----- Sorted Events -----';
for my $event (sort keys %events) {
say $events{$event};
say '-' x 45; # record separator as visual indicator
}
sub get_event_n {
my $record = shift;
my $sequence;
$record =~ /Sequence number:\s+(\d+)/;
$sequence = ;
return $sequence;
}
__DATA__
Date/Time: 2/3/20, 12:19:20 PM
Sequence number: 230
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:28 PM
Sequence number: 209
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 92, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
我的回答是基于@choroba 的正则表达式,但我认为这个更简单:
my $key = 'sequence'; #or other fields
my $keep = 'first'; #or 'last' record with identical $key
my $regex = qr{
Date/Time: \s* (?<date>.*)
|Sequence\ number: \s* (?<sequence>\d+)
|Event\ type: \s* (?<type>[0-9|a-f|A-F]{1,6})
|Event\ category: \s* (?<category>\w+)
|Priority: \s* (?<priority>\w+)
|Description: \s* (?<description>.*)
|Event\ specific\ codes: \s* (?<code>.*)
|Component\ location: \s* (?<location>.*)
|Logged\ by: \s* (?<logged_by>.*)
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
}x;
my @event=();
while (<>) {
m{^Date/Time:} and push @event, {};
m{^$regex} and @{$event[-1]}{keys %+} = values %+;
}
#special treatment for type and version: hex and lc
exists $$_{type} and $$_{type} = hex $$_{type} for @event;
exists $$_{version} and $$_{version} = lc $$_{version} for @event;
#mark for deletion
my %exists; $exists{$$_{$key}}++ and $$_{delete}=1
for $keep eq 'first' ? @event
: $keep eq 'last' ? reverse(@event)
: die "keep must be first or last";
#delete those marked
@event = grep !$$_{delete}, @event;
#sort by $key
@event = sort { $$a{$key} <=> $$b{$key} } @event;
我猜类型应该是 hex
ed,版本应该是 lc
ed,而不是像问题中那样反过来。
运行 喜欢:
perl script.pl input_file