在 Perl 中解析文件并将数据存储在 Hash 中
Parsing file in Perl and store the data in Hash
我正在读取输入文件并将数据存储在散列中。稍后我想将哈希内容打印到 csv 文件。
这是脚本:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash;
while(<DATA>){
chomp;
my ($e_id, $start, $end, $priority, $node);
next unless /\S/;
my ($key, $val) = split /\s*:\s*/;
if($key =~ /eventId/) { $e_id = $val; }
if($key =~ /startTime/){ $start = $val; }
if($key =~ /endTime/) { $end = $val; }
if($key =~ /Node/) { $node = $val; }
if($key =~ /Priority/) { $priority = $val; }
$hash{$e_id}{'node'} = $node;
$hash{$e_id}{'start'} = $start;
$hash{$e_id}{'end'} = $end;
$hash{$e_id}{'priority'} = $priority;
}
print Dumper(\%hash);
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
但是这里 hash
没有按预期打印。 Hash 的主键应该是 $e_id
,辅助键是 node
、start
、end
、priority
,值是从文件中获取的各自 eventId。
我想像这样打印散列:
$VAR1 = { '150' => {
'end' => 2020-08-18T03:45:00,
'priority' => High,
'start' => 2020-08-18T03:45:00,
'node' => Node1
},
'170' => {
'end' => 2020-08-19T00:00:40,
'priority' => Medium,
'start' => 2020-08-19T00:00:10,
'node' => Node2
}
};
我该怎么做。还请建议一种合适的方法来读取文件(我怀疑我做错了什么)。因为它会发出警告 - Use of uninitialized value $e_id in hash element at a.pl line .., <DATA> line ..
如果你想在读取不同的行时使用像$node
这样的变量,你需要在while循环之外声明它们。否则,my
声明会清除前面几行的值。只需将 my
行移到 while
行之前。
此外,您只想在信息完成后填充散列。将作业包装到 $hash{$e_id}
到
if ($key eq 'eventPlace') {
...
}
您正在为文件的每一行重新创建这些变量:
$e_id, $start, $end, $priority, $node
如果您想在处理后面的行时访问这些值,则它们不能限定为对文件的每一行重复的循环。
此外,您为每一行分配记录的字段,包括在您填充 $e_id
之前。您不想为文件的每一行分配给每个字段,并且您需要等到读取了整个记录后再分配给 $hash{$e_id}
.
我的解决方案:
my %field_map = (
'startTime' => 'start',
'endTime' => 'end',
'Node' => 'node',
'Priority' => 'priority',
);
my %recs;
my $id;
my $rec = { };
while (1) {
$_ = <DATA>;
# If end of file or end of record.
if (!defined($_) || $_ =~ /^$/) {
$recs{$id} = $rec if defined($id);
# If end of file.
last if !defined($_);
# Start a new record.
$id = undef;
$rec = { };
next;
}
chomp;
my ($key, $val) = split(/\s*:\s*/, $_, 2);
if ( $key eq 'eventId' ) {
$id = $val;
}
elsif ( $field_map{$key} ) {
$rec->{ $field_map{$key} } = $val;
}
}
没有必要对文件中的条目名称进行硬编码。您可以在读取文件时使用一个非常简单的循环,立即将整个条目读取到散列中。这是假设每条记录都由一个空行分隔。
use strict;
use warnings;
use Data::Dumper;
$/ = "";
my %data;
while(<DATA>) {
my $rec = { split /\n| : /, $_ };
$data{$rec->{eventId}} = $rec;
}
print Dumper \%data;
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
这将打印:
$VAR1 = {
'170' => {
'endTime' => '2020-08-19T00:00:40',
'eventPlace' => 'Router2',
'startTime' => '2020-08-19T00:00:10',
'Node' => 'Node2',
'Priority' => 'Medium',
'eventId' => '170',
'Text' => 'This is for Node2 text'
},
'150' => {
'endTime' => '2020-08-18T03:45:00',
'eventPlace' => 'Router1',
'startTime' => '2020-08-18T03:40:00',
'Node' => 'Node1',
'Priority' => 'High',
'eventId' => '150',
'Text' => 'This is for Node1 text'
}
};
Perl代码算法
- 通过重新定义
$/ = "\n\n"
用输入数据填充 @records
- 将每条记录拆分成
%hash
- 将
%hash
字段重新映射到 %data
哈希以匹配所需的输出
- 填充
%events
散列
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my @records = do{ $/ = "\n\n"; <DATA> };
my %events;
for ( @records ) {
my(%hash,%data);
%hash = split " : |\n";
@data{qw/node priority start end/} = @hash{qw/Node Priority startTime endTime/};
$events{$hash{eventId}} = \%data;
}
say Dumper(\%events);
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
输出
$VAR1 = {
'170' => {
'start' => '2020-08-19T00:00:10',
'end' => '2020-08-19T00:00:40',
'node' => 'Node2',
'priority' => 'Medium'
},
'150' => {
'node' => 'Node1',
'priority' => 'High',
'end' => '2020-08-18T03:45:00',
'start' => '2020-08-18T03:40:00'
}
};
我正在读取输入文件并将数据存储在散列中。稍后我想将哈希内容打印到 csv 文件。
这是脚本:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash;
while(<DATA>){
chomp;
my ($e_id, $start, $end, $priority, $node);
next unless /\S/;
my ($key, $val) = split /\s*:\s*/;
if($key =~ /eventId/) { $e_id = $val; }
if($key =~ /startTime/){ $start = $val; }
if($key =~ /endTime/) { $end = $val; }
if($key =~ /Node/) { $node = $val; }
if($key =~ /Priority/) { $priority = $val; }
$hash{$e_id}{'node'} = $node;
$hash{$e_id}{'start'} = $start;
$hash{$e_id}{'end'} = $end;
$hash{$e_id}{'priority'} = $priority;
}
print Dumper(\%hash);
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
但是这里 hash
没有按预期打印。 Hash 的主键应该是 $e_id
,辅助键是 node
、start
、end
、priority
,值是从文件中获取的各自 eventId。
我想像这样打印散列:
$VAR1 = { '150' => {
'end' => 2020-08-18T03:45:00,
'priority' => High,
'start' => 2020-08-18T03:45:00,
'node' => Node1
},
'170' => {
'end' => 2020-08-19T00:00:40,
'priority' => Medium,
'start' => 2020-08-19T00:00:10,
'node' => Node2
}
};
我该怎么做。还请建议一种合适的方法来读取文件(我怀疑我做错了什么)。因为它会发出警告 - Use of uninitialized value $e_id in hash element at a.pl line .., <DATA> line ..
如果你想在读取不同的行时使用像$node
这样的变量,你需要在while循环之外声明它们。否则,my
声明会清除前面几行的值。只需将 my
行移到 while
行之前。
此外,您只想在信息完成后填充散列。将作业包装到 $hash{$e_id}
到
if ($key eq 'eventPlace') {
...
}
您正在为文件的每一行重新创建这些变量:
$e_id, $start, $end, $priority, $node
如果您想在处理后面的行时访问这些值,则它们不能限定为对文件的每一行重复的循环。
此外,您为每一行分配记录的字段,包括在您填充 $e_id
之前。您不想为文件的每一行分配给每个字段,并且您需要等到读取了整个记录后再分配给 $hash{$e_id}
.
我的解决方案:
my %field_map = (
'startTime' => 'start',
'endTime' => 'end',
'Node' => 'node',
'Priority' => 'priority',
);
my %recs;
my $id;
my $rec = { };
while (1) {
$_ = <DATA>;
# If end of file or end of record.
if (!defined($_) || $_ =~ /^$/) {
$recs{$id} = $rec if defined($id);
# If end of file.
last if !defined($_);
# Start a new record.
$id = undef;
$rec = { };
next;
}
chomp;
my ($key, $val) = split(/\s*:\s*/, $_, 2);
if ( $key eq 'eventId' ) {
$id = $val;
}
elsif ( $field_map{$key} ) {
$rec->{ $field_map{$key} } = $val;
}
}
没有必要对文件中的条目名称进行硬编码。您可以在读取文件时使用一个非常简单的循环,立即将整个条目读取到散列中。这是假设每条记录都由一个空行分隔。
use strict;
use warnings;
use Data::Dumper;
$/ = "";
my %data;
while(<DATA>) {
my $rec = { split /\n| : /, $_ };
$data{$rec->{eventId}} = $rec;
}
print Dumper \%data;
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
这将打印:
$VAR1 = {
'170' => {
'endTime' => '2020-08-19T00:00:40',
'eventPlace' => 'Router2',
'startTime' => '2020-08-19T00:00:10',
'Node' => 'Node2',
'Priority' => 'Medium',
'eventId' => '170',
'Text' => 'This is for Node2 text'
},
'150' => {
'endTime' => '2020-08-18T03:45:00',
'eventPlace' => 'Router1',
'startTime' => '2020-08-18T03:40:00',
'Node' => 'Node1',
'Priority' => 'High',
'eventId' => '150',
'Text' => 'This is for Node1 text'
}
};
Perl代码算法
- 通过重新定义
$/ = "\n\n"
用输入数据填充 - 将每条记录拆分成
%hash
- 将
%hash
字段重新映射到%data
哈希以匹配所需的输出 - 填充
%events
散列
@records
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my @records = do{ $/ = "\n\n"; <DATA> };
my %events;
for ( @records ) {
my(%hash,%data);
%hash = split " : |\n";
@data{qw/node priority start end/} = @hash{qw/Node Priority startTime endTime/};
$events{$hash{eventId}} = \%data;
}
say Dumper(\%events);
__DATA__
Priority : High
Node : Node1
startTime : 2020-08-18T03:40:00
endTime : 2020-08-18T03:45:00
eventId : 150
Text : This is for Node1 text
eventPlace : Router1
Priority : Medium
Node : Node2
startTime : 2020-08-19T00:00:10
endTime : 2020-08-19T00:00:40
eventId : 170
Text : This is for Node2 text
eventPlace : Router2
输出
$VAR1 = {
'170' => {
'start' => '2020-08-19T00:00:10',
'end' => '2020-08-19T00:00:40',
'node' => 'Node2',
'priority' => 'Medium'
},
'150' => {
'node' => 'Node1',
'priority' => 'High',
'end' => '2020-08-18T03:45:00',
'start' => '2020-08-18T03:40:00'
}
};