输出中按时间的 PERL 相关性

PERL Correlation by time in output

我想问一些提示。我正在编写脚本来解析一些记录的数据,将其格式化,然后将其写入屏幕。在我的例子中,我有 3 个属性输出 "Date"、"Time" 和一些 "Message"。我有日志文件,其中经常出现一些消息。我的想法是。如果可能使用 Perl 进行关联?例如,如果我有任何事件在 5 分钟内记录了 9 次,输出只是 1 条消息,有 9 次计数?

我的代码是这个:

#!/usr/bin/perl
use strict;

#what should be searched in logs
my $regex = 'Error';

my @filtered_arr = ();
my @formated_rows = ();

my $filename = 'report3.txt';

while (<DATA>) {
   if (my $i =/\b(\d\d.\d\d.\d\d\d\d)\b/ .. /^\n+$/ ) {
      s/\n// if $i !~ /E0\z/; 

       my $logContent = "$_";

  open(my $fh, '>>', $filename) or die("Could not open file. $!");
  print $fh "$logContent";
  close $fh;
     }
}

  open my $formatedLog, $filename or die "Could not open $filename: $!";

    while( my $line = <$formatedLog>){
      while ($line =~ m/$regex/g) {
         $line =~ m/$regex/;
           push @filtered_arr, $line;
      }
}
  close $formatedLog;

  for my $row (@filtered_arr) {

    my $date = substr $row, 1, 10;  
    my $time = substr $row, 12, 8;
    my $stringDistance = (length $row) - 24;
    my $message = substr $row, 24, $stringDistance -1;

    # creating formated array (For AB)
    push @formated_rows, [$date,";", $time ,";", $message];
  }

# first pass over rows: compute the maximum width for each column
my @widths;
for my $output_row (@formated_rows) {
    for (my $col = 0; $col < @$output_row; $col++) {
        $widths[$col] = length $output_row->[$col] if length $output_row->[$col] > ($widths[$col] // 0);
    }
}

 # compute the format. for this data, it works out to "%-3s %-11s %-6s %-5s\n"
my $format = join(' ', map { "%-${_}s" } @widths) . "\n";

 # second pass: print each row using the format
for my $output_row (@formated_rows) {
    printf $format, @$output_row;
}

__DATA__

[05.09.2015 18:44:56] - Error 505
 some text about Error 505

[05.09.2015 18:45:56] - Error 505
 some text about Error 505

[05.09.2015 18:46:56] -  Error 505
 some text about Error 505

[05.09.2015 18:47:56] - Error 505
 some text about Error 505

[05.09.2015 18:48:56] - Error 505
 some text about Error 505

[05.09.2015 18:49:56] - Error 505
 some text about Error 505

[06.09.2015 12:46:56] - Error 404
 some text about Error 404

[06.09.2015 12:47:56] - Error 404
 some text about Error 404

[06.09.2015 12:48:56] - Error 404
 some text about Error 404

[06.09.2015 12:48:56] - Oracle Error
 some text about Oracle Error

[06.09.2015 12:49:56] - Error 404
 some text about Error 404

我的输出如下所示:

05.09.2015 ; 18:44:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:44:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:45:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:45:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:46:56 ;  Error 505 some text about Error 505
05.09.2015 ; 18:46:56 ;  Error 505 some text about Error 505
05.09.2015 ; 18:47:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:47:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:48:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:48:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:49:56 ; Error 505 some text about Error 505
05.09.2015 ; 18:49:56 ; Error 505 some text about Error 505
06.09.2015 ; 12:46:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:46:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:47:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:47:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:48:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:48:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:48:56 ; Oracle Error some text about Oracle Error
06.09.2015 ; 12:48:56 ; Oracle Error some text about Oracle Error
06.09.2015 ; 12:49:56 ; Error 404 some text about Error 404
06.09.2015 ; 12:49:56 ; Error 404 some text about Error 404

以及我想要实现的输出:

05.09.2015 ; 18:44:56 ; Error 505 some text about Error 505 ; 12 <- (Means it occur 12 times)
06.09.2015 ; 12:46:56 ; Error 404 some text about Error 404 ; 8
06.09.2015 ; 12:48:56 ; Oracle Error some text about Oracle Error; 2

感谢您一月的任何提示

要处理(和比较)日期,请查看 Time::Piece,它应该是一个核心模块,除非您使用的是非常古老的 perl。您可以通过解析时间戳字符串来创建一个 Time::Piece 对象...所以一旦您只取出时间戳子字符串...

my $datetime = Time::Piece->strptime( $timestamp, '%d.%m.%Y %H:%M:%S' );

然后一旦你有了那个 Time::Piece 对象,你就可以添加和删除秒,并将它与其他 Time::Piece 对象进行比较。你可以做这样的事情来创建一个截止时间...

$end_time = $datetime;
$end_time += 1 while $end_time->minute !~ /(?:0|5)$/;

给定 05.09.2015 18:46:56$datetime$end_time 将是 05.09.2015 18:50:00。然后,您可以继续将每一行时间戳转换为一个 Time::Piece 对象,并在数值上比较这些对象,即

if ( $datetime < $end_time ) {
    # ... increase count for the current logs error
}
else {
    # ... define new $end_time from the current logs timestamp
}

最终,我将这些想法实际实施到您的脚本中取决于您。

首先,使用 date/time 库让您的生活更轻松。 Time::Piece 是 Perl 发行版的标准部分,在这里运行良好。其次,将时间标准化为五分钟。第三,创建一个以标准化时间和错误消息为关键字的错误散列。

可能是这样的:

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Time::Piece;

# Paragraph mode
$/ = '';

my $format = '[%d.%m.%Y %H:%M:%S]';

my %errors;

while (<DATA>) {
  # Convert to one line
  s/\n+/ /g;
  my ($time, $error) = split /\s+\-\s+/;
  $time = Time::Piece->strptime($time, $format);
  my $mins = $time->min;
  # Normalise to a five mins
  # By subtracting the correct number of seconds
  $time -= (($mins % 5) * 60) + $time->sec;
  $errors{$time}{$error}++;
}

foreach my $time (keys %errors) {
  foreach my $error (keys %{$errors{$time}}) {
    say "$time ; $error ; $errors{$time}{$error}"
  }
}

__DATA__
[05.09.2015 18:44:56] - Error 505
 some text about Error 505

[05.09.2015 18:45:56] - Error 505
 some text about Error 505

[05.09.2015 18:46:56] -  Error 505
 some text about Error 505

[05.09.2015 18:47:56] - Error 505
 some text about Error 505

[05.09.2015 18:48:56] - Error 505
 some text about Error 505

[05.09.2015 18:49:56] - Error 505
 some text about Error 505

[06.09.2015 12:46:56] - Error 404
 some text about Error 404

[06.09.2015 12:47:56] - Error 404
 some text about Error 404

[06.09.2015 12:48:56] - Error 404
 some text about Error 404

[06.09.2015 12:48:56] - Oracle Error
 some text about Oracle Error

[06.09.2015 12:49:56] - Error 404
 some text about Error 404