如何在日志中分组并计算 bash 中的每个子组

how to group in a logs and count each subgroup in bash

我想分析一个日志文件。它有几个操作,每个操作包含一组子操作。 我想提取按操作分组的子操作数。 这在 sql 中很容易,但我被困在 bash.

这是该文件的简化版本:

    [21:30:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]

otherlogs stuff ...

[21:31:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]

otherlogs stuff ...

[21:31:21.690Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]

每个操作都是点之前的部分,其余部分属于任何子操作。

我正在寻找如下结果,我可以将其存储在一个文件中,例如:

operationName            suboperationCount
ingestion-4757-10-18-3         3
ingestion-4757-10-18-4         4
ingestion-4757-10-18-3         6

我一直在尝试一些组合,例如 cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt

但这只会 return 全球数字。

谢谢!

您可以使用这个 grep + uniq 命令:

grep -Eo '\bingestion-[0-9-]+' file.log | uniq -c
  4 ingestion-4759-9-13-41
  5 ingestion-4757-10-17-4
  7 ingestion-4757-10-18-3

编辑: 在 OP 的评论知道我们只需要在 TASKS 中包含 id,所以在这种情况下你可以尝试以下,严格考虑你每行 Input_file 中只有 1 个 TASK 字符串。

awk '
{
  sub(/.*Tasks/,"Tasks")
  while(match([=10=],/ingestion-[0-9-]+/)){
    arr[substr([=10=],RSTART,RLENGTH)]++
    [=10=]=substr([=10=],RSTART+RLENGTH)
  }
}
END{
  for(i in arr){
    print i,arr[i]
  }
}'  Input_file


对于 awk,您能否尝试使用显示的示例进行以下编写和测试。

awk '
{
  while(match([=11=],/ingestion-[0-9-]+/)){
    arr[substr([=11=],RSTART,RLENGTH)]++
    [=11=]=substr([=11=],RSTART+RLENGTH)
  }
}
END{
  for(i in arr){
    print i,arr[i]
  }
}' Input_file

说明: 为以上添加详细说明。

awk '                                       ##Starting awk program from here.
{
  while(match([=12=],/ingestion-[0-9-]+/)){     ##Running while loop till match function returns a TRUE result after matching regex init.
    arr[substr([=12=],RSTART,RLENGTH)]++        ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
    [=12=]=substr([=12=],RSTART+RLENGTH)            ##Now saving rest of the line(after the matched regx above) into current line.
  }
}
END{                                        ##Starting END block of this awk program from here.
  for(i in arr){                            ##Traversing through arr all elements here.
    print i,arr[i]                          ##printing index of array and value of array with index of i.
  }
}' Input_file                               ##mentioning Input_file name here.

$grep -o  'ingestion[\.0-9-]*\.'  file | uniq -c
      3 ingestion-4759-9-13-41.1.
      4 ingestion-4757-10-17-4.1.
      6 ingestion-4757-10-18-3.1.

我在你的代码中添加了 awk,因为它是可读的:

cat xlogs.txt | grep -o -E 'ingestion[0-9-]+' | uniq -c | awk ' 
     {if (NR == 1){
        print "operationName suboperationCount" > "fileresult.txt";
     }
     print [=10=]= " "  >> "fileresult.txt"
     }'