如何在日志中分组并计算 bash 中的每个子组
how to group in a logs and count each subgroup in bash
我想分析一个日志文件。它有几个操作,每个操作包含一组子操作。
我想提取按操作分组的子操作数。
这在 sql 中很容易,但我被困在 bash.
中
这是该文件的简化版本:
[21:30:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]
otherlogs stuff ...
[21:31:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]
otherlogs stuff ...
[21:31:21.690Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]
每个操作都是点之前的部分,其余部分属于任何子操作。
我正在寻找如下结果,我可以将其存储在一个文件中,例如:
operationName suboperationCount
ingestion-4757-10-18-3 3
ingestion-4757-10-18-4 4
ingestion-4757-10-18-3 6
我一直在尝试一些组合,例如 cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt
但这只会 return 全球数字。
谢谢!
您可以使用这个 grep + uniq
命令:
grep -Eo '\bingestion-[0-9-]+' file.log | uniq -c
4 ingestion-4759-9-13-41
5 ingestion-4757-10-17-4
7 ingestion-4757-10-18-3
编辑: 在 OP 的评论知道我们只需要在 TASKS
中包含 id,所以在这种情况下你可以尝试以下,严格考虑你每行 Input_file 中只有 1 个 TASK
字符串。
awk '
{
sub(/.*Tasks/,"Tasks")
while(match([=10=],/ingestion-[0-9-]+/)){
arr[substr([=10=],RSTART,RLENGTH)]++
[=10=]=substr([=10=],RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
对于 awk
,您能否尝试使用显示的示例进行以下编写和测试。
awk '
{
while(match([=11=],/ingestion-[0-9-]+/)){
arr[substr([=11=],RSTART,RLENGTH)]++
[=11=]=substr([=11=],RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
说明: 为以上添加详细说明。
awk ' ##Starting awk program from here.
{
while(match([=12=],/ingestion-[0-9-]+/)){ ##Running while loop till match function returns a TRUE result after matching regex init.
arr[substr([=12=],RSTART,RLENGTH)]++ ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
[=12=]=substr([=12=],RSTART+RLENGTH) ##Now saving rest of the line(after the matched regx above) into current line.
}
}
END{ ##Starting END block of this awk program from here.
for(i in arr){ ##Traversing through arr all elements here.
print i,arr[i] ##printing index of array and value of array with index of i.
}
}' Input_file ##mentioning Input_file name here.
$grep -o 'ingestion[\.0-9-]*\.' file | uniq -c
3 ingestion-4759-9-13-41.1.
4 ingestion-4757-10-17-4.1.
6 ingestion-4757-10-18-3.1.
我在你的代码中添加了 awk,因为它是可读的:
cat xlogs.txt | grep -o -E 'ingestion[0-9-]+' | uniq -c | awk '
{if (NR == 1){
print "operationName suboperationCount" > "fileresult.txt";
}
print [=10=]= " " >> "fileresult.txt"
}'
我想分析一个日志文件。它有几个操作,每个操作包含一组子操作。 我想提取按操作分组的子操作数。 这在 sql 中很容易,但我被困在 bash.
中这是该文件的简化版本:
[21:30:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]
otherlogs stuff ...
[21:31:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]
otherlogs stuff ...
[21:31:21.690Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]
每个操作都是点之前的部分,其余部分属于任何子操作。
我正在寻找如下结果,我可以将其存储在一个文件中,例如:
operationName suboperationCount
ingestion-4757-10-18-3 3
ingestion-4757-10-18-4 4
ingestion-4757-10-18-3 6
我一直在尝试一些组合,例如 cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt
但这只会 return 全球数字。
谢谢!
您可以使用这个 grep + uniq
命令:
grep -Eo '\bingestion-[0-9-]+' file.log | uniq -c
4 ingestion-4759-9-13-41
5 ingestion-4757-10-17-4
7 ingestion-4757-10-18-3
编辑: 在 OP 的评论知道我们只需要在 TASKS
中包含 id,所以在这种情况下你可以尝试以下,严格考虑你每行 Input_file 中只有 1 个 TASK
字符串。
awk '
{
sub(/.*Tasks/,"Tasks")
while(match([=10=],/ingestion-[0-9-]+/)){
arr[substr([=10=],RSTART,RLENGTH)]++
[=10=]=substr([=10=],RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
对于 awk
,您能否尝试使用显示的示例进行以下编写和测试。
awk '
{
while(match([=11=],/ingestion-[0-9-]+/)){
arr[substr([=11=],RSTART,RLENGTH)]++
[=11=]=substr([=11=],RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
说明: 为以上添加详细说明。
awk ' ##Starting awk program from here.
{
while(match([=12=],/ingestion-[0-9-]+/)){ ##Running while loop till match function returns a TRUE result after matching regex init.
arr[substr([=12=],RSTART,RLENGTH)]++ ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
[=12=]=substr([=12=],RSTART+RLENGTH) ##Now saving rest of the line(after the matched regx above) into current line.
}
}
END{ ##Starting END block of this awk program from here.
for(i in arr){ ##Traversing through arr all elements here.
print i,arr[i] ##printing index of array and value of array with index of i.
}
}' Input_file ##mentioning Input_file name here.
$grep -o 'ingestion[\.0-9-]*\.' file | uniq -c
3 ingestion-4759-9-13-41.1.
4 ingestion-4757-10-17-4.1.
6 ingestion-4757-10-18-3.1.
我在你的代码中添加了 awk,因为它是可读的:
cat xlogs.txt | grep -o -E 'ingestion[0-9-]+' | uniq -c | awk '
{if (NR == 1){
print "operationName suboperationCount" > "fileresult.txt";
}
print [=10=]= " " >> "fileresult.txt"
}'