从文件中提取两个模式之间的内容

Question

我想提取The SUMMARY和End processing summary for first university之间以及The SUMMARY和End processing summary for second university

之间的以下内容

这是我的文件：

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1200
Total teachers: 10
Total subjects: 20
Total attendance: 12000
End processing summary for first university

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1500
Total teachers: 12
Total subjects: 15
Total attendance: 20000
End processing summary for second university

Logs here
...
...
...
More logs here
...
...

以下作品很棒：

firstUniversity=$(awk '/The SUMMARY/ && ++n == 1, /End processing summary for first university/' < theLog.log)

secondUniversity=$(awk '/The SUMMARY/ && ++n == 2, /End processing summary for second university/' < theLog.log)

但是，有时 the summary for first university 或 the summary for second university 丢失，上面的代码不起作用。

缺少第一个大学街区

Logs here
...
...
...
More logs here
...
...
Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1500
Total teachers: 12
Total subjects: 15
Total attendance: 20000
End processing summary for second university

Logs here
...
...
...
More logs here
...
...

或者第二个大学街区不见了

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1200
Total teachers: 10
Total subjects: 20
Total attendance: 12000
End processing summary for first university

Logs here
...
...
...
More logs here

有使用 sed 或 awk 命令的解决方案吗？

Answer 1

您能否尝试使用显示的示例进行以下、编写和测试。在这个解决方案中，为了清楚起见，我创建了 found_university 变量。

awk '
/The SUMMARY/{
   found_summary=1
   val=found_university=""
}
/End processing summary for first university|End processing summary for second university/{
   found_university=1
   if(found_university && found_summary){
     print val ORS [=10=]
   }
   val=found_university=found_summary=""
}
found_summary{
   val=(val?val ORS:"")[=10=]
}
'  Input_file

可以尝试以下不使用变量 found_university 并简单地检查大学字符串出现的条件。

awk '
/The SUMMARY/{
   found_summary=1
   val=""
}
/End processing summary for first university|End processing summary for second university/{
   if(found_summary){
     print val ORS [=11=]
   }
   val=found_summary=""
}
found_summary{
   val=(val?val ORS:"")[=11=]
}
'   Input_file

说明：为以上代码添加详细级别的说明。请向右滚动一点以查看解释:)

awk '                                                                                             ##Starting awk program from here.
/The SUMMARY/{                                                                                    ##Checking condition if line has string The SUMMARY then do following.
   found_summary=1                                                                                ##Setting found_summary as 1 here.
   val=""                                                                                         ##Nullifying variable val here.
}
/End processing summary for first university|End processing summary for second university/{       ##Checking condition if university string present in line then do following.
   if(found_summary){                                                                             ##Checking condition if found_summary is SET then do following.
     print val ORS [=12=]                                                                             ##Printing variable val ORS and current line here.
   }
   val=found_summary=""                                                                           ##Nullifying variables val and found_summary here.
}
found_summary{                                                                                    ##Checking condition if found_summary is SET then do following.
   val=(val?val ORS:"")[=12=]                                                                         ##Keep concatenating current line in val value.
}
'  Input_file                                                                                       ##Mentioning Input_file name here.

Answer 2

稍微不同的 awk 方法：

cat extract.awk

/The SUMMARY/ {                                       # match starting line
  s = [=10=]                                              # set s to current line
  p = 1                                               # set flag p to 1
}
p {                                                   # if flag p is set
   s = s ORS [=10=]                                       # keep adding lines to s
}
[=10=] ~ "End processing summary for " kw " university" { # when we find end line
   print s                                            # print full text
   p = 0                                              # reset p to 0
}

然后将其用作：

firstUniversity="$(awk -v kw='first' -f extract.awk inputFile)"
secondUniversity="$(awk -v kw='second' -f extract.awk inputFile)"

不使用 awk 脚本文件：

firstUniversity="$(awk -v kw='first' '/The SUMMARY/{s=[=12=]; p=1} p{s = s ORS [=12=]}
[=12=] ~ "End processing summary for " kw " university"{print s; p=0}' inputFile)"

secondUniversity="$(awk -v kw='second' '/The SUMMARY/{s=[=12=]; p=1} p{s = s ORS [=12=]}
[=12=] ~ "End processing summary for " kw " university"{print s; p=0}' inputFile)"

从文件中提取两个模式之间的内容

Extract content between two patterns from a file

bash

awk

extract

pattern-matching