在 Bash 中打印开始和结束变量之间的行

Question

我正在编写一个脚本来将一些日志文件组织成人类可读的格式。我的代码背后的想法是，可以运行脚本并提供一个 IP 地址，然后它会提取与涉及该 IP 的会话相关的所有日志。到目前为止，这是我的代码所做的：

-takes in external log data 
-takes in an IP address as an argument

-loops through each log
  -if it reaches a log containing the specified IP {
    -find the MSD number within that log
    -check if MSD is in "collected" array
  
    -if MSD is already in array {
      -resume the loop
    } else {
      -add MSD to the "collected" array
      ****-search all logs for corresponding MSD and echo them to output.txt
      -does not affect the order in which the logs were generated
    }
  -add a "--------" visual separator
  -resume loop, repeating the process each time it finds the IP with a new MSD

这是我的代码：

OUTPUT_FILE=./output.txt
ARR=()
while IFS= read -r line; do
  if grep -q "" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${ARR[*]} " =~ " ${MSD} " ]] then
      ARR+=($MSD)
      cat ./example_logs | grep $MSD >> $OUTPUT_FILE
      echo "------------------" >> $OUTPUT_FILE
    fi
  fi
done < ./example_logs

这是示例输入：

Apr 12 03:42:45 fe1 msd[2645899]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 00:12:24 fe1 msd[2320005]: SMTPD started: connection from 46.xxx.xxx.xxx
Apr 12 00:12:24 fe1 msd[2320005]: Created UUID ....... for message
Apr 12 00:01:39 fe1 msd[2319095]: SMTPD started: connection from 85.xxx.xxx.xxx
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 00:01:39 fe1 msd[2319095]: Created UUID ....... for message
Apr 12 03:42:45 fe1 msd[2645899]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: CONN: 46.xxx.xxx.xxx -> 587 GeoIP = [LV] PTR = ....... 
Apr 12 00:01:39 fe1 msd[2319095]: CONN: 85.xxx.xxx.xxx -> 587 GeoIP = [NL] PTR = .......
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: EHLO command received, args: .......
... (400 more lines)

这是当前输出的一个片段：

#EXAMPLE LOGS (target IP is 3x.x.xx.xx)
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx 
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx 
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)

我的问题是我标记 ****（cat 行）的地方...从输出中，您可以看到有两个 IP 共享同一个 MSD。我不知道如何删除不需要的日志。有什么建议么？到目前为止，我已经尝试创建 $START 和 $END 变量，但我确定我做的不正确...

TARGET="connection from "
START="$(grep "$MSD" example_logs | grep "$TARGET")"
END="$(grep "$MSD" example_logs | grep -i "exiting")"

编辑：我在下面发布了我自己的问题的简陋解决方案，以防有人想知道我在哪里结束了这个。

Answer 1

这是我在喝了 3 杯咖啡并编写了 15 个小时的代码之后得出的简陋解决方案。我希望它可能对外面的其他人有用！

rm ./output.txt
OUTPUT_FILE=./output.txt
#
#
# ================= PART 1: =================
# An empty array for values later
MSD_ARR=()
#
# Loop through each log in ./output_logs
#   -IF it reaches a log containing the target IP {
#       -find msd# within that log & save to variable
#       -check if current msd is in MSD_ARR
#
#       -IF current msd is NOT in the array {
#           -add msd to MSD_ARR
#           -grep all logs containing the current msd
#           -output logs to new file. Does not affect the order in which logs are listed.
#       } else {
#           -resume the loop.  
#       }
#    }
#
while IFS= read -r line; do
  if grep -q "" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${MSD_ARR[*]} " =~ " ${MSD} " ]]; then
      MSD_ARR+=($MSD)
      cat ./example_logs | grep $MSD >> $OUTPUT_FILE
      echo "------------------------" >> $OUTPUT_FILE
    fi
  fi
done < ./example_logs
#
#
# ================= PART 2: =================
#
# Loop through each log in ./output.txt
#   -IF it reaches a log containing the phrase "connection from" {
#       -find msd# within that log & save to variable
#       -IF the current log does NOT contain the target IP {
#           -grab the current line's position & save as a variable called "START"
#           -grab all positions of lines containing the phrase "Exiting" & split the values into an array called "EndArr"
#
#           -FOR LOOP over EndArr
#           -IF our START position is less than (comes before) our current END position (EndVal) {
#               use "sed" to delete logs, using START and EndVal as the range. Deletes inclusively
#               BREAK out of this loop to repeat the process with the next START value
#           }
#       }
#   }
#
while IFS= read -r line; do
  if grep -q "connection from" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${line} " =~ "  " ]]; then
      START="$(awk -v var="$line" 's=index([=10=],var){print NR}' output.txt)"
      END="$(awk -v var="Exiting" 'e=index([=10=],var){print NR}' output.txt)"
      EndArr=($END)
      for EndVal in "${EndArr[@]}"; do
        if [ "$START" -lt "$EndVal" ]; then
          sed -i "$START,$EndVal d" ./output.txt
          break
        fi
      done
    fi
  fi
done < ./output.txt
#
# ===========================================
# EXAMPLE USE:
# >bash filename [IP Address]

Answer 2

只需简单调用 awk 和三个规则，即可完成您想要做的事情。（每条规则指定为 condition { commands }）如果没有 condition，则每条记录（输入行）的规则为运行基本上您需要做的就是：

从第 5 个字段中获取 7 位数字 msd（例如 msd[xxxxxxx]）；
如果不是第一行并且 msd 与最后一行不同，输出你的 "------------------" 分隔符
输出当前行，并用当前 msd

last

要保存到您的 "$OUTPUT_FILE"，只需重定向命令的输出。

如果你把它们放在一个简短的 awk 脚本中，你有：

awk '
  { # separate digits from msd[xxx] and save as msd
    # set RSTART RLENGHT (index and length of digits)
    match (,/[[:digit:]]+/)
    msd = substr(,RSTART,RLENGTH)   # assign substring of digits to msd
  }
  FNR > 1 && last != msd {  # if line > 1 and msd has changed
    print "------------------"
  }
  {
    print           # output line
    last = msd      # update last with msd
  }
' file > "$OUTPUT_FILE"

例子Use/Output

由于您的示例数据中只有一个 msd，因此 msd 中没有要捕获的变化。出于示例目的，数据已被复制并且 msd 已更改，例如

$ cat file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)

现在运行使脚本在 msd 转换时使用您的分隔符生成以下内容：

$ awk '
>   { # separate digits from msd[xxx] and save as msd
>     # set RSTART RLENGHT (index and length of digits)
>     match (,/[[:digit:]]+/)
>     msd = substr(,RSTART,RLENGTH)   # assign substring of digits to msd
>   }
>   FNR > 1 && last != msd {  # if line > 1 and msd has changed
>     print "------------------"
>   }
>   {
>     print           # output line
>     last = msd      # update last with msd
>   }
> ' file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
------------------
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)

从你的问题中不能完全清楚你想用 TARGET、START 和 END 做什么，所以如果我误解了你想做的事情，请更新你的问题有进一步的解释并在下面发表评论。

为此使用 awk 而不是 shell 脚本和循环将 数量级 更有效。（awk 的差异大到几秒，而大型日志的 shell 脚本的运行时间长达几小时）

在 Bash 中打印开始和结束变量之间的行

Printing lines between start & end variables in Bash

bash