如何在没有日期的情况下提取两次之间的日志条目？

Question

我正在尝试拥有一个自动化脚本，它可以获取最新的日志条目并收集两个小时前的所有日志条目，而不管那段时间是否存在日志条目。我一直运行研究的问题是，我找到的所有示例都附有日期，而我没有。示例日志输出为：


13:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
13:26:28.713687 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9287:9522, ack 13044, win 420, length 235
13:26:28.713766 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], ack 9522, win 24576, length 0
13:26:28.840650 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], seq 14286:15624, ack 9522, win 24576, length 1338
13:26:28.848949 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9522:9599, ack 14286, win 420, length 77
13:26:28.849002 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [P.], seq 15624:15674, ack 9599, win 24576, length 50
13:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
13:26:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

所以我的时间排在最前面，没有日期。而且 date 不喜欢使用它，给我一个 date: invalid date ‘+%s’ 响应并且不输出任何内容。我目前的工作是：

#!/bin/bash

truncate -s 0 twoHour.log

NEW=$(tail -n1  | cut -d ":" -f1)
# echo $NEW
New=$(date -d "$NEW" +%s)
OLD=$(($NEW-2))
New=$(date -d "$OLD" +%s)
# echo $OLD
START=$(egrep "$NEW\:\d\d\:\d\d"  | tail | date -d +%s)
END=$(egrep "$OLD\:\d\d\:\d\d"  | head | date -d +%s)

while read line; do

    # Extract the date for each line.
    # First strip off everything up to the first "[".
    # Then remove everything after the first "]".
    # Finally, straighten up the format with the cleandate function
    date="${date%%.*}"
    date=$( cleandate "$date" )

    # If the date falls between d1 and d2, print it
    if [[ $date -ge $START && $date -le $END ]]; then
         echo "$line"
    fi

done

NEW 和 OLD 用于提取的小时数。 START 和 END 是两者之间的所有内容逐行输出的边界。 $1 用于日志文件。

几个小时以来，我一直在尝试修改 bash/awk 脚本并搜索任何预制脚本，所以我不知道如何让它工作。

Answer 1

sed可用于提取正则表达式地址
表示的行 /^11:.*$/,/^13:26:28.849031 .*$/p

第一个地址可以通过获取分钟数字并添加到表达式中来进一步细化 /^11:(2[6-9]|[3-5][0-9]).*$/,/^13:26:28.849031 .*$/p

last_line=$(tail -n1 test.txt)
end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_hour="${end_time:0:2}"
min_msb="${end_time:3:1}"
min_next=$(($min_msb+1))
min_lsb="${end_time:4:1}"
start_hour=$(($end_hour-2))

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb+1))
else
  min_next=5
fi

sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p" test.txt

如果时间跨度超过 24 小时

22:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
...
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

然后
更新：修复了 sed 第一个地址在午夜时分的正则表达式。

hour_range=2
last_line=$(tail -n1 test.txt)
#end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_time="${last_line:0:8}"

start_time="$(date -d "$(date -d "$end_time" --iso=seconds) -$hour_range hour" '+%T')"
echo "Time range: $start_time - $end_time"

end_hour="$(printf "%d" ${end_time:0:2})"
min_msb="$(printf "%d" ${end_time:3:1})"
min_lsb="$(printf "%d" ${end_time:4:1})"
start_hour="$(printf '%d' ${start_time:0:2})"

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb+1))
else
  min_next=5
fi
# Crossed midnight
start_hour_expr="$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9])"
if [ "$start_hour" -gt "$end_hour" ];then
  start_hour_lsb_next=$((${start_hour:1:1} + 1))
  start_hour_next="${start_hour:0:1}${start_hour_lsb_next}"
  if [ "$start_hour_next" -eq 24 ]; then
     start_hour_next="00"
  fi
  start_hour_expr="($start_hour_expr|$start_hour_next:[0-5][0-9])"
fi

echo "sed expression:"
echo -e "/^$start_hour_expr.*$/,/^$end_time.*$/p \n"

sed -rn "/^$start_hour_expr.*$/,/^$end_time.*$/p" test.txt

给定

21:32:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
21:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
22:10:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

Returns

sed expression:
/^(22:(3[6-9]|[4-5][0-9])|23:[0-5][0-9]).*$/,/^00:36:28.*$/p 

23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

Answer 2

假设：

偏移量（在 OP 的示例中为 2 小时）小于 24 小时
每行以HH:MM:SS
日志可能跨越多天

计划：

将偏移量（例如，2 hrs）转换为秒；我们称之为 offset_secs
从文件的最后一行抓取时间；我们称之为 last_time
将时间戳转换为epoch/seconds；我们称之为 last_epoch
从last_epoch中减去offset_secs；我们称之为 first_epoch
将 first_epoch 转换回 HH:MM:SS 字符串；我们称之为 first_time
为了解决跨越多个午夜的文件时间戳，我们将把感兴趣的行保存在一个数组中，当我们发现还有另一个午夜时重置数组
在 awk/END 处理过程中，我们将行数组打印到标准输出

一个GNU awk想法：

$ cat log.awk
BEGIN { FS="." }                                # set input field delimiter to "."

# first line of input is last line of log file; grab time and calculate the offset/start time

NR==1 { last_time   = 
        last_epoch  = mktime( strftime("%Y %m %d") " " gensub(/:/," ","g",last_time))
        first_epoch = last_epoch - offset_secs
        first_time  = strftime("%H:%M:%S", first_epoch)

        if (first_time > last_time)
           spans_midnight=1
        next
      }

# for the rest of the input lines determine if the time falls within the last "offset_secs"

      { curr_time = 
        if ( (  spans_midnight && curr_time >= first_time) ||
             (  spans_midnight && curr_time <= last_time)  ||
             ( !spans_midnight && curr_time >= first_time && curr_time <= last_time) )
           lines[++cnt]=[=10=]
        else {                                  # outside the time range so ...
           delete lines                         # delete anything saved up to this point and ...
           cnt=0                                # reset the array index
        }
      }
END   { for (i=1;i<=cnt;i++)                    # print the lines that occurred within the last "offset_secs"
            print lines[i]
      }

注意： 有关 mktime() 和 strftime() 函数的更多详细信息，请参阅 GNU awk: Time Functions

测试 #1： 持续 2 小时；不跨越午夜；文件跨越午夜

$ cat sample.log
22:22:00.896232 IP 104.16.42.63.https  ignore this line
06:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
09:51:49.896232 IP 104.16.42.63.https  ignore this line
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((2*60*60))                   # 2 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

测试 #2： 持续 4 小时；跨越午夜；文件跨越多个午夜

$ cat sample.log
20:22:00.896232 IP 104.16.42.63.https  ignore this line
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
06:22:00.896232 IP 104.16.42.63.https  ignore this line
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
22:51:49.896232 IP 104.16.42.63.https  ignore this line
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((4*60*60))                   # 4 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line

Answer 3

egrep 的正则表达式功能有限。您可以使用 [0-9] 或 [[:digit:]]，但不能使用 \d。如果你想要 \d，你可以使用带有 grep -P.

的 Perl 风格的正则表达式

你也可以告诉grep只输出与-o

匹配的数据

值得注意的是，egrep 和 grep -E 是同义词；我建议明确并使用 grep -E，但这只是我的偏好。

  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -o, --only-matching       show only the part of a line matching PATTERN

对于 tail 和 head，您似乎正在为每个查找第一行和最后一行。默认情况下，它们输出 10 行。这可以通过 -n 1

来控制

您的日期命令失败，因为它不知道要读取哪个文件。您可以指定 -f - 以指示输入文件是 STDIN (Pipe string to GNU Date for conversion - how to make it read from stdin?)

有了这些，下面的内容应该可以帮助您。

START=$(egrep -o "$NEW:[0-9]{2}:[0-9]{2}\.[0-9]+"  | tail -n 1 | date +%s -f -)
END=$(egrep -o "$OLD:[0-9]{2}:[0-9]{2}\.[0-9]+"  | head -n 1| date +%s -f -)

提示：在对 bash 脚本进行故障排除时使用 bash -x 可以更好地了解正在发生的事情。

[root@91192da89fc4 temp]# bash -x date-orig.sh log
+ truncate -s 0 twoHour.log
++ tail -n1 log
++ cut -d : -f1
+ NEW=13
++ date -d 13 +%s
+ New=1651582800
+ OLD=11
++ date -d 11 +%s
+ New=1651575600
++ egrep '13\:\d\d\:\d\d' log
++ tail
++ date -d +%s
date: invalid date '+%s'
+ START=
++ egrep '11\:\d\d\:\d\d' log
++ head
++ date -d +%s
date: invalid date '+%s'
+ END=
+ read line

如何在没有日期的情况下提取两次之间的日志条目？

How do I extract log entries between two times without date?

bash