如何在没有日期的情况下提取两次之间的日志条目?
How do I extract log entries between two times without date?
我正在尝试拥有一个自动化脚本,它可以获取最新的日志条目并收集两个小时前的所有日志条目,而不管那段时间是否存在日志条目。我一直 运行 研究的问题是,我找到的所有示例都附有日期,而我没有。示例日志输出为:
13:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
13:26:28.713687 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9287:9522, ack 13044, win 420, length 235
13:26:28.713766 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], ack 9522, win 24576, length 0
13:26:28.840650 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], seq 14286:15624, ack 9522, win 24576, length 1338
13:26:28.848949 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9522:9599, ack 14286, win 420, length 77
13:26:28.849002 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [P.], seq 15624:15674, ack 9599, win 24576, length 50
13:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
13:26:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
所以我的时间排在最前面,没有日期。而且 date 不喜欢使用它,给我一个 date: invalid date ‘+%s’
响应并且不输出任何内容。
我目前的工作是:
#!/bin/bash
truncate -s 0 twoHour.log
NEW=$(tail -n1 | cut -d ":" -f1)
# echo $NEW
New=$(date -d "$NEW" +%s)
OLD=$(($NEW-2))
New=$(date -d "$OLD" +%s)
# echo $OLD
START=$(egrep "$NEW\:\d\d\:\d\d" | tail | date -d +%s)
END=$(egrep "$OLD\:\d\d\:\d\d" | head | date -d +%s)
while read line; do
# Extract the date for each line.
# First strip off everything up to the first "[".
# Then remove everything after the first "]".
# Finally, straighten up the format with the cleandate function
date="${date%%.*}"
date=$( cleandate "$date" )
# If the date falls between d1 and d2, print it
if [[ $date -ge $START && $date -le $END ]]; then
echo "$line"
fi
done
NEW 和 OLD 用于提取的小时数。 START 和 END 是两者之间的所有内容逐行输出的边界。 $1 用于日志文件。
几个小时以来,我一直在尝试修改 bash/awk 脚本并搜索任何预制脚本,所以我不知道如何让它工作。
sed
可用于提取正则表达式地址
表示的行
/^11:.*$/,/^13:26:28.849031 .*$/p
第一个地址可以通过获取分钟数字并添加到表达式中来进一步细化
/^11:(2[6-9]|[3-5][0-9]).*$/,/^13:26:28.849031 .*$/p
last_line=$(tail -n1 test.txt)
end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_hour="${end_time:0:2}"
min_msb="${end_time:3:1}"
min_next=$(($min_msb+1))
min_lsb="${end_time:4:1}"
start_hour=$(($end_hour-2))
if [ "$min_msb" -lt 5 ];then
min_next=$(($min_msb+1))
else
min_next=5
fi
sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p" test.txt
如果时间跨度超过 24 小时
22:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
...
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
然后
更新:修复了 sed 第一个地址在午夜时分的正则表达式。
hour_range=2
last_line=$(tail -n1 test.txt)
#end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_time="${last_line:0:8}"
start_time="$(date -d "$(date -d "$end_time" --iso=seconds) -$hour_range hour" '+%T')"
echo "Time range: $start_time - $end_time"
end_hour="$(printf "%d" ${end_time:0:2})"
min_msb="$(printf "%d" ${end_time:3:1})"
min_lsb="$(printf "%d" ${end_time:4:1})"
start_hour="$(printf '%d' ${start_time:0:2})"
if [ "$min_msb" -lt 5 ];then
min_next=$(($min_msb+1))
else
min_next=5
fi
# Crossed midnight
start_hour_expr="$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9])"
if [ "$start_hour" -gt "$end_hour" ];then
start_hour_lsb_next=$((${start_hour:1:1} + 1))
start_hour_next="${start_hour:0:1}${start_hour_lsb_next}"
if [ "$start_hour_next" -eq 24 ]; then
start_hour_next="00"
fi
start_hour_expr="($start_hour_expr|$start_hour_next:[0-5][0-9])"
fi
echo "sed expression:"
echo -e "/^$start_hour_expr.*$/,/^$end_time.*$/p \n"
sed -rn "/^$start_hour_expr.*$/,/^$end_time.*$/p" test.txt
给定
21:32:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
21:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
22:10:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
Returns
sed expression:
/^(22:(3[6-9]|[4-5][0-9])|23:[0-5][0-9]).*$/,/^00:36:28.*$/p
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
假设:
- 偏移量(在 OP 的示例中为 2 小时)小于 24 小时
- 每行以
HH:MM:SS
格式的时间戳开始
- 日志可能跨越多天
计划:
- 将偏移量(例如,
2 hrs
)转换为秒;我们称之为 offset_secs
- 从文件的最后一行抓取时间;我们称之为
last_time
- 将时间戳转换为epoch/seconds;我们称之为
last_epoch
- 从
last_epoch
中减去offset_secs
;我们称之为 first_epoch
- 将
first_epoch
转换回 HH:MM:SS
字符串;我们称之为 first_time
- 为了解决跨越多个午夜的文件时间戳,我们将把感兴趣的行保存在一个数组中,当我们发现还有另一个午夜时重置数组
- 在
awk/END
处理过程中,我们将行数组打印到标准输出
一个GNU awk
想法:
$ cat log.awk
BEGIN { FS="." } # set input field delimiter to "."
# first line of input is last line of log file; grab time and calculate the offset/start time
NR==1 { last_time =
last_epoch = mktime( strftime("%Y %m %d") " " gensub(/:/," ","g",last_time))
first_epoch = last_epoch - offset_secs
first_time = strftime("%H:%M:%S", first_epoch)
if (first_time > last_time)
spans_midnight=1
next
}
# for the rest of the input lines determine if the time falls within the last "offset_secs"
{ curr_time =
if ( ( spans_midnight && curr_time >= first_time) ||
( spans_midnight && curr_time <= last_time) ||
( !spans_midnight && curr_time >= first_time && curr_time <= last_time) )
lines[++cnt]=[=10=]
else { # outside the time range so ...
delete lines # delete anything saved up to this point and ...
cnt=0 # reset the array index
}
}
END { for (i=1;i<=cnt;i++) # print the lines that occurred within the last "offset_secs"
print lines[i]
}
注意: 有关 mktime()
和 strftime()
函数的更多详细信息,请参阅 GNU awk: Time Functions
测试 #1: 持续 2 小时;不跨越午夜;文件跨越午夜
$ cat sample.log
22:22:00.896232 IP 104.16.42.63.https ignore this line
06:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
07:22:00.896232 IP 104.16.42.63.https ignore this line
09:23:00.896232 IP 104.16.42.63.https ignore this line
09:51:49.896232 IP 104.16.42.63.https ignore this line
09:51:50.896232 IP 104.16.42.63.https keep this line
10:24:37.896232 IP 104.16.42.63.https keep this line
11:51:50.896232 IP 104.16.42.63.https keep this line
$ offset_secs=$((2*60*60)) # 2 hours
$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
09:51:50.896232 IP 104.16.42.63.https keep this line
10:24:37.896232 IP 104.16.42.63.https keep this line
11:51:50.896232 IP 104.16.42.63.https keep this line
测试 #2: 持续 4 小时;跨越午夜;文件跨越多个午夜
$ cat sample.log
20:22:00.896232 IP 104.16.42.63.https ignore this line
23:22:00.896232 IP 104.16.42.63.https ignore this line
01:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
23:22:00.896232 IP 104.16.42.63.https ignore this line
01:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
06:22:00.896232 IP 104.16.42.63.https ignore this line
07:22:00.896232 IP 104.16.42.63.https ignore this line
09:23:00.896232 IP 104.16.42.63.https ignore this line
22:51:49.896232 IP 104.16.42.63.https ignore this line
22:51:50.896232 IP 104.16.42.63.https keep this line
23:07:37.896232 IP 104.16.42.63.https keep this line
00:51:50.896232 IP 104.16.42.63.https keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https keep this line
02:51:50.896232 IP 104.16.42.63.https keep this line
$ offset_secs=$((4*60*60)) # 4 hours
$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
22:51:50.896232 IP 104.16.42.63.https keep this line
23:07:37.896232 IP 104.16.42.63.https keep this line
00:51:50.896232 IP 104.16.42.63.https keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https keep this line
02:51:50.896232 IP 104.16.42.63.https keep this line
egrep
的正则表达式功能有限。您可以使用 [0-9]
或 [[:digit:]]
,但不能使用 \d
。如果你想要 \d
,你可以使用带有 grep -P
.
的 Perl 风格的正则表达式
你也可以告诉grep
只输出与-o
匹配的数据
值得注意的是,egrep
和 grep -E
是同义词;我建议明确并使用 grep -E
,但这只是我的偏好。
-E, --extended-regexp PATTERN is an extended regular expression (ERE)
-P, --perl-regexp PATTERN is a Perl regular expression
-o, --only-matching show only the part of a line matching PATTERN
对于 tail
和 head
,您似乎正在为每个查找第一行和最后一行。默认情况下,它们输出 10 行。这可以通过 -n 1
来控制
您的日期命令失败,因为它不知道要读取哪个文件。您可以指定 -f -
以指示输入文件是 STDIN (Pipe string to GNU Date for conversion - how to make it read from stdin?)
有了这些,下面的内容应该可以帮助您。
START=$(egrep -o "$NEW:[0-9]{2}:[0-9]{2}\.[0-9]+" | tail -n 1 | date +%s -f -)
END=$(egrep -o "$OLD:[0-9]{2}:[0-9]{2}\.[0-9]+" | head -n 1| date +%s -f -)
提示:在对 bash 脚本进行故障排除时使用 bash -x
可以更好地了解正在发生的事情。
[root@91192da89fc4 temp]# bash -x date-orig.sh log
+ truncate -s 0 twoHour.log
++ tail -n1 log
++ cut -d : -f1
+ NEW=13
++ date -d 13 +%s
+ New=1651582800
+ OLD=11
++ date -d 11 +%s
+ New=1651575600
++ egrep '13\:\d\d\:\d\d' log
++ tail
++ date -d +%s
date: invalid date '+%s'
+ START=
++ egrep '11\:\d\d\:\d\d' log
++ head
++ date -d +%s
date: invalid date '+%s'
+ END=
+ read line
我正在尝试拥有一个自动化脚本,它可以获取最新的日志条目并收集两个小时前的所有日志条目,而不管那段时间是否存在日志条目。我一直 运行 研究的问题是,我找到的所有示例都附有日期,而我没有。示例日志输出为:
13:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
13:26:28.713687 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9287:9522, ack 13044, win 420, length 235
13:26:28.713766 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], ack 9522, win 24576, length 0
13:26:28.840650 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], seq 14286:15624, ack 9522, win 24576, length 1338
13:26:28.848949 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9522:9599, ack 14286, win 420, length 77
13:26:28.849002 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [P.], seq 15624:15674, ack 9599, win 24576, length 50
13:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
13:26:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
所以我的时间排在最前面,没有日期。而且 date 不喜欢使用它,给我一个 date: invalid date ‘+%s’
响应并且不输出任何内容。
我目前的工作是:
#!/bin/bash
truncate -s 0 twoHour.log
NEW=$(tail -n1 | cut -d ":" -f1)
# echo $NEW
New=$(date -d "$NEW" +%s)
OLD=$(($NEW-2))
New=$(date -d "$OLD" +%s)
# echo $OLD
START=$(egrep "$NEW\:\d\d\:\d\d" | tail | date -d +%s)
END=$(egrep "$OLD\:\d\d\:\d\d" | head | date -d +%s)
while read line; do
# Extract the date for each line.
# First strip off everything up to the first "[".
# Then remove everything after the first "]".
# Finally, straighten up the format with the cleandate function
date="${date%%.*}"
date=$( cleandate "$date" )
# If the date falls between d1 and d2, print it
if [[ $date -ge $START && $date -le $END ]]; then
echo "$line"
fi
done
NEW 和 OLD 用于提取的小时数。 START 和 END 是两者之间的所有内容逐行输出的边界。 $1 用于日志文件。
几个小时以来,我一直在尝试修改 bash/awk 脚本并搜索任何预制脚本,所以我不知道如何让它工作。
sed
可用于提取正则表达式地址
表示的行
/^11:.*$/,/^13:26:28.849031 .*$/p
第一个地址可以通过获取分钟数字并添加到表达式中来进一步细化
/^11:(2[6-9]|[3-5][0-9]).*$/,/^13:26:28.849031 .*$/p
last_line=$(tail -n1 test.txt)
end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_hour="${end_time:0:2}"
min_msb="${end_time:3:1}"
min_next=$(($min_msb+1))
min_lsb="${end_time:4:1}"
start_hour=$(($end_hour-2))
if [ "$min_msb" -lt 5 ];then
min_next=$(($min_msb+1))
else
min_next=5
fi
sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p" test.txt
如果时间跨度超过 24 小时
22:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
...
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
然后
更新:修复了 sed 第一个地址在午夜时分的正则表达式。
hour_range=2
last_line=$(tail -n1 test.txt)
#end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_time="${last_line:0:8}"
start_time="$(date -d "$(date -d "$end_time" --iso=seconds) -$hour_range hour" '+%T')"
echo "Time range: $start_time - $end_time"
end_hour="$(printf "%d" ${end_time:0:2})"
min_msb="$(printf "%d" ${end_time:3:1})"
min_lsb="$(printf "%d" ${end_time:4:1})"
start_hour="$(printf '%d' ${start_time:0:2})"
if [ "$min_msb" -lt 5 ];then
min_next=$(($min_msb+1))
else
min_next=5
fi
# Crossed midnight
start_hour_expr="$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9])"
if [ "$start_hour" -gt "$end_hour" ];then
start_hour_lsb_next=$((${start_hour:1:1} + 1))
start_hour_next="${start_hour:0:1}${start_hour_lsb_next}"
if [ "$start_hour_next" -eq 24 ]; then
start_hour_next="00"
fi
start_hour_expr="($start_hour_expr|$start_hour_next:[0-5][0-9])"
fi
echo "sed expression:"
echo -e "/^$start_hour_expr.*$/,/^$end_time.*$/p \n"
sed -rn "/^$start_hour_expr.*$/,/^$end_time.*$/p" test.txt
给定
21:32:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
21:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
22:10:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
Returns
sed expression:
/^(22:(3[6-9]|[4-5][0-9])|23:[0-5][0-9]).*$/,/^00:36:28.*$/p
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526
假设:
- 偏移量(在 OP 的示例中为 2 小时)小于 24 小时
- 每行以
HH:MM:SS
格式的时间戳开始
- 日志可能跨越多天
计划:
- 将偏移量(例如,
2 hrs
)转换为秒;我们称之为offset_secs
- 从文件的最后一行抓取时间;我们称之为
last_time
- 将时间戳转换为epoch/seconds;我们称之为
last_epoch
- 从
last_epoch
中减去offset_secs
;我们称之为first_epoch
- 将
first_epoch
转换回HH:MM:SS
字符串;我们称之为first_time
- 为了解决跨越多个午夜的文件时间戳,我们将把感兴趣的行保存在一个数组中,当我们发现还有另一个午夜时重置数组
- 在
awk/END
处理过程中,我们将行数组打印到标准输出
一个GNU awk
想法:
$ cat log.awk
BEGIN { FS="." } # set input field delimiter to "."
# first line of input is last line of log file; grab time and calculate the offset/start time
NR==1 { last_time =
last_epoch = mktime( strftime("%Y %m %d") " " gensub(/:/," ","g",last_time))
first_epoch = last_epoch - offset_secs
first_time = strftime("%H:%M:%S", first_epoch)
if (first_time > last_time)
spans_midnight=1
next
}
# for the rest of the input lines determine if the time falls within the last "offset_secs"
{ curr_time =
if ( ( spans_midnight && curr_time >= first_time) ||
( spans_midnight && curr_time <= last_time) ||
( !spans_midnight && curr_time >= first_time && curr_time <= last_time) )
lines[++cnt]=[=10=]
else { # outside the time range so ...
delete lines # delete anything saved up to this point and ...
cnt=0 # reset the array index
}
}
END { for (i=1;i<=cnt;i++) # print the lines that occurred within the last "offset_secs"
print lines[i]
}
注意: 有关 mktime()
和 strftime()
函数的更多详细信息,请参阅 GNU awk: Time Functions
测试 #1: 持续 2 小时;不跨越午夜;文件跨越午夜
$ cat sample.log
22:22:00.896232 IP 104.16.42.63.https ignore this line
06:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
07:22:00.896232 IP 104.16.42.63.https ignore this line
09:23:00.896232 IP 104.16.42.63.https ignore this line
09:51:49.896232 IP 104.16.42.63.https ignore this line
09:51:50.896232 IP 104.16.42.63.https keep this line
10:24:37.896232 IP 104.16.42.63.https keep this line
11:51:50.896232 IP 104.16.42.63.https keep this line
$ offset_secs=$((2*60*60)) # 2 hours
$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
09:51:50.896232 IP 104.16.42.63.https keep this line
10:24:37.896232 IP 104.16.42.63.https keep this line
11:51:50.896232 IP 104.16.42.63.https keep this line
测试 #2: 持续 4 小时;跨越午夜;文件跨越多个午夜
$ cat sample.log
20:22:00.896232 IP 104.16.42.63.https ignore this line
23:22:00.896232 IP 104.16.42.63.https ignore this line
01:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
23:22:00.896232 IP 104.16.42.63.https ignore this line
01:22:00.896232 IP 104.16.42.63.https ignore this line; crossed midnight
06:22:00.896232 IP 104.16.42.63.https ignore this line
07:22:00.896232 IP 104.16.42.63.https ignore this line
09:23:00.896232 IP 104.16.42.63.https ignore this line
22:51:49.896232 IP 104.16.42.63.https ignore this line
22:51:50.896232 IP 104.16.42.63.https keep this line
23:07:37.896232 IP 104.16.42.63.https keep this line
00:51:50.896232 IP 104.16.42.63.https keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https keep this line
02:51:50.896232 IP 104.16.42.63.https keep this line
$ offset_secs=$((4*60*60)) # 4 hours
$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
22:51:50.896232 IP 104.16.42.63.https keep this line
23:07:37.896232 IP 104.16.42.63.https keep this line
00:51:50.896232 IP 104.16.42.63.https keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https keep this line
02:51:50.896232 IP 104.16.42.63.https keep this line
egrep
的正则表达式功能有限。您可以使用 [0-9]
或 [[:digit:]]
,但不能使用 \d
。如果你想要 \d
,你可以使用带有 grep -P
.
你也可以告诉grep
只输出与-o
值得注意的是,egrep
和 grep -E
是同义词;我建议明确并使用 grep -E
,但这只是我的偏好。
-E, --extended-regexp PATTERN is an extended regular expression (ERE)
-P, --perl-regexp PATTERN is a Perl regular expression
-o, --only-matching show only the part of a line matching PATTERN
对于 tail
和 head
,您似乎正在为每个查找第一行和最后一行。默认情况下,它们输出 10 行。这可以通过 -n 1
您的日期命令失败,因为它不知道要读取哪个文件。您可以指定 -f -
以指示输入文件是 STDIN (Pipe string to GNU Date for conversion - how to make it read from stdin?)
有了这些,下面的内容应该可以帮助您。
START=$(egrep -o "$NEW:[0-9]{2}:[0-9]{2}\.[0-9]+" | tail -n 1 | date +%s -f -)
END=$(egrep -o "$OLD:[0-9]{2}:[0-9]{2}\.[0-9]+" | head -n 1| date +%s -f -)
提示:在对 bash 脚本进行故障排除时使用 bash -x
可以更好地了解正在发生的事情。
[root@91192da89fc4 temp]# bash -x date-orig.sh log
+ truncate -s 0 twoHour.log
++ tail -n1 log
++ cut -d : -f1
+ NEW=13
++ date -d 13 +%s
+ New=1651582800
+ OLD=11
++ date -d 11 +%s
+ New=1651575600
++ egrep '13\:\d\d\:\d\d' log
++ tail
++ date -d +%s
date: invalid date '+%s'
+ START=
++ egrep '11\:\d\d\:\d\d' log
++ head
++ date -d +%s
date: invalid date '+%s'
+ END=
+ read line