Grep 仅在文本文件中包含最新日期时间的最后一行

Grep only last line with latest datetimes in a text file

我在 Linux OS (redhat) 中有一个日志文件,它插入数据库的事件。该文件如下所示:

2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z

我只想获取每个用户 (x,y,z) 的最新日期时间的行。所以它应该如下所示:

  2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
  2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
  2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z

我们可以使用 获取在最新列中具有唯一值的行。
print unique lines based on field


为确保这些是最新的(数据时间),我假设如下

  • 文件总是从旧到新排序

因此,如果我们;

  • 反转文件(从 new -> old 开始)
  • 获取唯一用户行
  • 再次反转(从 old -> new 开始)

将为每个用户获取最后一次失败的尝试:

tac log.txt | awk -F" " '!_[]++' | tac

我本地机器上的示例:

$
$ cat log.txt
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
$ tac log.txt | awk -F" " '!_[]++' | tac
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$

见下文

from collections import defaultdict
from datetime import datetime

data_str = '''2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z'''
holder = defaultdict(list)
for entry in data_str.split('\n'):
    fields = entry.split(' ')
    holder[fields[-1]].append(datetime.strptime(fields[0] + ' ' + fields[1], '%Y-%m-%d %H:%M:%S.%f'))
for user, date_time_lst in holder.items():
    print(f'{user} --> {max(date_time_lst)}')

输出

x --> 2021-08-04 09:36:05.223000
y --> 2021-08-04 09:37:50.350000
z --> 2021-08-04 09:39:01.372000