Grep 仅在文本文件中包含最新日期时间的最后一行
Grep only last line with latest datetimes in a text file
我在 Linux OS (redhat) 中有一个日志文件,它插入数据库的事件。该文件如下所示:
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
我只想获取每个用户 (x,y,z) 的最新日期时间的行。所以它应该如下所示:
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
我们可以使用 awk 获取在最新列中具有唯一值的行。
print unique lines based on field
为确保这些是最新的(数据时间),我假设如下
- 文件总是从旧到新排序
因此,如果我们;
- 反转文件(从
new -> old
开始)
- 获取唯一用户行
- 再次反转(从
old -> new
开始)
将为每个用户获取最后一次失败的尝试:
tac log.txt | awk -F" " '!_[]++' | tac
我本地机器上的示例:
$
$ cat log.txt
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
$ tac log.txt | awk -F" " '!_[]++' | tac
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
见下文
from collections import defaultdict
from datetime import datetime
data_str = '''2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z'''
holder = defaultdict(list)
for entry in data_str.split('\n'):
fields = entry.split(' ')
holder[fields[-1]].append(datetime.strptime(fields[0] + ' ' + fields[1], '%Y-%m-%d %H:%M:%S.%f'))
for user, date_time_lst in holder.items():
print(f'{user} --> {max(date_time_lst)}')
输出
x --> 2021-08-04 09:36:05.223000
y --> 2021-08-04 09:37:50.350000
z --> 2021-08-04 09:39:01.372000
我在 Linux OS (redhat) 中有一个日志文件,它插入数据库的事件。该文件如下所示:
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
我只想获取每个用户 (x,y,z) 的最新日期时间的行。所以它应该如下所示:
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
我们可以使用 awk 获取在最新列中具有唯一值的行。
print unique lines based on field
为确保这些是最新的(数据时间),我假设如下
- 文件总是从旧到新排序
因此,如果我们;
- 反转文件(从
new -> old
开始) - 获取唯一用户行
- 再次反转(从
old -> new
开始)
将为每个用户获取最后一次失败的尝试:
tac log.txt | awk -F" " '!_[]++' | tac
我本地机器上的示例:
$
$ cat log.txt
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
$ tac log.txt | awk -F" " '!_[]++' | tac
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
见下文
from collections import defaultdict
from datetime import datetime
data_str = '''2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z'''
holder = defaultdict(list)
for entry in data_str.split('\n'):
fields = entry.split(' ')
holder[fields[-1]].append(datetime.strptime(fields[0] + ' ' + fields[1], '%Y-%m-%d %H:%M:%S.%f'))
for user, date_time_lst in holder.items():
print(f'{user} --> {max(date_time_lst)}')
输出
x --> 2021-08-04 09:36:05.223000
y --> 2021-08-04 09:37:50.350000
z --> 2021-08-04 09:39:01.372000