在 Bash 中打印开始和结束变量之间的行
Printing lines between start & end variables in Bash
我正在编写一个脚本来将一些日志文件组织成人类可读的格式。我的代码背后的想法是,可以 运行 脚本并提供一个 IP 地址,然后它会提取与涉及该 IP 的会话相关的所有日志。到目前为止,这是我的代码所做的:
-takes in external log data
-takes in an IP address as an argument
-loops through each log
-if it reaches a log containing the specified IP {
-find the MSD number within that log
-check if MSD is in "collected" array
-if MSD is already in array {
-resume the loop
} else {
-add MSD to the "collected" array
****-search all logs for corresponding MSD and echo them to output.txt
-does not affect the order in which the logs were generated
}
-add a "--------" visual separator
-resume loop, repeating the process each time it finds the IP with a new MSD
这是我的代码:
OUTPUT_FILE=./output.txt
ARR=()
while IFS= read -r line; do
if grep -q "" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${ARR[*]} " =~ " ${MSD} " ]] then
ARR+=($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
这是示例输入:
Apr 12 03:42:45 fe1 msd[2645899]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 00:12:24 fe1 msd[2320005]: SMTPD started: connection from 46.xxx.xxx.xxx
Apr 12 00:12:24 fe1 msd[2320005]: Created UUID ....... for message
Apr 12 00:01:39 fe1 msd[2319095]: SMTPD started: connection from 85.xxx.xxx.xxx
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 00:01:39 fe1 msd[2319095]: Created UUID ....... for message
Apr 12 03:42:45 fe1 msd[2645899]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: CONN: 46.xxx.xxx.xxx -> 587 GeoIP = [LV] PTR = .......
Apr 12 00:01:39 fe1 msd[2319095]: CONN: 85.xxx.xxx.xxx -> 587 GeoIP = [NL] PTR = .......
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: EHLO command received, args: .......
... (400 more lines)
这是当前输出的一个片段:
#EXAMPLE LOGS (target IP is 3x.x.xx.xx)
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
我的问题是我标记 ****(cat 行)的地方...从输出中,您可以看到有两个 IP 共享同一个 MSD。我不知道如何删除不需要的日志。有什么建议么?到目前为止,我已经尝试创建 $START 和 $END 变量,但我确定我做的不正确...
TARGET="connection from "
START="$(grep "$MSD" example_logs | grep "$TARGET")"
END="$(grep "$MSD" example_logs | grep -i "exiting")"
编辑:我在下面发布了我自己的问题的简陋解决方案,以防有人想知道我在哪里结束了这个。
这是我在喝了 3 杯咖啡并编写了 15 个小时的代码之后得出的简陋解决方案。我希望它可能对外面的其他人有用!
rm ./output.txt
OUTPUT_FILE=./output.txt
#
#
# ================= PART 1: =================
# An empty array for values later
MSD_ARR=()
#
# Loop through each log in ./output_logs
# -IF it reaches a log containing the target IP {
# -find msd# within that log & save to variable
# -check if current msd is in MSD_ARR
#
# -IF current msd is NOT in the array {
# -add msd to MSD_ARR
# -grep all logs containing the current msd
# -output logs to new file. Does not affect the order in which logs are listed.
# } else {
# -resume the loop.
# }
# }
#
while IFS= read -r line; do
if grep -q "" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${MSD_ARR[*]} " =~ " ${MSD} " ]]; then
MSD_ARR+=($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
#
#
# ================= PART 2: =================
#
# Loop through each log in ./output.txt
# -IF it reaches a log containing the phrase "connection from" {
# -find msd# within that log & save to variable
# -IF the current log does NOT contain the target IP {
# -grab the current line's position & save as a variable called "START"
# -grab all positions of lines containing the phrase "Exiting" & split the values into an array called "EndArr"
#
# -FOR LOOP over EndArr
# -IF our START position is less than (comes before) our current END position (EndVal) {
# use "sed" to delete logs, using START and EndVal as the range. Deletes inclusively
# BREAK out of this loop to repeat the process with the next START value
# }
# }
# }
#
while IFS= read -r line; do
if grep -q "connection from" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${line} " =~ " " ]]; then
START="$(awk -v var="$line" 's=index([=10=],var){print NR}' output.txt)"
END="$(awk -v var="Exiting" 'e=index([=10=],var){print NR}' output.txt)"
EndArr=($END)
for EndVal in "${EndArr[@]}"; do
if [ "$START" -lt "$EndVal" ]; then
sed -i "$START,$EndVal d" ./output.txt
break
fi
done
fi
fi
done < ./output.txt
#
# ===========================================
# EXAMPLE USE:
# >bash filename [IP Address]
只需简单调用 awk
和三个规则,即可完成您想要做的事情。 (每条规则指定为 condition { commands }
)如果没有 condition
,则每条记录(输入行)的规则为 运行 基本上您需要做的就是:
- 从第 5 个字段中获取 7 位数字
msd
(例如 msd[xxxxxxx]
);
- 如果不是第一行并且 msd 与最后一行不同,输出你的
"------------------"
分隔符
- 输出当前行,并用当前
msd
更新您的 last
变量
要保存到您的 "$OUTPUT_FILE"
,只需重定向命令的输出。
如果你把它们放在一个简短的 awk
脚本中,你有:
awk '
{ # separate digits from msd[xxx] and save as msd
# set RSTART RLENGHT (index and length of digits)
match (,/[[:digit:]]+/)
msd = substr(,RSTART,RLENGTH) # assign substring of digits to msd
}
FNR > 1 && last != msd { # if line > 1 and msd has changed
print "------------------"
}
{
print # output line
last = msd # update last with msd
}
' file > "$OUTPUT_FILE"
例子Use/Output
由于您的示例数据中只有一个 msd
,因此 msd
中没有要捕获的变化。出于示例目的,数据已被复制并且 msd
已更改,例如
$ cat file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
现在 运行 使脚本在 msd
转换时使用您的分隔符生成以下内容:
$ awk '
> { # separate digits from msd[xxx] and save as msd
> # set RSTART RLENGHT (index and length of digits)
> match (,/[[:digit:]]+/)
> msd = substr(,RSTART,RLENGTH) # assign substring of digits to msd
> }
> FNR > 1 && last != msd { # if line > 1 and msd has changed
> print "------------------"
> }
> {
> print # output line
> last = msd # update last with msd
> }
> ' file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
------------------
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
从你的问题中不能完全清楚你想用 TARGET
、START
和 END
做什么,所以如果我误解了你想做的事情,请更新你的问题有进一步的解释并在下面发表评论。
为此使用 awk
而不是 shell 脚本和循环将 数量级 更有效。 (awk
的差异大到几秒,而大型日志的 shell 脚本的 运行 时间长达几小时)
我正在编写一个脚本来将一些日志文件组织成人类可读的格式。我的代码背后的想法是,可以 运行 脚本并提供一个 IP 地址,然后它会提取与涉及该 IP 的会话相关的所有日志。到目前为止,这是我的代码所做的:
-takes in external log data
-takes in an IP address as an argument
-loops through each log
-if it reaches a log containing the specified IP {
-find the MSD number within that log
-check if MSD is in "collected" array
-if MSD is already in array {
-resume the loop
} else {
-add MSD to the "collected" array
****-search all logs for corresponding MSD and echo them to output.txt
-does not affect the order in which the logs were generated
}
-add a "--------" visual separator
-resume loop, repeating the process each time it finds the IP with a new MSD
这是我的代码:
OUTPUT_FILE=./output.txt
ARR=()
while IFS= read -r line; do
if grep -q "" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${ARR[*]} " =~ " ${MSD} " ]] then
ARR+=($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
这是示例输入:
Apr 12 03:42:45 fe1 msd[2645899]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 00:12:24 fe1 msd[2320005]: SMTPD started: connection from 46.xxx.xxx.xxx
Apr 12 00:12:24 fe1 msd[2320005]: Created UUID ....... for message
Apr 12 00:01:39 fe1 msd[2319095]: SMTPD started: connection from 85.xxx.xxx.xxx
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 00:01:39 fe1 msd[2319095]: Created UUID ....... for message
Apr 12 03:42:45 fe1 msd[2645899]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: CONN: 46.xxx.xxx.xxx -> 587 GeoIP = [LV] PTR = .......
Apr 12 00:01:39 fe1 msd[2319095]: CONN: 85.xxx.xxx.xxx -> 587 GeoIP = [NL] PTR = .......
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 00:12:24 fe1 msd[2320005]: EHLO command received, args: .......
... (400 more lines)
这是当前输出的一个片段:
#EXAMPLE LOGS (target IP is 3x.x.xx.xx)
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
我的问题是我标记 ****(cat 行)的地方...从输出中,您可以看到有两个 IP 共享同一个 MSD。我不知道如何删除不需要的日志。有什么建议么?到目前为止,我已经尝试创建 $START 和 $END 变量,但我确定我做的不正确...
TARGET="connection from "
START="$(grep "$MSD" example_logs | grep "$TARGET")"
END="$(grep "$MSD" example_logs | grep -i "exiting")"
编辑:我在下面发布了我自己的问题的简陋解决方案,以防有人想知道我在哪里结束了这个。
这是我在喝了 3 杯咖啡并编写了 15 个小时的代码之后得出的简陋解决方案。我希望它可能对外面的其他人有用!
rm ./output.txt
OUTPUT_FILE=./output.txt
#
#
# ================= PART 1: =================
# An empty array for values later
MSD_ARR=()
#
# Loop through each log in ./output_logs
# -IF it reaches a log containing the target IP {
# -find msd# within that log & save to variable
# -check if current msd is in MSD_ARR
#
# -IF current msd is NOT in the array {
# -add msd to MSD_ARR
# -grep all logs containing the current msd
# -output logs to new file. Does not affect the order in which logs are listed.
# } else {
# -resume the loop.
# }
# }
#
while IFS= read -r line; do
if grep -q "" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${MSD_ARR[*]} " =~ " ${MSD} " ]]; then
MSD_ARR+=($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
#
#
# ================= PART 2: =================
#
# Loop through each log in ./output.txt
# -IF it reaches a log containing the phrase "connection from" {
# -find msd# within that log & save to variable
# -IF the current log does NOT contain the target IP {
# -grab the current line's position & save as a variable called "START"
# -grab all positions of lines containing the phrase "Exiting" & split the values into an array called "EndArr"
#
# -FOR LOOP over EndArr
# -IF our START position is less than (comes before) our current END position (EndVal) {
# use "sed" to delete logs, using START and EndVal as the range. Deletes inclusively
# BREAK out of this loop to repeat the process with the next START value
# }
# }
# }
#
while IFS= read -r line; do
if grep -q "connection from" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${line} " =~ " " ]]; then
START="$(awk -v var="$line" 's=index([=10=],var){print NR}' output.txt)"
END="$(awk -v var="Exiting" 'e=index([=10=],var){print NR}' output.txt)"
EndArr=($END)
for EndVal in "${EndArr[@]}"; do
if [ "$START" -lt "$EndVal" ]; then
sed -i "$START,$EndVal d" ./output.txt
break
fi
done
fi
fi
done < ./output.txt
#
# ===========================================
# EXAMPLE USE:
# >bash filename [IP Address]
只需简单调用 awk
和三个规则,即可完成您想要做的事情。 (每条规则指定为 condition { commands }
)如果没有 condition
,则每条记录(输入行)的规则为 运行 基本上您需要做的就是:
- 从第 5 个字段中获取 7 位数字
msd
(例如msd[xxxxxxx]
); - 如果不是第一行并且 msd 与最后一行不同,输出你的
"------------------"
分隔符 - 输出当前行,并用当前
msd
更新您的
last
变量
要保存到您的 "$OUTPUT_FILE"
,只需重定向命令的输出。
如果你把它们放在一个简短的 awk
脚本中,你有:
awk '
{ # separate digits from msd[xxx] and save as msd
# set RSTART RLENGHT (index and length of digits)
match (,/[[:digit:]]+/)
msd = substr(,RSTART,RLENGTH) # assign substring of digits to msd
}
FNR > 1 && last != msd { # if line > 1 and msd has changed
print "------------------"
}
{
print # output line
last = msd # update last with msd
}
' file > "$OUTPUT_FILE"
例子Use/Output
由于您的示例数据中只有一个 msd
,因此 msd
中没有要捕获的变化。出于示例目的,数据已被复制并且 msd
已更改,例如
$ cat file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
现在 运行 使脚本在 msd
转换时使用您的分隔符生成以下内容:
$ awk '
> { # separate digits from msd[xxx] and save as msd
> # set RSTART RLENGHT (index and length of digits)
> match (,/[[:digit:]]+/)
> msd = substr(,RSTART,RLENGTH) # assign substring of digits to msd
> }
> FNR > 1 && last != msd { # if line > 1 and msd has changed
> print "------------------"
> }
> {
> print # output line
> last = msd # update last with msd
> }
> ' file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
------------------
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
从你的问题中不能完全清楚你想用 TARGET
、START
和 END
做什么,所以如果我误解了你想做的事情,请更新你的问题有进一步的解释并在下面发表评论。
为此使用 awk
而不是 shell 脚本和循环将 数量级 更有效。 (awk
的差异大到几秒,而大型日志的 shell 脚本的 运行 时间长达几小时)