从 bash 文件中读取 key=value 对以处理每一行的有效方法
Efficient way to read key=value pair from a bash file for each line to process it
我需要有关 bash 中处理文件(特别是一行中的键值对)的更好编程的建议
我正在尝试处理日志行的任务:
- 如果出现“critical/warning”这个词,我应该换行打印
request_id
的值
- 如果键
IPA
的值为“MASKED”,则在输出 中附加带有 request_id
的“MASK”
我写了下面的代码来处理它
while read line
do
if [ $( echo "$line" | grep "critical/warning" | grep -c "request_id=") -gt 0 ]
then
request_id=$( echo "$line"| awk -F"request_id=" '{print }'| awk '{print }')
if [ $(echo "$line" | grep -c "IPA=") -gt 0 ]
then
IPA=$(echo "$line"| awk -F"IPA=" '{print }'| awk '{print }');
[[ "M$IPA" == "M\"MASKED\"" ]] && request_id="$request_id MASK"
fi
echo $request_id;
fi
done < test.txt
下面是示例日志文件
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://jalaltu.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=jalaltu.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa IPA="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 IPA="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 IPA="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 jalaltu app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=jalaltu.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 IPA="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 jalaltu app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca IPA="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https
Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a IPA="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https
下面是示例日志文件的输出
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
假设:
- 虽然示例数据显示
request_id
总是先于 IPA
,但我假设情况可能并非总是如此
一个想法使用单个 awk
调用(这应该比当前 bash
循环构造快一点,其中有几个子进程调用 echo/grep/awk
):
awk '
/critical[/]warning/ && # if line contains "critical/warning" and ...
/request_id/ { mask="" # line contains "request_id", clear the "mask" variable
for (i=1 ; i<=NF; i++) # loop through our input fields
{ split($(i),arr,"=") # split current field on "=", store results in array "arr[]"
if ( arr[1] == "request_id" ) # if field is "request_id" ...
{ reqid = arr[2] } # save the associated id
if ( arr[1] == "IPA" && arr[2] ~ "MASKED" ) # if field is "IPA" and value matches "MASKED" ...
{ mask = " MASK" } # set our "mask" variable
}
print reqid mask # print our variables
}
' log.dat
注意:删除注释以整理代码
以上生成:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
使用 GNU 的不同方法 awk
:
awk '/critical\/warning/{
id=gensub(/.*=/,"","g",) # remove "request_id=" from
if(/IPA="MASKED"/){
print id,"MASK"
}
else{
print id
}
}' FPAT='request_id=[a-z0-9-]{36}' file
输出:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
来自man awk
:
FPAT
:描述记录中字段内容的正则表达式
一个 bash
解决方案(无 awk/grep/echo
)使用一些模式匹配和参数扩展...虽然对于大文件,这不会像单个 [=14] 的解决方案那么快=]调用...
while read -r line
do
[[ ! "${line}" =~ "critical/warning" ]] && continue # if line does not contain "critical/warning" then skip to next line
if [[ "${line}" =~ "request_id=" ]] # if line includes string "request_id=" ...
then
reqid="${line#*request_id=}" # strip off everything up to and including "request_id="
reqid="${reqid%% *}" # then strip off everything from the first space to the end of the variable
fi
mask="" # clear our mask string
[[ "${line}" =~ 'IPA="MASKED"' ]] && mask=" MASK" # if line includes string 'IPA="MASKED"' then set our "mask" variable
printf "%s%s\n" "${reqid}" "${mask}" # print our variables
done < log.dat
这会生成:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
我需要有关 bash 中处理文件(特别是一行中的键值对)的更好编程的建议
我正在尝试处理日志行的任务:
- 如果出现“critical/warning”这个词,我应该换行打印
request_id
的值 - 如果键
IPA
的值为“MASKED”,则在输出 中附加带有
request_id
的“MASK”
我写了下面的代码来处理它
while read line
do
if [ $( echo "$line" | grep "critical/warning" | grep -c "request_id=") -gt 0 ]
then
request_id=$( echo "$line"| awk -F"request_id=" '{print }'| awk '{print }')
if [ $(echo "$line" | grep -c "IPA=") -gt 0 ]
then
IPA=$(echo "$line"| awk -F"IPA=" '{print }'| awk '{print }');
[[ "M$IPA" == "M\"MASKED\"" ]] && request_id="$request_id MASK"
fi
echo $request_id;
fi
done < test.txt
下面是示例日志文件
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://jalaltu.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=jalaltu.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa IPA="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 IPA="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 IPA="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 jalaltu app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=jalaltu.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 IPA="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 jalaltu app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36
Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca IPA="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https
Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a IPA="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https
下面是示例日志文件的输出
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
假设:
- 虽然示例数据显示
request_id
总是先于IPA
,但我假设情况可能并非总是如此
一个想法使用单个 awk
调用(这应该比当前 bash
循环构造快一点,其中有几个子进程调用 echo/grep/awk
):
awk '
/critical[/]warning/ && # if line contains "critical/warning" and ...
/request_id/ { mask="" # line contains "request_id", clear the "mask" variable
for (i=1 ; i<=NF; i++) # loop through our input fields
{ split($(i),arr,"=") # split current field on "=", store results in array "arr[]"
if ( arr[1] == "request_id" ) # if field is "request_id" ...
{ reqid = arr[2] } # save the associated id
if ( arr[1] == "IPA" && arr[2] ~ "MASKED" ) # if field is "IPA" and value matches "MASKED" ...
{ mask = " MASK" } # set our "mask" variable
}
print reqid mask # print our variables
}
' log.dat
注意:删除注释以整理代码
以上生成:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK
使用 GNU 的不同方法 awk
:
awk '/critical\/warning/{
id=gensub(/.*=/,"","g",) # remove "request_id=" from
if(/IPA="MASKED"/){
print id,"MASK"
}
else{
print id
}
}' FPAT='request_id=[a-z0-9-]{36}' file
输出:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa 910b07d1-3f71-4347-a1a7-bfa20384ef65 097bf65e-e189-4f9f-9dfb-4758cff411b2 d48278c2-5731-464e-be38-ab9ad84ac4a8 8bb2413c-3c67-4180-8091-000313b8d9ca MASK 10f93da3-2753-48a3-9485-857a93d8a88a MASK
来自man awk
:
FPAT
:描述记录中字段内容的正则表达式
一个 bash
解决方案(无 awk/grep/echo
)使用一些模式匹配和参数扩展...虽然对于大文件,这不会像单个 [=14] 的解决方案那么快=]调用...
while read -r line
do
[[ ! "${line}" =~ "critical/warning" ]] && continue # if line does not contain "critical/warning" then skip to next line
if [[ "${line}" =~ "request_id=" ]] # if line includes string "request_id=" ...
then
reqid="${line#*request_id=}" # strip off everything up to and including "request_id="
reqid="${reqid%% *}" # then strip off everything from the first space to the end of the variable
fi
mask="" # clear our mask string
[[ "${line}" =~ 'IPA="MASKED"' ]] && mask=" MASK" # if line includes string 'IPA="MASKED"' then set our "mask" variable
printf "%s%s\n" "${reqid}" "${mask}" # print our variables
done < log.dat
这会生成:
b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK