从 bash 文件中读取 key=value 对以处理每一行的有效方法

Efficient way to read key=value pair from a bash file for each line to process it

我需要有关 bash 中处理文件(特别是一行中的键值对)的更好编程的建议

我正在尝试处理日志行的任务:

  1. 如果出现“critical/warning”这个词,我应该换行打印request_id的值
  2. 如果键 IPA 的值为“MASKED”,则在输出
  3. 中附加带有 request_id 的“MASK”

我写了下面的代码来处理它

while read line
do
  if [ $( echo "$line" | grep "critical/warning" | grep -c "request_id=") -gt 0 ]
  then
    request_id=$( echo "$line"| awk -F"request_id=" '{print }'| awk '{print }')
    if [ $(echo "$line" | grep -c "IPA=") -gt 0  ]
    then
      IPA=$(echo "$line"| awk -F"IPA=" '{print }'| awk '{print }');
      [[ "M$IPA" == "M\"MASKED\"" ]] && request_id="$request_id MASK"
    fi
    echo $request_id; 
  fi
done < test.txt

下面是示例日志文件

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://jalaltu.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0
Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?key=s2fwad2Es2" host=jalaltu.com request_id=b19a87a1-1bbb-4e67-b207-bd9f23d46afa IPA="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 IPA="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 IPA="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 jalaltu app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad2Es2 HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0" host=jalaltu.com request_id=d48278c2-5731-464e-be38-ab9ad84ac4a8 IPA="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:35 jalaltu app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q HTTP/1.1" 200 4263 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 jalaltu app/web.4: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=4eiramcmayu0 HTTP/1.1" 200 3023 "https://jalaltu.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36

Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=8bb2413c-3c67-4180-8091-000313b8d9ca IPA="MASKED" dyno=web.3 connect=1ms service=32ms status=200 bytes=4435 protocol=https

Apr 10 11:17:36 critical/warning: at=info method=GET path="/backend/requests/editor/placeholder?shareLinkId=tosrve4v8q8q" host=jalaltu.com request_id=10f93da3-2753-48a3-9485-857a93d8a88a IPA="MASKED" dyno=web.3 connect=1ms service=37ms status=200 bytes=4435 protocol=https

下面是示例日志文件的输出

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK

假设:

  • 虽然示例数据显示 request_id 总是先于 IPA,但我假设情况可能并非总是如此

一个想法使用单个 awk 调用(这应该比当前 bash 循环构造快一点,其中有几个子进程调用 echo/grep/awk):

awk '
/critical[/]warning/ &&                                           # if line contains "critical/warning" and ...
/request_id/ { mask=""                                            # line contains "request_id", clear the "mask" variable
               for (i=1 ; i<=NF; i++)                             # loop through our input fields 
                   { split($(i),arr,"=")                          # split current field on "=", store results in array "arr[]"
                     if ( arr[1] == "request_id" )                # if field is "request_id" ...
                        { reqid = arr[2] }                        # save the associated id
                     if ( arr[1] == "IPA" && arr[2] ~ "MASKED" )  # if field is "IPA" and value matches "MASKED" ...
                        { mask = " MASK"  }                       # set our "mask" variable
                   }
                print reqid mask                                  # print our variables
             }
' log.dat

注意:删除注释以整理代码

以上生成:

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK

使用 GNU 的不同方法 awk:

awk '/critical\/warning/{
       id=gensub(/.*=/,"","g",)   # remove "request_id=" from 
       if(/IPA="MASKED"/){
         print id,"MASK"
       }
       else{
         print id
       }
     }' FPAT='request_id=[a-z0-9-]{36}' file

输出:

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK

来自man awk

FPAT:描述记录中字段内容的正则表达式

一个 bash 解决方案(无 awk/grep/echo)使用一些模式匹配和参数扩展...虽然对于大文件,这不会像单个 [=14] 的解决方案那么快=]调用...

while read -r line
do
    [[ ! "${line}" =~ "critical/warning" ]] && continue     # if line does not contain "critical/warning" then skip to next line

    if [[ "${line}" =~ "request_id=" ]]                     # if line includes string "request_id=" ...
    then
        reqid="${line#*request_id=}"                        # strip off everything up to and including "request_id="
        reqid="${reqid%% *}"                                # then strip off everything from the first space to the end of the variable
    fi

    mask=""                                                 # clear our mask string
    [[ "${line}" =~ 'IPA="MASKED"' ]] && mask=" MASK"       # if line includes string 'IPA="MASKED"' then set our "mask" variable

    printf "%s%s\n" "${reqid}" "${mask}"                    # print our variables

done < log.dat

这会生成:

b19a87a1-1bbb-4e67-b207-bd9f23d46afa
910b07d1-3f71-4347-a1a7-bfa20384ef65
097bf65e-e189-4f9f-9dfb-4758cff411b2
d48278c2-5731-464e-be38-ab9ad84ac4a8
8bb2413c-3c67-4180-8091-000313b8d9ca MASK
10f93da3-2753-48a3-9485-857a93d8a88a MASK