从日志文件中雕刻数据

Question

我有一个包含以下数据的日志文件：

 time=1460196536.247325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=13ms requests=517 option1=0 option2=0 errors=0 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278

我正在尝试编写 bashscript，我尝试在其中为日志文件中的每一行雕刻这些值并将其写入第二个文件：

时间（转换为当地时间 GMT+2）
延迟 99
请求
错误

第二个文件中的所需输出：

 time    latency99   requests    errors

 12:08:56   13         517          0

这是使用正则表达式的最简单方法吗？

Answer 1

这是一个 Bash 版本 4 及更高版本的解决方案，使用关联数组：

#!/bin/bash
# Assoc array to hold data.
declare -A data
# Log file ( the input file ).
logfile=
# Output file.
output_file=

# Print column names for required values.
printf '%-20s %-10s %-10s %-10s\n' time latency99 requests errors > "$output_file"
# Iterate over each line in $logfile
while read -ra arr; do
    # Insert keys and values into 'data' array.
    for i in "${arr[@]}"; do
        data["${i%=*}"]="${i#*=}"
    done
    # Convert time to GMT+2
    gmt2_time=$(TZ=GMT+2 date -d "@${data[time]}" '+%T')
    # Print results to stdout.
    printf '%-20s %-10s %-10s %-10s\n' "$gmt2_time" "${data[latency99]%ms}" "${data[requests]}" "${data[errors]}" >> "$output_file"
done < "$logfile"

如您所见，脚本接受两个参数。第一个是日志文件的文件名，第二个是输出文件，对于日志文件中的每一行，解析后的数据将被逐行插入。

请注意，我使用 GMT+2 作为 TZ 变量的值。请改用 确切区域 作为值。例如，TZ="Europe/Berlin"。您可能想使用工具 tzselect 来了解您所在地区的正确字符串值。

为了测试它，我创建了以下日志文件，其中包含 3 行不同的输入：

time=1260196536.242325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=10ms requests=100 option1=0 option2=0 errors=1 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
time=1460246536.244325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=20ms requests=200 option1=0 option2=0 errors=2 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
time=1260236536.147325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=30ms requests=300 option1=0 option2=0 errors=3 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278

让我们运行测试一下（脚本名称是sof）：

$ ./sof logfile parsed_logfile
$ cat parsed_logfile
time                 latency99  requests   errors    
12:35:36             10         100        1         
22:02:16             20         200        2         
23:42:16             30         300        3

编辑：

根据评论中看到的 OP 请求，以及聊天中进一步讨论的内容，我编辑了脚本以包含以下功能：

从 latency99 的值中删除 ms 后缀。
逐行从日志文件中读取输入，解析并将结果输出到所选文件。
仅在输出的第一行中包含列名称。
将时间值转换为 GMT+2。

Answer 2

这是给你的 awk 脚本。假设日志文件是 mc.log 并且脚本保存为 mc.awk，你会运行像这样： awk -f mc.awk mc.log with GNU awk.

mc.awk:

    BEGIN{
        OFS="\t"
        # some "" to align header and values in output
        print "time", "", "latency99", "requests", "errors"
    }

    function getVal( str) {
        # strip leading "key=" and trailing "ms" from str
        gsub(/^.*=/, "", str)
        gsub(/ms$/, "", str)
        return str
    }

    function fmtTime( timeStamp ){
        val=getVal( timeStamp )
        return strftime( "%H:%M:%S", val)
    }

    {
        # some "" to align header and values in output
        print fmtTime(), getVal(), "", getVal(), "", getVal()
    }

Answer 3

这是一个 awk 版本（不是 GNU）。转换日期需要调用外部程序：

#!/usr/bin/awk -f

BEGIN {
    FS="([[:alpha:]]+)?[[:blank:]]*[[:alnum:]]+="
    OFS="\t"
    print "time", "latency99", "requests", "errors"
}
{
    print , , ,  
}

从日志文件中雕刻数据

Carving data from log file

bash

shell

logging

extract

multiple-columns

编辑：