使用 awk 将格式化文件转换为 json（处理非空行）

Question

这是我要转换为 json 的示例文件。

Name: Jack
Address: Fancy road and some special characters :"'$@|,
   City
   Country
ID: 1

特殊字符有双引号、单引号、$、@、竖线。我以为我可以在 awk:

中使用记录分隔符

awk -F ":" '{RS="\n"}{print }'

然而，我得到的是：

Name:
Address
   City
   Country
ID

我尝试将记录分隔符更改为“^[a-zA-Z0-9]”以尝试捕获不以 space 开头的字符串，但不知何故这不起作用.另一种尝试是简单地逐行解析文件并根据每行的内容格式化输出，但这很慢。

理想情况下，我会将文件转换为：

{
"Name": "Jack",
"Address": "Fancy road and some special characters :\"'$@|, City, Country",
"ID": "1"
}

Answer 1

idk 为什么你的问题在你的例子中没有空行时谈论非空行但是 GNU awk 的第三个参数是 match() 和 gensub():

$ cat tst.awk
BEGIN { printf "{" }

match([=10=],/^(\S[^:]+):\s*(.*)/,a) {
    prt()
    key = a[1]
    val = a[2]
    next
}

{ val = gensub(/,\s*$/,"",1,val) gensub(/^\s*/,", ",1) }

END { prt(); print "\n}" }

function prt() {
    if (key != "") {
        printf "%s\n\"%s\": \"%s\"", (++c>1?",":""), key, gensub(/"/,"\\&","g",val)
    }
}

$ awk -f tst.awk file
{
"Name": "Jack",
"Address": "Fancy road and some special characters :\"'$@|, City, Country",
"ID": "1"
}

对代码的一些额外注释：

match()

The match function searches the string, string, for the longest, leftmost substring matched by the regular expression, regexp. It returns the character position, or index, of where that substring begins (1, if it starts at the beginning of string).

\S

Matches any character that is not whitespace. Think of it as shorthand for ‘[^[:space:]]’.

\s

Matches any whitespace character. Think of it as shorthand for ‘[[:space:]]’.

使用 awk 将格式化文件转换为 json（处理非空行）

Converting a formatted file to json with awk (dealing with non-empty lines)

awk

json

gawk