JSON 个字段具有相同的名称
JSON fields have the same name
实际上,键在 JSON 对象中必须是唯一的(例如 Does JSON syntax allow duplicate keys in an object?)。但是,假设我有一个包含以下内容的文件:
{
"a" : "1",
"b" : "2",
"a" : "3"
}
有没有一种简单的方法可以将重复的键转换为数组?这样文件就变成了:
{
"a" : [ {"key": "1"}, {"key": "3"}],
"b" : "2"
}
或类似的东西,但将重复的键组合成一个数组(或找到提取重复键值的替代方法)。
Java 中有一个解决方案:Convert JSON object with duplicate keys to JSON array
有什么方法可以用 awk/bash/python 做到吗?
如果您的输入确实是一个平面 JSON 对象,其值是基元,这应该可行:
jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
对于更复杂的输出,这将需要实际理解 --stream
应该如何使用,这超出了我的范围。
基于圣地亚哥使用 -s --stream
的回答,以下过滤器一次构建一个对象,从而保留键的顺序和特定键的值的顺序:
reduce (.[] | select(length==2)) as $kv ({};
$kv[0][0] as $k
|$kv[1] as $v
| (.[$k]|type) as $t
| if $t == "null" then .[$k] = $v
elif $t == "array" then .[$k] += [$v]
else .[$k] = [ .[$k], $v ]
end)
对于给定的输入,结果是:
{
"a": [
"1",
"3"
],
"b": "2"
}
为了说明每个键的值的顺序被保留,请考虑以下输入:
{
"c" : "C",
"a" : "1",
"b" : "2",
"a" : "3",
"b" : "1"
}
上面的过滤器产生的输出是:
{
"c": "C",
"a": [
"1",
"3"
],
"b": [
"2",
"1"
]
}
基于 peak 的回答,以下过滤器还适用于多对象输入、嵌套对象 和没有 slurp-Option (-s).
这不是对最初问题的回答,但由于此处的 jq-FAQ 链接可能对某些访问者有用
文件jqmergekeys.txt
def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
else $arr end; # input=closing root object => return array
def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on
$kv[0] as $k
| $kv[1] as $v
| (getpath($k)|type) as $t # type of existing value under the given key
| if $t == "null" then setpath($k;$v) # value not existing => set value
elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
end);
def mainloop(f): (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input
#---------------- User code below --------------------------
mergeduplicates(.) # merge duplicate keys in input, without any additional filters
#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards
示例:
tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
这是一个可以很好概括的简单替代方法:
reshape.jq
def augmentpath($path; $value):
getpath($path) as $v
| setpath($path; $v + [$value]);
reduce (inputs | select(length==2)) as $pv
({}; augmentpath($pv[0]; $pv[1]) )
用法
jq -n -f reshape.jq input.json
输出
使用给定的输入:
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
后记
如果避免单例数组很重要,可以修改 augmentpath
的 def,或者可以添加后处理步骤。
实际上,键在 JSON 对象中必须是唯一的(例如 Does JSON syntax allow duplicate keys in an object?)。但是,假设我有一个包含以下内容的文件:
{
"a" : "1",
"b" : "2",
"a" : "3"
}
有没有一种简单的方法可以将重复的键转换为数组?这样文件就变成了:
{
"a" : [ {"key": "1"}, {"key": "3"}],
"b" : "2"
}
或类似的东西,但将重复的键组合成一个数组(或找到提取重复键值的替代方法)。
Java 中有一个解决方案:Convert JSON object with duplicate keys to JSON array
有什么方法可以用 awk/bash/python 做到吗?
如果您的输入确实是一个平面 JSON 对象,其值是基元,这应该可行:
jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
对于更复杂的输出,这将需要实际理解 --stream
应该如何使用,这超出了我的范围。
基于圣地亚哥使用 -s --stream
的回答,以下过滤器一次构建一个对象,从而保留键的顺序和特定键的值的顺序:
reduce (.[] | select(length==2)) as $kv ({};
$kv[0][0] as $k
|$kv[1] as $v
| (.[$k]|type) as $t
| if $t == "null" then .[$k] = $v
elif $t == "array" then .[$k] += [$v]
else .[$k] = [ .[$k], $v ]
end)
对于给定的输入,结果是:
{
"a": [
"1",
"3"
],
"b": "2"
}
为了说明每个键的值的顺序被保留,请考虑以下输入:
{
"c" : "C",
"a" : "1",
"b" : "2",
"a" : "3",
"b" : "1"
}
上面的过滤器产生的输出是:
{
"c": "C",
"a": [
"1",
"3"
],
"b": [
"2",
"1"
]
}
基于 peak 的回答,以下过滤器还适用于多对象输入、嵌套对象 和没有 slurp-Option (-s).
这不是对最初问题的回答,但由于此处的 jq-FAQ 链接可能对某些访问者有用
文件jqmergekeys.txt
def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
else $arr end; # input=closing root object => return array
def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on
$kv[0] as $k
| $kv[1] as $v
| (getpath($k)|type) as $t # type of existing value under the given key
| if $t == "null" then setpath($k;$v) # value not existing => set value
elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
end);
def mainloop(f): (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input
#---------------- User code below --------------------------
mergeduplicates(.) # merge duplicate keys in input, without any additional filters
#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards
示例:
tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
这是一个可以很好概括的简单替代方法:
reshape.jq
def augmentpath($path; $value):
getpath($path) as $v
| setpath($path; $v + [$value]);
reduce (inputs | select(length==2)) as $pv
({}; augmentpath($pv[0]; $pv[1]) )
用法
jq -n -f reshape.jq input.json
输出
使用给定的输入:
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
后记
如果避免单例数组很重要,可以修改 augmentpath
的 def,或者可以添加后处理步骤。