合并 json 个具有重复键的数组

Merge json arrays with duplicate keys

我想在 jq 的帮助下合并两个 json 数组。数组中的每个对象都包含名称字段,这允许我将两个数组分组并合并为一个。

标签

[
  {
    "name": "power_branch",
    "description": "master"
  },
  {
    "name": "test_branch",
    "description": "main"
  }
]

跑步者

[
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "runner": "tester",
    "runner_tag": ""
  },
  {
    "name": "development",
    "runner": "dev",
    "runner_tag": "ubuntu"
  }
]

期望的输出

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]

我尝试使用以下脚本,但 power_branch 条目被覆盖,相反我想要另一个具有不同 runner_tag

的条目
#!/usr/bin/bash

LABELS='[{"name": "power_branch","description": "master"},{"name": "test_branch","description": "main"}]'
RUNNERS='''
[
  { "name": "power_branch", "runner": "power", "runner_tag": "macos" },
  { "name": "power_branch", "runner": "power", "runner_tag": "ubuntu" },
  { "name": "test_branch", "runner": "tester", "runner_tag": "" },
  { "name": "development", "runner": "dev", "runner_tag": "ubuntu" }
]
'''

FINAL=$(jq -s '[ .[0] + .[1] | group_by(.name)[] | select(length > 1) | add]' <(echo $LABELS) <(echo $RUNNERS))
echo $FINAL

输出

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]

如果您有两个文件 labels.jsonrunners.json,您可以使用 --argjson 将后者(跑步者)作为变量读取并附加到输入数组的每个元素(标签)使用 mapselect.

确定的相应字段
jq --argjson runners "$(cat runners.json)" '
  map(.name as $name | . + ($runners[] | select(.name == $name)))
' labels.json

但是,这会将整个 runners 数组读入您的 shell 命令行 space(--argjson 需要两个字符串:一个名称和一个值)如果 runners 数组足够大,它很容易溢出.

因此,而不是使用 command substitution "$(…)", you could read in the runners file directly using either --slurpfile for the cost of another iteration level [][], or (despite the manual 说不要 - 在评论中阅读更多相关信息)使用 --argfile 和以前一样只有一个迭代级别:

jq --slurpfile runners runners.json '
  map(.name as $name | . + ($runners[][] | select(.name == $name)))
' labels.json
jq --argfile runners runners.json '
  map(.name as $name | . + ($runners[] | select(.name == $name)))
' labels.json

为了避免所有这些问题,@peak 对每个文件使用 input-n 选项。请注意,这需要按照顺序读取这两个文件的确切顺序。

jq -n 'input as $runners | input |
  map(.name as $name | . + ($runners[] | select(.name == $name)))
' runners.json labels.json

由于第二个 input(标签)直接作为过滤器的主要输入传递(与跑步者相反,跑步者存储在变量中供以后使用),这可以通过再次删除来进一步简化-n 选项(文件的顺序仍然很重要):

jq 'input as $runners |
  map(.name as $name | . + ($runners[] | select(.name == $name)))
' runners.json labels.json

最后,这里还有另一种使用 SQL-style operators INDEXJOIN 的方法,它们是在 jq v1.6 中引入的。这也采用了仅使用一个 input 的技术,而且文件的顺序仍然很重要,因为我们需要 runners 数组作为过滤器的主要输入。

jq '
  JOIN(INDEX(input[]; .name); .name) | map(select(.[1]) | add)
' runners.json labels.json