根据特定约束使用 jq 转换 json

Question

我有一个 json 文件 'OpenEnded_mscoco_val2014.json'。json 文件包含 121,512 个问题。
这是一些示例：

"questions": [
{
  "question": "What is the table made of?",
  "image_id": 350623,
  "question_id": 3506232
},
{
  "question": "Is the food napping on the table?",
  "image_id": 350623,
  "question_id": 3506230
},
{
  "question": "What has been upcycled to make lights?",
  "image_id": 350623,
  "question_id": 3506231
},
{
  "question": "Is this an Spanish town?",
  "image_id": 8647,
  "question_id": 86472
}

]

我使用 jq -r '.questions | [map(.question), map(.image_id), map(.question_id)] | @csv' OpenEnded_mscoco_val2014_questions.json >> temp.csv 将 json 转换为 csv。
但是这里 csv 中的输出是问题，后面跟着 image_id，这就是上面的代码所做的。
预期输出是：

"What is table made of",350623,3506232
"Is the food napping on the table?",350623,3506230

是否可以仅过滤具有 image_id <= 10000 和 group questions having same image_id 的结果？例如json的1,2,3结果可以组合成3题，1image_id，3question_id.

编辑：第一个问题由 possible duplicate question 解决。我想知道是否可以在 jq 的命令行上调用比较运算符来转换 json 文件。在这种情况下，如果仅 image_id <= 10000，则从 json 获取所有字段。

Answer 1

1) 给定您的输入（适当详细说明以使其有效 JSON），以下查询生成 CSV 输出，如下所示：

$ jq -r '.questions[] | [.question, .image_id, .question_id] | @csv'

"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472

这里要记住的关键是@csv 需要一个平面数组，但与所有 jq 过滤器一样，您可以为其提供一个流。

2) 要使用标准 .image_id <= 10000 进行过滤，只需插入适当的 select/1 过滤器：

.questions[]
| select(.image_id <= 10000)
| [.question, .image_id, .question_id]
| @csv

3) 要按 image_id 排序，请使用 sort_by(.image_id)

.questions
| sort_by(.image_id)
|.[]
| [.question, .image_id, .question_id]
| @csv

4) 要按 .image_id 分组，您可以将以下管道的输出通过管道传输到您自己的管道中：

.questions | group_by(.image_id)

但是，您必须准确决定组合对象的方式。

Answer 2

加上-r选项，下面的过滤器

  .questions[] | [ .[] ] | @csv

产生

"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472

要过滤数据，请使用 select。例如。使用 -r 选项，以下过滤器

  .questions[] | select(.image_id <= 10000) | [ .[] ] | @csv

产生子集

"Is this an Spanish town?",8647,86472

要对数据进行分组，请使用 group_by。以下过滤器

    .questions
  | group_by(.image_id)[]
  | [ .[] | [ .[] ] | @csv ]

生成分组数据

[
  "\"Is this an Spanish town?\",8647,86472"
]
[
  "\"What is the table made of?\",350623,3506232",
  "\"Is the food napping on the table?\",350623,3506230",
  "\"What has been upcycled to make lights?\",350623,3506231"
]

这在这种形式中不是很有用，可能不是您想要的，但它演示了基本方法。

根据特定约束使用 jq 转换 json

Convert json using jq based on specific constraints

python

csv

json

filtering

jq