jq 通过在输入嵌套数组中查找输出值来转换 JSON 结构

jq transform JSON structure by finding output values in input nested array

首先,我对标题感到抱歉。尽管英语不是我的第一语言,但我什至不知道如何用我的母语来称呼我想要完成的事情。

我想做的是获取输入(通过使用 curl 下载页面自动生成,然后使用非常粗略的方式从 HTML 转换为 JSON pup) 并将其转换成以后更容易使用的东西。输入如下所示:

[
 {
  "children": [
   {
    "class": "label label-info",
    "tag": "span",
    "text": "Lesson"
   },
   {
    "tag": "h2",
    "text": "Is That So?"
   },
   {
    "tag": "p",
    "text": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか."
   },
   {
    "class": "btn btn-primary",
    "href": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so",
    "tag": "a",
    "text": "Read Lesson"
   }
  ],
  "class": "row col-sm-12",
  "tag": "div"
 },
 {
  "children": [
   {
    "class": "label label-warning",
    "tag": "span",
    "text": "Drills"
   },
   {
    "tag": "h2",
    "text": "Yes, That Is So."
   },
   {
    "tag": "p",
    "text": "Practice the phrases and vocab from the lesson, Is That So?"
   }
  ],
  "class": "row col-sm-12",
  "tag": "div"
 }
]

我想要的输出将从每个 object 的 children 数组中提取各种值,如下所示:

[
  {
    "title": "Is That So?", // <-- in other words, find "tag" == "h2" and output "text" value
    "perex": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか.", // "tag" == "p", "text" value
    "type": "lesson", // "tag" == "span", "text" value (lowercased if possible? Not needed though)
    "link": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so" // "tag" == "a", "href" value
  },
  {
    "title": "Yes, That Is So."
    "perex": "Practice the phrases and vocab from the lesson, Is That So?",
    "type": "drills",
    "link": null // Can be missing!
  }
]

我用 select 函数尝试了各种实验,但几乎没有任何可用的结果,所以我不确定我的尝试是否值得分享。

在写上述问题的过程中,我偶然发现了正确的解决方案。我想我也应该在这里分享答案,而不是为自己保留知识。如果这不符合网站规则,请随时删除整个问题和答案(如果是这样的话,我很抱歉)。

select 确实是关键,但在写问题时我没有以正确的方式使用它。这是完成我的需求的完整 jq 命令,展示了上述所有要求:

  • 如何 select 基于搜索 children 数组的嵌套值;
  • 如何将 type 值小写;
  • 如何处理有时缺失 link 值;
  • (我当时没有意识到,但有时我想改变link的形式,所以我也添加了)。
def format(link): if link | tostring | startswith("/") then "https://www.nihongomaster.com" + link else link end;

[.[] | { title: .children[] | select(.tag == "h2").text, type: .children[] | select(.tag == "span").text | ascii_downcase, perex: .children[] | select(.tag == "p").text, link: format(((.children[] | select(.tag == "a").href) // null)) }]

没有什么比橡皮鸭调试更好的了。

下面是原始问题的直接解决方案:

[
  .[]
  | .children
  | { title: [.[] | select(.tag == "h2") | .text][0],
      perex: [.[] | select(.tag == "p") | .text][0],
      type:  [.[] | select(.tag == "span") | .text | ascii_downcase][0],
      link:  [.[] | select(.tag == "a") | .href][0] }
]

这里的重点是用成语[...][0]来处理关于...(包括0)的项数的所有可能性。