jq 通过在输入嵌套数组中查找输出值来转换 JSON 结构
jq transform JSON structure by finding output values in input nested array
首先,我对标题感到抱歉。尽管英语不是我的第一语言,但我什至不知道如何用我的母语来称呼我想要完成的事情。
我想做的是获取输入(通过使用 curl
下载页面自动生成,然后使用非常粗略的方式从 HTML 转换为 JSON pup
) 并将其转换成以后更容易使用的东西。输入如下所示:
[
{
"children": [
{
"class": "label label-info",
"tag": "span",
"text": "Lesson"
},
{
"tag": "h2",
"text": "Is That So?"
},
{
"tag": "p",
"text": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか."
},
{
"class": "btn btn-primary",
"href": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so",
"tag": "a",
"text": "Read Lesson"
}
],
"class": "row col-sm-12",
"tag": "div"
},
{
"children": [
{
"class": "label label-warning",
"tag": "span",
"text": "Drills"
},
{
"tag": "h2",
"text": "Yes, That Is So."
},
{
"tag": "p",
"text": "Practice the phrases and vocab from the lesson, Is That So?"
}
],
"class": "row col-sm-12",
"tag": "div"
}
]
我想要的输出将从每个 object 的 children
数组中提取各种值,如下所示:
[
{
"title": "Is That So?", // <-- in other words, find "tag" == "h2" and output "text" value
"perex": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか.", // "tag" == "p", "text" value
"type": "lesson", // "tag" == "span", "text" value (lowercased if possible? Not needed though)
"link": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so" // "tag" == "a", "href" value
},
{
"title": "Yes, That Is So."
"perex": "Practice the phrases and vocab from the lesson, Is That So?",
"type": "drills",
"link": null // Can be missing!
}
]
我用 select
函数尝试了各种实验,但几乎没有任何可用的结果,所以我不确定我的尝试是否值得分享。
在写上述问题的过程中,我偶然发现了正确的解决方案。我想我也应该在这里分享答案,而不是为自己保留知识。如果这不符合网站规则,请随时删除整个问题和答案(如果是这样的话,我很抱歉)。
select
确实是关键,但在写问题时我没有以正确的方式使用它。这是完成我的需求的完整 jq
命令,展示了上述所有要求:
- 如何 select 基于搜索
children
数组的嵌套值;
- 如何将
type
值小写;
- 如何处理有时缺失
link
值;
- (我当时没有意识到,但有时我想改变
link
的形式,所以我也添加了)。
def format(link): if link | tostring | startswith("/") then "https://www.nihongomaster.com" + link else link end;
[.[] | { title: .children[] | select(.tag == "h2").text, type: .children[] | select(.tag == "span").text | ascii_downcase, perex: .children[] | select(.tag == "p").text, link: format(((.children[] | select(.tag == "a").href) // null)) }]
没有什么比橡皮鸭调试更好的了。
下面是原始问题的直接解决方案:
[
.[]
| .children
| { title: [.[] | select(.tag == "h2") | .text][0],
perex: [.[] | select(.tag == "p") | .text][0],
type: [.[] | select(.tag == "span") | .text | ascii_downcase][0],
link: [.[] | select(.tag == "a") | .href][0] }
]
这里的重点是用成语[...][0]
来处理关于...
(包括0)的项数的所有可能性。
首先,我对标题感到抱歉。尽管英语不是我的第一语言,但我什至不知道如何用我的母语来称呼我想要完成的事情。
我想做的是获取输入(通过使用 curl
下载页面自动生成,然后使用非常粗略的方式从 HTML 转换为 JSON pup
) 并将其转换成以后更容易使用的东西。输入如下所示:
[
{
"children": [
{
"class": "label label-info",
"tag": "span",
"text": "Lesson"
},
{
"tag": "h2",
"text": "Is That So?"
},
{
"tag": "p",
"text": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか."
},
{
"class": "btn btn-primary",
"href": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so",
"tag": "a",
"text": "Read Lesson"
}
],
"class": "row col-sm-12",
"tag": "div"
},
{
"children": [
{
"class": "label label-warning",
"tag": "span",
"text": "Drills"
},
{
"tag": "h2",
"text": "Yes, That Is So."
},
{
"tag": "p",
"text": "Practice the phrases and vocab from the lesson, Is That So?"
}
],
"class": "row col-sm-12",
"tag": "div"
}
]
我想要的输出将从每个 object 的 children
数组中提取各种值,如下所示:
[
{
"title": "Is That So?", // <-- in other words, find "tag" == "h2" and output "text" value
"perex": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか.", // "tag" == "p", "text" value
"type": "lesson", // "tag" == "span", "text" value (lowercased if possible? Not needed though)
"link": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so" // "tag" == "a", "href" value
},
{
"title": "Yes, That Is So."
"perex": "Practice the phrases and vocab from the lesson, Is That So?",
"type": "drills",
"link": null // Can be missing!
}
]
我用 select
函数尝试了各种实验,但几乎没有任何可用的结果,所以我不确定我的尝试是否值得分享。
在写上述问题的过程中,我偶然发现了正确的解决方案。我想我也应该在这里分享答案,而不是为自己保留知识。如果这不符合网站规则,请随时删除整个问题和答案(如果是这样的话,我很抱歉)。
select
确实是关键,但在写问题时我没有以正确的方式使用它。这是完成我的需求的完整 jq
命令,展示了上述所有要求:
- 如何 select 基于搜索
children
数组的嵌套值; - 如何将
type
值小写; - 如何处理有时缺失
link
值; - (我当时没有意识到,但有时我想改变
link
的形式,所以我也添加了)。
def format(link): if link | tostring | startswith("/") then "https://www.nihongomaster.com" + link else link end;
[.[] | { title: .children[] | select(.tag == "h2").text, type: .children[] | select(.tag == "span").text | ascii_downcase, perex: .children[] | select(.tag == "p").text, link: format(((.children[] | select(.tag == "a").href) // null)) }]
没有什么比橡皮鸭调试更好的了。
下面是原始问题的直接解决方案:
[
.[]
| .children
| { title: [.[] | select(.tag == "h2") | .text][0],
perex: [.[] | select(.tag == "p") | .text][0],
type: [.[] | select(.tag == "span") | .text | ascii_downcase][0],
link: [.[] | select(.tag == "a") | .href][0] }
]
这里的重点是用成语[...][0]
来处理关于...
(包括0)的项数的所有可能性。