如何从文件之间存在细微变化的多个重复 json 字段中删除一段文本?
How do I remove a block of text from mutiple repetitive json files where there is a small change between the files?
我有一个包含重复部分的 json 文件,我正在尝试编写一个脚本来从多个文件中删除特定的文本块。 Python 脚本将是最优选的,否则从我的搜索中 sed 也可以工作,尽管我对此一无所知。
这是我的 json 文件的格式示例:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
- 如何从 json 文件中删除以下内容?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
我的另一个问题是,
2. 我如何调整脚本以说明跨多个文件的不同“FindMe”Url?例如,第二个文件将具有以下内容,多个文件依此类推?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
"Description": "There Are Approximately 5,000 Mammal Species."
},
我认为使用正则表达式会有所帮助,但我无法理解它们并在脚本中实现它们。
感谢任何帮助,谢谢。
更新:
我希望最终结果如下所示:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
假设您的完整 JSON 包含字典列表(您的示例建议),那么:
JSON = {"data": [{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}
JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']
print(JSON)
这可能适合您 (GNU sed):
sed '/^\s*{/{:a;N;/^\(\s*\){.*\n},/!ba;/"Type_species": "Mammal"/d}' file
收集每只动物的详细信息,如果动物包含 "Type_species": "Mammal"
,则将其移除。
我有一个包含重复部分的 json 文件,我正在尝试编写一个脚本来从多个文件中删除特定的文本块。 Python 脚本将是最优选的,否则从我的搜索中 sed 也可以工作,尽管我对此一无所知。 这是我的 json 文件的格式示例:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
- 如何从 json 文件中删除以下内容?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
我的另一个问题是, 2. 我如何调整脚本以说明跨多个文件的不同“FindMe”Url?例如,第二个文件将具有以下内容,多个文件依此类推?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
"Description": "There Are Approximately 5,000 Mammal Species."
},
我认为使用正则表达式会有所帮助,但我无法理解它们并在脚本中实现它们。
感谢任何帮助,谢谢。
更新: 我希望最终结果如下所示:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
假设您的完整 JSON 包含字典列表(您的示例建议),那么:
JSON = {"data": [{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}
JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']
print(JSON)
这可能适合您 (GNU sed):
sed '/^\s*{/{:a;N;/^\(\s*\){.*\n},/!ba;/"Type_species": "Mammal"/d}' file
收集每只动物的详细信息,如果动物包含 "Type_species": "Mammal"
,则将其移除。