根据特定字符串删除 json 行

Remove json lines based on specific string

我有一个 json 文件,内容如下:

[{"headline":"Ntugamo court issues criminal summons against Rukutana",
"url_src":"\/news\/headlines\/67240-ntugamo-court-issues-criminal-summons-against-rukutana"},
{"headline":"Corruption: Equal Opportunities Commission boss granted bail",
"url_src":"\/news\/headlines\/67239-corruption-equal-opportunities-commission-boss-granted-bail"},
{"headline":"Bobi Wine to launch corruption manifesto in Mbarara rejects EC security team",
"url_src":"https:\/\/www.monitor.co.ug"}]

我正在尝试查找并删除 {} 中包含“腐败”一词的所有部分,包括大括号本身。

例如,在这种情况下,.py 脚本将删除

{"headline":"Corruption: Equal Opportunities Commission boss granted bail",

"url_src":"/news/headlines/67239-corruption-equal-opportunities-commission-boss-granted-bail"}

并删除

{"headline":"Bobi Wine to launch corruption manifesto in Mbarara rejects EC security team","url_src":"https:\/\/www.monitor.co.ug"}

Python 2.7 可以吗?

您可以使用 list comprehension 遍历 list 中的每个 dict

在每次迭代中将每个字典转换为字符串,并使用if "corruption" not in str(d).lower()检查字符串"corruption"是否在小写字符串中。如果没有,那就保留它:

import json

with open("j.json", "rb") as f:
    lst = json.load(f)

lst = [d for d in lst if "corruption" not in str(d).lower()]

print(lst)

输出:

[{'headline': 'Ntugamo court issues criminal summons against Rukutana',
  'url_src': '/news/headlines/67240-ntugamo-court-issues-criminal-summons-against-rukutana'}]

如果要将列表写回 json 文件,请使用 json.dump:

with open("j.json", "w", encoding="utf8") as f:
    json.dump(lst, f)