使用正则表达式替换 json 中的额外引号
use regular expression to replace extra quote inside json
我的 json 字符串中出现意外引用,导致 json.loads(jstr) 失败。
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
所以我想用正则表达式来匹配删除"content"值里面的引号。我在 other solution:
中尝试了一些东西
import re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
pa = re.compile(r'(:\s+"[^"]*)"(?=[^"]*",)')
pa.findall(json_str)
[out]: []
有什么方法可以修复字符串吗?
正如@jonrsharpe 所指出的,清理源代码会好得多。
也就是说,如果您无法控制额外引用的来源,您可以使用 (*SKIP)(*FAIL)
使用较新的 regex
模块和否定。像这样环顾四周:
"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])
在 Python
:
import json, regex as re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
# clean the json
rx = re.compile('''"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])''')
json_str = rx.sub('', json_str)
# load it
json = json.loads(json_str)
print(json['id'])
# 9
我使用的可能解决方案:
whole = []
count = 0
with open(filename) as fin:
for eachline in fin:
pa = re.compile(r'"content":\s?"(.*?","\w)')
for s in pa.findall(eachline):
s = s[:-4]
s_fix = s.replace("\"","")
eachline = eachline.replace(s,s_fix)
data = json.loads(eachline)
whole.append(data)
我的 json 字符串中出现意外引用,导致 json.loads(jstr) 失败。
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
所以我想用正则表达式来匹配删除"content"值里面的引号。我在 other solution:
中尝试了一些东西import re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
pa = re.compile(r'(:\s+"[^"]*)"(?=[^"]*",)')
pa.findall(json_str)
[out]: []
有什么方法可以修复字符串吗?
正如@jonrsharpe 所指出的,清理源代码会好得多。
也就是说,如果您无法控制额外引用的来源,您可以使用 (*SKIP)(*FAIL)
使用较新的 regex
模块和否定。像这样环顾四周:
"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])
在
Python
:
import json, regex as re
json_str = '''{"id":"9","ctime":"2018-02-13","content":"abcd: "efg.","hots":"103b","date_sms":"2017-11-22"}'''
# clean the json
rx = re.compile('''"[^"]+":\s*"[^"]+"[,}]\s*(*SKIP)(*FAIL)|(?<![,:])"(?![:,]\s*["}])''')
json_str = rx.sub('', json_str)
# load it
json = json.loads(json_str)
print(json['id'])
# 9
我使用的可能解决方案:
whole = []
count = 0
with open(filename) as fin:
for eachline in fin:
pa = re.compile(r'"content":\s?"(.*?","\w)')
for s in pa.findall(eachline):
s = s[:-4]
s_fix = s.replace("\"","")
eachline = eachline.replace(s,s_fix)
data = json.loads(eachline)
whole.append(data)