python 将一个 json 结构转换为嵌套结构
python convert one json structure to a nested structure
如何将下面的json格式转换成下面的目标格式?我有 5 万个条目。
基本上,从每个数组中获取唯一的国家,并将所有其他具有相同国家名称的国家包含在一个数组下。
原版json:
[
{
"unilist": [
{
"country": "United States",
"name": "The College of New Jersey",
"web_page": "http://www.tcnj.edu"
},
{
"country": "United States",
"name": "Abilene Christian University",
"web_page": "http://www.acu.edu/"
},
{
"country": "United States",
"name": "Adelphi University",
"web_page": "http://www.adelphi.edu/"
},
{
"country": "China",
"name": "Harbin Medical University",
"web_page": "http://www.hrbmu.edu.cn/"
},
{
"country": "China",
"name": "Harbin Normal University",
"web_page": "http://www.hrbnu.edu.cn/"
}
...
]
}
]
目标格式:
{
"unilist" : {
"United States" : [
{"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
{"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
{"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
],
"China" : [
{"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
],
...
}
}
更新
我的尝试(在 Python 2.7.11 中)基于 ,但是它没有按预期工作,我得到以下类型错误:
from collections import defaultdict
import json
from pprint import pprint
with open('old_list.json') as orig_json:
newlist = defaultdict(list)
for country in orig_json[0]['unilist']:
newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})
with open('new_list.json', 'w') as fp:
json.dump(newlist,fp)
pprint.pprint(dict(newlist))
类型错误:
Traceback (most recent call last):
File "convert.py", line 8, in <module>
for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'
这会产生 几乎 相同的目标输出,只是缺少 "unilist"
键。但至少它确实按国家/地区对条目进行了分组:
import json
from collections import defaultdict
with open('original.json', 'r') as original:
orig_json = original.read()[1:-1] # Remove outermost list brackets([]) to enable parsing data as JSON data, not a list
oj = json.loads(orig_json)
newlist = defaultdict(list)
for country in oj['unilist']:
newlist[country['country']].append({'name': country['name'],
'web_page': country['web_page']})
with open('new.json', 'w') as outfile:
json.dump(newlist, outfile)
这会将 newlist
保存到 json 文件 'newlist.json'
输出:
{'China': [{'name': 'Harbin Medical University',
'web_page': 'http://www.hrbmu.edu.cn/'},
{'name': 'Harbin Normal University',
'web_page': 'http://www.hrbnu.edu.cn/'}],
'United States': [{'name': 'The College of New Jersey',
'web_page': 'http://www.tcnj.edu'},
{'name': 'Abilene Christian University',
'web_page': 'http://www.acu.edu/'},
{'name': 'Adelphi University',
'web_page': 'http://www.adelphi.edu/'}]}
如果我想出更好的方法来获得准确的目标输出,我会更新这个答案。与此同时,我希望这对你有所帮助。
如何将下面的json格式转换成下面的目标格式?我有 5 万个条目。
基本上,从每个数组中获取唯一的国家,并将所有其他具有相同国家名称的国家包含在一个数组下。
原版json:
[
{
"unilist": [
{
"country": "United States",
"name": "The College of New Jersey",
"web_page": "http://www.tcnj.edu"
},
{
"country": "United States",
"name": "Abilene Christian University",
"web_page": "http://www.acu.edu/"
},
{
"country": "United States",
"name": "Adelphi University",
"web_page": "http://www.adelphi.edu/"
},
{
"country": "China",
"name": "Harbin Medical University",
"web_page": "http://www.hrbmu.edu.cn/"
},
{
"country": "China",
"name": "Harbin Normal University",
"web_page": "http://www.hrbnu.edu.cn/"
}
...
]
}
]
目标格式:
{
"unilist" : {
"United States" : [
{"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
{"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
{"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
],
"China" : [
{"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
],
...
}
}
更新
我的尝试(在 Python 2.7.11 中)基于
from collections import defaultdict
import json
from pprint import pprint
with open('old_list.json') as orig_json:
newlist = defaultdict(list)
for country in orig_json[0]['unilist']:
newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})
with open('new_list.json', 'w') as fp:
json.dump(newlist,fp)
pprint.pprint(dict(newlist))
类型错误:
Traceback (most recent call last):
File "convert.py", line 8, in <module>
for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'
这会产生 几乎 相同的目标输出,只是缺少 "unilist"
键。但至少它确实按国家/地区对条目进行了分组:
import json
from collections import defaultdict
with open('original.json', 'r') as original:
orig_json = original.read()[1:-1] # Remove outermost list brackets([]) to enable parsing data as JSON data, not a list
oj = json.loads(orig_json)
newlist = defaultdict(list)
for country in oj['unilist']:
newlist[country['country']].append({'name': country['name'],
'web_page': country['web_page']})
with open('new.json', 'w') as outfile:
json.dump(newlist, outfile)
这会将 newlist
保存到 json 文件 'newlist.json'
输出:
{'China': [{'name': 'Harbin Medical University',
'web_page': 'http://www.hrbmu.edu.cn/'},
{'name': 'Harbin Normal University',
'web_page': 'http://www.hrbnu.edu.cn/'}],
'United States': [{'name': 'The College of New Jersey',
'web_page': 'http://www.tcnj.edu'},
{'name': 'Abilene Christian University',
'web_page': 'http://www.acu.edu/'},
{'name': 'Adelphi University',
'web_page': 'http://www.adelphi.edu/'}]}
如果我想出更好的方法来获得准确的目标输出,我会更新这个答案。与此同时,我希望这对你有所帮助。