python 将一个 json 结构转换为嵌套结构

Question

如何将下面的json格式转换成下面的目标格式？我有 5 万个条目。
基本上，从每个数组中获取唯一的国家，并将所有其他具有相同国家名称的国家包含在一个数组下。

原版json:

[
    {
        "unilist": [
                {
                    "country": "United States",
                    "name": "The College of New Jersey",
                    "web_page": "http://www.tcnj.edu"
                },
                {
                    "country": "United States",
                    "name": "Abilene Christian University",
                    "web_page": "http://www.acu.edu/"
                },
                {
                    "country": "United States",
                    "name": "Adelphi University",
                    "web_page": "http://www.adelphi.edu/"
                },
                {
                    "country": "China",
                    "name": "Harbin Medical University",
                    "web_page": "http://www.hrbmu.edu.cn/"
                },
                {
                    "country": "China",
                    "name": "Harbin Normal University",
                    "web_page": "http://www.hrbnu.edu.cn/"
                }
                ...
                ]
    }
]

目标格式：

{
"unilist" : {
        "United States" : [
          {"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
          {"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
          {"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
        ],
        "China" : [
          {"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
        ],
        ...
    }
}

更新

我的尝试（在 Python 2.7.11 中）基于，但是它没有按预期工作，我得到以下类型错误：

from collections import defaultdict
import json
from pprint import pprint

with open('old_list.json') as orig_json:    
    newlist = defaultdict(list)

for country in orig_json[0]['unilist']:
    newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})

with open('new_list.json', 'w') as fp:
            json.dump(newlist,fp)


pprint.pprint(dict(newlist))

类型错误：

Traceback (most recent call last):
  File "convert.py", line 8, in <module>
    for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'

Answer 1

这会产生几乎相同的目标输出，只是缺少 "unilist" 键。但至少它确实按国家/地区对条目进行了分组：

import json
from collections import defaultdict

with open('original.json', 'r') as original:
    orig_json = original.read()[1:-1] # Remove outermost list brackets([]) to enable parsing data as JSON data, not a list

oj = json.loads(orig_json)

newlist = defaultdict(list)

for country in oj['unilist']:
    newlist[country['country']].append({'name': country['name'], 
                                        'web_page': country['web_page']})

with open('new.json', 'w') as outfile:
    json.dump(newlist, outfile)

这会将 newlist 保存到 json 文件 'newlist.json'

输出：

{'China': [{'name': 'Harbin Medical University',
            'web_page': 'http://www.hrbmu.edu.cn/'},
           {'name': 'Harbin Normal University',
            'web_page': 'http://www.hrbnu.edu.cn/'}],
 'United States': [{'name': 'The College of New Jersey',
                    'web_page': 'http://www.tcnj.edu'},
                   {'name': 'Abilene Christian University',
                    'web_page': 'http://www.acu.edu/'},
                   {'name': 'Adelphi University',
                    'web_page': 'http://www.adelphi.edu/'}]}

如果我想出更好的方法来获得准确的目标输出，我会更新这个答案。与此同时，我希望这对你有所帮助。

python 将一个 json 结构转换为嵌套结构

python convert one json structure to a nested structure

python

json

converter

更新