使用 Python 拆分 JSON 文件的最简单方法
Easiest way to split JSON file using Python
我正在研究 2015 年到 2020 年世界幸福报告的交互式可视化。数据被分成 6 个 csv 文件。使用 pandas,我已成功清理数据并将它们连接成一个大的 JSON 文件,格式如下:
[
{
"Country": "Switzerland",
"Year": 2015,
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001,
},
{
"Country": "Iceland",
"Year": 2015,
"Happiness Rank": 2,
"Happiness Score": 7.561,
},
{
"Country": "Switzerland",
"Year": 2016,
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995,
},
{
"Country": "Iceland",
"Year": 2016,
"Happiness Rank": 3,
"Happiness Score": 7.501,
},
{
"Country": "Switzerland",
"Year": 2017,
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833,
},
{
"Country": "Iceland",
"Year": 2017,
"Happiness Rank": 1,
"Happiness Score": 7.801,
}
]
现在,我想以编程方式格式化 JSON 文件,使其具有以下格式:
{
"2015": {
"Switzerland": {
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001
},
"Iceland": {
"Happiness Rank": 2,
"Happiness Score": 7.561
}
},
"2016": {
"Switzerland": {
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995
},
"Iceland": {
"Happiness Rank": 3,
"Happiness Score": 7.501
}
},
"2017": {
"Switzerland": {
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833
},
"Iceland": {
"Happiness Rank": 1,
"Happiness Score": 7.801
}
}
}
它必须以编程方式完成,因为有 900 多个不同的(国家、年份)对。我想要这种格式的 JSON,因为它使 JSON 文件更具可读性,并且更容易 select 适当的数据。如果我想要 2015 年冰岛的排名,我可以做 data[2015]["Iceland"]["Happiness Rank"]
有谁知道在 Python 中最简单/最方便的方法吗?
我假设您拥有创建此 JSON 的原始 pandas 数据框。使用pandas,你可以做到df = df.groupby(['Year', 'Country'])
。然后您可以按照 pandas groupby to nested json 中的过程将其转换为 JSON.
您可能会发现 itertools 模块中的 groupby
很有用。我能够用
做到这一点
import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}
其中data
是你给出的例子形式的数据
它会保留dict中原来的字段,但是这样可以方便的删除
for countries in newdict.values():
for c in countries.values():
del c["Year"]
del c["Country"]
如果 data
是您的原始词典列表:
def by_year(data):
from itertools import groupby
from operator import itemgetter
retain_keys = ("Happiness Rank", "Happiness Score")
for year, group in groupby(data, key=itemgetter("Year")):
as_tpl = tuple(group)
yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))
print(dict(by_year(data)))
输出:
{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>>
这假设 data
中的词典已经按年份分组。
我正在研究 2015 年到 2020 年世界幸福报告的交互式可视化。数据被分成 6 个 csv 文件。使用 pandas,我已成功清理数据并将它们连接成一个大的 JSON 文件,格式如下:
[
{
"Country": "Switzerland",
"Year": 2015,
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001,
},
{
"Country": "Iceland",
"Year": 2015,
"Happiness Rank": 2,
"Happiness Score": 7.561,
},
{
"Country": "Switzerland",
"Year": 2016,
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995,
},
{
"Country": "Iceland",
"Year": 2016,
"Happiness Rank": 3,
"Happiness Score": 7.501,
},
{
"Country": "Switzerland",
"Year": 2017,
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833,
},
{
"Country": "Iceland",
"Year": 2017,
"Happiness Rank": 1,
"Happiness Score": 7.801,
}
]
现在,我想以编程方式格式化 JSON 文件,使其具有以下格式:
{
"2015": {
"Switzerland": {
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001
},
"Iceland": {
"Happiness Rank": 2,
"Happiness Score": 7.561
}
},
"2016": {
"Switzerland": {
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995
},
"Iceland": {
"Happiness Rank": 3,
"Happiness Score": 7.501
}
},
"2017": {
"Switzerland": {
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833
},
"Iceland": {
"Happiness Rank": 1,
"Happiness Score": 7.801
}
}
}
它必须以编程方式完成,因为有 900 多个不同的(国家、年份)对。我想要这种格式的 JSON,因为它使 JSON 文件更具可读性,并且更容易 select 适当的数据。如果我想要 2015 年冰岛的排名,我可以做 data[2015]["Iceland"]["Happiness Rank"]
有谁知道在 Python 中最简单/最方便的方法吗?
我假设您拥有创建此 JSON 的原始 pandas 数据框。使用pandas,你可以做到df = df.groupby(['Year', 'Country'])
。然后您可以按照 pandas groupby to nested json 中的过程将其转换为 JSON.
您可能会发现 itertools 模块中的 groupby
很有用。我能够用
import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}
其中data
是你给出的例子形式的数据
它会保留dict中原来的字段,但是这样可以方便的删除
for countries in newdict.values():
for c in countries.values():
del c["Year"]
del c["Country"]
如果 data
是您的原始词典列表:
def by_year(data):
from itertools import groupby
from operator import itemgetter
retain_keys = ("Happiness Rank", "Happiness Score")
for year, group in groupby(data, key=itemgetter("Year")):
as_tpl = tuple(group)
yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))
print(dict(by_year(data)))
输出:
{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>>
这假设 data
中的词典已经按年份分组。