使用 python 和 pandas 数据框将复杂的 json 转换为 csv
complex json to csv using python and pandas dataframe
我知道这个问题已被问过很多次,但我仍然无法将其转换为 json。
我的 json 文件看起来像这样:
{
"itemCostPrices": {
"Id": 1,
"costPrices": [{
"costPrice": 83.56,
"currencyCode": "GBP",
"startDateValid": "2010-09-06",
"endDateValid": "2011-05-01",
"postCalculatedCostPriceFlag": false,
"promoCostPriceFlag": true
}]
},
"eventId": null,
"eventDateTime": null
}
试试这个代码:
import json
import pandas as pd
def flatten_dict(d, acc={}):
for k, v in d.items():
if isinstance(v, dict):
flatten_dict(v, acc)
elif isinstance(v, list):
for l in v:
flatten_dict(l, acc)
else:
acc[k] = v
return acc
with open('tmp.json') as f:
data = json.load(f)
df = pd.DataFrame([flatten_dict(d, {}) for d in data])
df.to_csv('tmp.csv', index=False)
代码解释:
1) 读取 json 文件并将其导入字典:
with open('tmp.json') as f:
data = json.load(f)
你得到:
[{'eventDateTime': None,
'eventId': None,
'itemCostPrices': {'Id': 1,
'costPrices': [{'costPrice': 83.56,
'currencyCode': 'GBP',
'endDateValid': '2011-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'startDateValid': '2010-09-06'}]}},
{'eventDateTime': None,
'eventId': None,
'itemCostPrices': {'Id': 2,
'costPrices': [{'costPrice': 99.56,
'currencyCode': 'EUR',
'endDateValid': '2017-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'startDateValid': '2018-09-06'}]}}]
2) 压平字典:
flat_data = [flatten_dict(d, {}) for d in data]
你会得到下面的扁平字典列表:
[{'Id': 1,
'costPrice': 83.56,
'currencyCode': 'GBP',
'startDateValid': '2010-09-06',
'endDateValid': '2011-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'eventId': None,
'eventDateTime': None},
{'Id': 2,
'costPrice': 99.56,
'currencyCode': 'EUR',
'startDateValid': '2018-09-06',
'endDateValid': '2017-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'eventId': None,
'eventDateTime': None}]
3) 在 pandas 数据框
中加载字典
df = pd.DataFrame(flat_data)
你得到:
Id costPrice currencyCode endDateValid eventDateTime eventId postCalculatedCostPriceFlag promoCostPriceFlag startDateValid
0 1 83.56 GBP 2011-05-01 None None False True 2010-09-06
1 2 99.56 EUR 2017-05-01 None None False True 2018-09-06
4) 将数据帧保存为 csv
df.to_csv('tmp.csv', index=False)
我知道这个问题已被问过很多次,但我仍然无法将其转换为 json。
我的 json 文件看起来像这样:
{
"itemCostPrices": {
"Id": 1,
"costPrices": [{
"costPrice": 83.56,
"currencyCode": "GBP",
"startDateValid": "2010-09-06",
"endDateValid": "2011-05-01",
"postCalculatedCostPriceFlag": false,
"promoCostPriceFlag": true
}]
},
"eventId": null,
"eventDateTime": null
}
试试这个代码:
import json
import pandas as pd
def flatten_dict(d, acc={}):
for k, v in d.items():
if isinstance(v, dict):
flatten_dict(v, acc)
elif isinstance(v, list):
for l in v:
flatten_dict(l, acc)
else:
acc[k] = v
return acc
with open('tmp.json') as f:
data = json.load(f)
df = pd.DataFrame([flatten_dict(d, {}) for d in data])
df.to_csv('tmp.csv', index=False)
代码解释:
1) 读取 json 文件并将其导入字典:
with open('tmp.json') as f:
data = json.load(f)
你得到:
[{'eventDateTime': None,
'eventId': None,
'itemCostPrices': {'Id': 1,
'costPrices': [{'costPrice': 83.56,
'currencyCode': 'GBP',
'endDateValid': '2011-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'startDateValid': '2010-09-06'}]}},
{'eventDateTime': None,
'eventId': None,
'itemCostPrices': {'Id': 2,
'costPrices': [{'costPrice': 99.56,
'currencyCode': 'EUR',
'endDateValid': '2017-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'startDateValid': '2018-09-06'}]}}]
2) 压平字典:
flat_data = [flatten_dict(d, {}) for d in data]
你会得到下面的扁平字典列表:
[{'Id': 1,
'costPrice': 83.56,
'currencyCode': 'GBP',
'startDateValid': '2010-09-06',
'endDateValid': '2011-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'eventId': None,
'eventDateTime': None},
{'Id': 2,
'costPrice': 99.56,
'currencyCode': 'EUR',
'startDateValid': '2018-09-06',
'endDateValid': '2017-05-01',
'postCalculatedCostPriceFlag': False,
'promoCostPriceFlag': True,
'eventId': None,
'eventDateTime': None}]
3) 在 pandas 数据框
中加载字典df = pd.DataFrame(flat_data)
你得到:
Id costPrice currencyCode endDateValid eventDateTime eventId postCalculatedCostPriceFlag promoCostPriceFlag startDateValid
0 1 83.56 GBP 2011-05-01 None None False True 2010-09-06
1 2 99.56 EUR 2017-05-01 None None False True 2018-09-06
4) 将数据帧保存为 csv
df.to_csv('tmp.csv', index=False)