在 python 中将带有子字段的 Json 转换为 CSV

Question

我有一个示例 JSON 输出的文件如下所示： jsonoutput.txt 文件：

[{"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"}

{"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}]

我需要输出 csv 作为（excel 文件）：

fruit id countries color
orange 1 Portugal Orange
apple 2  Spain     red

现在，我越来越像水果编号国家颜色 orange 1 [u'Portugal'] 橙色苹果 2 [u'Spain'] 红

如何从国家/地区列中删除 [] 、 u 和 ''？

print (json.dumps(fruits))——给我 json 输出

这是我尝试将 json 转换为 xlsx 的内容：

data= tablib.Dataset(headers=('Fruit','id','Countries','Color'))
importfile = 'jsonoutput.txt'
data.json = open(importfile. 'r').read()
data_export = data.export('xlsx')
with open('output.xlsx','wb') as f:
    f.write(data_export)
    f.close()

Answer 1

你可以使用 pandas.io.json.json_normalize

import pandas as pd
from pandas.io.json import json_normalize

d = [
    {"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"},
    {"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}
]

df = pd.concat([json_normalize(d[i]) for i in range(len(d))], ignore_index=True)
df['countries'] = df['countries'].str.join(' ')

    fruit   id  countries   color
0   orange  1   Portugal    Orange
1   apple   2   Portugal    red

要将其保存为 .xlsx 文件，请使用：

df.to_excel('filename.xlsx', index=False)

编辑：

json_normalize 是将半结构化 JSON 数据规范化为平面 table.

的函数

我现在真正意识到我的代码可以简化为：

df = json_normalize(d) # no need for `pd.concat`

### Output:
#   fruit   id  countries   color
# 0 orange  1   ['Portugal']    Orange
# 1 apple   2   ['Portugal']    red

要从 countries 列中删除 []，我使用了 pandas.Series.str.join, which is pandas' equivalent to Python's str.join。

这是必需的，因为最初 countries 列是一个包含元素

的列表

df['countries'] = df['countries'].str.join(' ')

countries 加入项目后，列不再是列表：

    fruit   id  countries   color
0   orange  1   Portugal    Orange
1   apple   2   Portugal    red

在 python 中将带有子字段的 Json 转换为 CSV

Convert Json with sub fields to CSV in python

python

csv

json

tablib