Pandas 将嵌入 JSON 的 CSV 读入数据框
Pandas read CSV with embedded JSON into dataframe
我需要读入包含 Pandas 的 CSV 文件,CSV 中的其中一列是 JSON 数据。但是,一旦我引入文件,JSON 已损坏(?)并且我无法在其上使用 json_normalize()
。
我无法附加文件,但这里有一些演示问题的示例代码:
df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
{"XYZR":4,"ABCD":4},
{"ASDF":4}]})
pd.json_normalize(df.visits)
# OUTPUTS THE NORMALIZED JSON JUST FINE
df.to_csv('test_visits.csv')
df2 = pd.read_csv('test_visits.csv')
pd.json_normalize(df2.visits)
# RESULTS IN ERROR:
# AttributeError: 'str' object has no attribute 'values'
我在 read_csv()
中是否遗漏了什么使 JSON 可用的东西?
提前致谢。
In [77]: df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
...: {"XYZR":4,"ABCD":4},
...: {"ASDF":4}]})
In [78]: df
Out[78]:
location_id visits
0 1 {'ABCD': 9, 'DEFG': 8, 'ASDF': 6}
1 2 {'XYZR': 4, 'ABCD': 4}
2 3 {'ASDF': 4}
In [79]: pd.json_normalize(df["visits"])
Out[79]:
ABCD DEFG ASDF XYZR
0 9.0 8.0 6.0 NaN
1 4.0 NaN NaN 4.0
2 NaN NaN 4.0 NaN
发生这种情况是因为一旦您写入 csv 并从 csv 中读取它,pandas 会将其读取为字符串。因此,当你试图规范化它时,它会抛出错误说 str
object has no attribute values
because it's not a json object
- 问题是,
'visits'
列是 str
类型(例如 '{"ABCD":9,"DEFG":8,"ASDF":6}'
)。
- 将带有
.read_csv
, use the converters
parameter to apply ast.literal_eval 的 csv 加载到 'visits'
列时,会将 str
转换为 dict
。
converters
: 用于转换某些列中的值的函数字典。键可以是整数或列标签。
from ast import literal_eval
import pandas as pd
# load the csv using the converters parameter with literal_eval
df2 = pd.read_csv('test_visits.csv', converters={'visits': literal_eval})
# normalize the visits, join it to location_id and drop the visits column
df2 = df2.join(pd.json_normalize(df2.visits)).drop(columns=['visits'])
# display(df)
location_id ABCD DEFG ASDF XYZR
0 1 9.0 8.0 6.0 NaN
1 2 4.0 NaN NaN 4.0
2 3 NaN NaN 4.0 NaN
我需要读入包含 Pandas 的 CSV 文件,CSV 中的其中一列是 JSON 数据。但是,一旦我引入文件,JSON 已损坏(?)并且我无法在其上使用 json_normalize()
。
我无法附加文件,但这里有一些演示问题的示例代码:
df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
{"XYZR":4,"ABCD":4},
{"ASDF":4}]})
pd.json_normalize(df.visits)
# OUTPUTS THE NORMALIZED JSON JUST FINE
df.to_csv('test_visits.csv')
df2 = pd.read_csv('test_visits.csv')
pd.json_normalize(df2.visits)
# RESULTS IN ERROR:
# AttributeError: 'str' object has no attribute 'values'
我在 read_csv()
中是否遗漏了什么使 JSON 可用的东西?
提前致谢。
In [77]: df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
...: {"XYZR":4,"ABCD":4},
...: {"ASDF":4}]})
In [78]: df
Out[78]:
location_id visits
0 1 {'ABCD': 9, 'DEFG': 8, 'ASDF': 6}
1 2 {'XYZR': 4, 'ABCD': 4}
2 3 {'ASDF': 4}
In [79]: pd.json_normalize(df["visits"])
Out[79]:
ABCD DEFG ASDF XYZR
0 9.0 8.0 6.0 NaN
1 4.0 NaN NaN 4.0
2 NaN NaN 4.0 NaN
发生这种情况是因为一旦您写入 csv 并从 csv 中读取它,pandas 会将其读取为字符串。因此,当你试图规范化它时,它会抛出错误说 str
object has no attribute values
because it's not a json object
- 问题是,
'visits'
列是str
类型(例如'{"ABCD":9,"DEFG":8,"ASDF":6}'
)。 - 将带有
.read_csv
, use theconverters
parameter to apply ast.literal_eval 的 csv 加载到'visits'
列时,会将str
转换为dict
。converters
: 用于转换某些列中的值的函数字典。键可以是整数或列标签。
from ast import literal_eval
import pandas as pd
# load the csv using the converters parameter with literal_eval
df2 = pd.read_csv('test_visits.csv', converters={'visits': literal_eval})
# normalize the visits, join it to location_id and drop the visits column
df2 = df2.join(pd.json_normalize(df2.visits)).drop(columns=['visits'])
# display(df)
location_id ABCD DEFG ASDF XYZR
0 1 9.0 8.0 6.0 NaN
1 2 4.0 NaN NaN 4.0
2 3 NaN NaN 4.0 NaN