如何将带有字典的 pandas 列中的值移动到仅包含该值的新列中

Question

我有一个 JSON 文件，该文件嵌套很深，我将其规范化为 Pandas 数据帧。结果是所有的键都变成了列，而值变成了行。问题是我有一些列仍然包含字典并且没有正确规范化。数据框有超过 8000 行和 3000 列，因此无法手动完成。

例如：

我将以下列命名为：

Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp

其中包含字典，如下所示：

[{'Desc': 'MISCELLANEOUS', 'ExclusionCd': '01', 'ExclusionAmt': '13'}, {'Desc': 'GRANT REFUNDS', 'RelatedOrExemptFunctionIncmAmt': '159502'}]

如您所见，有一个 Desc ExclusionCd ExclusionAmt 等等

我已经在我的数据框中填充了以这些命名的列：

Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.BusinessCd Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.Desc Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.ExclusionAmt Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.ExclusionCd Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.RelatedOrExemptFunctionIncmAmt Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.UnrelatedBusinessTaxblIncmAmt

如何将这些值移动到它们各自的列中？请记住，我可能有数百个这样的东西，我无法手工完成。它需要自动化。谢谢！

Answer 1

posted 数据显示一条记录，不清楚文件是否包含这些记录的列表。
使用 pandas，用于解析 pd.json_normalize 中 JSON 记录列表的最佳方法。
- 答案中的示例太长 post，但请注意，当您运行这个在本地。
此键 Return.ReturnData.IRS990PF 在 PasteBin 数据示例中不可用。
- 在 IRS990 个键中，只有 IRS990 和 IRS990Schedule... 可用。
解压字典列表值的另一种方法是使用 json_normalize 的 meta 参数
- 请参阅此 answer 以获得一个很好的示例

import pandas as pd
import json
from pathlib import Path

# path to file
p = Path('c:\some_path\test.json')

# read in the JSON file
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# parse with pandas
df = pd.json_normalize(data)

# if there's a list of dictionaries in the resulting dataframe, they can be unpacked with something like
df['Return.ReturnData.IRS990ScheduleO.SupplementalInformationDetail'].apply(pd.json_normalize)

如何将带有字典的 pandas 列中的值移动到仅包含该值的新列中

how to move values from pandas columns with dicts into new columns containing just the value

python

json

pandas

json-normalize