如何将多个 json 部分（全部在一个文件中）导入到 python/pandas

Question

我正在尝试处理一个 json 文件，以便在另一个将使用 excel 文件的程序中使用。我的 json 文件有多个 sections/arrays，其中一个用于一些基本信息，例如记录数、报告名称。它具有 sections/arrays 作为列名，然后另一个具有每个单独的 row/record 作为数组。

我试过使用 pandas.read_json 和 json_loads 选项来读取数据，但我不断出错。如果我删除除一个（如行）部分以外的所有部分，我可以让它读取（尽管它将所有列放在一列中，就像它没有将用逗号分隔的每个列名称识别为单独的列一样。

理想情况下，我不想手动编辑此文件，只需在 python 中将其处理成 pandas 数据框，这样我就可以进一步操作它并将其导出以供其他用途。

如有任何帮助，我们将不胜感激。这是我的第一个 post，所以如果有什么我可以做得更好的，请告诉我！

这是 json 数据的表示，实际数据有更多的列和更多 rows/records（通常为 700+）

{
"count": 2,
"name": "report1",
"columnNames": [
    "Incident ID",
    "Status",
    "Subject"
],
"rows": [
    [
        "2460636",
        "Resolved",
        "login help"
    ],
    [
        "2460637",
        "Unresolved",
        "email help"
    ]
    ]
}

我试图让 columnNames 部分成为 pandas 数据框中的列名称，并且每个 "row" 成为数据框中的一条记录。

我试过查看其他示例，但我没有遇到与 json 格式类似的问题。

我试过使用 pandas.read_json("example.json") 以及 json.loads 来加载数据以获取数据，但它们都出现了不同的错误，我可以好像绕不开了。

当运行时 pandas.read_json("example.json") 它返回说 "arrays must all be same length"。

结果应该是 columnNames section/array 应该是 pandas 数据框的列名然后每个 "row" 我想成为数据框中的一条记录。

Answer 1

我将为您提供一般方法，我认为您可以以此为基础。

使用三个虚拟列名称创建一个 pandas 数据框
根据您的要求插入所有行。
使用 json 的 columnNames 段重命名列。

Answer 2

因为我还没有看到你的完整 json 文件，我不知道这是否会完成你需要它做的一切，但根据你的测试数据，这确实创建了一个 pandas df 使用 json 格式字典中的数据。

import pandas as pd

test_dict={
    "count": 2,
    "name": "report1",
    "columnNames": [
        "Incident ID",
        "Status",
        "Subject"
    ],
    "rows": [
        [
            "2460636",
            "Resolved",
            "login help"
        ],
        [
            "2460637",
            "Unresolved",
            "email help"
        ]
        ]
    }

def make_df(json_dat): #use this function every time you want to make new df from json
    indicent_id=[]
    status=[]
    subject=[]

    for row in json_dat.get('rows'): #loop for all rows in df and append data to lists
        indicent_id.append(row[0])
        status.append(row[1])
        subject.append(row[2])

    #create pandas df from data
    df=pd.DataFrame([indicent_id, status, subject],
                    index=['indicent_id', 'status', 'subject']).T

    return df

#you can call the function now every time you need to make a df, potentially generating a dictionary of dfs based on the name of the json files 

df1= make_df(test_dict)

Answer 3

使用 `pd.json_normalize`: 解压您的 `json` 文件

pd.json_normalize

import pandas as pd
import json

with open('test.json') as f:
    data = json.load(f)

json_data = pd.json_normalize(data)

输出：

                       columnNames  count  name                            rows
0   [Incident ID, Status, Subject]  2   report1 [[2460636, Resolved, login help], [2460637, Un...

解压rows:

df_rows = pd.json_normalize(data, record_path=['rows'], meta=['name'])
df_rows.rename({0: data['columnNames'][0],
                1: data['columnNames'][1],
                2: data['columnNames'][2]}, axis=1, inplace=True)

df_row 的输出：

     Incident ID        Status       Subject       name
0        2460636      Resolved    login help    report1
1        2460637    Unresolved    email help    report1

json 格式不是特别好，像下面这样的东西会更容易解压：

{
    "count": 2,
    "name": "report1",
    "rows": [{
            "Incident ID": "2460636",
            "Status": "Resolved",
            "Subject": "login help"
        }, {
            "Incident ID": "2460637",
            "Status": "Unresolved",
            "Subject": "email help"
        }
    ]
}

如何将多个 json 部分（全部在一个文件中）导入到 python/pandas

How to import multiple json sections (all in one file) in to python/pandas

python

json

python-3.x

pandas

json-normalize

使用 `pd.json_normalize`: 解压您的 `json` 文件

如何将多个 json 部分（全部在一个文件中）导入到 python/pandas

How to import multiple json sections (all in one file) in to python/pandas

python

json

python-3.x

pandas

json-normalize

使用 pd.json_normalize: 解压您的 json 文件

使用 `pd.json_normalize`: 解压您的 `json` 文件