如何将 CSV 转换为 Python 中的嵌套 JSON

How to convert CSV to nested JSON in Python

我有一个 csv 文件,格式如下:

a b c d e
1 2 3 4 5
9 8 7 6 5

我想将此 csv 文件转换为嵌套 JSON 格式,如下所示:

[{"a": 1,
"Purchase" : {
              "b": 2,
              "c": 3
              "d": 4},
"Sales": {
           "d": 4,
           "e": 5}},
{"a": 9,
"Purchase" : {
              "b": 8,
              "c": 7},
"Sales": {
           "d": 6,
           "e": 5}}]

我该如何进行这种转变?我似乎无法弄清楚如何在 Python 中进行这种转换。 请记住,这只是示例 table,我的真实 table 有多个列和数千行,因此手动操作不经济。

到目前为止我已经尝试过这个代码:

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        r["purchase"] = {"b": r['b'],
                        "c": r['c'],
                        }

我在这里尝试添加我需要的字典的另一个键值对,但没有成功。我也会用 Sales 做同样的事情,但这只是示例。

一个简单的方法是添加更多列;然后在 pandas:

中使用 to_json 方法
import pandas as pd
df = pd.read_csv('your_file.csv')
df['Purchase'] = df[['b','c','d']].to_dict('records')
df['Sales'] = df[['d','e']].to_dict('records')
out = df[['a', 'Purchase', 'Sales']].to_json(orient='records', indent=4)

输出:

[
    {
        "a":1,
        "Purchase":{
            "b":2,
            "c":3,
            "d":4
        },
        "Sales":{
            "d":4,
            "e":5
        }
    },
    {
        "a":9,
        "Purchase":{
            "b":8,
            "c":7,
            "d":6
        },
        "Sales":{
            "d":6,
            "e":5
        }
    }
]

您不需要为此使用任何库,只需指定正确的方言,例如tab-separated:

import csv
import json


with open("tmp4.csv", "r") as f:
    result = [
        {
            "a": row["a"],
            "Purchase": {
                "b": row["b"],
                "c": row["c"],
            },
            "Sales": {
                "d": row["d"],
                "e": row["e"],
            },
        }
        for row in csv.DictReader(f, dialect='excel-tab')
    ]
assert (
    json.dumps(result)
    == '[{"a": "1", "Purchase": {"b": "2", "c": "3"}, "Sales": {"d": "4", "e": "5"}}, {"a": "9", "Purchase": {"b": "8", "c": "7"}, "Sales": {"d": "6", "e": "5"}}]'
)

当您执行 r["purchase"] = {"b": ...} 时,您将字典分配回 per-line 对象 r,该对象在循环结束时被丢弃。相反,为每个记录创建一个新字典并将其附加到列表中。喜欢:

result = []
with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        result.append({
            "a": r["a"],
            "Purchase" : {
                "b": r["b"],
                "c": r["c"],
                "d": r["d"],
            },
            "Sales": {
                "d": r["d"],
                "e": r["e"],
            },
        })

并使用列表理解来创建 result:

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    result = [{
        "a": r["a"],
        "Purchase" : {
            "b": r["b"],
            "c": r["c"],
            "d": r["d"],
        },
        "Sales": {
            "d": r["d"],
            "e": r["e"],
        },
    } for r in reader]