爆炸功能

Explode function

这是我在这里的第一个问题。我在这里和整个网络上搜索过,但似乎无法找到问题的答案。我正在尝试将 json 文件中的列表分解为多列和多行。到目前为止我所做的一切都被证明是不成功的。

我正在对目录中的多个 json 文件执行此操作,以便像这样在数据框中打印出来。 目标:

did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc

相反,我在我的数据框中得到了这个:

did Version Nodes rds fusage
did Version Nodes rds everything in fusage
did Version Nodes rds everything in fusage
did Version Nodes rds everything in fusage

我正在使用的 json 示例。 json结构不会改变

{
  "did": "123456789",
  "mId": "1a2b3cjsks",
  "timestamp": "2021-11-26T11:10:58.322000",
  "beat": {
    "did": "123456789",
    "collectionTime": "2010-05-26 11:10:58.004783+00",
    "Nodes": 6,
    "Version": "v1.4.6-2",
    "rds": "0.00B",
    "fusage": [
      {
        "time": "2010-05-25",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "time": "2010-05-19",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "t": "2010-05-23",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "time": "2010-05-23",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      }
    ]
  }
}

我的最终目标是将数据帧导出为 csv 以便被摄取。感谢大家的帮助。

使用 python 3.8.10 & pandas 1.3.4

python 下面的代码

import csv
import glob
import json
import os
import pandas as pd

tempdir = '/dir/to/files/json_temp'
json_files = os.path.join(tempdir, '*.json')
file_list = glob.glob(json_files)
dfs = []

for file in file_list:
    with open(file) as f:
        data = pd.json_normalize(json.loads(f.read()))
        dfs.append(data)
        df = pd.concat(dfs, ignore_index=True)
        df.explode('fusage')
        print(df)

如果您要使用爆炸函数,那么在包含 fusage 列表 (beat.fusage) 的列上应用 pd.Series 以获得每个列表的系列项目。

/dir/to/files
├── example-v1.4.6-2.json
└── example-v2.2.2-2.json
...
for file in file_list:
    with open(file) as f:
        data = pd.json_normalize(json.loads(f.read()))
        dfs.append(data)

df = pd.concat(dfs, ignore_index=True)
fusage_list = df.explode('beat.fusage')['beat.fusage'].apply(pd.Series)
df = pd.concat([df, fusage_list], axis=1)

# show desired columns
df = df[['did', 'beat.Version', 'beat.Nodes', 'beat.rds', 'time', 'c', 'sc', 'f', 'uc']]
print(df)

来自 df

的输出
         did beat.Version  beat.Nodes beat.rds        time       c      sc       f   uc
0  123456789     v1.4.6-2           6    0.00B  2010-05-25  string  string  string  int
0  123456789     v1.4.6-2           6    0.00B  2010-05-19  string  string  string  int
0  123456789     v1.4.6-2           6    0.00B         NaN  string  string  string  int
0  123456789     v1.4.6-2           6    0.00B  2010-05-23  string  string  string  int
1  123777777     v2.2.2-2           4    0.00B  2010-05-25  string  string  string  int
1  123777777     v2.2.2-2           4    0.00B  2010-05-19  string  string  string  int
1  123777777     v2.2.2-2           4    0.00B         NaN  string  string  string  int
1  123777777     v2.2.2-2           4    0.00B  2010-05-23  string  string  string  int