来自不同长度列表列表的数据框

Dataframe from list of lists with different length

如何将如下所示的列表转换为具有 5 列的 Dataframe?

[[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
 [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
 [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
 [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
  ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]

Blockquote

标准化原始数据并创建 df

import pandas as pd

data = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
        [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
         ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
        [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
         ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
        [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
         ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
lst = []
for entry in data:
    for sub in entry:
        lst.append(sub)
df = pd.DataFrame(data=lst, columns=['A', 'B', 'C', 'D', 'E'])
print(df)

输出

             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
4   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
5   25/03/2015  C          CETIPON NM H  31,17   10
6   25/03/2015  C          CETIPON NM H  31,17    9
7   25/03/2015  C          CETIPON NM H  31,17   10
8   25/03/2015  C          CETIPON NM H  31,17   10
9   25/03/2015  C          CETIPON NM H  31,17   10
10  25/03/2015  C          CETIPON NM H  31,17   10
11  25/03/2015  C          CETIPON NM H  31,17   10
12  25/03/2015  C          CETIPON NM H  31,17   10
13  25/03/2015  C          CETIPON NM H  31,17   10
14  25/03/2015  C          CETIPON NM H  31,17   10
15  25/03/2015  C         WEGON EJ NM H  30,88   99
16  16/12/2019  C     IRBBRASIL REON NM  36,72  100
17  16/12/2019  C  ITAUUNIBANCOON EJ N1  31,45  200

只需将列表展平以获取行,然后转换为数据框 -

import pandas as pd

flat = [row for item in l for row in item]
df = pd.DataFrame(flat, columns=['A','B','C','D','E'])
print(df)
             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
4   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
5   25/03/2015  C          CETIPON NM H  31,17   10
6   25/03/2015  C          CETIPON NM H  31,17    9
7   25/03/2015  C          CETIPON NM H  31,17   10
8   25/03/2015  C          CETIPON NM H  31,17   10
9   25/03/2015  C          CETIPON NM H  31,17   10
10  25/03/2015  C          CETIPON NM H  31,17   10
11  25/03/2015  C          CETIPON NM H  31,17   10
12  25/03/2015  C          CETIPON NM H  31,17   10
13  25/03/2015  C          CETIPON NM H  31,17   10
14  25/03/2015  C          CETIPON NM H  31,17   10
15  25/03/2015  C         WEGON EJ NM H  30,88   99
16  16/12/2019  C     IRBBRASIL REON NM  36,72  100
17  16/12/2019  C  ITAUUNIBANCOON EJ N1  31,45  200

通过使用 pandas explode 展开记录,然后创建数据框

import pandas as pd
lst = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
 [['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
  ['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
 [['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
  ['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
 [['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
  ['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
df = pd.DataFrame(list(pd.Series(lst).explode()))
print(df)

这是另一个解决方案,使用 chain.from_iterable

import pandas as pd
from itertools import chain

pd.DataFrame(chain.from_iterable(data), columns=list("ABCDE"))

             A  B                     C      D    E
0   30/09/2015  C        ETERNITON NM H   1,73  400
1   05/08/2019  C            CIELOON NM   7,75  500
2   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
3   05/08/2019  C     M.DIASBRANCOON NM  39,40  100
    ...