来自不同长度列表列表的数据框
Dataframe from list of lists with different length
如何将如下所示的列表转换为具有 5 列的 Dataframe?
[[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
Blockquote
标准化原始数据并创建 df
import pandas as pd
data = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
lst = []
for entry in data:
for sub in entry:
lst.append(sub)
df = pd.DataFrame(data=lst, columns=['A', 'B', 'C', 'D', 'E'])
print(df)
输出
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
4 05/08/2019 C M.DIASBRANCOON NM 39,40 100
5 25/03/2015 C CETIPON NM H 31,17 10
6 25/03/2015 C CETIPON NM H 31,17 9
7 25/03/2015 C CETIPON NM H 31,17 10
8 25/03/2015 C CETIPON NM H 31,17 10
9 25/03/2015 C CETIPON NM H 31,17 10
10 25/03/2015 C CETIPON NM H 31,17 10
11 25/03/2015 C CETIPON NM H 31,17 10
12 25/03/2015 C CETIPON NM H 31,17 10
13 25/03/2015 C CETIPON NM H 31,17 10
14 25/03/2015 C CETIPON NM H 31,17 10
15 25/03/2015 C WEGON EJ NM H 30,88 99
16 16/12/2019 C IRBBRASIL REON NM 36,72 100
17 16/12/2019 C ITAUUNIBANCOON EJ N1 31,45 200
只需将列表展平以获取行,然后转换为数据框 -
import pandas as pd
flat = [row for item in l for row in item]
df = pd.DataFrame(flat, columns=['A','B','C','D','E'])
print(df)
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
4 05/08/2019 C M.DIASBRANCOON NM 39,40 100
5 25/03/2015 C CETIPON NM H 31,17 10
6 25/03/2015 C CETIPON NM H 31,17 9
7 25/03/2015 C CETIPON NM H 31,17 10
8 25/03/2015 C CETIPON NM H 31,17 10
9 25/03/2015 C CETIPON NM H 31,17 10
10 25/03/2015 C CETIPON NM H 31,17 10
11 25/03/2015 C CETIPON NM H 31,17 10
12 25/03/2015 C CETIPON NM H 31,17 10
13 25/03/2015 C CETIPON NM H 31,17 10
14 25/03/2015 C CETIPON NM H 31,17 10
15 25/03/2015 C WEGON EJ NM H 30,88 99
16 16/12/2019 C IRBBRASIL REON NM 36,72 100
17 16/12/2019 C ITAUUNIBANCOON EJ N1 31,45 200
通过使用 pandas explode 展开记录,然后创建数据框
import pandas as pd
lst = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
df = pd.DataFrame(list(pd.Series(lst).explode()))
print(df)
这是另一个解决方案,使用 chain.from_iterable
import pandas as pd
from itertools import chain
pd.DataFrame(chain.from_iterable(data), columns=list("ABCDE"))
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
...
如何将如下所示的列表转换为具有 5 列的 Dataframe?
[[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
Blockquote
标准化原始数据并创建 df
import pandas as pd
data = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
lst = []
for entry in data:
for sub in entry:
lst.append(sub)
df = pd.DataFrame(data=lst, columns=['A', 'B', 'C', 'D', 'E'])
print(df)
输出
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
4 05/08/2019 C M.DIASBRANCOON NM 39,40 100
5 25/03/2015 C CETIPON NM H 31,17 10
6 25/03/2015 C CETIPON NM H 31,17 9
7 25/03/2015 C CETIPON NM H 31,17 10
8 25/03/2015 C CETIPON NM H 31,17 10
9 25/03/2015 C CETIPON NM H 31,17 10
10 25/03/2015 C CETIPON NM H 31,17 10
11 25/03/2015 C CETIPON NM H 31,17 10
12 25/03/2015 C CETIPON NM H 31,17 10
13 25/03/2015 C CETIPON NM H 31,17 10
14 25/03/2015 C CETIPON NM H 31,17 10
15 25/03/2015 C WEGON EJ NM H 30,88 99
16 16/12/2019 C IRBBRASIL REON NM 36,72 100
17 16/12/2019 C ITAUUNIBANCOON EJ N1 31,45 200
只需将列表展平以获取行,然后转换为数据框 -
import pandas as pd
flat = [row for item in l for row in item]
df = pd.DataFrame(flat, columns=['A','B','C','D','E'])
print(df)
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
4 05/08/2019 C M.DIASBRANCOON NM 39,40 100
5 25/03/2015 C CETIPON NM H 31,17 10
6 25/03/2015 C CETIPON NM H 31,17 9
7 25/03/2015 C CETIPON NM H 31,17 10
8 25/03/2015 C CETIPON NM H 31,17 10
9 25/03/2015 C CETIPON NM H 31,17 10
10 25/03/2015 C CETIPON NM H 31,17 10
11 25/03/2015 C CETIPON NM H 31,17 10
12 25/03/2015 C CETIPON NM H 31,17 10
13 25/03/2015 C CETIPON NM H 31,17 10
14 25/03/2015 C CETIPON NM H 31,17 10
15 25/03/2015 C WEGON EJ NM H 30,88 99
16 16/12/2019 C IRBBRASIL REON NM 36,72 100
17 16/12/2019 C ITAUUNIBANCOON EJ N1 31,45 200
通过使用 pandas explode 展开记录,然后创建数据框
import pandas as pd
lst = [[['30/09/2015', 'C', 'ETERNITON NM H', '1,73', '400']],
[['05/08/2019', 'C', 'CIELOON NM', '7,75', '500'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100'],
['05/08/2019', 'C', 'M.DIASBRANCOON NM', '39,40', '100']],
[['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '9'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'CETIPON NM H', '31,17', '10'],
['25/03/2015', 'C', 'WEGON EJ NM H', '30,88', '99']],
[['16/12/2019', 'C', 'IRBBRASIL REON NM', '36,72', '100'],
['16/12/2019', 'C', 'ITAUUNIBANCOON EJ N1', '31,45', '200']]]
df = pd.DataFrame(list(pd.Series(lst).explode()))
print(df)
这是另一个解决方案,使用 chain.from_iterable
import pandas as pd
from itertools import chain
pd.DataFrame(chain.from_iterable(data), columns=list("ABCDE"))
A B C D E
0 30/09/2015 C ETERNITON NM H 1,73 400
1 05/08/2019 C CIELOON NM 7,75 500
2 05/08/2019 C M.DIASBRANCOON NM 39,40 100
3 05/08/2019 C M.DIASBRANCOON NM 39,40 100
...