Python Pandas 加入索引值
Python Pandas Join with Index value
我想加入 2 个 DataFrame,例如:
数据框 1:
A 1
B 2
C 3
D 4
DataFrame2:
A 1
A 3
B 4
B 3
B 7
C 4
D 6
D 8
结果应如下所示:
A 1 1
3
B 2 4
3
7
C 3 4
D 4 6
8
我试过 join merge 和 concat,但没有任何效果,你能帮帮我吗?
一种方法是这样的:
df_out = pd.concat([df1,df2])
df_out.set_index([0,df_out.groupby([0]).cumcount()])[1].unstack()
输出:
0 1 2 3
0
A 1.0 1.0 3.0 NaN
B 2.0 4.0 3.0 7.0
C 3.0 4.0 NaN NaN
D 4.0 6.0 8.0 NaN
编辑更新以匹配预期输出:
df_out = pd.concat([df1,df2])
df_out = df_out.set_index([0,df_out.groupby([0]).cumcount()])[1]
df_out.sort_index()
输出:
A 0 1
1 1
2 3
B 0 2
1 4
2 3
3 7
C 0 3
1 4
D 0 4
1 6
2 8
Name: 1, dtype: int64
编辑更新一点...
df_out = pd.concat([df1,df2])
df_out = df_out.set_index([0,df_out.groupby([0]).cumcount()])[1]
df_out = df_out.sort_index().to_frame()
df_out = df_out.reset_index().drop('level_1', axis=1)
df_out[0] = df_out[0].mask(df_out[0].duplicated()).fillna('')
print(df_out)
输出:
0 1
0 A 1
1 1
2 3
3 B 2
4 4
5 3
6 7
7 C 3
8 4
9 D 4
10 6
11 8
你可以这样做:
import pandas as pd
data1 = [['A', 1],
['B', 2],
['C', 3],
['D', 4]]
data2 = [['A', 1],
['A', 3],
['B', 4],
['B', 3],
['B', 7],
['C', 4],
['D', 6],
['D', 8]]
df1 = pd.DataFrame(data=data1, columns=['c1', 'c2'])
df2 = pd.DataFrame(data=data2, columns=['c1', 'c2'])
result = pd.concat([df1, df2]).groupby('c1')['c2'].apply(list)
print(result)
输出
c1
A [1, 1, 3]
B [2, 4, 3, 7]
C [3, 4]
D [4, 6, 8]
Name: c2, dtype: object
或没有列:
df1 = pd.DataFrame(data=data1)
df2 = pd.DataFrame(data=data2)
result = pd.concat([df1, df2]).groupby(0)[1].apply(list)
print(result)
输出
0
A [1, 1, 3]
B [2, 4, 3, 7]
C [3, 4]
D [4, 6, 8]
Name: 1, dtype: object
我想加入 2 个 DataFrame,例如: 数据框 1:
A 1
B 2
C 3
D 4
DataFrame2:
A 1
A 3
B 4
B 3
B 7
C 4
D 6
D 8
结果应如下所示:
A 1 1
3
B 2 4
3
7
C 3 4
D 4 6
8
我试过 join merge 和 concat,但没有任何效果,你能帮帮我吗?
一种方法是这样的:
df_out = pd.concat([df1,df2])
df_out.set_index([0,df_out.groupby([0]).cumcount()])[1].unstack()
输出:
0 1 2 3
0
A 1.0 1.0 3.0 NaN
B 2.0 4.0 3.0 7.0
C 3.0 4.0 NaN NaN
D 4.0 6.0 8.0 NaN
编辑更新以匹配预期输出:
df_out = pd.concat([df1,df2])
df_out = df_out.set_index([0,df_out.groupby([0]).cumcount()])[1]
df_out.sort_index()
输出:
A 0 1
1 1
2 3
B 0 2
1 4
2 3
3 7
C 0 3
1 4
D 0 4
1 6
2 8
Name: 1, dtype: int64
编辑更新一点...
df_out = pd.concat([df1,df2])
df_out = df_out.set_index([0,df_out.groupby([0]).cumcount()])[1]
df_out = df_out.sort_index().to_frame()
df_out = df_out.reset_index().drop('level_1', axis=1)
df_out[0] = df_out[0].mask(df_out[0].duplicated()).fillna('')
print(df_out)
输出:
0 1
0 A 1
1 1
2 3
3 B 2
4 4
5 3
6 7
7 C 3
8 4
9 D 4
10 6
11 8
你可以这样做:
import pandas as pd
data1 = [['A', 1],
['B', 2],
['C', 3],
['D', 4]]
data2 = [['A', 1],
['A', 3],
['B', 4],
['B', 3],
['B', 7],
['C', 4],
['D', 6],
['D', 8]]
df1 = pd.DataFrame(data=data1, columns=['c1', 'c2'])
df2 = pd.DataFrame(data=data2, columns=['c1', 'c2'])
result = pd.concat([df1, df2]).groupby('c1')['c2'].apply(list)
print(result)
输出
c1
A [1, 1, 3]
B [2, 4, 3, 7]
C [3, 4]
D [4, 6, 8]
Name: c2, dtype: object
或没有列:
df1 = pd.DataFrame(data=data1)
df2 = pd.DataFrame(data=data2)
result = pd.concat([df1, df2]).groupby(0)[1].apply(list)
print(result)
输出
0
A [1, 1, 3]
B [2, 4, 3, 7]
C [3, 4]
D [4, 6, 8]
Name: 1, dtype: object