Pandas:合并两个Dataframe,添加列并删除重复行
Pandas: Merging two Dataframe, add columns and delete duplicate rows
我有两个数据框,比方说,material 一月和二月的库存报告:
1 月报告
code description qty_jan amount_jan
WP1 Wooden Part-1 1000 50000
MP1 Metal Part-1 500 5000
GL1 Glass-1 100 2500
2 月报告
code description qty_feb amount_feb
WP1 Wooden Part-1 1200 60000
MP2 Metal Part-2 300 3000
GL1 Glass-1 50 1250
GL2 Glass-2 200 5000
为了监控每个material盘点的进度,我想合并两个报表,如下:
code description qty_jan amount_jan qty_feb amount_feb
WP1 Wooden Part-1 1000 50000 1200 60000
MP1 Metal Part-1 500 5000 0 0
MP2 Metal Part-2 0 0 300 3000
GL1 Glass-1 100 2500 50 1250
GL2 Glass-2 0 0 200 5000
注意:未在报告中列出的材料被视为零库存。
如何合并这两个报表?
您可以在 DataFrame.merge
中使用外连接,然后将缺失值替换为 0
:
df = df1.merge(df2, on=['code','description'], how='outer').fillna(0)
print (df)
v code description qty_jan amount_jan qty_feb amount_feb
0 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0
1 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
2 GL1 Glass-1 100.0 2500.0 50.0 1250.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 GL2 Glass-2 0.0 0.0 200.0 5000.0
concat
的另一个想法:
df = pd.concat([df1.set_index(['code','description']),
df2.set_index(['code','description'])], axis=1).fillna(0).reset_index()
print (df)
code description qty_jan amount_jan qty_feb amount_feb
0 GL1 Glass-1 100.0 2500.0 50.0 1250.0
1 GL2 Glass-2 0.0 0.0 200.0 5000.0
2 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0
我有两个数据框,比方说,material 一月和二月的库存报告:
1 月报告
code description qty_jan amount_jan
WP1 Wooden Part-1 1000 50000
MP1 Metal Part-1 500 5000
GL1 Glass-1 100 2500
2 月报告
code description qty_feb amount_feb
WP1 Wooden Part-1 1200 60000
MP2 Metal Part-2 300 3000
GL1 Glass-1 50 1250
GL2 Glass-2 200 5000
为了监控每个material盘点的进度,我想合并两个报表,如下:
code description qty_jan amount_jan qty_feb amount_feb
WP1 Wooden Part-1 1000 50000 1200 60000
MP1 Metal Part-1 500 5000 0 0
MP2 Metal Part-2 0 0 300 3000
GL1 Glass-1 100 2500 50 1250
GL2 Glass-2 0 0 200 5000
注意:未在报告中列出的材料被视为零库存。
如何合并这两个报表?
您可以在 DataFrame.merge
中使用外连接,然后将缺失值替换为 0
:
df = df1.merge(df2, on=['code','description'], how='outer').fillna(0)
print (df)
v code description qty_jan amount_jan qty_feb amount_feb
0 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0
1 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
2 GL1 Glass-1 100.0 2500.0 50.0 1250.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 GL2 Glass-2 0.0 0.0 200.0 5000.0
concat
的另一个想法:
df = pd.concat([df1.set_index(['code','description']),
df2.set_index(['code','description'])], axis=1).fillna(0).reset_index()
print (df)
code description qty_jan amount_jan qty_feb amount_feb
0 GL1 Glass-1 100.0 2500.0 50.0 1250.0
1 GL2 Glass-2 0.0 0.0 200.0 5000.0
2 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0