Python:如何从数据框中查找数据并根据列匹配在另一个数据框中相乘
Python: How to row look up data from a dataframe and multiply in another dataframe based on columns match
我有一个数据框 df1,它具有以下结构
A B C D
10 9 9 4
5 4 4 9
5 10 6 4
9 9 9 4
4 7 10 7
9 7 4 8
5 7 8 9
10 4 10 6
我有另一个数据帧 df2 如下,
name factor
A 2
B 3
C 4
D 8
如何从 df2 中查找 A、B、C、D 的值并与 df1 [A、B、C、D] 相乘并得到 df3,例如,
A B C D
20 27 36 32
10 12 16 72
10 30 24 32
18 27 36 32
8 21 40 56
18 21 16 64
10 21 32 72
20 12 40 48
使用mul
for multiple by Series
created by set_index
:
df3 = df1.mul(df2.set_index('name')['factor'])
print (df3)
A B C D
0 20 27 36 32
1 10 12 16 72
2 10 30 24 32
3 18 27 36 32
4 8 21 40 56
5 18 21 16 64
6 10 21 32 72
7 20 12 40 48
详情:
print (df2.set_index('name')['factor'])
name
A 2
B 3
C 4
D 8
Name: factor, dtype: int64
编辑:
如果缺少某些类别,可以使用 fillna
,谢谢 :
s = df2.set_index('name').drop('D')['factor']
print (s)
name
A 2
B 3
C 4
Name: factor, dtype: int64
df3 = df1.mul(s).fillna(df1)
print (df3)
A B C D
0 20.0 27.0 36.0 4.0
1 10.0 12.0 16.0 9.0
2 10.0 30.0 24.0 4.0
3 18.0 27.0 36.0 4.0
4 8.0 21.0 40.0 7.0
5 18.0 21.0 16.0 8.0
6 10.0 21.0 32.0 9.0
7 20.0 12.0 40.0 6.0
编辑 1:
如果要比较 DataFrame
和 Series
:
s = df2.set_index('name')['factor']
print (s)
name
A 2
B 3
C 4
D 8
Name: factor, dtype: int64
df1['A'] = (np.log(df1['A']) * s['A']) ** 3
print (df1)
A B C D
0 97.664572 9 9 4
1 33.351293 4 4 9
2 33.351293 10 6 4
3 84.862013 9 9 4
4 21.313578 7 10 7
5 84.862013 7 4 8
6 33.351293 7 8 9
7 97.664572 4 10 6
对于所有列:
df1 = (np.log(df1) * s) ** 3
print (df1)
A B C D
0 97.664572 286.409295 678.896108 1364.068975
1 33.351293 71.933325 170.508622 5431.168861
2 33.351293 329.617932 368.145163 1364.068975
3 84.862013 286.409295 678.896108 1364.068975
4 21.313578 198.944581 781.316579 3772.578718
5 84.862013 198.944581 170.508622 4603.732789
6 33.351293 198.944581 575.466599 5431.168861
7 97.664572 71.933325 781.316579 2945.161306
我有一个数据框 df1,它具有以下结构
A B C D
10 9 9 4
5 4 4 9
5 10 6 4
9 9 9 4
4 7 10 7
9 7 4 8
5 7 8 9
10 4 10 6
我有另一个数据帧 df2 如下,
name factor
A 2
B 3
C 4
D 8
如何从 df2 中查找 A、B、C、D 的值并与 df1 [A、B、C、D] 相乘并得到 df3,例如,
A B C D
20 27 36 32
10 12 16 72
10 30 24 32
18 27 36 32
8 21 40 56
18 21 16 64
10 21 32 72
20 12 40 48
使用mul
for multiple by Series
created by set_index
:
df3 = df1.mul(df2.set_index('name')['factor'])
print (df3)
A B C D
0 20 27 36 32
1 10 12 16 72
2 10 30 24 32
3 18 27 36 32
4 8 21 40 56
5 18 21 16 64
6 10 21 32 72
7 20 12 40 48
详情:
print (df2.set_index('name')['factor'])
name
A 2
B 3
C 4
D 8
Name: factor, dtype: int64
编辑:
如果缺少某些类别,可以使用 fillna
,谢谢
s = df2.set_index('name').drop('D')['factor']
print (s)
name
A 2
B 3
C 4
Name: factor, dtype: int64
df3 = df1.mul(s).fillna(df1)
print (df3)
A B C D
0 20.0 27.0 36.0 4.0
1 10.0 12.0 16.0 9.0
2 10.0 30.0 24.0 4.0
3 18.0 27.0 36.0 4.0
4 8.0 21.0 40.0 7.0
5 18.0 21.0 16.0 8.0
6 10.0 21.0 32.0 9.0
7 20.0 12.0 40.0 6.0
编辑 1:
如果要比较 DataFrame
和 Series
:
s = df2.set_index('name')['factor']
print (s)
name
A 2
B 3
C 4
D 8
Name: factor, dtype: int64
df1['A'] = (np.log(df1['A']) * s['A']) ** 3
print (df1)
A B C D
0 97.664572 9 9 4
1 33.351293 4 4 9
2 33.351293 10 6 4
3 84.862013 9 9 4
4 21.313578 7 10 7
5 84.862013 7 4 8
6 33.351293 7 8 9
7 97.664572 4 10 6
对于所有列:
df1 = (np.log(df1) * s) ** 3
print (df1)
A B C D
0 97.664572 286.409295 678.896108 1364.068975
1 33.351293 71.933325 170.508622 5431.168861
2 33.351293 329.617932 368.145163 1364.068975
3 84.862013 286.409295 678.896108 1364.068975
4 21.313578 198.944581 781.316579 3772.578718
5 84.862013 198.944581 170.508622 4603.732789
6 33.351293 198.944581 575.466599 5431.168861
7 97.664572 71.933325 781.316579 2945.161306