python DataFrame 中具有最大值的行内的相对权重

Question

我尝试计算每行中 df1 的相对权重，最大值为 0.5。到目前为止，我能够计算 df2 中的相对权重，但没有上限。这是一个简单的例子：

import pandas as pd
df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'ID1':[0,0,2,1,1], 
    'ID2':[1,3,1,1,2],
    'ID3':[1,0,0,1,0],
    'ID4':[1,1,7,1,0],
    'ID5':[0,6,0,0,1]})

df1:
    Dates       ID1 ID2 ID3 ID4 ID5
0   2021-01-01  0   1   1   1   0
1   2021-01-02  0   3   0   1   6
2   2021-01-03  2   1   0   7   0
3   2021-01-04  1   1   1   1   0
4   2021-01-05  1   2   0   0   1

df1 = df1.set_index('Dates').T
df2 = df1.transform(lambda x: x/sum(x)).T
df2.round(2)

df2:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.60
2021-01-03  0.20    0.10    0.00    0.70    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

我尝试得到 df3 相对权重最大值为 0.5。

df3:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.50
2021-01-03  0.20    0.10    0.00    0.50    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

当我使用以下调整函数时，出现错误：Transform function failed

df1.transform(lambda x: x/sum(x) if x/sum(x) < 0.5 else 0.5).T

非常感谢！

Answer 1

您可以使用 apply(...,axis=1) 和 clip 最大值为 0.5 的值（这假设 Date 始终是第一列 - 或者，我们可以将其设置为索引） :

df1[df1.columns[1:]] = df1[df1.columns[1:]].apply(lambda x:x/sum(x), axis=1).clip(upper=0.5)

Answer 2

for col in df1.columns:
   df1[col] = df1[col].apply(lambda x: x/sum(df1[col]) if x/sum(df1[col]) < 0.5 else 0.5)

玩得开心！

Answer 3

我们可以直接操作行，而不是对每个元素进行转置和应用转换。

df3 = df1.copy().set_index('Dates')
df3 = df3.div(df3.sum(axis=1), axis=0).clip(upper=0.5).round(2).reset_index()

输出：

        Dates   ID1   ID2   ID3   ID4   ID5
0  2021-01-01  0.00  0.33  0.33  0.33  0.00
1  2021-01-02  0.00  0.30  0.00  0.10  0.50
2  2021-01-03  0.20  0.10  0.00  0.50  0.00
3  2021-01-04  0.25  0.25  0.25  0.25  0.00
4  2021-01-05  0.25  0.50  0.00  0.00  0.25

这对你有用吗？

python DataFrame 中具有最大值的行内的相对权重

Relative weight within row with max value in python DataFrame

python

lambda

function

dataframe

pandas