pandas 计算中的最小值
minimum of values in pandas calculation
我想将我之前编写的一段 python 代码转换为 pandas,这样它就可以直接在 dataframe 中完成,而不是乱用 csv 文件。
我想根据多个值(属性)计算设备的健康状况。
假设我有以下 df:
A B C
0 7 NaN 8
1 3 3 5
2 8 1 7
3 NaN 0 3
4 8 2 7
我想计算健康度如下:
note that the def attributeHealth
is still in the old form and not
converted to pandas since that is the part where I get stuck and this
is the code that was working with the csv library
df['Health'] = attributeHealth(df['A'], 10, 0.4) * attributeHealth(df['B'], 5, 0.5) * attributeHealth(df['C'],2 ,0.8) * 100
def attributeHealth(name, weight, limit):
if row[name] != 'NULL':
attrHealth = 1 - min(int(row[name])*weight/100, limit)
else:
attrHealth = 1
return attrHealth
我试过先将它缩减为单个属性,但似乎我不能以这种方式使用 min():
inputDF['health'] = 1 - min(inputDF['A']* 2/100, 0.7)
提前致谢!
您可以为此使用 DataFrame.apply:
inputDF['health'] = inputDF.apply(lambda row: 1 - min(row['A']* 2/100, 0.7),
axis=1)
apply
为每一行执行给定的可调用函数(在本例中为 lambda),returns 结果系列。
您可以使用 numpy.minimum
然后替换缺失值 reindex
:
inputDF['health'] = ((1 - np.minimum(inputDF['A'].dropna() * 2/100, 0.7))
.reindex(inputDF.index, fill_value=1))
类似的解决方案:
inputDF['health'] = 1 - np.minimum(inputDF['A'].dropna() * 2/100, 0.7)
inputDF['health'] = inputDF['health'].fillna(1)
print (inputDF)
A B C health
0 7.0 NaN 8 0.86
1 3.0 3.0 5 0.94
2 8.0 1.0 7 0.84
3 NaN 0.0 3 1.00
4 8.0 2.0 7 0.84
总计:
def attributeHealth(col, weight, limit):
#return Series (column)
return ((1 - np.minimum(col.dropna() * weight/100, limit))
.reindex(col.index, fill_value=1))
a = attributeHealth(inputDF['A'], 10, 0.4)
b = attributeHealth(inputDF['B'], 5, 0.5)
c = attributeHealth(inputDF['C'], 2, 0.8)
inputDF['Health'] = (a * b * c) * 100
print (inputDF)
A B C Health
0 7.0 NaN 8 50.40
1 3.0 3.0 5 53.55
2 8.0 1.0 7 49.02
3 NaN 0.0 3 94.00
4 8.0 2.0 7 46.44
我想将我之前编写的一段 python 代码转换为 pandas,这样它就可以直接在 dataframe 中完成,而不是乱用 csv 文件。
我想根据多个值(属性)计算设备的健康状况。 假设我有以下 df:
A B C
0 7 NaN 8
1 3 3 5
2 8 1 7
3 NaN 0 3
4 8 2 7
我想计算健康度如下:
note that the
def attributeHealth
is still in the old form and not converted to pandas since that is the part where I get stuck and this is the code that was working with the csv library
df['Health'] = attributeHealth(df['A'], 10, 0.4) * attributeHealth(df['B'], 5, 0.5) * attributeHealth(df['C'],2 ,0.8) * 100
def attributeHealth(name, weight, limit):
if row[name] != 'NULL':
attrHealth = 1 - min(int(row[name])*weight/100, limit)
else:
attrHealth = 1
return attrHealth
我试过先将它缩减为单个属性,但似乎我不能以这种方式使用 min():
inputDF['health'] = 1 - min(inputDF['A']* 2/100, 0.7)
提前致谢!
您可以为此使用 DataFrame.apply:
inputDF['health'] = inputDF.apply(lambda row: 1 - min(row['A']* 2/100, 0.7),
axis=1)
apply
为每一行执行给定的可调用函数(在本例中为 lambda),returns 结果系列。
您可以使用 numpy.minimum
然后替换缺失值 reindex
:
inputDF['health'] = ((1 - np.minimum(inputDF['A'].dropna() * 2/100, 0.7))
.reindex(inputDF.index, fill_value=1))
类似的解决方案:
inputDF['health'] = 1 - np.minimum(inputDF['A'].dropna() * 2/100, 0.7)
inputDF['health'] = inputDF['health'].fillna(1)
print (inputDF)
A B C health
0 7.0 NaN 8 0.86
1 3.0 3.0 5 0.94
2 8.0 1.0 7 0.84
3 NaN 0.0 3 1.00
4 8.0 2.0 7 0.84
总计:
def attributeHealth(col, weight, limit):
#return Series (column)
return ((1 - np.minimum(col.dropna() * weight/100, limit))
.reindex(col.index, fill_value=1))
a = attributeHealth(inputDF['A'], 10, 0.4)
b = attributeHealth(inputDF['B'], 5, 0.5)
c = attributeHealth(inputDF['C'], 2, 0.8)
inputDF['Health'] = (a * b * c) * 100
print (inputDF)
A B C Health
0 7.0 NaN 8 50.40
1 3.0 3.0 5 53.55
2 8.0 1.0 7 49.02
3 NaN 0.0 3 94.00
4 8.0 2.0 7 46.44