根据另一列中的值选择列的值,然后在 pandas 数据框中的每一行应用该值
Selecting value of column based on the values in another column, then applying the value at each row in pandas dataframe
我要计算S的值,公式为:S = (25400/CN) − 254
我必须选择的 CN 值将取决于 amc_active 条件,即 1、2 和 3。
如果 'index 0 or 1st row' 处的 amc_active 条件为 1,那么我必须从 cn1 列中选择 CN 值,即 47
如果 amc_active 是 3,那么我必须从第 4 行的 cn3 列中选择 CN 值为 95,依此类推..
cn1 cn2 cn3 amc_active
0 47 56 78 1
1 55 61 87 2
2 36 67 73 1
3 42 84 95 3
... ... ... ... ... ... ... ... ...
17410 42 84 96 3
17411 48 81 85 1
17412 55 59 82 1
17413 57 86 93 2
17414 36 87 91 2
为此,我正在使用 else if condition
if (df_col_all_merged['amc_active'] == 1):
cn_for_s = df_col_all_merged['cn1']
elif (df_col_all_merged['amc_active'] == 2):
cn_for_s = df_col_all_merged['cn2']
elif (df_col_all_merged['amc_active'] == 3):
cn_for_s = df_col_all_merged['cn3']
但出现错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-43-40c3b3817474> in <module>
----> 1 if (df_col_all_merged['amc_active'] == 1):
2 cn_for_s = df_col_all_merged['cn1']
3 elif (df_col_all_merged['amc_active'] == 2):
4 cn_for_s = df_col_all_merged['cn2']
5 elif (df_col_all_merged['amc_active'] == 3):
~\Anaconda3\envs\geocube\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
并由此
def select_cn(df_col_all_merged):
result = cn_for_s
if (df_col_all_merged['amc_active'] == 1):
cn_for_s = df_col_all_merged['cn1']
elif (df_col_all_merged['amc_active'] == 2):
cn_for_s = df_col_all_merged['cn2']
elif (df_col_all_merged['amc_active'] == 3):
cn_for_s = df_col_all_merged['cn3']
return result
df_col_all_merged['s_mm'] = (25400/select_cn(df_col_all_merged)) - 254
错误是
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-54-df43eddeac39> in <module>
----> 1 df_col_all_merged['s_mm'] = (num_const/select_cn(df_col_all_merged)) - dev_const
<ipython-input-51-7405a6dd24db> in select_cn(df_col_all_merged)
1 def select_cn(df_col_all_merged):
----> 2 result = cn_for_s
3 if (df_col_all_merged['amc_active'] == 1):
4 cn_for_s = df_col_all_merged['cn1']
5 elif (df_col_all_merged['amc_active'] == 2):
UnboundLocalError: local variable 'cn_for_s' referenced before assignment
如何纠正?
你可以使用 numpy 的花式索引:
# get the values of `cn*` columns
cn_123 = df_col_all_merged[["cn1", "cn2", "cn3"]].to_numpy()
# index into it as "(row_index, amc_active_value-1)"
cn = cn_123[np.arange(len(df_col_all_merged)),
df.amc_active-1]
# perform the formula
df_col_all_merged["s_mm"] = (25400/cn) - 254
我们用 amc_active
中的索引对每一行进行索引(但自 0 索引以来负 1),
获得
cn1 cn2 cn3 amc_active s_mm
0 47 56 78 1 286.425532
1 55 61 87 2 162.393443
2 36 67 73 1 451.555556
3 42 84 95 3 13.368421
17410 42 84 96 3 10.583333
17411 48 81 85 1 275.166667
17412 55 59 82 1 207.818182
17413 57 86 93 2 41.348837
17414 36 87 91 2 37.954023
或者,有 np.select
代替多个 if-elif:
# form the conditions & corresponding choices
conditions = [df.amc_active.eq(1), df.amc_active.eq(2), df.amc_active.eq(3)]
choices = [df.cn1, df.cn2, df.cn3]
# select so
cn = np.select(conditions, choices)
# formula
df_col_all_merged["s_mm"] = (25400/cn) - 254
但由于 df.amc_active
的方便结构,这可能不是最好的。 (您也可以省略最后一个条件和选择,并将其作为 np.select
的默认值,即作为“else”)。
使用DataFrame.melt
for alternative for lookup:
df1 = df.melt('amc_active', ignore_index=False)
CN = df1.loc['cn' + df1['amc_active'].astype(str) == df1['variable'], 'value']
df['new'] = (25400/CN) - 254
print (df)
cn1 cn2 cn3 amc_active new
0 47 56 78 1 286.425532
1 55 61 87 2 162.393443
2 36 67 73 1 451.555556
3 42 84 95 3 13.368421
17410 42 84 96 3 10.583333
17411 48 81 85 1 275.166667
17412 55 59 82 1 207.818182
17413 57 86 93 2 41.348837
17414 36 87 91 2 37.954023
我要计算S的值,公式为:S = (25400/CN) − 254
我必须选择的 CN 值将取决于 amc_active 条件,即 1、2 和 3。 如果 'index 0 or 1st row' 处的 amc_active 条件为 1,那么我必须从 cn1 列中选择 CN 值,即 47
如果 amc_active 是 3,那么我必须从第 4 行的 cn3 列中选择 CN 值为 95,依此类推..
cn1 cn2 cn3 amc_active
0 47 56 78 1
1 55 61 87 2
2 36 67 73 1
3 42 84 95 3
... ... ... ... ... ... ... ... ...
17410 42 84 96 3
17411 48 81 85 1
17412 55 59 82 1
17413 57 86 93 2
17414 36 87 91 2
为此,我正在使用 else if condition
if (df_col_all_merged['amc_active'] == 1):
cn_for_s = df_col_all_merged['cn1']
elif (df_col_all_merged['amc_active'] == 2):
cn_for_s = df_col_all_merged['cn2']
elif (df_col_all_merged['amc_active'] == 3):
cn_for_s = df_col_all_merged['cn3']
但出现错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-43-40c3b3817474> in <module>
----> 1 if (df_col_all_merged['amc_active'] == 1):
2 cn_for_s = df_col_all_merged['cn1']
3 elif (df_col_all_merged['amc_active'] == 2):
4 cn_for_s = df_col_all_merged['cn2']
5 elif (df_col_all_merged['amc_active'] == 3):
~\Anaconda3\envs\geocube\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
并由此
def select_cn(df_col_all_merged):
result = cn_for_s
if (df_col_all_merged['amc_active'] == 1):
cn_for_s = df_col_all_merged['cn1']
elif (df_col_all_merged['amc_active'] == 2):
cn_for_s = df_col_all_merged['cn2']
elif (df_col_all_merged['amc_active'] == 3):
cn_for_s = df_col_all_merged['cn3']
return result
df_col_all_merged['s_mm'] = (25400/select_cn(df_col_all_merged)) - 254
错误是
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-54-df43eddeac39> in <module>
----> 1 df_col_all_merged['s_mm'] = (num_const/select_cn(df_col_all_merged)) - dev_const
<ipython-input-51-7405a6dd24db> in select_cn(df_col_all_merged)
1 def select_cn(df_col_all_merged):
----> 2 result = cn_for_s
3 if (df_col_all_merged['amc_active'] == 1):
4 cn_for_s = df_col_all_merged['cn1']
5 elif (df_col_all_merged['amc_active'] == 2):
UnboundLocalError: local variable 'cn_for_s' referenced before assignment
如何纠正?
你可以使用 numpy 的花式索引:
# get the values of `cn*` columns
cn_123 = df_col_all_merged[["cn1", "cn2", "cn3"]].to_numpy()
# index into it as "(row_index, amc_active_value-1)"
cn = cn_123[np.arange(len(df_col_all_merged)),
df.amc_active-1]
# perform the formula
df_col_all_merged["s_mm"] = (25400/cn) - 254
我们用 amc_active
中的索引对每一行进行索引(但自 0 索引以来负 1),
获得
cn1 cn2 cn3 amc_active s_mm
0 47 56 78 1 286.425532
1 55 61 87 2 162.393443
2 36 67 73 1 451.555556
3 42 84 95 3 13.368421
17410 42 84 96 3 10.583333
17411 48 81 85 1 275.166667
17412 55 59 82 1 207.818182
17413 57 86 93 2 41.348837
17414 36 87 91 2 37.954023
或者,有 np.select
代替多个 if-elif:
# form the conditions & corresponding choices
conditions = [df.amc_active.eq(1), df.amc_active.eq(2), df.amc_active.eq(3)]
choices = [df.cn1, df.cn2, df.cn3]
# select so
cn = np.select(conditions, choices)
# formula
df_col_all_merged["s_mm"] = (25400/cn) - 254
但由于 df.amc_active
的方便结构,这可能不是最好的。 (您也可以省略最后一个条件和选择,并将其作为 np.select
的默认值,即作为“else”)。
使用DataFrame.melt
for alternative for lookup:
df1 = df.melt('amc_active', ignore_index=False)
CN = df1.loc['cn' + df1['amc_active'].astype(str) == df1['variable'], 'value']
df['new'] = (25400/CN) - 254
print (df)
cn1 cn2 cn3 amc_active new
0 47 56 78 1 286.425532
1 55 61 87 2 162.393443
2 36 67 73 1 451.555556
3 42 84 95 3 13.368421
17410 42 84 96 3 10.583333
17411 48 81 85 1 275.166667
17412 55 59 82 1 207.818182
17413 57 86 93 2 41.348837
17414 36 87 91 2 37.954023