有条件的 if then else 在数据框上

Conditional if then else on a dataframe

对于下面的 df,我想应用以下规则,如果 SPY 或 SCZ 的值为正,则将具有最高值的列的标签添加到新列中,否则如果两者都不是是正数,然后将列 'ACTIVE' 的值设置为最高值的列标签,TLT 或 TIP。

# pseudo code rules
if df.SPY > 0 and df.SCZ > 0:
   if df.SPY > df.SCZ:
      df['ACTIVE'] = 'SPY'
   else:
      df['ACTIVE'] = 'SCZ'
else:
   if df.TIP > df.TLT:
      df['ACTIVE'] = 'TIP'
   else:
      df['ACTIVE'] = 'TLT'


#sample data
df3 = pd.DataFrame({'TLT':[0.077,0.064,0.034,0.028], 
                    'TIP':[0.021,0.014,0.005,0.005], 
                    'SPY':[0.055,-0.063,0.09,0.094],
                    'SCZ':[0.062,0.060,0.068,0.069]})

# condition is only referencing the last row, need it to apply to each row
if (df.SPY[-1:] < 0).bool() & (df.SCZ[-1:] < 0).bool():    
    df['ACTIVE'] = df.iloc[:,0:2].apply(pd.Series.idxmax, axis=1)
else:
    # this will set the last column to the highest performing asset
    df['ACTIVE'] = df.iloc[:,2:4].apply(pd.Series.idxmax, axis=1) 

您可以使用 pandas.DataFrame.apply 方法,该方法将在 DataFrame 的每一行上应用提供的函数。

def active(row):
  if row['SPY'] > 0 and row['SCZ'] > 0:
    if row['SPY'] > row['SCZ']:
      return 'SPY'
    return 'SCZ'
  
  if row['TIP'] > row['TLT']:
    return 'TIP'

  return 'TLT'

df3['ACTIVE'] = df3.apply(active, axis = 1)
     TLT    TIP     SPY     SCZ
0   0.077   0.021   0.055   0.062
1   0.064   0.014   -0.063  0.060
2   0.034   0.005   0.090   0.068
3   0.028   0.005   0.094   0.069

您可以使用 np.where 并以矢量化的方式进行。

df3['Active'] = np.where((df3['SPY'] > 0) & (df3['SCZ'] > 0),
                         np.where(df3['SPY'] > df3['SCZ'], 'SPY', 'SCZ'),
                         np.where(df3['TIP'] > df3['TLT'], 'TIP', 'TLT'))
df3
     TLT    TIP     SPY     SCZ     Active
0   0.077   0.021   0.055   0.062   SCZ
1   0.064   0.014   -0.063  0.060   TLT
2   0.034   0.005   0.090   0.068   SPY
3   0.028   0.005   0.094   0.069   SPY

我会为此任务使用 Python 内置 列表理解

首先,让我们提取确定新字段到其自身函数的逻辑:

def evaluate_row(row) -> str:
    if row['SPY'] > 0 and row['SCZ'] > 0:
        if row['SPY'] > row['SCZ']:
            return 'SPY'
        else:
            return 'SCZ'
    else:
        if row['TIP'] > row['TLT']:
            return 'TIP'
        else:
            return 'TLT'

现在我们只需使用此函数通过列表理解生成一个列表,该列表将填充数据框中的新列'ACTIVE'。很简单:

df3['ACTIVE'] = [evaluate_row(r[1]) for r in df3.iterrows()]

注意:我们之所以使用r[1]是因为.iterrows()方法返回的元组。

结果是:

>>> df3
     TLT    TIP    SPY    SCZ ACTIVE
0  0.077  0.021  0.055  0.062    SCZ
1  0.064  0.014 -0.063  0.060    TLT
2  0.034  0.005  0.090  0.068    SPY
3  0.028  0.005  0.094  0.069    SPY