向量化函数以使用整个数据框列而不是单个值

vectorizing a function to use entire dataframe column instead of single value

我有设置颜色的功能。目前,我循环遍历数据帧并将单个值传递给函数,将该值交叉引用到其对应的颜色值和 return 颜色值。我现在想传递数据帧中的整个列(而不是遍历数据帧)和 return 颜色值数组。

这是当前传递单个值的函数的简化版本(我只是设置单个值而不是通过数据帧显示整个循环):

    def set_LineQualityColor(LineQ):
      data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]
      df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])   
      c=df[df['LineQuality']==LineQ]['CR'].values[0]
    return c
    
    LQ=4
    c= set_LineQualityColor(LQ)

LineQ 是数据框中的一列时,如何才能使其正常工作?即

c= set_LineQualityColor(df.LQ)

或者是否有更有效的方法来执行此操作? python 的新手。谢谢

您可以传递一个新的(或一列)数据框来连接两者以获得结果。

>>> def set_LineQualityColor_df(LineQ):
...     data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
...             ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
...             ['lightgray', 9]]
...     df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
...     #c=df[df['LineQuality']==LineQ]['CR'].values[0]
...     c = df.set_index('LineQuality').join(LineQ)
...     return c
...
>>> df_lineQ = pd.DataFrame({ 'LineQuality': [4,5]})
>>> set_LineQualityColor_df(df_lineQ).head(5)
                     CR  LineQuality
LineQuality
0.0                    grey          4.0
1.0          cornflowerblue          5.0
2.0              lightgreen          NaN
3.0                seagreen          NaN
4.0               mistyrose          NaN

您可以传递特定的数据框列。

>>> set_LineQualityColor_df(df_lineQ.LineQuality).head(5)
                         CR  LineQuality
LineQuality
0.0                    grey          4.0
1.0          cornflowerblue          5.0
2.0              lightgreen          NaN
3.0                seagreen          NaN
4.0               mistyrose          NaN
>>>

设置LineQuality为指标。

data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]

df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
df.set_index(['LineQuality'], drop=True, inplace=True)

给出这个数据框:

                         CR
LineQuality                
0.0                    grey
1.0          cornflowerblue
2.0              lightgreen
3.0                seagreen
4.0               mistyrose
4.1              lightcoral
5.0               rosybrown
5.1               indianred
9.0               lightgray

然后使用loc进行查找。

LQ_df = pd.DataFrame([1, 5, 4, 9, 4.1, 0, 4.0], columns=['LQ'])

LQ = LQ_df['LQ']

df.loc[LQ, 'CR']

这给了这个系列:

LineQuality
1.0    cornflowerblue
5.0         rosybrown
4.0         mistyrose
9.0         lightgray
4.1        lightcoral
0.0              grey
4.0         mistyrose

每次调用函数都创建df数据帧没有意义,所以最好在调用函数之前创建一次。然后,您可以像之前那样定义要使用 df.loc 的函数:

data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]

lineq_color_lookup = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
lineq_color_lookup.set_index(['LineQuality'], drop=True, inplace=True)

def get_LineQualityColor(LineQ):
    return lineq_color_lookup.loc[LineQ, 'CR'] # .tolist() if you want it as a list

我还将函数名称更改为 get_LineQualityColor,因为该函数未设置任何内容 -- 它仅 returns 对应于给定 LineQuality 的颜色。