向量化函数以使用整个数据框列而不是单个值
vectorizing a function to use entire dataframe column instead of single value
我有设置颜色的功能。目前,我循环遍历数据帧并将单个值传递给函数,将该值交叉引用到其对应的颜色值和 return 颜色值。我现在想传递数据帧中的整个列(而不是遍历数据帧)和 return 颜色值数组。
这是当前传递单个值的函数的简化版本(我只是设置单个值而不是通过数据帧显示整个循环):
def set_LineQualityColor(LineQ):
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
c=df[df['LineQuality']==LineQ]['CR'].values[0]
return c
LQ=4
c= set_LineQualityColor(LQ)
当 LineQ
是数据框中的一列时,如何才能使其正常工作?即
c= set_LineQualityColor(df.LQ)
或者是否有更有效的方法来执行此操作? python 的新手。谢谢
您可以传递一个新的(或一列)数据框来连接两者以获得结果。
>>> def set_LineQualityColor_df(LineQ):
... data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
... ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
... ['lightgray', 9]]
... df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
... #c=df[df['LineQuality']==LineQ]['CR'].values[0]
... c = df.set_index('LineQuality').join(LineQ)
... return c
...
>>> df_lineQ = pd.DataFrame({ 'LineQuality': [4,5]})
>>> set_LineQualityColor_df(df_lineQ).head(5)
CR LineQuality
LineQuality
0.0 grey 4.0
1.0 cornflowerblue 5.0
2.0 lightgreen NaN
3.0 seagreen NaN
4.0 mistyrose NaN
您可以传递特定的数据框列。
>>> set_LineQualityColor_df(df_lineQ.LineQuality).head(5)
CR LineQuality
LineQuality
0.0 grey 4.0
1.0 cornflowerblue 5.0
2.0 lightgreen NaN
3.0 seagreen NaN
4.0 mistyrose NaN
>>>
设置LineQuality为指标。
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
df.set_index(['LineQuality'], drop=True, inplace=True)
给出这个数据框:
CR
LineQuality
0.0 grey
1.0 cornflowerblue
2.0 lightgreen
3.0 seagreen
4.0 mistyrose
4.1 lightcoral
5.0 rosybrown
5.1 indianred
9.0 lightgray
然后使用loc
进行查找。
LQ_df = pd.DataFrame([1, 5, 4, 9, 4.1, 0, 4.0], columns=['LQ'])
LQ = LQ_df['LQ']
df.loc[LQ, 'CR']
这给了这个系列:
LineQuality
1.0 cornflowerblue
5.0 rosybrown
4.0 mistyrose
9.0 lightgray
4.1 lightcoral
0.0 grey
4.0 mistyrose
每次调用函数都创建df
数据帧没有意义,所以最好在调用函数之前创建一次。然后,您可以像之前那样定义要使用 df.loc
的函数:
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
lineq_color_lookup = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
lineq_color_lookup.set_index(['LineQuality'], drop=True, inplace=True)
def get_LineQualityColor(LineQ):
return lineq_color_lookup.loc[LineQ, 'CR'] # .tolist() if you want it as a list
我还将函数名称更改为 get_LineQualityColor
,因为该函数未设置任何内容 -- 它仅 returns 对应于给定 LineQuality 的颜色。
我有设置颜色的功能。目前,我循环遍历数据帧并将单个值传递给函数,将该值交叉引用到其对应的颜色值和 return 颜色值。我现在想传递数据帧中的整个列(而不是遍历数据帧)和 return 颜色值数组。
这是当前传递单个值的函数的简化版本(我只是设置单个值而不是通过数据帧显示整个循环):
def set_LineQualityColor(LineQ):
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
c=df[df['LineQuality']==LineQ]['CR'].values[0]
return c
LQ=4
c= set_LineQualityColor(LQ)
当 LineQ
是数据框中的一列时,如何才能使其正常工作?即
c= set_LineQualityColor(df.LQ)
或者是否有更有效的方法来执行此操作? python 的新手。谢谢
您可以传递一个新的(或一列)数据框来连接两者以获得结果。
>>> def set_LineQualityColor_df(LineQ):
... data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
... ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
... ['lightgray', 9]]
... df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
... #c=df[df['LineQuality']==LineQ]['CR'].values[0]
... c = df.set_index('LineQuality').join(LineQ)
... return c
...
>>> df_lineQ = pd.DataFrame({ 'LineQuality': [4,5]})
>>> set_LineQualityColor_df(df_lineQ).head(5)
CR LineQuality
LineQuality
0.0 grey 4.0
1.0 cornflowerblue 5.0
2.0 lightgreen NaN
3.0 seagreen NaN
4.0 mistyrose NaN
您可以传递特定的数据框列。
>>> set_LineQualityColor_df(df_lineQ.LineQuality).head(5)
CR LineQuality
LineQuality
0.0 grey 4.0
1.0 cornflowerblue 5.0
2.0 lightgreen NaN
3.0 seagreen NaN
4.0 mistyrose NaN
>>>
设置LineQuality为指标。
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
df.set_index(['LineQuality'], drop=True, inplace=True)
给出这个数据框:
CR
LineQuality
0.0 grey
1.0 cornflowerblue
2.0 lightgreen
3.0 seagreen
4.0 mistyrose
4.1 lightcoral
5.0 rosybrown
5.1 indianred
9.0 lightgray
然后使用loc
进行查找。
LQ_df = pd.DataFrame([1, 5, 4, 9, 4.1, 0, 4.0], columns=['LQ'])
LQ = LQ_df['LQ']
df.loc[LQ, 'CR']
这给了这个系列:
LineQuality
1.0 cornflowerblue
5.0 rosybrown
4.0 mistyrose
9.0 lightgray
4.1 lightcoral
0.0 grey
4.0 mistyrose
每次调用函数都创建df
数据帧没有意义,所以最好在调用函数之前创建一次。然后,您可以像之前那样定义要使用 df.loc
的函数:
data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
['lightgray', 9]]
lineq_color_lookup = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
lineq_color_lookup.set_index(['LineQuality'], drop=True, inplace=True)
def get_LineQualityColor(LineQ):
return lineq_color_lookup.loc[LineQ, 'CR'] # .tolist() if you want it as a list
我还将函数名称更改为 get_LineQualityColor
,因为该函数未设置任何内容 -- 它仅 returns 对应于给定 LineQuality 的颜色。