重用具有多个字符串值的函数 pandas

reuse function with multiple string values pandas

我希望简化仅基于单个字符串值的 return 列的函数。使用下面,我在 df 中有两种不同的颜色。我想将每种颜色传递给一个函数。但我只希望输出包含与该颜色相关的列。

如果函数中有多种颜色和多个输出,returned df 会变得太大。

import pandas as pd
import numpy as np

d = ({
    'Date' : ['1/1/18','1/1/18','2/1/18','3/1/18','1/2/18','1/3/18','2/1/19','3/1/19'],                 
    'Val' : ['A','B','C','D','A','B','C','D'],   
    'Blue' : ['Blue', 'Blue', 'Blue', np.NaN, np.NaN, 'Blue', np.NaN, np.NaN],   
    'Red' : [np.NaN, np.NaN, np.NaN, 'Red', 'Red', np.NaN, 'Red', 'Red']                                 
    })

df = pd.DataFrame(data = d)

df['Date'] = pd.to_datetime(df['Date'], format = '%d/%m/%y')

df['Count'] = df.Date.map(df.groupby('Date').size())


def func(df, val):
    
    df['%s_cat' % val] = df['Count'] * 2

    return df


blue = func(df, 'Blue')


red = func(df, 'Red')

预期输出(蓝色):

        Date Val  Blue   Count  Blue_cat 
0 2018-01-01   A  Blue       2         4        
1 2018-01-01   B  Blue       2         4        
2 2018-01-02   C  Blue       1         2               
5 2018-03-01   B  Blue       1         2       

预期输出(红色):

        Date Val  Blue  Red  Count   Red_cat
3 2018-01-03   D   NaN  Red      1         2
4 2018-02-01   A   NaN  Red      1         2
6 2019-01-02   C   NaN  Red      1         2
7 2019-01-03   D   NaN  Red      1         2

使用boolean indexing with DataFrame.copy来避免SettingWithCopyWarning,因为如果您稍后修改过滤后的DataFrame中的值,您会发现修改不会传播回原始数据,并且Pandas警告:

def func(df, val):
    df = df[df[val].eq(val)].copy()
    df[f'{val}_cat'] = df['Count'] * 2

    return df

blue = func(df, 'Blue')
print (blue)
        Date Val  Blue  Red  Count  Blue_cat
0 2018-01-01   A  Blue  NaN      2         4
1 2018-01-01   B  Blue  NaN      2         4
2 2018-01-02   C  Blue  NaN      1         2
5 2018-03-01   B  Blue  NaN      1         2

red = func(df, 'Red')
print (red)
        Date Val Blue  Red  Count  Red_cat
3 2018-01-03   D  NaN  Red      1        2
4 2018-02-01   A  NaN  Red      1        2
6 2019-01-02   C  NaN  Red      1        2
7 2019-01-03   D  NaN  Red      1        2