重用具有多个字符串值的函数 pandas
reuse function with multiple string values pandas
我希望简化仅基于单个字符串值的 return 列的函数。使用下面,我在 df 中有两种不同的颜色。我想将每种颜色传递给一个函数。但我只希望输出包含与该颜色相关的列。
如果函数中有多种颜色和多个输出,returned df 会变得太大。
import pandas as pd
import numpy as np
d = ({
'Date' : ['1/1/18','1/1/18','2/1/18','3/1/18','1/2/18','1/3/18','2/1/19','3/1/19'],
'Val' : ['A','B','C','D','A','B','C','D'],
'Blue' : ['Blue', 'Blue', 'Blue', np.NaN, np.NaN, 'Blue', np.NaN, np.NaN],
'Red' : [np.NaN, np.NaN, np.NaN, 'Red', 'Red', np.NaN, 'Red', 'Red']
})
df = pd.DataFrame(data = d)
df['Date'] = pd.to_datetime(df['Date'], format = '%d/%m/%y')
df['Count'] = df.Date.map(df.groupby('Date').size())
def func(df, val):
df['%s_cat' % val] = df['Count'] * 2
return df
blue = func(df, 'Blue')
red = func(df, 'Red')
预期输出(蓝色):
Date Val Blue Count Blue_cat
0 2018-01-01 A Blue 2 4
1 2018-01-01 B Blue 2 4
2 2018-01-02 C Blue 1 2
5 2018-03-01 B Blue 1 2
预期输出(红色):
Date Val Blue Red Count Red_cat
3 2018-01-03 D NaN Red 1 2
4 2018-02-01 A NaN Red 1 2
6 2019-01-02 C NaN Red 1 2
7 2019-01-03 D NaN Red 1 2
使用boolean indexing
with DataFrame.copy
来避免SettingWithCopyWarning
,因为如果您稍后修改过滤后的DataFrame中的值,您会发现修改不会传播回原始数据,并且Pandas警告:
def func(df, val):
df = df[df[val].eq(val)].copy()
df[f'{val}_cat'] = df['Count'] * 2
return df
blue = func(df, 'Blue')
print (blue)
Date Val Blue Red Count Blue_cat
0 2018-01-01 A Blue NaN 2 4
1 2018-01-01 B Blue NaN 2 4
2 2018-01-02 C Blue NaN 1 2
5 2018-03-01 B Blue NaN 1 2
red = func(df, 'Red')
print (red)
Date Val Blue Red Count Red_cat
3 2018-01-03 D NaN Red 1 2
4 2018-02-01 A NaN Red 1 2
6 2019-01-02 C NaN Red 1 2
7 2019-01-03 D NaN Red 1 2
我希望简化仅基于单个字符串值的 return 列的函数。使用下面,我在 df 中有两种不同的颜色。我想将每种颜色传递给一个函数。但我只希望输出包含与该颜色相关的列。
如果函数中有多种颜色和多个输出,returned df 会变得太大。
import pandas as pd
import numpy as np
d = ({
'Date' : ['1/1/18','1/1/18','2/1/18','3/1/18','1/2/18','1/3/18','2/1/19','3/1/19'],
'Val' : ['A','B','C','D','A','B','C','D'],
'Blue' : ['Blue', 'Blue', 'Blue', np.NaN, np.NaN, 'Blue', np.NaN, np.NaN],
'Red' : [np.NaN, np.NaN, np.NaN, 'Red', 'Red', np.NaN, 'Red', 'Red']
})
df = pd.DataFrame(data = d)
df['Date'] = pd.to_datetime(df['Date'], format = '%d/%m/%y')
df['Count'] = df.Date.map(df.groupby('Date').size())
def func(df, val):
df['%s_cat' % val] = df['Count'] * 2
return df
blue = func(df, 'Blue')
red = func(df, 'Red')
预期输出(蓝色):
Date Val Blue Count Blue_cat
0 2018-01-01 A Blue 2 4
1 2018-01-01 B Blue 2 4
2 2018-01-02 C Blue 1 2
5 2018-03-01 B Blue 1 2
预期输出(红色):
Date Val Blue Red Count Red_cat
3 2018-01-03 D NaN Red 1 2
4 2018-02-01 A NaN Red 1 2
6 2019-01-02 C NaN Red 1 2
7 2019-01-03 D NaN Red 1 2
使用boolean indexing
with DataFrame.copy
来避免SettingWithCopyWarning
,因为如果您稍后修改过滤后的DataFrame中的值,您会发现修改不会传播回原始数据,并且Pandas警告:
def func(df, val):
df = df[df[val].eq(val)].copy()
df[f'{val}_cat'] = df['Count'] * 2
return df
blue = func(df, 'Blue')
print (blue)
Date Val Blue Red Count Blue_cat
0 2018-01-01 A Blue NaN 2 4
1 2018-01-01 B Blue NaN 2 4
2 2018-01-02 C Blue NaN 1 2
5 2018-03-01 B Blue NaN 1 2
red = func(df, 'Red')
print (red)
Date Val Blue Red Count Red_cat
3 2018-01-03 D NaN Red 1 2
4 2018-02-01 A NaN Red 1 2
6 2019-01-02 C NaN Red 1 2
7 2019-01-03 D NaN Red 1 2