使用嵌套元组对列应用多个过滤器
Apply multiple filter on column using nested tuple
正在尝试向元组过滤器添加附加条件....
没有附加条件的当前工作元组过滤器(稍后讨论):
import pandas as pd
data = [['A',23], ['D',50], ['C',32], ['D',21], ['D',24], ['B',20], ['C',68], ['A',52], ['A',41],[ 'D',44], ['B',29], ['B',70], ['B',33], ['C',56], ['A',72]]
df = pd.DataFrame(data, columns = ['group', 'age'])
group_mask = {(20, 30): 'A', (25, 30): 'B', (65, 70): 'C', (40, 50): 'D'}
df['range'] = df['group'].map({v:k for k, v in group_mask.items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])
#filtered
df = df[df['in_range']]
df.drop(columns=['range', 'in_range'], inplace=True)
上面的代码将数据框过滤到年龄等于的行,或者在每个相应组的 group_mask 中设置的范围之间。
从而产生以下输出...
group age
0 A 23
1 D 50
6 C 68
9 D 44
10 B 29
但是,我需要考虑一个额外的条件(列);列 gender
。根据 gender
,group
的 age
过滤器范围不同
数据现已修改为包含此附加列:
data = [['A', 'male', 23], ['D','female',50], ['C','male',32], ['D','male',21], ['D','female',24], ['B','female',20], ['C','male',68], ['A','male',52], ['A','male',41],[ 'D','male',44], ['B','female',29], ['B','female',70], ['B','female',33], ['C','female',56], ['A','female',72]]
df = pd.DataFrame(data, columns = ['group', 'gender', 'age'])
但是,将现有的 group_mask
元组过滤器调整为现在包含 'gender' 依赖范围是我遇到的问题,如下所示。
我试过从....
group_mask = {(20, 30): 'A', (25, 30): 'B', (65, 70): 'C', (40, 50): 'D'}
到....
group_mask = {(((20, 30), 'A') , 'male' ), (((25, 30), 'B') , 'male' ), (((65, 70), 'C') , 'male' ), (((40, 50), 'D'), 'male' ), \
(((60, 80), 'A') , 'female'), (((15, 30), 'B'), 'female'), (((50, 60), 'C'), 'female'), (((30, 40), 'D'), 'female' )}
..然后重新应用 map
和过滤器....
df['range'] = df[['group', 'gender']].map({v:k for k, v in group_mask .items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])
但是,它会引发一条错误消息 AttributeError: 'DataFrame' object has no attribute 'map'
首先,我不确定修改后的 group_mask
格式是否正确,其次我不确定如何更正 map
功能。
需要帮助。提前谢谢你。
你可以让 group_mask
变成这样:
group_mask = {(30, 40): ('D', 'female'), (25, 30): ('B', 'male'), (40, 50): ('D', 'male'), (65, 70): ('C', 'male'), (60, 80): ('A', 'female'), (20, 30): ('A', 'male'), (15, 30): ('B', 'female'), (50, 60): ('C', 'female')}
要应用地图,您可以这样:
df['range']=df.apply(lambda x: [(x[0],x[1])], axis=1, result_type='expand')[0].map({v:k for k, v in group_mask .items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])
正在尝试向元组过滤器添加附加条件....
没有附加条件的当前工作元组过滤器(稍后讨论):
import pandas as pd
data = [['A',23], ['D',50], ['C',32], ['D',21], ['D',24], ['B',20], ['C',68], ['A',52], ['A',41],[ 'D',44], ['B',29], ['B',70], ['B',33], ['C',56], ['A',72]]
df = pd.DataFrame(data, columns = ['group', 'age'])
group_mask = {(20, 30): 'A', (25, 30): 'B', (65, 70): 'C', (40, 50): 'D'}
df['range'] = df['group'].map({v:k for k, v in group_mask.items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])
#filtered
df = df[df['in_range']]
df.drop(columns=['range', 'in_range'], inplace=True)
上面的代码将数据框过滤到年龄等于的行,或者在每个相应组的 group_mask 中设置的范围之间。 从而产生以下输出...
group age
0 A 23
1 D 50
6 C 68
9 D 44
10 B 29
但是,我需要考虑一个额外的条件(列);列 gender
。根据 gender
,group
的 age
过滤器范围不同
数据现已修改为包含此附加列:
data = [['A', 'male', 23], ['D','female',50], ['C','male',32], ['D','male',21], ['D','female',24], ['B','female',20], ['C','male',68], ['A','male',52], ['A','male',41],[ 'D','male',44], ['B','female',29], ['B','female',70], ['B','female',33], ['C','female',56], ['A','female',72]]
df = pd.DataFrame(data, columns = ['group', 'gender', 'age'])
但是,将现有的 group_mask
元组过滤器调整为现在包含 'gender' 依赖范围是我遇到的问题,如下所示。
我试过从....
group_mask = {(20, 30): 'A', (25, 30): 'B', (65, 70): 'C', (40, 50): 'D'}
到....
group_mask = {(((20, 30), 'A') , 'male' ), (((25, 30), 'B') , 'male' ), (((65, 70), 'C') , 'male' ), (((40, 50), 'D'), 'male' ), \
(((60, 80), 'A') , 'female'), (((15, 30), 'B'), 'female'), (((50, 60), 'C'), 'female'), (((30, 40), 'D'), 'female' )}
..然后重新应用 map
和过滤器....
df['range'] = df[['group', 'gender']].map({v:k for k, v in group_mask .items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])
但是,它会引发一条错误消息 AttributeError: 'DataFrame' object has no attribute 'map'
首先,我不确定修改后的 group_mask
格式是否正确,其次我不确定如何更正 map
功能。
需要帮助。提前谢谢你。
你可以让 group_mask
变成这样:
group_mask = {(30, 40): ('D', 'female'), (25, 30): ('B', 'male'), (40, 50): ('D', 'male'), (65, 70): ('C', 'male'), (60, 80): ('A', 'female'), (20, 30): ('A', 'male'), (15, 30): ('B', 'female'), (50, 60): ('C', 'female')}
要应用地图,您可以这样:
df['range']=df.apply(lambda x: [(x[0],x[1])], axis=1, result_type='expand')[0].map({v:k for k, v in group_mask .items()})
df['in_range'] = (df['range'].str[0] <= df['age']) & (df['age'] <= df['range'].str[1])