迭代 DataFrame,评估列值,并将值设置为第三列
Iterating over a DataFrame, evaluating column values, and setting value to a third column
我一直在尝试遍历 DataFrame 或应用函数,以便根据 DataFrame 中的其他 2 列更改 DataFrame 特定列中的内容。
我有一个像这样的 df:
df = pd.DataFrame({'Age_type' : pd.Series(['Adult','Adult','Child','Child']),
'Gender' : pd.Series(['Female','Male','Female','Female'])
})
Gender Age_type Group
0 Female Adult
1 Male Adult
2 Female Child
3 Female Child
而且我想给每个案例设置一个分组,思路是这样的:
if gender == 'Female' and age_type == 'Adult':
group = 'Group A'
elif gender == 'Female' and age_type == 'Child':
group = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
group = 'Group C'
elif gender == 'Male' and age_type == 'Child':
group = 'Group D'
我曾尝试使用 .apply(function),因为据我所知,您永远不应该在迭代 DataFrame 时修改它(所以这会使 for 循环不是一个选项?)。
我试过:
def set_group(data):
gender = data['Gender']
age_type = data['Age_type']
if gender == 'Female' and age_type == 'Adult':
data['Group'] = 'Group A'
elif gender == 'Female' and age_type == 'Child':
data['Group'] = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
data['Group'] = 'Group C'
elif gender == 'Male' and age_type == 'Child':
data['Group'] = 'Group D'
return None
df['Group'].apply(set_group)
但我不断收到如下错误:
类型错误:字符串索引必须是整数,而不是 str
知道如何迭代 DataFrame、读取一些列并基于此为另一列设置值吗?
谢谢!
试试这个:
dfx['group'] = ""
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Adult'),'A', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Child'),'B', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Adult'),'C', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Child'),'D', dfx['group'])
这个怎么样?
In [96]: df
Out[96]:
Age_type Gender
0 Adult Female
1 Adult Male
2 Child Female
3 Child Female
In [97]: m = {'FemaleAdult': 'Group A',
...: 'FemaleChild': 'Group B',
...: 'MaleAdult': 'Group C',
...: 'MaleChild': 'Group D'}
In [98]: df['group'] = (df.Gender + df.Age_type).map(m)
In [99]: df
Out[99]:
Age_type Gender group
0 Adult Female Group A
1 Adult Male Group C
2 Child Female Group B
3 Child Female Group B
我一直在尝试遍历 DataFrame 或应用函数,以便根据 DataFrame 中的其他 2 列更改 DataFrame 特定列中的内容。
我有一个像这样的 df:
df = pd.DataFrame({'Age_type' : pd.Series(['Adult','Adult','Child','Child']),
'Gender' : pd.Series(['Female','Male','Female','Female'])
})
Gender Age_type Group
0 Female Adult
1 Male Adult
2 Female Child
3 Female Child
而且我想给每个案例设置一个分组,思路是这样的:
if gender == 'Female' and age_type == 'Adult':
group = 'Group A'
elif gender == 'Female' and age_type == 'Child':
group = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
group = 'Group C'
elif gender == 'Male' and age_type == 'Child':
group = 'Group D'
我曾尝试使用 .apply(function),因为据我所知,您永远不应该在迭代 DataFrame 时修改它(所以这会使 for 循环不是一个选项?)。
我试过:
def set_group(data):
gender = data['Gender']
age_type = data['Age_type']
if gender == 'Female' and age_type == 'Adult':
data['Group'] = 'Group A'
elif gender == 'Female' and age_type == 'Child':
data['Group'] = 'Group B'
elif gender == 'Male' and age_type == 'Adult':
data['Group'] = 'Group C'
elif gender == 'Male' and age_type == 'Child':
data['Group'] = 'Group D'
return None
df['Group'].apply(set_group)
但我不断收到如下错误: 类型错误:字符串索引必须是整数,而不是 str
知道如何迭代 DataFrame、读取一些列并基于此为另一列设置值吗?
谢谢!
试试这个:
dfx['group'] = ""
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Adult'),'A', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Female')&(dfx['Age_type']=='Child'),'B', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Adult'),'C', dfx['group'])
dfx['group'] = np.where((dfx['Gender']=='Male')&(dfx['Age_type']=='Child'),'D', dfx['group'])
这个怎么样?
In [96]: df
Out[96]:
Age_type Gender
0 Adult Female
1 Adult Male
2 Child Female
3 Child Female
In [97]: m = {'FemaleAdult': 'Group A',
...: 'FemaleChild': 'Group B',
...: 'MaleAdult': 'Group C',
...: 'MaleChild': 'Group D'}
In [98]: df['group'] = (df.Gender + df.Age_type).map(m)
In [99]: df
Out[99]:
Age_type Gender group
0 Adult Female Group A
1 Adult Male Group C
2 Child Female Group B
3 Child Female Group B