如何根据 DataFrame 中的多个条件计算出现次数
How to count occurrences based on multiple criteria in a DataFrame
我正在尝试弄清楚如何使用多个条件计算 DataFrame 中出现的次数。
在此特定示例中,我想知道 Pclass 3 中女性乘客的数量。
PassengerId Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 female 47.0 1 0 363272 7.0000 NaN S
2 894 2 male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 male 27.0 0 0 315154 8.6625 NaN S
4 896 3 female 22.0 1 1 3101298 12.2875 NaN S
这是我几次失败的尝试:
len(test[test["Sex"] == "female", test["Pclass"] == 3])
sum(test.Pclass == 3 & test.Sex == "female")
test.[test["Sex"] == "female", test["Pclass"] == 3].count()
None 其中似乎有效。
最后我创建了自己的函数,但必须有一种更简单的方法来计算它。
def countif(sex, pclass):
x = 0
for i in range(0,len(test)):
s = test.iloc[i]['Sex']
c = test.iloc[i]['Pclass']
if s == sex and c == pclass:
x = x + 1
return x
提前致谢
有几种方法可以做到这一点:
test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896},
'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3},
'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'},
'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0},
'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1},
'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1},
'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298},
'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875},
'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})
您需要将布尔值放在圆括号中并用 &
连接
sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]
或者你可以这样做:
tab = pd.crosstab(df.Pclass,df.Sex)
Sex female male
Pclass
2 0 1
3 2 2
tab.iloc[tab.index==3]['female']
我正在尝试弄清楚如何使用多个条件计算 DataFrame 中出现的次数。 在此特定示例中,我想知道 Pclass 3 中女性乘客的数量。
PassengerId Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 female 47.0 1 0 363272 7.0000 NaN S
2 894 2 male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 male 27.0 0 0 315154 8.6625 NaN S
4 896 3 female 22.0 1 1 3101298 12.2875 NaN S
这是我几次失败的尝试:
len(test[test["Sex"] == "female", test["Pclass"] == 3])
sum(test.Pclass == 3 & test.Sex == "female")
test.[test["Sex"] == "female", test["Pclass"] == 3].count()
None 其中似乎有效。 最后我创建了自己的函数,但必须有一种更简单的方法来计算它。
def countif(sex, pclass):
x = 0
for i in range(0,len(test)):
s = test.iloc[i]['Sex']
c = test.iloc[i]['Pclass']
if s == sex and c == pclass:
x = x + 1
return x
提前致谢
有几种方法可以做到这一点:
test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896},
'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3},
'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'},
'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0},
'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1},
'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1},
'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298},
'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875},
'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})
您需要将布尔值放在圆括号中并用 &
连接sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]
或者你可以这样做:
tab = pd.crosstab(df.Pclass,df.Sex)
Sex female male
Pclass
2 0 1
3 2 2
tab.iloc[tab.index==3]['female']