如何根据 DataFrame 中的多个条件计算出现次数

Question

我正在尝试弄清楚如何使用多个条件计算 DataFrame 中出现的次数。在此特定示例中，我想知道 Pclass 3 中女性乘客的数量。

    PassengerId Pclass  Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
0       892 3   male    34.5    0   0   330911  7.8292  NaN Q
1       893 3   female  47.0    1   0   363272  7.0000  NaN S
2       894 2   male    62.0    0   0   240276  9.6875  NaN Q
3       895 3   male    27.0    0   0   315154  8.6625  NaN S
4       896 3   female  22.0    1   1   3101298 12.2875 NaN S

这是我几次失败的尝试：

    len(test[test["Sex"] == "female", test["Pclass"] == 3])
    sum(test.Pclass == 3 & test.Sex == "female")
    test.[test["Sex"] == "female", test["Pclass"] == 3].count()

None 其中似乎有效。最后我创建了自己的函数，但必须有一种更简单的方法来计算它。

def countif(sex, pclass):
    x = 0
    for i in range(0,len(test)):
        s = test.iloc[i]['Sex']
        c = test.iloc[i]['Pclass']
        if s == sex and c == pclass:
                x = x + 1
    return x

提前致谢

Answer 1

有几种方法可以做到这一点：

test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896}, 
      'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3}, 
      'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'}, 
      'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0}, 
      'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1}, 
      'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 
      'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298}, 
      'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875}, 
      'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan}, 
      'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})

您需要将布尔值放在圆括号中并用 &

连接

sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]

或者你可以这样做：

tab = pd.crosstab(df.Pclass,df.Sex)

Sex female  male
Pclass      
2   0   1
3   2   2

tab.iloc[tab.index==3]['female']

如何根据 DataFrame 中的多个条件计算出现次数

How to count occurrences based on multiple criteria in a DataFrame

python

count

countif

pandas