按性别分组并计算高度列中的缺失值
Groupby gender and count missing values in height column
我有 table 如下所示,我需要仅为缺少高度的行查找性别列的值计数。
Age Gender Height Weight NonAlcoholicDrink AlcoholicDrink
0 19.0 Male NaN NaN Coffee NaN
1 NaN Female 64.50 128.70 Water Liquor
2 21.0 Male 71.47 182.95 Coffee Beer
3 32.0 Female 57.30 103.40 Green Tea Wine
4 32.0 Female 53.80 138.40 Black Tea Liquor
5 20.0 Male 73.38 204.59 Pepsi NaN
6 20.0 Male 70.46 225.25 Coffee NaN
7 32.0 Female 54.10 157.80 Black Tea Liquor
8 49.0 Female 64.80 152.60 Gatorade Beer
9 45.0 Male NaN 196.55 Coffee Liquor
我该怎么办?
获得答案的一种方法是创建一个新的数据框,其中高度值为 np.nan
(上面的示例中有 2 个),方法是:
missing_height = df[df['Height'].isnull()]
然后你可以使用新的数据框,做一个value_counts()
,得到你需要的:
missing_height['Gender'].value_counts(ascending=False)
这将为您提供所需的东西。
我有 table 如下所示,我需要仅为缺少高度的行查找性别列的值计数。
Age Gender Height Weight NonAlcoholicDrink AlcoholicDrink
0 19.0 Male NaN NaN Coffee NaN
1 NaN Female 64.50 128.70 Water Liquor
2 21.0 Male 71.47 182.95 Coffee Beer
3 32.0 Female 57.30 103.40 Green Tea Wine
4 32.0 Female 53.80 138.40 Black Tea Liquor
5 20.0 Male 73.38 204.59 Pepsi NaN
6 20.0 Male 70.46 225.25 Coffee NaN
7 32.0 Female 54.10 157.80 Black Tea Liquor
8 49.0 Female 64.80 152.60 Gatorade Beer
9 45.0 Male NaN 196.55 Coffee Liquor
我该怎么办?
获得答案的一种方法是创建一个新的数据框,其中高度值为 np.nan
(上面的示例中有 2 个),方法是:
missing_height = df[df['Height'].isnull()]
然后你可以使用新的数据框,做一个value_counts()
,得到你需要的:
missing_height['Gender'].value_counts(ascending=False)
这将为您提供所需的东西。