使用 Python 根据另一列的类别将一列分成分位数

Question

在泰坦尼克号数据集中，我想根据每个登船站将票价列分成分位数，值为 C、S 和 Q。

例如：

pd.qcut(titanic_train['Fare'],q=3,labels=['Low','Med','High'])

无论登船车站如何，都会将票价列分成分位数，如果客户从不同的车站登车，这可能不是真的。

预期的 Sol：我想要一个函数，将 Fare 列分别切割为每个登车站的 3 个分位数。

Answer 1

您想先使用 pandas.groupby，然后使用 transform 和 qcut，如下所示：

df["Quantile"] = df.groupby("Embarked")["Fare"].transform(lambda x: pd.qcut(x, 3, labels=['Low','Med','High']))

>>> df.head()
   PassengerId  Survived  Pclass  ... Cabin Embarked  Quantile
0            1         0       3  ...   NaN        S       Low
1            2         1       1  ...   C85        C      High
2            3         1       3  ...   NaN        S       Low
3            4         1       1  ...  C123        S      High
4            5         0       3  ...   NaN        S       Low

使用 Python 根据另一列的类别将一列分成分位数

Cutting a column into quantiles based on categories on another column using Python

python

data-science