在 python 中计算和绘制每个类别的比率

Question

我正在处理员工流失问题。我有部门列以及员工是否辞职。

由于某些部门有更多的员工在其中工作，我认为他们占了减员的大部分。

所以我想绘制每个部门的流失率图表。例如，从事 IT 工作的 1000 人中有 200 人退出（20% 流失率）。

我一直在做的是创建枢轴 tables 并交叉 table 以获取相关信息，然后手动计算损耗率。但我觉得这样可以更有效地完成。

我目前拥有的是：

示例数据帧

df= pd.DataFrame({'department': ['sales','sales','HR','sales','sales','HR','sales','R&D','HR','sales'],
                   'quit': ['yes','no','no','yes','no','yes','yes','no','yes','no'],
                   'employee_count': [1,1,1,1,1,1,1,1,1,1]})

旋转 Table 以了解每个部门有多少员工


pivot = pd.pivot_table(df, values='employee_count', 
                    columns=['department'], aggfunc=np.sum)

给出

的输出

.	HR	R&D	Sales
employee_count	3	1	6

用于查找每个部门离职员工人数的交叉表

pivot2=pd.crosstab(values=df['employee_count'],
                  index=df['department'],
                  columns=df['quit'],
                  aggfunc=np.sum)

Quit	no	yes
HR	1	2
R&D	1	nan
Sales	3	3

手动计算并绘图

import matplotlib.pyplot as plt

plt.rcParams['axes.facecolor'] = 'white'
fig = plt.figure()
fig.patch.set_facecolor('white')

names = ['HR','sales']
values = [66.66666667,50]

plt.figure()
plt.bar(names, values)

如有任何帮助，我们将不胜感激，感谢您的宝贵时间

Answer 1

Pandas 提供按特定列分组。在你的情况下，我们可以根据部门和是否有人辞职来汇总结果。然后用一个数据透视表改变table的格式，最后计算出员工总数和离职率：

# obtain the relevant results by grouping
agg_df = df.groupby(by=['department', 'quit']).agg({'employee_count': 'sum'})
agg_df.reset_index(drop=False, inplace=True)

# pivot the table for further usage
pivot_df = agg_df.pivot(columns=['quit'], index=['department'], values='employee_count')
pivot_df.fillna(0, inplace=True)

# calculate final statistics
pivot_df['total'] = pivot_df['yes'] + pivot_df['no']
pivot_df['quit_rate'] = pivot_df['yes'] / pivot_df['total']

在 python 中计算和绘制每个类别的比率

calculating and graphing rates per category in python

python

matplotlib

pandas