如何创建一个列来测量另一个字符串列中存在的项目数？

Question

我有包含员工及其级别的数据框。

import pandas as pd
d = {'employees': ["John", "Jamie", "Ann", "Jane", "Kim", "Steve"],  'Level': ["A/Ba", "C/A", "A", "C", "Ba/C", "D"]}
df = pd.DataFrame(data=d)

如何添加一个新列来衡量具有相同级别的员工人数。例如，John 将有 3 个，因为有 2 个 A（Jamie 和 Ann）和另一个 Ba（Kim）。请注意，在这种情况下，约翰级别的员工不计入该计数。

我的目标是最终数据帧是这样的。

Answer 1

试试这个：

df['Number of levels'] = df['Level'].str.split('/').explode().map(df['Level'].str.split('/').explode().value_counts()).sub(1).groupby(level=0).sum()

输出：

>>> df
  employees Level  Number of levels
0      John  A/Ba                 3
1     Jamie   C/A                 4
2       Ann     A                 2
3      Jane     C                 2
4       Kim  Ba/C                 3
5     Steve     D                 0

Answer 2

exploded = df.Level.str.split("/").explode()
counts = exploded.groupby(exploded).transform("count").sub(1)
df["Num Levels"] = counts.groupby(level=0).sum()

我们首先通过拆分“/”展开“级别”列，这样我们就可以到达每个级别：

>>> exploded = df.Level.str.split("/").explode()
>>> exploded

0     A
0    Ba
1     C
1     A
2     A
3     C
4    Ba
4     C
5     D
Name: Level, dtype: object

我们现在需要这个系列中每个元素的计数，所以我们按自身分组并按计数转换：

>>> exploded.groupby(exploded).transform("count")
0    3
0    2
1    3
1    3
2    3
3    3
4    2
4    3
5    1
Name: Level, dtype: int64

因为它计算元素本身，但你看其他地方，我们减去 1 得到 counts:

>>> counts = exploded.groupby(exploded).transform("count").sub(1)
>>> counts
0    2
0    1
1    2
1    2
2    2
3    2
4    1
4    2
5    0
Name: Level, dtype: int64

现在，我们需要“回来”，索引就是我们的帮手；我们对其进行分组（level=0 表示）并对其计数求和：

>>> counts.groupby(level=0).sum()
0    3
1    4
2    2
3    2
4    3
5    0
Name: Level, dtype: int64

这是最终结果，分配给 df["Num Levels"]。

得到

  employees Level  Num Levels
0      John  A/Ba           3
1     Jamie   C/A           4
2       Ann     A           2
3      Jane     C           2
4       Kim  Ba/C           3
5     Steve     D           0

这在“1 行”中都是可写的，但它可能会影响可读性和进一步的调试！

df["Num Levels"] = (df.Level
                      .str.split("/")
                      .explode()
                      .pipe(lambda ex: ex.groupby(ex))
                      .transform("count")
                      .sub(1)
                      .groupby(level=0)
                      .sum())

如何创建一个列来测量另一个字符串列中存在的项目数？

How to create a column that measures the number of items that exits in another string column?

python

group-by

dataframe

pandas

pandas-groupby