如何统计具有相同前缀的ID并将总数存储在另一列中

Question

我有一个数据集，我注意到其中的 ID 带有 class化的信息。基本上，ID 的最后 2 位数字代表它们在同一家族中的子 ID（01、02、03 等）。下面是一个例子。我正在尝试获取另一列（第 2 列）来存储我们为同一家庭拥有多少子 ID 的信息。例如，22302 属于家族 223，它有 3 个成员：22301、22302 和 22303。这样我就有了 class化建模的新功能。不确定是否有更好的方法来提取信息。无论如何，有人可以让我知道如何提取相同 class 中的数字（如第 2 列所示）

ID 相同class

23401 1

22302 3

43201 1

144501 2

144502 2

22301 3

22303 3

Answer 1

您可以使用 str slice 和 transform

df['New']=df.groupby(df.ID.astype(str).str[:-2]).ID.transform('size')
df
Out[223]: 
       ID  Sameclass  New
0   23401          1    1
1   22302          3    3
2   43201          1    1
3  144501          2    2
4  144502          2    2
5   22301          3    3
6   22303          3    3

如何统计具有相同前缀的ID并将总数存储在另一列中

How to count the ID with the same prefix and store the total number in another column

classification

feature-extraction

tabular

pandas