lambda/groupby 个条件创建 yes/no 列 - python、pandas

Question

我有一个可以用 lambda/groupby 过滤的 df，但是，我想添加一个新列来指示它是否满足条件，而不是过滤器。但是，在使用应用时出现错误。

请注意，如果 hf >= 5 "all" 被过滤，因此，对于新列，它应该指示 'no' 如果组中的 1 个或多个值 >=5

用于过滤的原始代码：

df=df.groupby('id').filter(lambda x: ((x.hr >= 5)).all())

data = {
    "id": [11111,11111,11111,11111,
           333,333,333,333,333,333,
            5678,5678,5678,5678,5678,],
    "hr": [4,2,5,4,5,7,6,8,5,6,7,8,6,2,4,],
    "new_col": ['no','no','no','no','yes','yes',
                'yes','yes','yes','yes','no','no','no','no','no',]}
df = pd.DataFrame(data)

原版table:

╔═══════╦════╗
║  id   ║ hr ║
╠═══════╬════╣
║ 11111 ║  4 ║
║ 11111 ║  2 ║
║ 11111 ║  5 ║
║ 11111 ║  4 ║
║   333 ║  5 ║
║   333 ║  7 ║
║   333 ║  6 ║
║   333 ║  8 ║
║   333 ║  5 ║
║   333 ║  6 ║
║  5678 ║  7 ║
║  5678 ║  8 ║
║  5678 ║  6 ║
║  5678 ║  2 ║
║  5678 ║  4 ║
╚═══════╩════╝

尝试获得的结果

╔═══════╦════╦═════════╗
║  id   ║ hr ║ new_col ║
╠═══════╬════╬═════════╣
║ 11111 ║  4 ║ no      ║
║ 11111 ║  2 ║ no      ║
║ 11111 ║  5 ║ no      ║
║ 11111 ║  4 ║ no      ║
║   333 ║  5 ║ yes     ║
║   333 ║  7 ║ yes     ║
║   333 ║  6 ║ yes     ║
║   333 ║  8 ║ yes     ║
║   333 ║  5 ║ yes     ║
║   333 ║  6 ║ yes     ║
║  5678 ║  7 ║ no      ║
║  5678 ║  8 ║ no      ║
║  5678 ║  6 ║ no      ║
║  5678 ║  2 ║ no      ║
║  5678 ║  4 ║ no      ║
╚═══════╩════╩═════════╝

请指教。谢谢

Answer 1

您可以在 lambda 计算条件的地方使用 groupby_transform，然后使用 np.where 分配“是”、“否”值：

df['new_col'] = np.where(df.groupby('id')['hr'].transform(lambda x: (x>=5).all()), 'yes', 'no')

输出：

       id  hr new_col
0   11111   4      no
1   11111   2      no
2   11111   5      no
3   11111   4      no
4     333   5     yes
5     333   7     yes
6     333   6     yes
7     333   8     yes
8     333   5     yes
9     333   6     yes
10   5678   7      no
11   5678   8      no
12   5678   6      no
13   5678   2      no
14   5678   4      no

lambda/groupby 个条件创建 yes/no 列 - python、pandas

lambda/groupby conditions creating yes/no column - python, pandas

python

database

lambda

pandas