根据唯一值过滤列值,但对同一列的不同值不重复相同的唯一值
Filtering column value based on unique value, but not repeated for different value of the same column on the same unique value
寻找方法来过滤具有 inactive
状态的唯一值,但不会在相同的唯一值下重复为 active
状态。
df:
Unique_value Status
1 Active <- Has both active and inactive, must be inactive only
1 Active <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
3 Cancelled <- Has inactive only (cancelled okay to be filtered out)
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
期望的输出:
Unique_value status
2 Inactive
3 Inactive
到目前为止我已经尝试过了,但我认为这是不正确的。
p = ['Inactive', 'Active']
df.groupby('Unique_value')['Status'].apply(lambda x: (x =='Inactive') != set(p))
让我们试试
g=df[df.groupby('Unique_value')['Status'].transform(lambda x: ~(x.eq('Active').any()))]
g[g['Status'].eq('Inactive')].drop_duplicates()
首先检查每组中的 any
个值是否为 Active
或 Inactive
。然后去掉两个条件都为真的组:
m1 = df["Status"].eq("Active").groupby(df["Unique_value"]).transform("any")
m2 = df["Status"].eq("Inactive").groupby(df["Unique_value"]).transform("any")
df[~(m1 & m2)].groupby("Unique_value", as_index=False).first()
Unique_value Status
0 2 Inactive
1 3 Inactive
寻找方法来过滤具有 inactive
状态的唯一值,但不会在相同的唯一值下重复为 active
状态。
df:
Unique_value Status
1 Active <- Has both active and inactive, must be inactive only
1 Active <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
1 Inactive <- Has both active and inactive, must be inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
2 Inactive <- Has inactive only
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
3 Cancelled <- Has inactive only (cancelled okay to be filtered out)
3 Inactive <- Has inactive only (cancelled okay to be filtered out)
期望的输出:
Unique_value status
2 Inactive
3 Inactive
到目前为止我已经尝试过了,但我认为这是不正确的。
p = ['Inactive', 'Active']
df.groupby('Unique_value')['Status'].apply(lambda x: (x =='Inactive') != set(p))
让我们试试
g=df[df.groupby('Unique_value')['Status'].transform(lambda x: ~(x.eq('Active').any()))]
g[g['Status'].eq('Inactive')].drop_duplicates()
首先检查每组中的 any
个值是否为 Active
或 Inactive
。然后去掉两个条件都为真的组:
m1 = df["Status"].eq("Active").groupby(df["Unique_value"]).transform("any")
m2 = df["Status"].eq("Inactive").groupby(df["Unique_value"]).transform("any")
df[~(m1 & m2)].groupby("Unique_value", as_index=False).first()
Unique_value Status
0 2 Inactive
1 3 Inactive