如何比较和标记最冷的字符串行

How to compare and flag colsest string rows

我的目标是使用 transform()apply() 函数检测基于 res 列的连续行是否相等。

我的数据框:

data = [[111, 123, "aa", 0], 
        [111, 124, "bb", 1], 
        [111, 125, "bb", 2],
        [111, 126, "cc", 0],
        [111, 127, "dd", 1],
        [111, 128, "cc", 2],
        [222, 133, "xx", 1],
        [222, 134, "yy", 2],
        [222, 135, "zz", 0], 
        [222, 136, "zz", 1],] 
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"]) 

我在找什么:

111, 123, "aa", 0, 0 
111, 124, "bb", 1, 1 
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1

我用过:

df['flag'] = df.groupby('uuid')['res'].tranform(lambda x:  1 if x == x.shift(-1) else 0)

Return:

*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*

IIUC,您可以尝试对每个组使用 series.duplicated

f = lambda x: (x.eq(x.shift()) | x.eq(x.shift(-1))).astype(int)
df['flag'] = df.groupby('uuid')['res'].transform(f)

print(df)

    uuid  foo_id res  num  flag
0    111     123  aa    0     0
1    111     124  bb    1     1
2    111     125  bb    2     1
3    111     126  cc    0     0
4    111     127  dd    1     0
5    111     128  ee    2     0
6    111     129  dd    3     0
7    222     133  xx    1     0
8    222     134  yy    2     0
9    222     135  zz    0     1
10   222     136  zz    1     1