如何比较和标记最冷的字符串行
How to compare and flag colsest string rows
我的目标是使用 transform()
或 apply()
函数检测基于 res
列的连续行是否相等。
我的数据框:
data = [[111, 123, "aa", 0],
[111, 124, "bb", 1],
[111, 125, "bb", 2],
[111, 126, "cc", 0],
[111, 127, "dd", 1],
[111, 128, "cc", 2],
[222, 133, "xx", 1],
[222, 134, "yy", 2],
[222, 135, "zz", 0],
[222, 136, "zz", 1],]
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"])
我在找什么:
111, 123, "aa", 0, 0
111, 124, "bb", 1, 1
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
我用过:
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x: 1 if x == x.shift(-1) else 0)
Return:
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*
IIUC,您可以尝试对每个组使用 series.duplicated
:
f = lambda x: (x.eq(x.shift()) | x.eq(x.shift(-1))).astype(int)
df['flag'] = df.groupby('uuid')['res'].transform(f)
print(df)
uuid foo_id res num flag
0 111 123 aa 0 0
1 111 124 bb 1 1
2 111 125 bb 2 1
3 111 126 cc 0 0
4 111 127 dd 1 0
5 111 128 ee 2 0
6 111 129 dd 3 0
7 222 133 xx 1 0
8 222 134 yy 2 0
9 222 135 zz 0 1
10 222 136 zz 1 1
我的目标是使用 transform()
或 apply()
函数检测基于 res
列的连续行是否相等。
我的数据框:
data = [[111, 123, "aa", 0],
[111, 124, "bb", 1],
[111, 125, "bb", 2],
[111, 126, "cc", 0],
[111, 127, "dd", 1],
[111, 128, "cc", 2],
[222, 133, "xx", 1],
[222, 134, "yy", 2],
[222, 135, "zz", 0],
[222, 136, "zz", 1],]
df = pd.DataFrame(data, columns = ["uuid", "foo_id", "res", "num"])
我在找什么:
111, 123, "aa", 0, 0
111, 124, "bb", 1, 1
111, 125, "bb", 2, 1
111, 126, "cc", 0, 0
111, 127, "dd", 1, 0
111, 128, "cc", 2, 0
222, 133, "xx", 1, 0
222, 134, "yy", 2, 0
222, 135, "zz", 0, 1
222, 136, "zz", 1, 1
我用过:
df['flag'] = df.groupby('uuid')['res'].tranform(lambda x: 1 if x == x.shift(-1) else 0)
Return:
*ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().*
IIUC,您可以尝试对每个组使用 series.duplicated
:
f = lambda x: (x.eq(x.shift()) | x.eq(x.shift(-1))).astype(int)
df['flag'] = df.groupby('uuid')['res'].transform(f)
print(df)
uuid foo_id res num flag
0 111 123 aa 0 0
1 111 124 bb 1 1
2 111 125 bb 2 1
3 111 126 cc 0 0
4 111 127 dd 1 0
5 111 128 ee 2 0
6 111 129 dd 3 0
7 222 133 xx 1 0
8 222 134 yy 2 0
9 222 135 zz 0 1
10 222 136 zz 1 1