这些字符是否具有某种映射功能？ “[1]”、“[2]”、“[3]”、...、“[n]”

Question

我正在使用这行代码

df_mask = ~df[new_col_titles[:1]].apply(lambda x : x.str.contains('|'.join(filter_list), flags=re.IGNORECASE)).any(1)

为我的 df 创建一个面具。过滤器列表是

filter_list = ["[1]", "[2]", "[3]", "[4]", "[5]", "[6]", "[7]", "[8]","[9]",..."[n]"]

但我得到了奇怪的结果我希望它只会过滤 df 的第 0 列中包含 [1]...[n] 的行。但它不会过滤没有的行这些元素。虽然有一些模式。它将过滤掉带有“字符”的数字的行，我的意思是 £55, 2010), 55*, 55 *

谁能解释一下这是怎么回事，是否有解决方法？

Answer 1

如果要精确匹配筛选列表中的项目，请使用re.escape()转义特殊字符。 [1] 是一个正则表达式，只匹配数字 1，而不匹配字符串 [1].

df_mask = ~df[new_col_titles[:1]].apply(lambda x : x.str.contains('|'.join(re.escape(f) for f in filter_list), flags=re.IGNORECASE)).any(1)

见Reference - What does this regex mean?

Do these characters have some sort of mapping function? "[1]", "[2]", "[3]",...,"[n]"