尽管使用正则表达式，但无法删除字符串中带重音的特殊字符

Question

我有以下代码

import re
oldstr="HRÂ Director,Â LearningÂ"
newstr = re.sub(r"[-()\"#/@;:<>{}`+=&~|.!?,^]", " ", oldstr)
print(newstr)

以上代码无效。

当前结果 "HRÂ Director,Â LearningÂ"

预期结果 "HR Director, Learning"

如何实现？

Answer 1

正在将我的评论转换为答案，以便未来的访问者可以轻松找到解决方案。

您可以使用：

import re
oldstr="HRÂ Director,Â LearningÂ"
newstr = re.sub(r'[^\x00-\x7f]+|[-()"#/@;:<>{}`+=&~|.!?,^]+', "", oldstr)
print(newstr)

输出：

HR Director Learning

[^\x00-\x7f] 将匹配所有非 ASCII 字符。

Answer 2

你也可以用这个方法：

def _removeNonAscii(s): 
    return "".join(i for i in s if ord(i)<128)

这是我的一段代码的输出方式：

s = "HRÂ Director,Â LearningÂ"
def _removeNonAscii(s): 
    return "".join(i for i in s if ord(i)<128)

print(_removeNonAscii(s))

输出：

人力资源总监，学习

Unable to remove accented special characters in a string despite using regex