替换 Python 中的非 ASCII 字符：例如，' vs. '

Question

我不想 you’ll 减少到 you ll（不是 youll）。这就是我正在做的事情：

>>> clean = "you'll"
>>> import string
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
you'll

>>> clean = "you’ll" 
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
youll

这是我试过的：

>>> clean = "you'll"
>>> clean =clean.replace('\'',' ')
>>> print clean
you ll
>>> clean = "you’ll"
>>> clean =clean.replace('’',' ')
>>> print clean
you ll

这很好用，但是当我把它放在我的脚本中时：

SyntaxError: Non-ASCII character '\xe2' in file sc.py on line 177, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

因此，我在脚本的最顶部添加了：

# -*- coding: utf-8 -*-

但是得到

clean =clean.replace('’',' ')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

我有点没思路了。

Answer 1

您可以使用 replace() 将撇号替换为 space，如下所示：

print "you'll".replace("'", " ")

打印 you ll

Answer 2

这可能不是最佳答案，但一个简单的解决方案是只处理异常：

clean2 = ""
for ch in clean:
    try:
        clean2 += " " if ch == "'" else clean2 += ch
    except UnicodeDecodeError:
        clean2 += 'vs.'

Answer 3

你需要decode字符串

# -*- coding: utf-8 -*- 
clean = "you’ll".decode('utf-8')
clean = clean.replace('’'.decode('utf-8'),' ')
print clean

这个prints

you ll

符合预期

替换 Python 中的非 ASCII 字符：例如，' vs. '

Replace non-ASCII character in Python: eg, ' vs. ’

python

unicode

python-2.7