替换 Python 中的非 ASCII 字符:例如,' vs. '
Replace non-ASCII character in Python: eg, ' vs. ’
我不想 you’ll
减少到 you ll
(不是 youll
)。这就是我正在做的事情:
>>> clean = "you'll"
>>> import string
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
you'll
>>> clean = "you’ll"
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
youll
这是我试过的:
>>> clean = "you'll"
>>> clean =clean.replace('\'',' ')
>>> print clean
you ll
>>> clean = "you’ll"
>>> clean =clean.replace('’',' ')
>>> print clean
you ll
这很好用,但是当我把它放在我的脚本中时:
SyntaxError: Non-ASCII character '\xe2' in file sc.py on line 177, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
因此,我在脚本的最顶部添加了:
# -*- coding: utf-8 -*-
但是得到
clean =clean.replace('’',' ')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
我有点没思路了。
您可以使用 replace()
将撇号替换为 space,如下所示:
print "you'll".replace("'", " ")
打印 you ll
这可能不是最佳答案,但一个简单的解决方案是只处理异常:
clean2 = ""
for ch in clean:
try:
clean2 += " " if ch == "'" else clean2 += ch
except UnicodeDecodeError:
clean2 += 'vs.'
你需要decode
字符串
# -*- coding: utf-8 -*-
clean = "you’ll".decode('utf-8')
clean = clean.replace('’'.decode('utf-8'),' ')
print clean
这个print
s
you ll
符合预期
我不想 you’ll
减少到 you ll
(不是 youll
)。这就是我正在做的事情:
>>> clean = "you'll"
>>> import string
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
you'll
>>> clean = "you’ll"
>>> clean = filter(lambda x: x in string.printable, clean)
>>> print clean
youll
这是我试过的:
>>> clean = "you'll"
>>> clean =clean.replace('\'',' ')
>>> print clean
you ll
>>> clean = "you’ll"
>>> clean =clean.replace('’',' ')
>>> print clean
you ll
这很好用,但是当我把它放在我的脚本中时:
SyntaxError: Non-ASCII character '\xe2' in file sc.py on line 177, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
因此,我在脚本的最顶部添加了:
# -*- coding: utf-8 -*-
但是得到
clean =clean.replace('’',' ')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
我有点没思路了。
您可以使用 replace()
将撇号替换为 space,如下所示:
print "you'll".replace("'", " ")
打印 you ll
这可能不是最佳答案,但一个简单的解决方案是只处理异常:
clean2 = ""
for ch in clean:
try:
clean2 += " " if ch == "'" else clean2 += ch
except UnicodeDecodeError:
clean2 += 'vs.'
你需要decode
字符串
# -*- coding: utf-8 -*-
clean = "you’ll".decode('utf-8')
clean = clean.replace('’'.decode('utf-8'),' ')
print clean
这个print
s
you ll
符合预期