从字符串中删除反斜杠
removing an backslash from a string
我有一个字符串,是一个像I don't want it, there'll be others
这样的句子
所以文字看起来像这样I don\'t want it, there\'ll be other
出于某种原因,\
与 '
旁边的文本一起出现。它是从另一个来源读取的。我想删除它,但不能。我试过了。
sentence.replace("\'","'")
sentence.replace(r"\'","'")
sentence.replace("\","")
sentence.replace(r"\","")
sentence.replace(r"\\","")
我知道 \
是为了逃避某些东西,所以不知道如何用引号来做到这一点
\
正好对应 escape '
字符。它仅在字符串的表示形式 (repr
) 中可见,实际上并不是字符串中的字符。请看下面的演示
>>> repr("I don't want it, there'll be others")
'"I don\'t want it, there\'ll be others"'
>>> print("I don't want it, there'll be others")
I don't want it, there'll be others
尝试使用:
sentence.replace("\", "")
您需要两个反斜杠,因为第一个用作转义符号,第二个是您需要替换的符号。
最好使用正则表达式去除反斜杠:
>>> re.sub(u"u[=10=]5c'", r"'", "I don\'t want it, there\'ll be other")
"I don't want it, there'll be other"
如果您的文本来自抓取的文本,并且您在使用 NLP 工具处理它之前没有通过反转义来清理它,那么您可以轻松地反转义 HTML 标记,例如:
在python2.x
中:
>>> import sys; sys.version
'2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'
>>> import HTMLParser
>>> txt = """I don\'t want it, there\'ll be other"""
>>> HTMLParser.HTMLParser().unescape(txt)
"I don't want it, there'll be other"
在python3
中:
>>> import sys; sys.version
'3.4.0 (default, Jun 19 2015, 14:20:21) \n[GCC 4.8.2]'
>>> import html
>>> txt = """I don\'t want it, there\'ll be other"""
>>> html.unescape(txt)
"I don't want it, there'll be other"
另请参阅:How do I unescape HTML entities in a string in Python 3.1?
我有一个字符串,是一个像I don't want it, there'll be others
所以文字看起来像这样I don\'t want it, there\'ll be other
出于某种原因,\
与 '
旁边的文本一起出现。它是从另一个来源读取的。我想删除它,但不能。我试过了。
sentence.replace("\'","'")
sentence.replace(r"\'","'")
sentence.replace("\","")
sentence.replace(r"\","")
sentence.replace(r"\\","")
我知道 \
是为了逃避某些东西,所以不知道如何用引号来做到这一点
\
正好对应 escape '
字符。它仅在字符串的表示形式 (repr
) 中可见,实际上并不是字符串中的字符。请看下面的演示
>>> repr("I don't want it, there'll be others")
'"I don\'t want it, there\'ll be others"'
>>> print("I don't want it, there'll be others")
I don't want it, there'll be others
尝试使用:
sentence.replace("\", "")
您需要两个反斜杠,因为第一个用作转义符号,第二个是您需要替换的符号。
最好使用正则表达式去除反斜杠:
>>> re.sub(u"u[=10=]5c'", r"'", "I don\'t want it, there\'ll be other")
"I don't want it, there'll be other"
如果您的文本来自抓取的文本,并且您在使用 NLP 工具处理它之前没有通过反转义来清理它,那么您可以轻松地反转义 HTML 标记,例如:
在python2.x
中:
>>> import sys; sys.version
'2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'
>>> import HTMLParser
>>> txt = """I don\'t want it, there\'ll be other"""
>>> HTMLParser.HTMLParser().unescape(txt)
"I don't want it, there'll be other"
在python3
中:
>>> import sys; sys.version
'3.4.0 (default, Jun 19 2015, 14:20:21) \n[GCC 4.8.2]'
>>> import html
>>> txt = """I don\'t want it, there\'ll be other"""
>>> html.unescape(txt)
"I don't want it, there'll be other"
另请参阅:How do I unescape HTML entities in a string in Python 3.1?