使用 re.sub 删除双引号

Question

我正在尝试从 Python 中的文本文件中删除双引号。语句 print re.sub(r'"', '', line) 在解释器中有效，但当我在文件中使用它时却无效。为什么会这样？

直接来自口译员：

>>> 
>>> import re  
>>> str = "bill"  
>>> print re.sub(r'"', '', str)  
bill
>>>

来自我的 .py 文件：

def remove_quotes (filename):  
    with open(filename, 'rU') as file:  
        print re.sub(r'"', '', file.read())

输出：

“Bill”  
“pretty good”       bastante bien  

“friendship”        amistad  
 “teenager”     adolescent

好的，正如 col6y 所指出的，我正在处理花哨的 L/R 引号。试图摆脱它们：

>>> line  
'\xe2\x80\x9cBill\xe2\x80\x9d\n'  
text = line.replace(u'\xe2\x80\x9c', '')  
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

尝试了另一种字符编码：

text = line.replace(u"\u201c", '')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Answer 1

在您的解释器示例中，您说：

>>> 
>>> import re  
>>> str = "bill"  
>>> print re.sub(r'"', '', str)  
bill
>>>

然而，字符串 "bill" 不包含任何引号，因此这不会测试任何内容。如果您尝试 print str，您会发现它一开始就没有引号 - 这是因为引号 str 是一个字符串，因此不包括在内。（你不会总是想在你的字符串中使用引号。）如果你想包含引号，你可以说 "\"bill\"" 或 '"bill"'.

但是，这并不能解释您其他程序中的真正问题。要理解这一点，请注意 “、” 和 " 之间的区别。它们看起来很相似，但它们略有不同，并且与计算机肯定不同。在您的文件中，您有 “ 和 ”，但您正在替换 "。您还需要更换其他两个。

此外，正如@MikeT 指出的那样，使用 file.read().replace(...) 而不是 re.replace(..., file.read()) 会更容易。 re.replace 用于正则表达式，但您在这里不需要它们的功能。

您还应该注意 file.read() 只会读取文件的一部分，而不是整个文件。为此，请考虑使用 file.readlines() 并遍历这些行。

使用 re.sub 删除双引号

Using re.sub to remove double quotes

python

replace