匹配引用的 csv 中未转义的引号

Question

我查看了几篇标题相似的 Stack Overflow 帖子，none 已接受的答案对我有用。

我有一个 CSV 文件，其中每个 "cell" 数据都用逗号分隔并被引用（包括数字）。每行以换行符结尾。

有些文本"cells"里面有引号，我想用正则表达式找到这些，这样我就可以正确地转义它们。

示例行：

"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n

我只想匹配 E 60" 和 AD"8 中的 "，而不匹配任何其他 " ].

我可以用来执行此操作的（最好是 Python-friendly）正则表达式是什么？

Answer 1

编辑：使用来自@sundance 的正则表达式更新以避免行首和换行符。

您可以尝试只替换不在逗号、行首或换行符旁边的引号：

import re

newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)

Answer 2

这里没有使用正则表达式，而是使用 Python 的字符串函数来查找和转义字符串左右引号之间的引号。

它使用字符串的.find()和.rfind()方法来查找周围的"字符。然后它对出现在外引号内的任何其他 " 字符进行替换。这样做不会假设周围的引号在 , 分隔符之间的位置，因此它将保留任何周围的空白不变（例如，它在每行的末尾留下 '\n' 作为-是）。

def escape_internal_quotes(item):
    left = item.find('"') + 1
    right = item.rfind('"')
    if left < right:
        # only do the substitution if two surrounding quotes are found
        item = item[:left] + item[left:right].replace('"', '\"') + item[right:]
    return item

line = '"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n'
escaped = [escape_internal_quotes(item) for item in line.split(',')]
print(repr(','.join(escaped)))

导致：

'"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60\"","AD\"8"\n'

匹配引用的 csv 中未转义的引号

Match unescaped quotes in quoted csv

python

regex

csv

regex-lookarounds