删除单词和撇号之间多余的 space

Question

我有一个字符串列表，其中包含动词的缩写。我的列表是这样的：

["What 's your name?", "Isn 't it beautiful?",...]

我想删除单词和撇号之间的 space，因此新列表将是：

["What's your name?", "Isn't it beautiful?",...]

我用了replace()但是这个列表包含 5500 个字符串并且里面有不同形式的缩写。下面的代码只是替换了一种形式的缩写。

s = s.replace("'s","is")

我应该怎么做才能去掉单词和撇号之间多余的space？

Answer 1

应该这样做：

l = ["What 's your name?", "Isn 't it beautiful"]
lNew = [i.replace(" '","'") for i in l]

这给出：

lNew = ["What's your name?", "Isn't it beautiful"]

您似乎对撇号和字符串使用了相同的符号，但我确定在您的程序中它们是不同的。

这有帮助吗？

Answer 2

您可以尝试用这种方式使用正则表达式。（这将帮助您减少更多的空格，但不会像您在评论中提到的 do n ots。）

import re s = ["What 's your name?","Isn 't it beautiful?"] s = [re.sub(r'\s+\'', "'", i) for i in s]

输出将是 >>> s ["What's your name?", "Isn't it beautiful?"]

Answer 3

(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])

你可以试试这个：

https://regex101.com/r/18GHqw/1

import re

regex = r"(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])"

test_str = ("'What 's your name?','Isn 't it beautiful?'\n\n"
"Jesus ' cross\"\n"
"do n't\"\n"
"sdsda   sdsd'  sdsd")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

注意：为了 Python 2.7 兼容性，使用 ur"" 作为正则表达式的前缀，使用 u"" 作为测试字符串和替换的前缀。

删除单词和撇号之间多余的 space

Removing extra space between the word and apostrophe

python

string

text-mining