删除单词和撇号之间多余的 space
Removing extra space between the word and apostrophe
我有一个字符串列表,其中包含动词的缩写。
我的列表是这样的:
["What 's your name?", "Isn 't it beautiful?",...]
我想删除单词和撇号之间的 space,因此新列表将是:
["What's your name?", "Isn't it beautiful?",...]
我用了replace()
但是这个列表包含 5500 个字符串并且里面有不同形式的缩写。下面的代码只是替换了一种形式的缩写。
s = s.replace("'s","is")
我应该怎么做才能去掉单词和撇号之间多余的space?
应该这样做:
l = ["What 's your name?", "Isn 't it beautiful"]
lNew = [i.replace(" '","'") for i in l]
这给出:
lNew = ["What's your name?", "Isn't it beautiful"]
您似乎对撇号和字符串使用了相同的符号,但我确定在您的程序中它们是不同的。
这有帮助吗?
您可以尝试用这种方式使用正则表达式。(这将帮助您减少更多的空格,但不会像您在评论中提到的 do n ot
s。)
import re
s = ["What 's your name?","Isn 't it beautiful?"]
s = [re.sub(r'\s+\'', "'", i) for i in s]
输出将是
>>> s
["What's your name?", "Isn't it beautiful?"]
(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])
你可以试试这个:
https://regex101.com/r/18GHqw/1
import re
regex = r"(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])"
test_str = ("'What 's your name?','Isn 't it beautiful?'\n\n"
"Jesus ' cross\"\n"
"do n't\"\n"
"sdsda sdsd' sdsd")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
注意:为了 Python 2.7 兼容性,使用 ur""
作为正则表达式的前缀,使用 u""
作为测试字符串和替换的前缀。
我有一个字符串列表,其中包含动词的缩写。 我的列表是这样的:
["What 's your name?", "Isn 't it beautiful?",...]
我想删除单词和撇号之间的 space,因此新列表将是:
["What's your name?", "Isn't it beautiful?",...]
我用了replace()
但是这个列表包含 5500 个字符串并且里面有不同形式的缩写。下面的代码只是替换了一种形式的缩写。
s = s.replace("'s","is")
我应该怎么做才能去掉单词和撇号之间多余的space?
应该这样做:
l = ["What 's your name?", "Isn 't it beautiful"]
lNew = [i.replace(" '","'") for i in l]
这给出:
lNew = ["What's your name?", "Isn't it beautiful"]
您似乎对撇号和字符串使用了相同的符号,但我确定在您的程序中它们是不同的。
这有帮助吗?
您可以尝试用这种方式使用正则表达式。(这将帮助您减少更多的空格,但不会像您在评论中提到的 do n ot
s。)
import re
s = ["What 's your name?","Isn 't it beautiful?"]
s = [re.sub(r'\s+\'', "'", i) for i in s]
输出将是
>>> s
["What's your name?", "Isn't it beautiful?"]
(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])
你可以试试这个:
https://regex101.com/r/18GHqw/1
import re
regex = r"(?<=[a-zA-Z])\s+(?=[a-z]*'\s*[a-z])"
test_str = ("'What 's your name?','Isn 't it beautiful?'\n\n"
"Jesus ' cross\"\n"
"do n't\"\n"
"sdsda sdsd' sdsd")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
注意:为了 Python 2.7 兼容性,使用 ur""
作为正则表达式的前缀,使用 u""
作为测试字符串和替换的前缀。