Python str.maketrans 删除带空的标点符号 Space
Python str.maketrans Remove Punctuation with Empty Space
我正在使用 string 模块 in Python 3 中的 maketrans 来进行简单的文本预处理,例如降低、删除数字和标点符号。问题是在标点符号删除过程中,所有单词都附加在一起,没有空 space!例如,假设我有以下文本:
text='[{"Hello":"List:","Test"321:[{"Hello":"Airplane Towel for Kitchen"},{"Hello":2 " Repair massive utilities "2},{"Hello":"Some 3 appliance for our kitchen"2}'
文本=text.lower()
text=text.translate(str.maketrans(' ',' ',string.digits))
工作正常,它给出:
'[{"hello":"list:","test":[{"hello":"airplane towel for kitchen"},{"hello": " repair massives utilities "},{"hello":"some appliance for our kitchen"}'
但是一旦我想删除标点符号:
text=text.translate(str.maketrans(' ',' ',string.punctuation))
它给了我这个:
'hellolisttesthelloairplane towel for kitchenhello nbsprepair massives utilitiesnbsphellosome appliance for our kitchen'
理想情况下它应该产生:
'hello list test hello airplane towel for kitchen hello nbsp repair massives utilities nbsp hello some appliance for our kitchen'
我使用 maketrans 做这件事没有具体原因,但我喜欢它,因为它快速、简单,而且有点卡住解决它。谢谢!
免责声明:我已经知道如何使用 re 来实现,如下所示:
import re
s = "string.]With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
好吧...这行得通
txt = text.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).replace(' '*4, ' ').replace(' '*3, ' ').replace(' '*2, ' ').strip()
我正在使用 string 模块 in Python 3 中的 maketrans 来进行简单的文本预处理,例如降低、删除数字和标点符号。问题是在标点符号删除过程中,所有单词都附加在一起,没有空 space!例如,假设我有以下文本:
text='[{"Hello":"List:","Test"321:[{"Hello":"Airplane Towel for Kitchen"},{"Hello":2 " Repair massive utilities "2},{"Hello":"Some 3 appliance for our kitchen"2}'
文本=text.lower() text=text.translate(str.maketrans(' ',' ',string.digits))
工作正常,它给出:
'[{"hello":"list:","test":[{"hello":"airplane towel for kitchen"},{"hello": " repair massives utilities "},{"hello":"some appliance for our kitchen"}'
但是一旦我想删除标点符号:
text=text.translate(str.maketrans(' ',' ',string.punctuation))
它给了我这个:
'hellolisttesthelloairplane towel for kitchenhello nbsprepair massives utilitiesnbsphellosome appliance for our kitchen'
理想情况下它应该产生:
'hello list test hello airplane towel for kitchen hello nbsp repair massives utilities nbsp hello some appliance for our kitchen'
我使用 maketrans 做这件事没有具体原因,但我喜欢它,因为它快速、简单,而且有点卡住解决它。谢谢!
免责声明:我已经知道如何使用 re 来实现,如下所示:
import re
s = "string.]With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
好吧...这行得通
txt = text.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).replace(' '*4, ' ').replace(' '*3, ' ').replace(' '*2, ' ').strip()