SMS 语言文本扩展器 - Pandas
SMS language Text expander - Pandas
目的是用扩展替换文本中的短信。我通过比较 pandas 中存储的列值并在 python 中读取它作为 xlsx.
来实现这一点
word expansion
fyi for your information
gtg got to go
brb be right back
gtg2 got to go too
fyii sample test
目前的努力:
礼貌:
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go2 for your informationi really
预期输出:
for your information got to go got to go too sample text really
如何逐字逐句检查?
我不知道它是否完全符合你的要求,但你可以尝试将单词边界(\b)放在你的模式中每个单词的末尾,以便考虑整个单词:
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go too sample test really
目的是用扩展替换文本中的短信。我通过比较 pandas 中存储的列值并在 python 中读取它作为 xlsx.
来实现这一点word expansion
fyi for your information
gtg got to go
brb be right back
gtg2 got to go too
fyii sample test
目前的努力:
礼貌:
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go2 for your informationi really
预期输出:
for your information got to go got to go too sample text really
如何逐字逐句检查?
我不知道它是否完全符合你的要求,但你可以尝试将单词边界(\b)放在你的模式中每个单词的末尾,以便考虑整个单词:
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go too sample test really