SMS 语言文本扩展器 - Pandas

Question

目的是用扩展替换文本中的短信。我通过比较 pandas 中存储的列值并在 python 中读取它作为 xlsx.

来实现这一点

word    expansion
fyi     for your information
gtg     got to go
brb     be right back
gtg2    got to go too
fyii    sample test

目前的努力：

礼貌：

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

输出：

for your information got to go got to go2 for your informationi really

预期输出：

 for your information got to go got to go too sample text really

如何逐字逐句检查？

Answer 1

我不知道它是否完全符合你的要求，但你可以尝试将单词边界（\b）放在你的模式中每个单词的末尾，以便考虑整个单词：

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

输出：

for your information got to go got to go too sample test really

SMS 语言文本扩展器 - Pandas

SMS language Text expander - Pandas

python

text

nltk

str-replace

pandas