使用 Python 函数检查文本中的特殊字符

Question

因此，为了将文本文件转换为特征数据框，我正在编写一个能够做到这一点的自定义函数。现在我希望函数能够在文本输入中找到 question/exclamation 标记，然后将其转换为 df.column 中的值。我的函数部分如下所示：

discount = ['[%]','[€]','[$]','[£]','korting','deal','discount','reduct','remise','voucher', 
            'descuento', 'rebaja', 'скидка', 'sconto','rabat','alennus','kedvezmény',
            '할인','折扣','ディスカウント','diskon']
data = [text_input.split()]

for word in data:
    if any(char in discount for char in word):
        df['discount'] = 1
    else:
        df['discount'] = 0
for word in data:
    if any(char == '!' for char in word):
        df['exclamation'] = 1
    else:
        df['exclamation'] = 0
for word in data:
    if any(char == '?' for char in word):
        df['question'] = 1
    else:
        df['question'] = 0

问题是，例如，如果文本输入包含：'discount!' 它无法识别“!”或单词 'discount'，在两个指定的列中产生 0。现在，如果我删除 '!'从 'discount' 它可以识别它们。

因此我想知道我需要如何拆分我的 text_input 以确保它去掉 '!'从的话。或者是否有更有效的方法来查找这些字符？

提前致谢！

Answer 1

例如，您可以使用正则表达式在 space 或 '!' 处拆分 text_input。在正则表达式中添加额外的特殊字符也很容易。

import re
data = re.split('[! ]', text_input)

Answer 2

设法解决了。这是我更新后的有效代码：

data_str = [re.split('[*?*! ]', text_input)]
data_chr = [re.findall('[^A-Za-z0-9]', text_input)]

for word in data_str:
    if any(phrase in word for phrase in discount):
        df['discount'] = 1
    else:
        df['discount'] = 0
for word in data_chr:
    if '!' in word:
        df['exclamation'] = 1
    else:
        df['exclamation'] = 0
for word in data_chr:
    if '?' in word:
        df['question'] = 1
    else:
        df['question'] = 0

使用 Python 函数检查文本中的特殊字符

Check for special character in text with Python function

python

if-statement

special-characters

dataframe