从列中的字符串中删除特殊字符和子字符串

Question

我对这一切还很陌生。我正在使用 Python 和 pandas 库来处理看起来像这样的大型数据集，例如：

           date                              text
0   Jul 31 2020       "test sentence numerouno"
1   Jul 31 2020       (second sentence) unonumero
2   Jul 31 2020       testuno sentence!!!

现在我正在寻找一个 function/loop 来删除一组已定义的子字符串以及特殊字符。

所以具体来说，坚持这个例子，我想删除所有特殊字符 " ( ) ! 以及子字符串 uno 来自 'text'.

列

所以输出应该是这样的：

           date                         text
0   Jul 31 2020       test sentence numero
1   Jul 31 2020       second sentence numero
2   Jul 31 2020       test sentence

感谢您的帮助！ <3

Answer 1

您可以按照以下模式使用 str.replace：

df['text'] = df['text'].str.replace(r'[^ A-Za-z]+|uno','')

print(df.text)
0      test sentence numero
1    second sentence numero
2             test sentence
Name: text, dtype: object

见demo

从列中的字符串中删除特殊字符和子字符串

Remove special characters and substrings from strings in column

python

substring

pandas