通过列表过滤一列字符串而不做完全匹配
filtering a column of string by list without doing exact match
我有一个如下所示的 pandas 数据框:-
Tweets
0 RT @cizzorz: THE CHILLER TRAP *TEMPLE RUN* OBS...
1 Disco Domination receives a change in order to...
2 It's time for the Week 3 #FallSkirmish Trials!...
3 Dance your way to victory in the new Disco Dom...
4 Patch v6.02 is available now with a return fro...
5 Downtime for patch v6.02 has begun. Find out a...
6 ⛏️... soon
7 Launch into patch v6.02 Wednesday, October 10!...
8 Righteous Fury.\n\nThe Wukong and Dark Vanguar...
9 RT @wbgames: WB Games is happy to bring @Fortn...
我还有一个列表假设如下:-
my_list = ['Launch', 'Dance', 'Issue']
使用下面的命令过滤掉数据框:-
ndata = data[data['Tweets'].str.contains( "|".join(my_list), regex=True)].reset_index(drop=True)
如果我有
,过滤器将无法正常工作
Working Not Working
Launch 'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'
预期的输出应该是下面任何一个词的句子
'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'
您需要确保 contains
忽略大小写:
import re
.
.
.
ndata = data[data['Tweets'].str.contains("|".join(my_list), regex=True,
flags=re.IGNORECASE)].reset_index(drop=True)
# ^^^^^^^^^^^^^^^^^^^
我有一个如下所示的 pandas 数据框:-
Tweets
0 RT @cizzorz: THE CHILLER TRAP *TEMPLE RUN* OBS...
1 Disco Domination receives a change in order to...
2 It's time for the Week 3 #FallSkirmish Trials!...
3 Dance your way to victory in the new Disco Dom...
4 Patch v6.02 is available now with a return fro...
5 Downtime for patch v6.02 has begun. Find out a...
6 ⛏️... soon
7 Launch into patch v6.02 Wednesday, October 10!...
8 Righteous Fury.\n\nThe Wukong and Dark Vanguar...
9 RT @wbgames: WB Games is happy to bring @Fortn...
我还有一个列表假设如下:-
my_list = ['Launch', 'Dance', 'Issue']
使用下面的命令过滤掉数据框:-
ndata = data[data['Tweets'].str.contains( "|".join(my_list), regex=True)].reset_index(drop=True)
如果我有
,过滤器将无法正常工作 Working Not Working
Launch 'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'
预期的输出应该是下面任何一个词的句子
'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'
您需要确保 contains
忽略大小写:
import re
.
.
.
ndata = data[data['Tweets'].str.contains("|".join(my_list), regex=True,
flags=re.IGNORECASE)].reset_index(drop=True)
# ^^^^^^^^^^^^^^^^^^^