如何过滤 pandas 数据框中每一行的元组列表?
How to filter a list of tuples for every row in a pandas dataframe?
您好,我正在尝试过滤第二个元素以 'V' 开头的元组列表,以清理我的数据框。
我有一个 pandas 数据帧调用 'df_my_string' 就像:
样本是:
verbs_tokens
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')]
[('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')]
我需要的是保留第二个值以“V”开头的每一行的元组
我尝试了很多方法,但我不知道如何:
#df_my_string['clean_verbs_tokens']=filter((lambda x: x[1].startswith('V')),df_my_string[['verbs_tokens']])
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = [tup for tup in df_my_string['verbs_tokens'] if str(tup[0][1])=='V']
#df_my_string['clean_verbs_tokens'] = [item for item in df_my_string['verbs_tokens'] if pd.Series(re.search('^V.*',item[0][1])).reset_index(drop=True).values]
预期输出:
verbs_tokens
[('was', 'VBD')]
[('is', 'VBZ')]
尝试:
df_my_string['clean_verbs_tokens'] = df_my_string["verbs_tokens"].apply(lambda x: [t for t in x if t[1].lower().startswith("v")])
>>> df_my_string['clean_verbs_tokens']
0 [(was, VBD)]
1 [(is, VBZ)]
Name: clean_verbs_tokens, dtype: object
# this is wrong because x is containing list of tuples
# so basically you are applying the condition only on
# the first tuple
df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
# try this
df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: [tup for tup in x if tup[1][0]=="V"])
这是一个解决方案:
df = pd.DataFrame(
{
'Tuples' : [ [('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')],
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')] ]
} )
定义一个函数来查找以任何字符开头的元组:
def find_char(tuples , char):
start_with_char = []
for tp in tuples:
if tp[1][:1] == char:
start_with_char.append(tp)
return start_with_char
在您的数据框上应用该函数:
df['Tuples'].apply(lambda row: find_char(row ,'V') )
结果:
0 [(is, VBZ)]
1 [(was, VBD)]
注意:此解决方案将为您提供包含字符
的元组列表
您好,我正在尝试过滤第二个元素以 'V' 开头的元组列表,以清理我的数据框。
我有一个 pandas 数据帧调用 'df_my_string' 就像:
样本是:
verbs_tokens
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')]
[('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')]
我需要的是保留第二个值以“V”开头的每一行的元组
我尝试了很多方法,但我不知道如何:
#df_my_string['clean_verbs_tokens']=filter((lambda x: x[1].startswith('V')),df_my_string[['verbs_tokens']])
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = [tup for tup in df_my_string['verbs_tokens'] if str(tup[0][1])=='V']
#df_my_string['clean_verbs_tokens'] = [item for item in df_my_string['verbs_tokens'] if pd.Series(re.search('^V.*',item[0][1])).reset_index(drop=True).values]
预期输出:
verbs_tokens
[('was', 'VBD')]
[('is', 'VBZ')]
尝试:
df_my_string['clean_verbs_tokens'] = df_my_string["verbs_tokens"].apply(lambda x: [t for t in x if t[1].lower().startswith("v")])
>>> df_my_string['clean_verbs_tokens']
0 [(was, VBD)]
1 [(is, VBZ)]
Name: clean_verbs_tokens, dtype: object
# this is wrong because x is containing list of tuples
# so basically you are applying the condition only on
# the first tuple
df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
# try this
df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: [tup for tup in x if tup[1][0]=="V"])
这是一个解决方案:
df = pd.DataFrame(
{
'Tuples' : [ [('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')],
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')] ]
} )
定义一个函数来查找以任何字符开头的元组:
def find_char(tuples , char):
start_with_char = []
for tp in tuples:
if tp[1][:1] == char:
start_with_char.append(tp)
return start_with_char
在您的数据框上应用该函数:
df['Tuples'].apply(lambda row: find_char(row ,'V') )
结果:
0 [(is, VBZ)]
1 [(was, VBD)]
注意:此解决方案将为您提供包含字符
的元组列表