pandas: return 一个列表列中的值基于另一个列表列中的条件
pandas: return a value from a list column based on a condition in another list column
我有一个类似于以下列表列的大型数据框,但行和列更多:
import pandas as pd
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': [['adj','noun'],['adj','noun'],['adj','noun','verb'],['adj','noun','verb']]}
df = pd.DataFrame (data, columns = ['First','Second'])
我想 return 第一列中的值,如果它等于第二列中的条件。所以我喜欢的是第三列,如果第二列中的值等于 'adj'.
,则如下所示
所需的第三列:
third column:
first
second
third
fourth
由于我的数据集很大,我至少尝试过过滤包含值 'adj' 的行的数据集,但不知道如何继续:
df[['First','Second']][df['Second'].map(set(['adj']).issubset)]
如果每个列表中始终存在 adj
,则通过 .index
和 select 从第二个列表中获取索引:
df['new'] = [a[b.index('adj')] for a, b in df[['First','Second']].to_numpy()]
如果不存在则更通用adj
:
df['new'] = [a[b.index('adj')] if 'adj' in b else None
for a, b in df[['First','Second']].to_numpy()]
替代 apply
:
f = lambda x: x['First'][x['Second'].index('adj')] if 'adj' in x['Second'] else None
df['new'] = df.apply(f, axis=1)
print (df)
First Second new
0 [First, value] [adj, noun] First
1 [second, value] [adj, noun] second
2 [third, value, is] [adj, noun, verb] third
3 [fourth, value, is] [adj, noun, verb] fourth
我有一个类似于以下列表列的大型数据框,但行和列更多:
import pandas as pd
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': [['adj','noun'],['adj','noun'],['adj','noun','verb'],['adj','noun','verb']]}
df = pd.DataFrame (data, columns = ['First','Second'])
我想 return 第一列中的值,如果它等于第二列中的条件。所以我喜欢的是第三列,如果第二列中的值等于 'adj'.
,则如下所示所需的第三列:
third column:
first
second
third
fourth
由于我的数据集很大,我至少尝试过过滤包含值 'adj' 的行的数据集,但不知道如何继续:
df[['First','Second']][df['Second'].map(set(['adj']).issubset)]
如果每个列表中始终存在 adj
,则通过 .index
和 select 从第二个列表中获取索引:
df['new'] = [a[b.index('adj')] for a, b in df[['First','Second']].to_numpy()]
如果不存在则更通用adj
:
df['new'] = [a[b.index('adj')] if 'adj' in b else None
for a, b in df[['First','Second']].to_numpy()]
替代 apply
:
f = lambda x: x['First'][x['Second'].index('adj')] if 'adj' in x['Second'] else None
df['new'] = df.apply(f, axis=1)
print (df)
First Second new
0 [First, value] [adj, noun] First
1 [second, value] [adj, noun] second
2 [third, value, is] [adj, noun, verb] third
3 [fourth, value, is] [adj, noun, verb] fourth