创建具有多个选择条件的列以包含字符串值
Create column with multiple selection criteria for containing string values
我有一个大数据框,如果满足条件,我正在尝试创建一个 returns total_amount 值的列。
如果第一列包含列表 val1 中的任何值并且
如果任何一列(第二、第三、第四)包含列表 targets 和 targets2
中的任何值
first second third fourth total_amount
Top;Tier None FIT,Special Rising 5000
Internal None None Black 6000
None Existing None Pink 800
def func(row):
val1 = ['primary','Internal', 'found','Led', 'Yes - found']
targets = ['Top', 'Special', 'FIT', 'Global', 'Silver', 'Gold']
targets2= ['Top','Gold','Beginner','Rising','Global','Excluded']
if row['first'].str.contains('|'.join(val1)) and \
( row['second'].str.contains('|'.join(targets)) or row['third'].str.contains('|'.join(targets)) or row['fourth'].str.contains('|'.join(targets2)) ):
return row['total_amount']
else:
return 0
df['verified_amount']= df.apply(func, axis=1)
预期输出:
first second third fourth total_amount verified_amt
Top;Tier None FIT,Special Rising 5000 5000
Internal None None Black 6000 6000
None Existing None Pink 800 0
您可以单独创建条件,也可以通过 Series.str.cat
and set new column by chained conditions with |
for bitwise OR
or &
for bitwise AND
with numpy.where
:
加入 second
和 third
列
val1 = ['primary','Internal', 'found','Led', 'Yes - found']
targets = ['Top', 'Special', 'FIT', 'Global', 'Silver', 'Gold']
targets2= ['Top','Gold','Beginner','Rising','Global','Excluded']
m1 = df['first'].str.contains('|'.join(val1))
m2 = df['second'].str.cat(df['third'], na_rep='').str.contains('|'.join(targets))
m3 = df['fourth'].str.contains('|'.join(targets2))
df['verified_amount'] = np.where(m1 | m2 | m3, df['total_amount'], 0)
#if need AND with OR - but different output from sample data
#df['verified_amount'] = np.where(m1 & (m2 | m3), df['total_amount'], 0)
print (df)
first second third fourth total_amount verified_amount
0 Top;Tier None FIT,Special Rising 5000 5000
1 Internal None None Black 6000 6000
2 None Existing None Pink 800 0
我有一个大数据框,如果满足条件,我正在尝试创建一个 returns total_amount 值的列。 如果第一列包含列表 val1 中的任何值并且 如果任何一列(第二、第三、第四)包含列表 targets 和 targets2
中的任何值first second third fourth total_amount
Top;Tier None FIT,Special Rising 5000
Internal None None Black 6000
None Existing None Pink 800
def func(row):
val1 = ['primary','Internal', 'found','Led', 'Yes - found']
targets = ['Top', 'Special', 'FIT', 'Global', 'Silver', 'Gold']
targets2= ['Top','Gold','Beginner','Rising','Global','Excluded']
if row['first'].str.contains('|'.join(val1)) and \
( row['second'].str.contains('|'.join(targets)) or row['third'].str.contains('|'.join(targets)) or row['fourth'].str.contains('|'.join(targets2)) ):
return row['total_amount']
else:
return 0
df['verified_amount']= df.apply(func, axis=1)
预期输出:
first second third fourth total_amount verified_amt
Top;Tier None FIT,Special Rising 5000 5000
Internal None None Black 6000 6000
None Existing None Pink 800 0
您可以单独创建条件,也可以通过 Series.str.cat
and set new column by chained conditions with |
for bitwise OR
or &
for bitwise AND
with numpy.where
:
second
和 third
列
val1 = ['primary','Internal', 'found','Led', 'Yes - found']
targets = ['Top', 'Special', 'FIT', 'Global', 'Silver', 'Gold']
targets2= ['Top','Gold','Beginner','Rising','Global','Excluded']
m1 = df['first'].str.contains('|'.join(val1))
m2 = df['second'].str.cat(df['third'], na_rep='').str.contains('|'.join(targets))
m3 = df['fourth'].str.contains('|'.join(targets2))
df['verified_amount'] = np.where(m1 | m2 | m3, df['total_amount'], 0)
#if need AND with OR - but different output from sample data
#df['verified_amount'] = np.where(m1 & (m2 | m3), df['total_amount'], 0)
print (df)
first second third fourth total_amount verified_amount
0 Top;Tier None FIT,Special Rising 5000 5000
1 Internal None None Black 6000 6000
2 None Existing None Pink 800 0