Countif pandas python 用于带通配符的多列
Countif pandas python for multiple columns with wildcard
我在 Excel 中有一个数据集,我想复制它。
我的 python 代码如下:
data_frames = [df_mainstore, df_store_A, df_store_B]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=["Id_number"], how='outer'), data_frames)
print(df_merged)
由于我合并了几个数据框(列号和名称可能会有所不同),因此写出在此 example:
中完成的所有列也会很乏味
isY = lambda x:int(x=='Y')
countEmail= lambda row: isY(row['Store Contact A']) + isY(row['Store B Contact'])
df['Contact Email'] = df.apply(countEmail,axis=1)
我也在纠结这个表达:isY = lambda x:int(x=='@')
如何以与 Excel 类似的方式添加 "Contact has Email" 列?
您可以使用 filter
到 select 包含联系人的列,然后使用 str.contains
和右边的 ,最后您想要 any
每个这样排:
#data sample
df_merged = pd.DataFrame({'id': [0,1,2,3],
'Store A': list('abcd'),
'Store Contact A':['aa@bb.cc', '', 'e', 'f'],
'Store B': list('ghij'),
'Store B Contact':['kk@ll.m', '', 'nn@ooo.pp', '']})
# define the pattern as in the link
pat = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
# create the column as wanted
df_merged['Contact has Email'] = df_merged.filter(like='Contact')\
.apply(lambda x: x.str.contains(pat))\
.any(1)
print (df_merged)
id Store A Store Contact A Store B Store B Contact Contact has Email
0 0 a aa@bb.cc g kk@ll.m True
1 1 b h False
2 2 c e i nn@ooo.pp True
3 3 d f j False
您可以使用 pandas.Series.str.contains
df_merged['Contact has Email'] = df_merged['Store Contact A'].str.contains('@', na=False)|df_merged['Store B Contact'].str.contains('@', na=False)
我在 Excel 中有一个数据集,我想复制它。
我的 python 代码如下:
data_frames = [df_mainstore, df_store_A, df_store_B]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=["Id_number"], how='outer'), data_frames)
print(df_merged)
由于我合并了几个数据框(列号和名称可能会有所不同),因此写出在此 example:
中完成的所有列也会很乏味isY = lambda x:int(x=='Y')
countEmail= lambda row: isY(row['Store Contact A']) + isY(row['Store B Contact'])
df['Contact Email'] = df.apply(countEmail,axis=1)
我也在纠结这个表达:isY = lambda x:int(x=='@')
如何以与 Excel 类似的方式添加 "Contact has Email" 列?
您可以使用 filter
到 select 包含联系人的列,然后使用 str.contains
和右边的 any
每个这样排:
#data sample
df_merged = pd.DataFrame({'id': [0,1,2,3],
'Store A': list('abcd'),
'Store Contact A':['aa@bb.cc', '', 'e', 'f'],
'Store B': list('ghij'),
'Store B Contact':['kk@ll.m', '', 'nn@ooo.pp', '']})
# define the pattern as in the link
pat = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
# create the column as wanted
df_merged['Contact has Email'] = df_merged.filter(like='Contact')\
.apply(lambda x: x.str.contains(pat))\
.any(1)
print (df_merged)
id Store A Store Contact A Store B Store B Contact Contact has Email
0 0 a aa@bb.cc g kk@ll.m True
1 1 b h False
2 2 c e i nn@ooo.pp True
3 3 d f j False
您可以使用 pandas.Series.str.contains
df_merged['Contact has Email'] = df_merged['Store Contact A'].str.contains('@', na=False)|df_merged['Store B Contact'].str.contains('@', na=False)