如何在我的 lambda 函数中合并 if 语句以排除空白值?
how do I incorporate and if statement in my lambda function to exclude blank values?
我想在使用下面的 lambda function
时尝试排除任何空白值,这将防止在我的输出中出现额外的逗号。如果我 运行 没有 if
语句的代码,我会在 comb_words
列的值中得到额外的逗号。我如何合并 if
语句来排除空白值并防止在我的输出中出现任何额外的逗号?
代码:
# dataframe
df = pd.DataFrame(data ={'col1':[123,123, 456, 456, 789, 789],'col2':["",'I eat cake.','We run fast.',
'We eat cake?','I run faster!','I eat candy.'],'col2_new':["",'i eat cake','we run fast','we eat cake',
'i run faster','i eat candy']})
# words to search on
search_words1 = ['run fast','eat cake','faster','candy']
# create columns based on search words found
for n in search_words1:
df[n] = np.where(df['col2_new'].str.contains(n),n,"")
# combine words into a single column only if value is not blank
cols = ['run fast','eat cake','faster','candy']
df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
df
原始数据框:
col1 col2 col2_new
0 123
1 123 I eat cake. i eat cake
2 456 We run fast. we run fast
3 456 We eat cake? we eat cake
4 789 I run faster! i run faster
5 789 I eat candy. i eat candy
错误信息:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-117bb81b84df> in <module>
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
294 try:
295 result = libreduction.compute_reduction(
--> 296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
298 except ValueError as err:
pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()
<ipython-input-28-117bb81b84df> in <lambda>(row)
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1477 def __nonzero__(self):
1478 raise ValueError(
-> 1479 f"The truth value of a {type(self).__name__} is ambiguous. "
1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1481 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
期望的输出:
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
没有条件语句,可以使用:
df['comb_words'] = df[cols].stack().loc[lambda x: x != ''] \
.groupby(level=0).apply(lambda x: ' , '.join(x))
print(df)
# Output
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123 NaN
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
无需使用复杂的lambda,您只需编写一个函数,然后将其传递给apply
:
# ...
def func(row):
if not row:
return ""
else:
return ' , '.join(row.values.astype(str))
df['comb_words'] = df[cols].apply(func, axis=1)
我想在使用下面的 lambda function
时尝试排除任何空白值,这将防止在我的输出中出现额外的逗号。如果我 运行 没有 if
语句的代码,我会在 comb_words
列的值中得到额外的逗号。我如何合并 if
语句来排除空白值并防止在我的输出中出现任何额外的逗号?
代码:
# dataframe
df = pd.DataFrame(data ={'col1':[123,123, 456, 456, 789, 789],'col2':["",'I eat cake.','We run fast.',
'We eat cake?','I run faster!','I eat candy.'],'col2_new':["",'i eat cake','we run fast','we eat cake',
'i run faster','i eat candy']})
# words to search on
search_words1 = ['run fast','eat cake','faster','candy']
# create columns based on search words found
for n in search_words1:
df[n] = np.where(df['col2_new'].str.contains(n),n,"")
# combine words into a single column only if value is not blank
cols = ['run fast','eat cake','faster','candy']
df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
df
原始数据框:
col1 col2 col2_new
0 123
1 123 I eat cake. i eat cake
2 456 We run fast. we run fast
3 456 We eat cake? we eat cake
4 789 I run faster! i run faster
5 789 I eat candy. i eat candy
错误信息:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-117bb81b84df> in <module>
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
294 try:
295 result = libreduction.compute_reduction(
--> 296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
298 except ValueError as err:
pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()
<ipython-input-28-117bb81b84df> in <lambda>(row)
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1477 def __nonzero__(self):
1478 raise ValueError(
-> 1479 f"The truth value of a {type(self).__name__} is ambiguous. "
1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1481 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
期望的输出:
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
没有条件语句,可以使用:
df['comb_words'] = df[cols].stack().loc[lambda x: x != ''] \
.groupby(level=0).apply(lambda x: ' , '.join(x))
print(df)
# Output
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123 NaN
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
无需使用复杂的lambda,您只需编写一个函数,然后将其传递给apply
:
# ...
def func(row):
if not row:
return ""
else:
return ' , '.join(row.values.astype(str))
df['comb_words'] = df[cols].apply(func, axis=1)