如何在我的 lambda 函数中合并 if 语句以排除空白值？

Question

我想在使用下面的 lambda function 时尝试排除任何空白值，这将防止在我的输出中出现额外的逗号。如果我运行没有 if 语句的代码，我会在 comb_words 列的值中得到额外的逗号。我如何合并 if 语句来排除空白值并防止在我的输出中出现任何额外的逗号？

代码：

# dataframe
df = pd.DataFrame(data ={'col1':[123,123, 456, 456, 789, 789],'col2':["",'I eat cake.','We run fast.', 
'We eat cake?','I run faster!','I eat candy.'],'col2_new':["",'i eat cake','we run fast','we eat cake',
'i run faster','i eat candy']})

# words to search on
search_words1 = ['run fast','eat cake','faster','candy']

# create columns based on search words found                
for n in search_words1:
        df[n] = np.where(df['col2_new'].str.contains(n),n,"")

# combine words into a single column only if value is not blank
cols = ['run fast','eat cake','faster','candy']

df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
df

原始数据框：

col1     col2           col2_new
0   123     
1   123  I eat cake.    i eat cake
2   456  We run fast.   we run fast
3   456  We eat cake?   we eat cake
4   789  I run faster!  i run faster
5   789  I eat candy.   i eat candy

错误信息：

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-117bb81b84df> in <module>
     10 cols = ['run fast','eat cake','faster','candy']
     11 
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
     13 
     14 # df = df.drop_duplicates(subset =['call_id','comb_words'])

~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879 
   6880     def applymap(self, func) -> "DataFrame":

~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    294             try:
    295                 result = libreduction.compute_reduction(
--> 296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )
    298             except ValueError as err:

pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()

<ipython-input-28-117bb81b84df> in <lambda>(row)
     10 cols = ['run fast','eat cake','faster','candy']
     11 
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
     13 
     14 # df = df.drop_duplicates(subset =['call_id','comb_words'])

~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1477     def __nonzero__(self):
   1478         raise ValueError(
-> 1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1481         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

期望的输出：

col1      col2          col2_new     run fast    eat cake   faster  candy   comb_words
0   123                                                                         
1   123   I eat cake.   i eat cake               eat cake                   eat cake
2   456   We run fast.  we run fast  run fast                               run fast
3   456   We eat cake?  we eat cake              eat cake                   eat cake
4   789   I run faster! i run faster run fast               faster          run fast , faster
5   789   I eat candy.  i eat candy                                 candy   candy

Answer 1

没有条件语句，可以使用：

df['comb_words'] = df[cols].stack().loc[lambda x: x != ''] \
                           .groupby(level=0).apply(lambda x: ' , '.join(x))
print(df)

# Output
   col1           col2      col2_new  run fast  eat cake  faster  candy         comb_words
0   123                                                                                NaN
1   123    I eat cake.    i eat cake            eat cake                          eat cake
2   456   We run fast.   we run fast  run fast                                    run fast
3   456   We eat cake?   we eat cake            eat cake                          eat cake
4   789  I run faster!  i run faster  run fast            faster         run fast , faster
5   789   I eat candy.   i eat candy                              candy              candy

Answer 2

无需使用复杂的lambda，您只需编写一个函数，然后将其传递给apply:

# ...

def func(row):
    if not row:
        return ""
    else:
        return ' , '.join(row.values.astype(str))


df['comb_words'] = df[cols].apply(func, axis=1)

如何在我的 lambda 函数中合并 if 语句以排除空白值？

how do I incorporate and if statement in my lambda function to exclude blank values?

python

if-statement

dataframe

pandas