根据特定列中是否存在 NaN 在 python 数据框中创建一个新列

Question

我有一个包含 NaN 的数据框

df = pd.DataFrame({"A": [10,20,30, np.nan], "B": [20, np.nan, 10,np.nan]})

     A     B
0  10.0  20.0
1  20.0   NaN
2  30.0  10.0
3   NaN   NaN

我想创建一个新专栏'C'。在任何行中，如果 'A' 或 'B' 列中的任何一个具有 NaN，则 'C' 列将设置为 0，否则为 1.

我想得到如下：

      A     B  C
0  10.0  20.0  1
1  20.0   NaN  0
2  30.0  10.0  1
3   NaN   NaN  0

我尝试了以下代码：

df['C'] = df.apply(lambda row:0 if (row['A']=='NaN' or row['B']=='NaN')  else 1, axis=1)

我得到以下 df。 C 列始终设置为 1。

      A     B  C
0  10.0  20.0  1
1  20.0   NaN  1
2  30.0  10.0  1
3   NaN   NaN  1

也尝试了以下代码：

df['C'] = df.apply(lambda row:0 if (row['A'].isnull() or row['B'].isnull())  else 1, axis=1)

出现以下错误。

AttributeError: ("'float' object has no attribute 'isnull'", 'occurred at index 0')

Answer 1

使用notnull + all:

df['C'] = df.notnull().all(1).astype(int)    
df

      A     B  C
0  10.0  20.0  1
1  20.0   NaN  0
2  30.0  10.0  1
3   NaN   NaN  0

Create a new column in python dataframe based on the presence of NaN in a specific column