使用来自其他列的非空值填充列中的空值

Fill nulls in columns with non-null values from other columns

给定一个数据框,其中包含相似的列,它们之间有空值。如何使用其他列的非空值动态填充列中的空值而不明确说明其他列名的名称,例如select 第一列 category1 并用同一行其他列的值填充空行?

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
        'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
        'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
        'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}


df = pd.DataFrame(data)
df = df.set_index('year')
df

    category1   category2   category3
year            
2010    NaN 10  10
2011    21  21  21
2012    NaN 20  20
2013    10  10  10
2014    NaN NaN NaN
2015    30  30  NaN
2016    31  NaN 31
2017    45  45  45
2018    23  23  23
2019    56  56  56

填写后category1:

category1   category2   category3
year            
2010    10  10  10
2011    21  21  21
2012    20  20  20
2013    10  10  10
2014    NaN NaN NaN
2015    30  30  NaN
2016    31  NaN 31
2017    45  45  45
2018    23  23  23
2019    56  56  56

IIUC 你可以这样做:

In [369]: df['category1'] = df['category1'].fillna(df['category2'])

In [370]: df
Out[370]:
      category1  category2  category3
year
2010       10.0       10.0       10.0
2011       21.0       21.0       21.0
2012       20.0       20.0       20.0
2013       10.0       10.0       10.0
2014        NaN        NaN        NaN
2015       30.0       30.0       30.0
2016       31.0        NaN       31.0
2017       45.0       45.0       45.0
2018       23.0       23.0       23.0
2019       56.0       56.0       56.0

如果所有值都是 NaN:

,您可以使用 first_valid_index 条件
def f(x):
    if x.first_valid_index() is None:
        return None
    else:
        return x[x.first_valid_index()]

df['a'] = df.apply(f, axis=1)

print (df)
      category1  category2  category3     a
year                                       
2010        NaN       10.0       10.0  10.0
2011       21.0       21.0       21.0  21.0
2012        NaN       20.0       20.0  20.0
2013       10.0       10.0       10.0  10.0
2014        NaN        NaN        NaN   NaN
2015       30.0       30.0       30.0  30.0
2016       31.0        NaN       31.0  31.0
2017       45.0       45.0       45.0  45.0
2018       23.0       23.0       23.0  23.0
2019       56.0       56.0       56.0  56.0

试试这个:

df['category1']= df['category1'].fillna(df.median(axis=1))