Python / Pandas:用顺序填充NaN - 线性插值--> ffill --> bfill
Python / Pandas: Fill NaN with order - linear interpolation --> ffill --> bfill
我有一个df
:
company year revenues
0 company 1 2019 1,425,000,000
1 company 1 2018 1,576,000,000
2 company 1 2017 1,615,000,000
3 company 1 2016 1,498,000,000
4 company 1 2015 1,569,000,000
5 company 2 2019 nan
6 company 2 2018 1,061,757,075
7 company 2 2017 nan
8 company 2 2016 573,414,893
9 company 2 2015 599,402,347
我想fill
nan
值,有订单。我想先线性插值,然后向前填充,然后向后填充。我目前有:
f_2_impute = [x for x in cl_data.columns if cl_data[x].dtypes != 'O' and 'total' not in x and 'year' not in x]
def ffbf(x):
return x.ffill().bfill()
group_with = ['company']
for x in cl_data[f_2_impute]:
cl_data[x] = cl_data.groupby(group_with)[x].apply(lambda fill_it: ffbf(fill_it))
执行 ffill()
和 bfill()
。理想情况下,我想要一个函数,它首先尝试对缺失值进行线性插值,然后尝试向前填充它们,然后向后填充它们。
有什么快速实现的方法吗?提前致谢。
我认为您需要先将列转换为浮点数,如果 ,
存在:
df = pd.read_csv(file, thousands=',')
或:
df['revenues'] = df['revenues'].replace(',','', regex=True).astype(float)
def ffbf(x):
return x.interpolate().ffill().bfill()
我有一个df
:
company year revenues
0 company 1 2019 1,425,000,000
1 company 1 2018 1,576,000,000
2 company 1 2017 1,615,000,000
3 company 1 2016 1,498,000,000
4 company 1 2015 1,569,000,000
5 company 2 2019 nan
6 company 2 2018 1,061,757,075
7 company 2 2017 nan
8 company 2 2016 573,414,893
9 company 2 2015 599,402,347
我想fill
nan
值,有订单。我想先线性插值,然后向前填充,然后向后填充。我目前有:
f_2_impute = [x for x in cl_data.columns if cl_data[x].dtypes != 'O' and 'total' not in x and 'year' not in x]
def ffbf(x):
return x.ffill().bfill()
group_with = ['company']
for x in cl_data[f_2_impute]:
cl_data[x] = cl_data.groupby(group_with)[x].apply(lambda fill_it: ffbf(fill_it))
执行 ffill()
和 bfill()
。理想情况下,我想要一个函数,它首先尝试对缺失值进行线性插值,然后尝试向前填充它们,然后向后填充它们。
有什么快速实现的方法吗?提前致谢。
我认为您需要先将列转换为浮点数,如果 ,
存在:
df = pd.read_csv(file, thousands=',')
或:
df['revenues'] = df['revenues'].replace(',','', regex=True).astype(float)
def ffbf(x):
return x.interpolate().ffill().bfill()