缺失值的插值

interapolation of missing values

我正在尝试通过线性方法在 python 数据帧中插入缺失值。有什么办法吗?

这是一个解决方案,我不确定最好的。

import pandas as pd

df = pd.DataFrame({"A": [12, 4, 5, None, 1],
                   "B": [None, 2, 10000, 20000, None],
                   "C": [1, None, None, 8, None],
                   "D": [14, 99, None, None, 6]})


l = df.columns
for i in l:
    for j in range(len(df[i])):
        if j > 1 and pd.isna(df[i].iloc[j]):
            m = -1
            for k in range(j - 1, -1, -1):
                if not pd.isna(df[i].iloc[k]):
                    if m == -1:
                        m = k
                    else:
                        df[i].iloc[j] = df[i].iloc[m] + ((df[i].iloc[m] - 
 df[i].iloc[k]) / (m - k)) * (j - m)
                        break
print(df)

输出:

      A        B          C      D
0  12.0      NaN   1.000000   14.0
1   4.0      2.0        NaN   99.0
2   5.0  10000.0        NaN  184.0
3   6.0  20000.0   8.000000  269.0
4   1.0  30000.0  10.333333    6.0

使用 Pandas 进行插值的替代答案。下面的代码使用 Python 3.7:

导入库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

创建数据帧

df = pd.DataFrame({
'x':[0, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, 10],
'y':[0, np.nan, 4, np.nan, 6, np.nan, 8, np.nan, 20]
})

插入缺失值或 NaN 值:线性

df['ix'] = df['x'].interpolate(method='linear')
df['iy'] = df['y'].interpolate(method='linear')

创建具有插值的图

plt.scatter(df['x'], df['y'], label='original') 
plt.scatter(df['ix'],df['iy'], marker='o', facecolor='none', color='red', s=200, label='interpolated')
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Original data with linear-interpolated data')