缺失值的插值
interapolation of missing values
我正在尝试通过线性方法在 python 数据帧中插入缺失值。有什么办法吗?
这是一个解决方案,我不确定最好的。
import pandas as pd
df = pd.DataFrame({"A": [12, 4, 5, None, 1],
"B": [None, 2, 10000, 20000, None],
"C": [1, None, None, 8, None],
"D": [14, 99, None, None, 6]})
l = df.columns
for i in l:
for j in range(len(df[i])):
if j > 1 and pd.isna(df[i].iloc[j]):
m = -1
for k in range(j - 1, -1, -1):
if not pd.isna(df[i].iloc[k]):
if m == -1:
m = k
else:
df[i].iloc[j] = df[i].iloc[m] + ((df[i].iloc[m] -
df[i].iloc[k]) / (m - k)) * (j - m)
break
print(df)
输出:
A B C D
0 12.0 NaN 1.000000 14.0
1 4.0 2.0 NaN 99.0
2 5.0 10000.0 NaN 184.0
3 6.0 20000.0 8.000000 269.0
4 1.0 30000.0 10.333333 6.0
使用 Pandas 进行插值的替代答案。下面的代码使用 Python 3.7:
导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
创建数据帧
df = pd.DataFrame({
'x':[0, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, 10],
'y':[0, np.nan, 4, np.nan, 6, np.nan, 8, np.nan, 20]
})
插入缺失值或 NaN 值:线性
df['ix'] = df['x'].interpolate(method='linear')
df['iy'] = df['y'].interpolate(method='linear')
创建具有插值的图
plt.scatter(df['x'], df['y'], label='original')
plt.scatter(df['ix'],df['iy'], marker='o', facecolor='none', color='red', s=200, label='interpolated')
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Original data with linear-interpolated data')
我正在尝试通过线性方法在 python 数据帧中插入缺失值。有什么办法吗?
这是一个解决方案,我不确定最好的。
import pandas as pd
df = pd.DataFrame({"A": [12, 4, 5, None, 1],
"B": [None, 2, 10000, 20000, None],
"C": [1, None, None, 8, None],
"D": [14, 99, None, None, 6]})
l = df.columns
for i in l:
for j in range(len(df[i])):
if j > 1 and pd.isna(df[i].iloc[j]):
m = -1
for k in range(j - 1, -1, -1):
if not pd.isna(df[i].iloc[k]):
if m == -1:
m = k
else:
df[i].iloc[j] = df[i].iloc[m] + ((df[i].iloc[m] -
df[i].iloc[k]) / (m - k)) * (j - m)
break
print(df)
输出:
A B C D
0 12.0 NaN 1.000000 14.0
1 4.0 2.0 NaN 99.0
2 5.0 10000.0 NaN 184.0
3 6.0 20000.0 8.000000 269.0
4 1.0 30000.0 10.333333 6.0
使用 Pandas 进行插值的替代答案。下面的代码使用 Python 3.7:
导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
创建数据帧
df = pd.DataFrame({
'x':[0, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, 10],
'y':[0, np.nan, 4, np.nan, 6, np.nan, 8, np.nan, 20]
})
插入缺失值或 NaN 值:线性
df['ix'] = df['x'].interpolate(method='linear')
df['iy'] = df['y'].interpolate(method='linear')
创建具有插值的图
plt.scatter(df['x'], df['y'], label='original')
plt.scatter(df['ix'],df['iy'], marker='o', facecolor='none', color='red', s=200, label='interpolated')
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Original data with linear-interpolated data')