高斯过程回归(Kriging)与径向基函数插值
Gaussian process regression (Kriging) vs Radial Basis Function interpolation
我在平面图上对传感器之间的温度数据实施了 2 种类型的插值。由于我不是很精通我使用的包的底层过程和数学,我发现很难理解为什么它们通过 pcolormesh 的输出如此不同。
我用了scipy.interpolate.Rbf
和sklearn.gaussian_process
。
这些是图片。
Radial Basis Function interpolation
RBF 示例看起来与在网络上找到的实现完全一样,但 GPR 示例显示这些长线而不是圆形。在 Scikit Learn 的 GPR 实现中,什么参数会调节这些形状?为什么即使 GPR 的温度结果发生轻微变化,它们的形状甚至有时颜色强度也会如此不同。
平面图上的9个传感器(点)均匀分布。
RBF 代码。
# Set X and Y Coordinates for each sensor (pixels)
days_data['xCoordinate'] = days_data.nodeid.apply(lambda id: createXCoord(id))
days_data['yCoordinate'] = days_data.nodeid.apply(lambda id: createYCoord(id))
# Define location of "sensors" on the axes
x = days_data.xCoordinate.to_list()
y = days_data.yCoordinate.to_list()
z = days_data.avgtemperature.to_list() #temperature
# Use Gaussian function
rbf_adj = Rbf(x, y, z, function = 'gaussian')
# Set extent to which colour mesh stretches over
# the underlying image
x_fine = np.linspace(0, 1000, 81) #81 - num of samples
y_fine = np.linspace(0, 700, 81)
x_grid, y_grid = np.meshgrid(x_fine, y_fine)
z_grid = rbf_adj(x_grid.ravel(), y_grid.ravel()).reshape(x_grid.shape)
# Remove the colorbar created by the previous plot, if any
# To avoid a new colorbar being plotted alongside the previous one each time a different date is selected
try:
cb = p.colorbar
cb.remove()
except:
pass
# plot the pcolor on the Axes. Use alpha to set the transparency
p=ax.pcolor(x_fine, y_fine, z_grid, alpha=0.3)
ax.invert_yaxis() #invert Y axis for X and Y to have same starting point
# Add a colorbar for the pcolor field
fig.colorbar(p,ax=ax)
GPR 代码
# Define location of "sensors" on the axes
x = days_data.xCoordinate.to_list()
y = days_data.yCoordinate.to_list()
z = days_data.avgtemperature.to_list() #temperature
X = np.array([[a, b] for a, b in zip(x, y)])
# Set extent to which colour mesh stretches over
# the underlying image
x_fine = np.linspace(0, 1000, 81) #81 - num of samples
y_fine = np.linspace(0, 700, 82)
X_fine = np.array([[a_fine, b_fine] for a_fine, b_fine in zip(x_fine, y_fine)])
x_grid, y_grid = np.meshgrid(x_fine, y_fine)
# Instantiate a Gaussian Process model
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, z)
z_grid, sigma = gp.predict(X_fine, return_std=True)
# Remove the colorbar created by the previous plot, if any
# To avoid a new colorbar being plotted alongside the previous one each time a different date is selected
try:
cb = p.colorbar
cb.remove()
except:
pass
# plot the pcolor on the Axes. Use alpha to set the transparency
p = ax.pcolor(x_grid, y_grid, np.meshgrid(z_grid, y_fine)[0], alpha=0.3)
ax.invert_yaxis() #invert Y axis for X and Y to have same starting point
# Add a colorbar for the pcolor field
fig.colorbar(p,ax=ax)
我猜想高斯过程的尺度参数在x和y方向上差别很大,x尺度参数小,y尺度参数相对较大。这样,x 距离小的两个点相关性低,y 距离小的两个点相关性高:这会在温度曲线中创建垂直 "bands"。
我能够用以下数据模拟这个,OpenTURNS。你没有提供温度,所以我不得不从图形中猜测它们。
import numpy as np
import openturns as ot
coordinates = ot.Sample([[100.0,100.0],[500.0,100.0],[900.0,100.0], \
[100.0,350.0],[500.0,350.0],[900.0,350.0], \
[100.0,600.0],[500.0,600.0],[900.0,600.0]])
observations = ot.Sample([25.0,25.0,10.0,20.0,25.0,20.0,15.0,25.0,25.0],1)
# Extract coordinates
x = np.array(coordinates[:,0])
y = np.array(coordinates[:,1])
# Plot the data with a scatter plot and a color map
import matplotlib.pyplot as plt
fig = plt.figure()
plt.scatter(x, y, c=observations, cmap='viridis')
plt.colorbar()
plt.show()
这会产生:
可以使用以下脚本从中拟合克里格元模型。我使用了平方指数协方差模型。
def fitKriging(coordinates, observations, covarianceModel, basis):
'''
Fit the parameters of a kriging metamodel.
'''
algo = ot.KrigingAlgorithm(coordinates, observations, covarianceModel, basis)
algo.run()
krigingResult = algo.getResult()
krigingMetamodel = krigingResult.getMetaModel()
return krigingResult, krigingMetamodel
inputDimension = 2
basis = ot.ConstantBasisFactory(inputDimension).build()
covarianceModel = ot.SquaredExponential([1.]*inputDimension, [1.0])
krigingResult, krigingMetamodel = fitKriging(coordinates, observations, covarianceModel, basis)
为了绘制这个克里格元模型的预测,我使用了以下基于 pcolor
函数的脚本。
def plotKrigingPredictions(krigingMetamodel):
'''
Plot the predictions of a kriging metamodel.
'''
# Create the mesh of the box [0., 1000.] * [0., 700.]
myInterval = ot.Interval([0., 0.], [1000., 700.])
# Define the number of interval in each direction of the box
nx = 20
ny = 20
myIndices = [nx-1, ny-1]
myMesher = ot.IntervalMesher(myIndices)
myMeshBox = myMesher.build(myInterval)
# Predict
vertices = myMeshBox.getVertices()
predictions = krigingMetamodel(vertices)
# Format for plot
X = np.array(vertices[:,0]).reshape((ny,nx))
Y = np.array(vertices[:,1]).reshape((ny,nx))
predictions_array = np.array(predictions).reshape((ny,nx))
# Plot
plt.figure()
plt.pcolor(X, Y, predictions_array)
plt.colorbar()
plt.show()
return
plotKrigingPredictions(krigingMetamodel)
这会产生:
您可以看到与预测中相同的波段。
查看协方差模型解释了原因:
>>> covarianceModel = krigingResult.getCovarianceModel()
>>> print(covarianceModel)
SquaredExponential(scale=[0.167256,1.5929], amplitude=[6.75753])
x 尺度为 0.1672,比 y 尺度 1.593 小得多。这是因为我们使用了各向异性协方差模型,其中 x 尺度 theta 可以不同于 y 尺度 theta。
为了解决这个问题,我们可以使用各向同性协方差模型,其中 x 尺度保持等于 y 尺度。然而,仅将 x 和 y 标度设置为给定值并仅估计振幅参数 sigma 会更简单。
以下脚本将尺度设置为先前估计的平均值,并且仅估计 sigma。
scales = covarianceModel.getScale()
meanScale = (scales[0]+scales[1])/2.0
covarianceModel.setScale([meanScale]*2)
covarianceModel.setActiveParameter([2]) # Enable sigma (amplitude) only
# Learn amplitude only
krigingResult, krigingMetamodel = fitKriging(coordinates, observations, covarianceModel, basis)
covarianceModel = krigingResult.getCovarianceModel()
print("Covariance model=",covarianceModel)
这会打印:
Covariance model= SquaredExponential(scale=[0.880079,0.880079], amplitude=[6.1472])
最后,脚本如下:
plotKrigingPredictions(krigingMetamodel)
地块:
由于 theta 参数在两个方向上都相同,因此温度现在是球形的,正如我猜你所期望的那样。
我在平面图上对传感器之间的温度数据实施了 2 种类型的插值。由于我不是很精通我使用的包的底层过程和数学,我发现很难理解为什么它们通过 pcolormesh 的输出如此不同。
我用了scipy.interpolate.Rbf
和sklearn.gaussian_process
。
这些是图片。
Radial Basis Function interpolation
RBF 示例看起来与在网络上找到的实现完全一样,但 GPR 示例显示这些长线而不是圆形。在 Scikit Learn 的 GPR 实现中,什么参数会调节这些形状?为什么即使 GPR 的温度结果发生轻微变化,它们的形状甚至有时颜色强度也会如此不同。
平面图上的9个传感器(点)均匀分布。
RBF 代码。
# Set X and Y Coordinates for each sensor (pixels)
days_data['xCoordinate'] = days_data.nodeid.apply(lambda id: createXCoord(id))
days_data['yCoordinate'] = days_data.nodeid.apply(lambda id: createYCoord(id))
# Define location of "sensors" on the axes
x = days_data.xCoordinate.to_list()
y = days_data.yCoordinate.to_list()
z = days_data.avgtemperature.to_list() #temperature
# Use Gaussian function
rbf_adj = Rbf(x, y, z, function = 'gaussian')
# Set extent to which colour mesh stretches over
# the underlying image
x_fine = np.linspace(0, 1000, 81) #81 - num of samples
y_fine = np.linspace(0, 700, 81)
x_grid, y_grid = np.meshgrid(x_fine, y_fine)
z_grid = rbf_adj(x_grid.ravel(), y_grid.ravel()).reshape(x_grid.shape)
# Remove the colorbar created by the previous plot, if any
# To avoid a new colorbar being plotted alongside the previous one each time a different date is selected
try:
cb = p.colorbar
cb.remove()
except:
pass
# plot the pcolor on the Axes. Use alpha to set the transparency
p=ax.pcolor(x_fine, y_fine, z_grid, alpha=0.3)
ax.invert_yaxis() #invert Y axis for X and Y to have same starting point
# Add a colorbar for the pcolor field
fig.colorbar(p,ax=ax)
GPR 代码
# Define location of "sensors" on the axes
x = days_data.xCoordinate.to_list()
y = days_data.yCoordinate.to_list()
z = days_data.avgtemperature.to_list() #temperature
X = np.array([[a, b] for a, b in zip(x, y)])
# Set extent to which colour mesh stretches over
# the underlying image
x_fine = np.linspace(0, 1000, 81) #81 - num of samples
y_fine = np.linspace(0, 700, 82)
X_fine = np.array([[a_fine, b_fine] for a_fine, b_fine in zip(x_fine, y_fine)])
x_grid, y_grid = np.meshgrid(x_fine, y_fine)
# Instantiate a Gaussian Process model
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, z)
z_grid, sigma = gp.predict(X_fine, return_std=True)
# Remove the colorbar created by the previous plot, if any
# To avoid a new colorbar being plotted alongside the previous one each time a different date is selected
try:
cb = p.colorbar
cb.remove()
except:
pass
# plot the pcolor on the Axes. Use alpha to set the transparency
p = ax.pcolor(x_grid, y_grid, np.meshgrid(z_grid, y_fine)[0], alpha=0.3)
ax.invert_yaxis() #invert Y axis for X and Y to have same starting point
# Add a colorbar for the pcolor field
fig.colorbar(p,ax=ax)
我猜想高斯过程的尺度参数在x和y方向上差别很大,x尺度参数小,y尺度参数相对较大。这样,x 距离小的两个点相关性低,y 距离小的两个点相关性高:这会在温度曲线中创建垂直 "bands"。
我能够用以下数据模拟这个,OpenTURNS。你没有提供温度,所以我不得不从图形中猜测它们。
import numpy as np
import openturns as ot
coordinates = ot.Sample([[100.0,100.0],[500.0,100.0],[900.0,100.0], \
[100.0,350.0],[500.0,350.0],[900.0,350.0], \
[100.0,600.0],[500.0,600.0],[900.0,600.0]])
observations = ot.Sample([25.0,25.0,10.0,20.0,25.0,20.0,15.0,25.0,25.0],1)
# Extract coordinates
x = np.array(coordinates[:,0])
y = np.array(coordinates[:,1])
# Plot the data with a scatter plot and a color map
import matplotlib.pyplot as plt
fig = plt.figure()
plt.scatter(x, y, c=observations, cmap='viridis')
plt.colorbar()
plt.show()
这会产生:
可以使用以下脚本从中拟合克里格元模型。我使用了平方指数协方差模型。
def fitKriging(coordinates, observations, covarianceModel, basis):
'''
Fit the parameters of a kriging metamodel.
'''
algo = ot.KrigingAlgorithm(coordinates, observations, covarianceModel, basis)
algo.run()
krigingResult = algo.getResult()
krigingMetamodel = krigingResult.getMetaModel()
return krigingResult, krigingMetamodel
inputDimension = 2
basis = ot.ConstantBasisFactory(inputDimension).build()
covarianceModel = ot.SquaredExponential([1.]*inputDimension, [1.0])
krigingResult, krigingMetamodel = fitKriging(coordinates, observations, covarianceModel, basis)
为了绘制这个克里格元模型的预测,我使用了以下基于 pcolor
函数的脚本。
def plotKrigingPredictions(krigingMetamodel):
'''
Plot the predictions of a kriging metamodel.
'''
# Create the mesh of the box [0., 1000.] * [0., 700.]
myInterval = ot.Interval([0., 0.], [1000., 700.])
# Define the number of interval in each direction of the box
nx = 20
ny = 20
myIndices = [nx-1, ny-1]
myMesher = ot.IntervalMesher(myIndices)
myMeshBox = myMesher.build(myInterval)
# Predict
vertices = myMeshBox.getVertices()
predictions = krigingMetamodel(vertices)
# Format for plot
X = np.array(vertices[:,0]).reshape((ny,nx))
Y = np.array(vertices[:,1]).reshape((ny,nx))
predictions_array = np.array(predictions).reshape((ny,nx))
# Plot
plt.figure()
plt.pcolor(X, Y, predictions_array)
plt.colorbar()
plt.show()
return
plotKrigingPredictions(krigingMetamodel)
这会产生:
您可以看到与预测中相同的波段。
查看协方差模型解释了原因:
>>> covarianceModel = krigingResult.getCovarianceModel()
>>> print(covarianceModel)
SquaredExponential(scale=[0.167256,1.5929], amplitude=[6.75753])
x 尺度为 0.1672,比 y 尺度 1.593 小得多。这是因为我们使用了各向异性协方差模型,其中 x 尺度 theta 可以不同于 y 尺度 theta。
为了解决这个问题,我们可以使用各向同性协方差模型,其中 x 尺度保持等于 y 尺度。然而,仅将 x 和 y 标度设置为给定值并仅估计振幅参数 sigma 会更简单。
以下脚本将尺度设置为先前估计的平均值,并且仅估计 sigma。
scales = covarianceModel.getScale()
meanScale = (scales[0]+scales[1])/2.0
covarianceModel.setScale([meanScale]*2)
covarianceModel.setActiveParameter([2]) # Enable sigma (amplitude) only
# Learn amplitude only
krigingResult, krigingMetamodel = fitKriging(coordinates, observations, covarianceModel, basis)
covarianceModel = krigingResult.getCovarianceModel()
print("Covariance model=",covarianceModel)
这会打印:
Covariance model= SquaredExponential(scale=[0.880079,0.880079], amplitude=[6.1472])
最后,脚本如下:
plotKrigingPredictions(krigingMetamodel)
地块:
由于 theta 参数在两个方向上都相同,因此温度现在是球形的,正如我猜你所期望的那样。