在 python 中的 PCA 重建中寻找 rmse 的问题
Issue in finding rmse in PCA Reconstruction in python
我试图找出来自 Xdata
的原始样本和 recon
的不同分量的重建样本之间的均方根误差。但是当我使用下面的代码时:
components = [2,6,10,20]
for n in components:
pca = PCA(n_components=n)
recon = pca.inverse_transform(pca.fit_transform(Xdata[0].reshape(1, -1)))
rmse = math.sqrt(mean_squared_error(Xdata[0].reshape(1, -1), recon))
print("RMSE: {} with {} components".format(rmse, n))
每个分量的 RMSE 总是 0.0?
供参考,这是 Xdata[0]
的内容:
array([-8.47058824e-06, -6.12352941e-05, -3.18529412e-04, -1.09905882e-03, -2.64370588e-03, -4.39111765e-03, -8.70000000e-03, -2.35560000e-02, -6.03388235e-02, -1.52837471e-01, -3.48945353e-01, -4.86196588e-01, -5.51568706e-01, -5.38629706e-01, -5.34948000e-01, -5.70773824e-01, -5.45583000e-01, -4.30446353e-01, -2.76558000e-01, -1.10208882e-01, -4.35031765e-02, -2.09613529e-02, -1.25080588e-02, -9.00317647e-03, -5.04900000e-03, -2.75576471e-03, -1.03394118e-03, -1.78058824e-04, -7.53529412e-05, -2.54647059e-04])
PCA 是一种类型降维,我引用 wiki:
It is commonly used for dimensionality reduction by projecting each
data point onto only the first few principal components to obtain
lower-dimensional data while preserving as much of the data's
variation as possible.
对我来说,你的数据 X[0]
而且只有 1 个维度..你还能减少多少?
如果是测试第一个条目的rmse,您仍然需要在完整数据上拟合pca(以捕获方差),并且只在一个数据点上对rmse进行子集化(尽管它可能没有意义,因为对于 n=1 它不是 rmse 而是残差的平方)
您可以在下面看到:
import numpy as np
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error
iris = datasets.load_iris()
Xdata = iris.data
components = [2,3]
for n in components:
pca = PCA(n_components=n)
recon = pca.inverse_transform(pca.fit_transform(Xdata))
rmse = mean_squared_error(Xdata[0], recon[0],squared=False)
print("RMSE: {} with {} components".format(rmse, n))
输出:
RMSE: 0.014003180182090432 with 2 components
RMSE: 0.0011312185356586826 with 3 components
我试图找出来自 Xdata
的原始样本和 recon
的不同分量的重建样本之间的均方根误差。但是当我使用下面的代码时:
components = [2,6,10,20]
for n in components:
pca = PCA(n_components=n)
recon = pca.inverse_transform(pca.fit_transform(Xdata[0].reshape(1, -1)))
rmse = math.sqrt(mean_squared_error(Xdata[0].reshape(1, -1), recon))
print("RMSE: {} with {} components".format(rmse, n))
每个分量的 RMSE 总是 0.0?
供参考,这是 Xdata[0]
的内容:
array([-8.47058824e-06, -6.12352941e-05, -3.18529412e-04, -1.09905882e-03, -2.64370588e-03, -4.39111765e-03, -8.70000000e-03, -2.35560000e-02, -6.03388235e-02, -1.52837471e-01, -3.48945353e-01, -4.86196588e-01, -5.51568706e-01, -5.38629706e-01, -5.34948000e-01, -5.70773824e-01, -5.45583000e-01, -4.30446353e-01, -2.76558000e-01, -1.10208882e-01, -4.35031765e-02, -2.09613529e-02, -1.25080588e-02, -9.00317647e-03, -5.04900000e-03, -2.75576471e-03, -1.03394118e-03, -1.78058824e-04, -7.53529412e-05, -2.54647059e-04])
PCA 是一种类型降维,我引用 wiki:
It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data's variation as possible.
对我来说,你的数据 X[0]
而且只有 1 个维度..你还能减少多少?
如果是测试第一个条目的rmse,您仍然需要在完整数据上拟合pca(以捕获方差),并且只在一个数据点上对rmse进行子集化(尽管它可能没有意义,因为对于 n=1 它不是 rmse 而是残差的平方)
您可以在下面看到:
import numpy as np
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error
iris = datasets.load_iris()
Xdata = iris.data
components = [2,3]
for n in components:
pca = PCA(n_components=n)
recon = pca.inverse_transform(pca.fit_transform(Xdata))
rmse = mean_squared_error(Xdata[0], recon[0],squared=False)
print("RMSE: {} with {} components".format(rmse, n))
输出:
RMSE: 0.014003180182090432 with 2 components
RMSE: 0.0011312185356586826 with 3 components