将 PCA 投影回具有 explained_variance_ratio_ 条件的原始比例

Question

我在使用 scikit 时有 2 个关于 PCA 的问题。

假设我有以下数据：

fullmatrix =[[2.5, 2.4],
             [0.5, 0.7],
             [2.2, 2.9],
             [1.9, 2.2],
             [3.1, 3.0],
             [2.3, 2.7],
             [2.0, 1.6],
             [1.0, 1.1],
             [1.5, 1.6],
             [1.1, 0.9]]

现在我进行 PCA 计算：

from sklearn.decomposition import PCA as PCA

sklearn_pca = PCA()
Y_sklearn = sklearn_pca.fit_transform(fullmatrix)
print Y_sklearn  # Y_sklearn is now the Data transformed with 2 eigenvectors

sklearn_pca.explained_variance_ratio_  # variance explained by each eigenvector
print sklearn_pca.explained_variance_ratio_

sklearn_pca.components_ # eigenvectors order by highest eigenvalue
print sklearn_pca.components_

第一个问题： 我怎样才能将这个 Y_sklearn 投影回原来的比例？（我知道我们应该得到与全矩阵相同的数据，因为我正在使用所有特征向量，它只是为了检查是否正确完成）。

第二个问题： 如何输入关于来自 "sklearn_pca.explained_variance_ratio_" 的最小可接受总方差的阈值？例如，假设我想继续使用特征向量，直到我达到总数 explained_variance_ratio_ 超过 95%。在这种情况下很容易，我们只使用第一个特征向量，因为它解释了 .96318131%。但是我们怎样才能以更自动化的方式做到这一点呢？

Answer 1

第一个：sklearn_pca.inverse_transform(Y_sklearn)

第二个：

thr = 0.95
# Is cumulative sum exceeds some threshold
is_exceeds = np.cumsum(sklearn_pca.explained_variance_ratio_) >= thr
# Which minimal index provides such variance
# We need to add 1 to get minimum number of eigenvectors for saving this variance
k = np.min(np.where(is_exceeds))+1
# Or you can just initialize your model with thr parameter
sklearn_pca = PCA(n_components = thr)

将 PCA 投影回具有 explained_variance_ratio_ 条件的原始比例

project PCA back into original scales with explained_variance_ratio_ condition

python

scikit-learn