特征提取和更高的灵敏度

Feature Extraction and higher sensitivity

在 WBCD 数据集上进行特征提取（PCA 和 LDA），然后进行逻辑回归时，我的灵敏度有所提高，但准确度有所不同。我一直在努力寻找可以解释/研究特征提取如何提高分类器灵敏度的文献，但我找不到任何东西。

特征提取降低了数据的维度。这通常是为了创建一个更小的系统（以减少计算费用）and/or 减少噪音（以获得更清晰的信号）。

Introduction to Statistical Learning (available here) on Unsupervised Learning (p.373) 中有简明的介绍，我想你看的就是为了。

以PCA为例。来自统计学习导论:

When faced with a large set of correlated variables, principal components allow us to summarize this set with a smaller number of representative variables that collectively explain most of the variability in the original set. The principal component directions are presented in Section 6.3.1 as directions in feature space along which the original data are highly variable. These directions also define lines and subspaces that are as close as possible to the data cloud. To perform principal components regression, we simply use principal components as predictors in a regression model in place of the original larger set of variables.

Principal component analysis (PCA) refers to the process by which principal components are computed, and the subsequent use of these components in understanding the data. PCA is an unsupervised approach, since it involves only a set of features X1, X2,...,Xp, and no associated response Y. Apart from producing derived variables for use in supervised learning problems, PCA also serves as a tool for data visualization (visualization of the observations or visualization of the variables). We now discuss PCA in greater detail, focusing on the use of PCA as a tool for unsupervised data exploration, in keeping with the topic of this chapter.

我的 go-to 资源是 Elements of Statistical Learning（可免费获得 here）。第 534 页起对 PCA 进行了详细讨论，将其应用于 hand-writing 以使问题更易于处理。

特征提取和更高的灵敏度

Feature Extraction and higher sensitivity

python

machine-learning

feature-extraction

logistic-regression