Python 数值数据浮点数错误

Question

我正在尝试运行在具有 162 列和 69,000 行的数据集上使用 sklearn 进行 PCA。我不断收到下面的浮点错误消息，我已经检查以确保我只有数字数据。我做错了什么？非常感谢任何帮助。

    >>> data = np.loadtxt("PCAdata.txt")
    >>> trans = data.transpose()
    >>> trans
    array([[0., 0., 1., ..., 0., 0., 1.],
           [0., 0., 1., ..., 1., 0., 2.],
           [0., 0., 1., ..., 0., 0., 1.],
           ...,
           [1., 0., 1., ..., 0., 0., 1.],
           [0., 0., 1., ..., 0., 0., 2.],
           [0., 0., 1., ..., 0., 0., 2.]])
    >>> sscaler = preprocessing.StandardScaler().fit(trans)
    >>> sscaler
    StandardScaler(copy=True, with_mean=True, with_std=True)
    >>> pca = PCA(n_components=2)
    >>> pca.fit(sscaler)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 329, i
    n fit
        self._fit(X)
      File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 370, i
    n _fit
        copy=self.copy)
      File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 433, in
     check_array
        array = np.array(array, dtype=dtype, order=order, copy=copy)
    TypeError: float() argument must be a string or a number

Answer 1

fit 方法没有 return 矩阵。 Sklearn 给出错误，因为您输入的参数 sscaler 不是数字矩阵。如果你想获得缩放数据矩阵，你可以使用 fit_transform 方法或分别使用 fit 和 transform 方法。

示例：

data = np.random.randint(0, 3, (100, 10))
scaler = StandardScaler()
data = scaler.fit_transform(data)
pca = PCA()
data = pca.fit_transform(data)

Python 数值数据浮点数错误

Python float error with numerical data

pca

python-2.7

scikit-learn