Python: 连接 scipy 稀疏矩阵

Python: Concatenating scipy sparse matrix

我正在尝试借助 hstack 函数连接 2 个稀疏矩阵。 xtrain_cat 是 DictVectorizer(编码分类值)的输出,xtrain_num 是一个 pandas cvs 文件。

    xtrain_num = sparse.csr_matrix(xtrain_num)
    print type(xtrain_num)
    print xtrain_cat.shape
    print xtrain_num.shape
    x_train_data = hstack(xtrain_cat,xtrain_num)

错误:

(1000, 2778)
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>
(1000, 2778)
(1000, 968)
Traceback (most recent call last):
  File "D:\Projects\Zohair\Bosch\Bosch.py", line 360, in <module>
    x_train_data = hstack(xtrain_cat,xtrain_num)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 464, in hstack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Users\Public\Documents\anaconda2\lib\site-packages\scipy\sparse\construct.py", line 547, in bmat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D

谁能确定问题是什么

你应该试试:

x_train_data = hstack((xtrain_cat,xtrain_num))

It takes a sequence:

blocks sequence of sparse matrices with compatible shapes


当我将a定义为稀疏矩阵时,我可以在省略它时验证您的错误(并在添加时更正):

In [19]: sparse.hstack(a, a)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent     call last)
<ipython-input-19-7c450ab4fda0> in <module>()
----> 1 sparse.hstack(a, a)

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
    454 
    455     """
--> 456     return bmat([blocks], format=format, dtype=dtype)
    457 
    458 

/usr/local/lib/python2.7/dist-packages/scipy/sparse/construct.pyc in     bmat(blocks, format, dtype)
    537 
    538     if blocks.ndim != 2:
--> 539         raise ValueError('blocks must be 2-D')
    540 
    541     M,N = blocks.shape

ValueError: blocks must be 2-D

In [20]: sparse.hstack((a, a))
Out[20]: 
<3x8 sparse matrix of type '<type 'numpy.float64'>'
    with 0 stored elements in COOrdinate format>