如何对第一列中的稀疏矩阵行求和,并将其他列归零,与原始矩阵的维度相同?
How to sum a sparse matrix rows in the first column, and zero the other colums, with the same dimensions of the original matrix?
我有一个稀疏矩阵 B,我想通过对第一列中的所有行求和,然后将第一列除以“2”,并使其他列为零来从 B 得到稀疏矩阵 A。
from numpy import array
from scipy import csr_matrix
row = array([0,0,1,2,2,2])
col = array([0,2,2,0,1,2])
data = array([1,2,3,4,5,6])
B = csr_matrix( (data,(row,col)), shape=(3,3) )
A = B.copy()
A = A.sum(axis=1)/2
# A shape becomes 1 x 3 instead of 3 x 3 here!
我认为可以通过多种方式解决这个问题。你的没问题。
In [275]: from scipy.sparse import csr_matrix
...:
...: row = np.array([0,0,1,2,2,2])
...: col = np.array([0,2,2,0,1,2])
...: data = np.array([1,2,3,4,5,6.]) # make float
...:
...: B = csr_matrix( (data,(row,col)), shape=(3,3) )
In [276]: A = B.copy()
In [277]: A
Out[277]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
作业有效:
In [278]: A[:,0] = A.sum(axis=1)/2
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [279]: A[:,1:] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [280]: A
Out[280]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 9 stored elements in Compressed Sparse Row format>
In [283]: A.eliminate_zeros()
In [284]: A
Out[284]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [285]: A.A
Out[285]:
array([[1.5, 0. , 0. ],
[1.5, 0. , 0. ],
[7.5, 0. , 0. ]])
效率警告主要是为了阻止迭代或重复赋值。我觉得像这样的一次性动作可以忽略不计。
或者如果我们从全零开始 A
:
In [286]: A = csr_matrix(np.zeros(B.shape)) # may be better method
In [287]: A[:,0] = B.sum(axis=1)/2
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [288]: A
Out[288]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
或者,列和矩阵可用于直接构造 A
,使用与用于定义 B
:
相同的输入样式
In [289]: A1 = B.sum(axis=1)/2
In [290]: A1
Out[290]:
matrix([[1.5],
[1.5],
[7.5]])
In [296]: row = np.arange(3)
In [297]: col = np.zeros(3,int)
In [298]: data = A1.A1
In [299]: A = csr_matrix((data, (row, col)), shape=(3,3))
In [301]: A
Out[301]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [302]: A.A
Out[302]:
array([[1.5, 0. , 0. ],
[1.5, 0. , 0. ],
[7.5, 0. , 0. ]])
我不知道哪种方法最快。您的 sparse.hstack
看起来不错,但在幕后,hstack
正在从 coo
格式构建 row,col,data
数组,并制作一个新的 coo_matrix
。虽然它很可靠,但并不是特别精简。
我演示了如何直接加载一个稀疏数组
row = np.array([0,0,1,2,2,2])
col = np.array([0,2,2,0,1,2])
data = np.array([1,2,3,4,5,6])
B = csr_matrix( (data,(row,col)), shape=(3,3) )
sparseMatrix = csr_matrix((3, 3),
dtype = np.int).toarray()
my_tuple=list(zip(row,col))
index=0
for item in my_tuple:
sparseMatrix[item[0],item[1]]=data[index]
index+=1
print(sparseMatrix.sum(axis=1)/2)
print((B.sum(axis=1)/2).flatten())
输出:
[1.5 1.5 7.5]
[[1.5 1.5 7.5]]
我有一个稀疏矩阵 B,我想通过对第一列中的所有行求和,然后将第一列除以“2”,并使其他列为零来从 B 得到稀疏矩阵 A。
from numpy import array
from scipy import csr_matrix
row = array([0,0,1,2,2,2])
col = array([0,2,2,0,1,2])
data = array([1,2,3,4,5,6])
B = csr_matrix( (data,(row,col)), shape=(3,3) )
A = B.copy()
A = A.sum(axis=1)/2
# A shape becomes 1 x 3 instead of 3 x 3 here!
我认为可以通过多种方式解决这个问题。你的没问题。
In [275]: from scipy.sparse import csr_matrix
...:
...: row = np.array([0,0,1,2,2,2])
...: col = np.array([0,2,2,0,1,2])
...: data = np.array([1,2,3,4,5,6.]) # make float
...:
...: B = csr_matrix( (data,(row,col)), shape=(3,3) )
In [276]: A = B.copy()
In [277]: A
Out[277]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
作业有效:
In [278]: A[:,0] = A.sum(axis=1)/2
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [279]: A[:,1:] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [280]: A
Out[280]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 9 stored elements in Compressed Sparse Row format>
In [283]: A.eliminate_zeros()
In [284]: A
Out[284]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [285]: A.A
Out[285]:
array([[1.5, 0. , 0. ],
[1.5, 0. , 0. ],
[7.5, 0. , 0. ]])
效率警告主要是为了阻止迭代或重复赋值。我觉得像这样的一次性动作可以忽略不计。
或者如果我们从全零开始 A
:
In [286]: A = csr_matrix(np.zeros(B.shape)) # may be better method
In [287]: A[:,0] = B.sum(axis=1)/2
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:126: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
In [288]: A
Out[288]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
或者,列和矩阵可用于直接构造 A
,使用与用于定义 B
:
In [289]: A1 = B.sum(axis=1)/2
In [290]: A1
Out[290]:
matrix([[1.5],
[1.5],
[7.5]])
In [296]: row = np.arange(3)
In [297]: col = np.zeros(3,int)
In [298]: data = A1.A1
In [299]: A = csr_matrix((data, (row, col)), shape=(3,3))
In [301]: A
Out[301]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [302]: A.A
Out[302]:
array([[1.5, 0. , 0. ],
[1.5, 0. , 0. ],
[7.5, 0. , 0. ]])
我不知道哪种方法最快。您的 sparse.hstack
看起来不错,但在幕后,hstack
正在从 coo
格式构建 row,col,data
数组,并制作一个新的 coo_matrix
。虽然它很可靠,但并不是特别精简。
我演示了如何直接加载一个稀疏数组
row = np.array([0,0,1,2,2,2])
col = np.array([0,2,2,0,1,2])
data = np.array([1,2,3,4,5,6])
B = csr_matrix( (data,(row,col)), shape=(3,3) )
sparseMatrix = csr_matrix((3, 3),
dtype = np.int).toarray()
my_tuple=list(zip(row,col))
index=0
for item in my_tuple:
sparseMatrix[item[0],item[1]]=data[index]
index+=1
print(sparseMatrix.sum(axis=1)/2)
print((B.sum(axis=1)/2).flatten())
输出:
[1.5 1.5 7.5]
[[1.5 1.5 7.5]]