将元组列表转换为稀疏 coo_matrix 或直接排序

Question

我最初的问题是对名为 weights.

的 SciPy 稀疏 coo_matrix 进行排序

在排序之前，它看起来像：

  (0, 1)    2.0
  (0, 3)    4.0
  (1, 0)    5.0
  (3, 3)    1.0
  (0, 0)    5.0
  (2, 4)    1.0
  (1, 2)    2.0
  (1, 4)    2.0
  (0, 2)    3.0
  (0, 4)    1.0
  (2, 0)    5.0
  (4, 3)    3.0
  (3, 4)    3.0
  (2, 1)    0.0
  (3, 0)    5.0
  (3, 2)    0.0
  (2, 2)    0.0
  (4, 4)    0.0
  (4, 0)    0.0
  (3, 1)    2.0

我想要以下（最终）结果

  (0, 0)    5.0
  (0, 1)    2.0
  (0, 2)    3.0
  (0, 3)    4.0
  (0, 4)    1.0
  (1, 0)    5.0
  (1, 2)    2.0
  (1, 4)    2.0
  (2, 0)    5.0
  (2, 1)    0.0
  (2, 2)    0.0
  (2, 4)    1.0
  (3, 0)    5.0  
  (3, 1)    2.0
  (3, 2)    0.0
  (3, 3)    1.0 
  (3, 4)    3.0
  (4, 0)    0.0
  (4, 3)    3.0
  (4, 4)    0.0

我尝试这样做的：

weights_tuples = zip(weights.row, weights.col, weights.data)
sorted_weights_tuples = sorted(train_weights_tuples, key=lambda x: (x[0], x[1]))

这确实对它进行了排序，但没有产生正确的输出格式：

[(0, 0, 5.0), (0, 1, 2.0), (0, 3, 4.0), (0, 4, 1.0), (1, 1, 4.0), (1, 2, 2.0), (1, 4, 2.0), (2, 0, 5.0), (2, 1, 0.0), (2, 2, 0.0), (2, 3, 0.0), (3, 0, 5.0), (3, 1, 2.0), (3, 2, 0.0), (3, 3, 1.0), (3, 4, 3.0), (4, 0, 0.0), (4, 1, 0.0), (4, 2, 2.0), (4, 3, 3.0)]

我的问题是，如何将得到的结果转换成正确的格式，或者有没有更好的方法对coo_matrix进行排序，直接得到正确的输出格式

提前致谢。

Answer 1

@hpaulj 的评论给出了相关提示，但这里是如何使用它而不依赖 coo_matrix 上方法的实现细节，并且没有转换为 CSR 的开销：

首先，使用 np.lexsort 获得产生正确顺序的排列，然后使用将稀疏表示作为输入的初始化程序创建一个新的 coo_matrix：

order = np.lexsort((weights.col, weights.row))
sorted_weights = coo_matrix((weights.data[order], (weights.row[order], weights.col[order])),
                            shape=weights.shape)

如果您不介意让您的代码更晦涩一点，您可以通过将 np.lexsort 替换为

来提高性能

np.argsort(N * rows + cols)

其中 N 是行数。

请注意，正如@hpaulj 在他的评论中所说，coo_matrix.sum_duplicates 就地执行此操作，并相应地设置 coo_matrix.has_canonical_format 以指示行和列已排序。但是，在 sum_duplicates 的实现中，weights.col 和 weights.row 已在对 np.lexsort 的调用中进行了交换，因此开箱即用将使您获得相反的效果你要。这也意味着标志 has_canonical_format 实际上并没有确定唯一的格式，即 already noted as a bug on GitHub.

将元组列表转换为稀疏 coo_matrix 或直接排序

Convert list of tuples to sparse coo_matrix or sort it directly

python

sorting

scipy

sparse-matrix