使用 Numpy 查找数组中的行组合，使得每一列总和为相同的值

Question

我正在尝试使用 numpy 查找矩阵中行的配置，以便对行的列求和将得到相同的值。例如，对于 matrix/array

[[0,0,0,1],
 [1,0,1,0],
 [1,1,0,0],
 [0,1,0,0]]

我想将第一行、第二行和最后一行作为输出，因为

  0,0,0,1
  1,0,1,0
  0,1,0,0 +
  -------
= 1,1,1,1

numpy 是否有任何内置工具可以帮助我获得它？

Answer 1

一种解决方案是枚举行的幂集，然后检查求和条件的每个可能的行子集。对于具有大量行的矩阵，这可能会很慢。

使用幂集的标准 itertools 配方：

from itertools import chain, combinations

def powerset(iterable):
    xs = list(iterable)
    return chain.from_iterable(combinations(xs, n) for n in range(len(xs) + 1))

然后我展示了一个包含一些合成数据的工作示例：

In [79]: data
Out[79]: 
array([[0, 1, 1],
       [0, 0, 1],
       [1, 0, 1],
       [0, 1, 1],
       [0, 0, 0],
       [0, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1],
       [0, 1, 0]], dtype=int32)

In [80]: def is_constant(array):
    ...:     return (array == array[0]).all()
    ...: 

In [81]: solution = []

In [82]: for candidate in powerset(range(len(data))):
    ...:     if candidate and is_constant(data[candidate, :].sum(axis=0)):
    ...:         solution.append(candidate)
    ...:

其中显示，例如：

In [83]: solution
Out[83]: 
[(4,),
 (6,),
 (8,),
 (1, 7),
 (2, 5),
 (2, 9),
 (4, 6),
 (4, 8),
 (6, 8),
 (0, 2, 7),
 (1, 4, 7),
 (1, 6, 7),
 (1, 7, 8),
 (2, 3, 7),
 (2, 4, 5),
 (2, 4, 9),
 (2, 5, 6),
 (2, 5, 8),
 (2, 6, 9),
 (2, 8, 9),
 (4, 6, 8),
 (0, 2, 4, 7),
 (0, 2, 6, 7),
 (0, 2, 7, 8),
 (1, 2, 5, 7),
 (1, 2, 7, 9),
 (1, 4, 6, 7),
 (1, 4, 7, 8),
 (1, 6, 7, 8),
 (2, 3, 4, 7),
 (2, 3, 6, 7),
 (2, 3, 7, 8),
 (2, 4, 5, 6),
 (2, 4, 5, 8),
 (2, 4, 6, 9),
 (2, 4, 8, 9),
 (2, 5, 6, 8),
 (2, 6, 8, 9),
 (0, 2, 4, 6, 7),
 (0, 2, 4, 7, 8),
 (0, 2, 6, 7, 8),
 (1, 2, 4, 5, 7),
 (1, 2, 4, 7, 9),
 (1, 2, 5, 6, 7),
 (1, 2, 5, 7, 8),
 (1, 2, 6, 7, 9),
 (1, 2, 7, 8, 9),
 (1, 4, 6, 7, 8),
 (2, 3, 4, 6, 7),
 (2, 3, 4, 7, 8),
 (2, 3, 6, 7, 8),
 (2, 4, 5, 6, 8),
 (2, 4, 6, 8, 9),
 (0, 2, 4, 6, 7, 8),
 (1, 2, 4, 5, 6, 7),
 (1, 2, 4, 5, 7, 8),
 (1, 2, 4, 6, 7, 9),
 (1, 2, 4, 7, 8, 9),
 (1, 2, 5, 6, 7, 8),
 (1, 2, 6, 7, 8, 9),
 (2, 3, 4, 6, 7, 8),
 (1, 2, 4, 5, 6, 7, 8),
 (1, 2, 4, 6, 7, 8, 9)]

我们可以验证其中一些情况的解决方案：

In [84]: data[(1, 2, 4, 6, 7, 8, 9), :].sum(axis=0)
Out[84]: array([4, 4, 4])

In [85]: data[(0, 2, 4, 6, 7), :].sum(axis=0)
Out[85]: array([3, 3, 3])

要将其扩展到更具体的用例，您可以使用 itertools.combinations 生成仅特定大小的子集，例如恰好 2 行或恰好 3 行的子集等。

或者您可以从我示例中给出的结果集中过滤掉不需要的结果（例如一次包含一行的琐碎解决方案）。

请注意，您可以简化 powerset 的函数定义（我使用的函数定义实际上取自 Python 关于 itertools 配方的文档）。您可以传递一个整数并直接跳到 return 最后的 chain.from_iterable 结果，然后修改为只传递 len(data) 作为参数，而不是传递一个转换为列表的迭代器powerset 在我的例子中，像这样：

from itertools import chain, combinations

def powerset(N):
    """Power set of integers {0, ..., N-1}."""
    xs = list(range(N))
    return chain.from_iterable(combinations(xs, n) for n in range(N + 1))

...
for candidate in powerset(len(data)):
    ...

使用 Numpy 查找数组中的行组合，使得每一列总和为相同的值

Using Numpy to find combination of rows in an array such that each column sums to the same value

python

numpy

linear-algebra

data-science