两个 numpy 数组的笛卡尔积,有条件
Cartesian product of two numpy arrays, with condition
下面给定 1d 数组 w 和 x,我可以使用以下代码形成笛卡尔积。
import numpy as np
w = np.array([1, 2, 3, 4])
x = np.array([1, 2, 3, 4])
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
print(V1)
[[1 1]
[1 2]
[1 3]
[1 4]
[2 1]
[2 2]
[2 3]
[2 4]
[3 1]
[3 2]
[3 3]
[3 4]
[4 1]
[4 2]
[4 3]
[4 4]]
但是,我希望输出 V1 包含 ONLY 数组行,其中 w < x(如下所示)。我可以用循环来做到这一点,但我希望能快一点。
[[1 2]
[1 3]
[1 4]
[2 3]
[2 4]
[3 4]]
方法 #1
给定 ONLY array rows where w < x
,这将用于成对组合,这是实现相同结果的一种方法 -
In [81]: r,c = np.nonzero(w[:,None]<x) # or np.less.outer(w,x)
In [82]: np.c_[w[r], x[c]]
Out[82]:
array([[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
方法 #2
使用纯粹基于掩码的方法,它将是 -
In [93]: mask = np.less.outer(w,x)
In [94]: s = (len(w), len(x))
In [95]: np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
Out[95]:
array([[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
基准测试
使用相对较大的数组:
In [8]: np.random.seed(0)
...: w = np.random.randint(0,1000,(1000))
...: x = np.random.randint(0,1000,(1000))
In [9]: %%timeit
...: r,c = np.nonzero(w[:,None]<x)
...: np.c_[w[r], x[c]]
11.3 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %%timeit
...: mask = np.less.outer(w,x)
...: s = (len(w), len(x))
...: np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
10.5 ms ± 275 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [11]: import itertools
# @Akshay Sehgal's soln
In [12]: %timeit [i for i in itertools.product(w,x) if i[0]<i[1]]
105 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
用 itertools 试试这一行方法 -
import itertools
w = np.array([1, 2, 3, 4])
x = np.array([1, 2, 3, 4])
[i for i in itertools.product(w,x) if i[0]<i[1]]
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
Itertools 非常快 memory-efficient。它应该非常快。
您还可以通过以下方式过滤您自己的解决方案:
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
V1[V1[:,0]<V1[:,1]]
当然,@Divakar 提出的解决方案更快,因为它不进行冗余计算。
使用@Divakar 的 benchit 包进行基准测试:
#@Proposed solution here
def m1(x):
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
return V1[V1[:,0]<V1[:,1]]
#@Divakar's approach 1
def m2(x):
r,c = np.nonzero(w[:,None]<x) # or np.less.outer(w,x)
return np.c_[w[r], x[c]]
#Divakar's approach 2
def m3(x):
mask = np.less.outer(w,x)
s = (len(w), len(x))
return np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
in_ = [np.arange(n) for n in [10,100,1000]]
w = x.copy()
下面给定 1d 数组 w 和 x,我可以使用以下代码形成笛卡尔积。
import numpy as np
w = np.array([1, 2, 3, 4])
x = np.array([1, 2, 3, 4])
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
print(V1)
[[1 1]
[1 2]
[1 3]
[1 4]
[2 1]
[2 2]
[2 3]
[2 4]
[3 1]
[3 2]
[3 3]
[3 4]
[4 1]
[4 2]
[4 3]
[4 4]]
但是,我希望输出 V1 包含 ONLY 数组行,其中 w < x(如下所示)。我可以用循环来做到这一点,但我希望能快一点。
[[1 2]
[1 3]
[1 4]
[2 3]
[2 4]
[3 4]]
方法 #1
给定 ONLY array rows where w < x
,这将用于成对组合,这是实现相同结果的一种方法 -
In [81]: r,c = np.nonzero(w[:,None]<x) # or np.less.outer(w,x)
In [82]: np.c_[w[r], x[c]]
Out[82]:
array([[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
方法 #2
使用纯粹基于掩码的方法,它将是 -
In [93]: mask = np.less.outer(w,x)
In [94]: s = (len(w), len(x))
In [95]: np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
Out[95]:
array([[1, 2],
[1, 3],
[1, 4],
[2, 3],
[2, 4],
[3, 4]])
基准测试
使用相对较大的数组:
In [8]: np.random.seed(0)
...: w = np.random.randint(0,1000,(1000))
...: x = np.random.randint(0,1000,(1000))
In [9]: %%timeit
...: r,c = np.nonzero(w[:,None]<x)
...: np.c_[w[r], x[c]]
11.3 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %%timeit
...: mask = np.less.outer(w,x)
...: s = (len(w), len(x))
...: np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
10.5 ms ± 275 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [11]: import itertools
# @Akshay Sehgal's soln
In [12]: %timeit [i for i in itertools.product(w,x) if i[0]<i[1]]
105 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
用 itertools 试试这一行方法 -
import itertools
w = np.array([1, 2, 3, 4])
x = np.array([1, 2, 3, 4])
[i for i in itertools.product(w,x) if i[0]<i[1]]
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
Itertools 非常快 memory-efficient。它应该非常快。
您还可以通过以下方式过滤您自己的解决方案:
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
V1[V1[:,0]<V1[:,1]]
当然,@Divakar 提出的解决方案更快,因为它不进行冗余计算。
使用@Divakar 的 benchit 包进行基准测试:
#@Proposed solution here
def m1(x):
V1 = np.transpose([np.repeat(w, len(x)), np.tile(x, len(w))])
return V1[V1[:,0]<V1[:,1]]
#@Divakar's approach 1
def m2(x):
r,c = np.nonzero(w[:,None]<x) # or np.less.outer(w,x)
return np.c_[w[r], x[c]]
#Divakar's approach 2
def m3(x):
mask = np.less.outer(w,x)
s = (len(w), len(x))
return np.c_[np.broadcast_to(w[:,None], s)[mask], np.broadcast_to(x, s)[mask]]
in_ = [np.arange(n) for n in [10,100,1000]]
w = x.copy()