获取事件上矩阵
Get incidents upper matrix
我有一个矩阵,每个 row/col 都有一个数字(称为事件),例如:
9 7 6
9 [[1, 2, 3],
7 [4, 5, 6],
6 [7, 8, 9]]
我想在两个列表中获取上矩阵事件,一个用于行,另一个用于列(因为我的矩阵是对称的)。例如,
row = [9, 9, 9, 7, 7, 6]
col = [9, 7, 6, 7, 6, 6]
我可以为
的行执行此操作
import numpy as np
myIncidents = [9, 7, 6]
row = np.array(myIncidents).repeat(np.arange(len(myIncidents), 0, -1)) # [9, 9, 9, 7, 7, 6]
但我不知道如何为 col
实现这一目标。有什么建议吗?
您可以使用 triu_indices
和高级索引:
Incidents = np.array([9,7,6])
row,col = np.triu_indices(len(Incidents))
row,col = Incidents[row],Incidents[col]
row
# array([9, 9, 9, 7, 7, 6])
col
# array([9, 7, 6, 7, 6, 6])
对于中小型数组 itertools
通常比 numpy 更快:
import itertools as it
np.fromiter(it.chain.from_iterable(it.combinations_with_replacement([9,7,6],2)),int).reshape(2,-1,order="F")
# array([[9, 9, 9, 7, 7, 6],
# [9, 7, 6, 7, 6, 6]])
这是使用 masking
获得 row
和 col
的一种方法 -
def triu_elements(a):
n = len(a)
r1 = np.broadcast_to(a,(n,n))
r2 = np.broadcast_to(a[:,None],(n,n))
mask = ~np.tri(n,k=-1,dtype=bool)
return r2[mask],r1[mask]
样本运行-
In [56]: myIncidents = np.array([9,7,6])
In [57]: triu_elements(myIncidents)
Out[57]: (array([9, 9, 9, 7, 7, 6]), array([9, 7, 6, 7, 6, 6]))
时间:在各种数据集上
将@Paul Panzer 的解决方案与此处的 np.triu_indices
进行比较。
第 1 组(小):
In [105]: Incidents = np.random.randint(0,100,(100))
# @Paul Panzer's solution-1
In [106]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
10000 loops, best of 3: 66.8 µs per loop
# @Paul Panzer's solution-2
In [116]: %timeit np.fromiter(it.chain.from_iterable(it.combinations_with_replacement(Incidents,2)),int).reshape(2,-1,order="F")
1000 loops, best of 3: 259 µs per loop
In [107]: %timeit triu_elements(Incidents)
10000 loops, best of 3: 38.3 µs per loop
设置#2(大):
In [99]: Incidents = np.random.randint(0,100,(1000))
In [100]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
100 loops, best of 3: 6.24 ms per loop
In [101]: %timeit triu_elements(Incidents)
1000 loops, best of 3: 1.7 ms per loop
第 3 组(非常大):
In [121]: Incidents = np.random.randint(0,100,(10000))
In [122]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
1 loop, best of 3: 1.08 s per loop
In [123]: %timeit triu_elements(Incidents)
1 loop, best of 3: 421 ms per loop
我有一个矩阵,每个 row/col 都有一个数字(称为事件),例如:
9 7 6
9 [[1, 2, 3],
7 [4, 5, 6],
6 [7, 8, 9]]
我想在两个列表中获取上矩阵事件,一个用于行,另一个用于列(因为我的矩阵是对称的)。例如,
row = [9, 9, 9, 7, 7, 6]
col = [9, 7, 6, 7, 6, 6]
我可以为
的行执行此操作import numpy as np
myIncidents = [9, 7, 6]
row = np.array(myIncidents).repeat(np.arange(len(myIncidents), 0, -1)) # [9, 9, 9, 7, 7, 6]
但我不知道如何为 col
实现这一目标。有什么建议吗?
您可以使用 triu_indices
和高级索引:
Incidents = np.array([9,7,6])
row,col = np.triu_indices(len(Incidents))
row,col = Incidents[row],Incidents[col]
row
# array([9, 9, 9, 7, 7, 6])
col
# array([9, 7, 6, 7, 6, 6])
对于中小型数组 itertools
通常比 numpy 更快:
import itertools as it
np.fromiter(it.chain.from_iterable(it.combinations_with_replacement([9,7,6],2)),int).reshape(2,-1,order="F")
# array([[9, 9, 9, 7, 7, 6],
# [9, 7, 6, 7, 6, 6]])
这是使用 masking
获得 row
和 col
的一种方法 -
def triu_elements(a):
n = len(a)
r1 = np.broadcast_to(a,(n,n))
r2 = np.broadcast_to(a[:,None],(n,n))
mask = ~np.tri(n,k=-1,dtype=bool)
return r2[mask],r1[mask]
样本运行-
In [56]: myIncidents = np.array([9,7,6])
In [57]: triu_elements(myIncidents)
Out[57]: (array([9, 9, 9, 7, 7, 6]), array([9, 7, 6, 7, 6, 6]))
时间:在各种数据集上
将@Paul Panzer 的解决方案与此处的 np.triu_indices
进行比较。
第 1 组(小):
In [105]: Incidents = np.random.randint(0,100,(100))
# @Paul Panzer's solution-1
In [106]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
10000 loops, best of 3: 66.8 µs per loop
# @Paul Panzer's solution-2
In [116]: %timeit np.fromiter(it.chain.from_iterable(it.combinations_with_replacement(Incidents,2)),int).reshape(2,-1,order="F")
1000 loops, best of 3: 259 µs per loop
In [107]: %timeit triu_elements(Incidents)
10000 loops, best of 3: 38.3 µs per loop
设置#2(大):
In [99]: Incidents = np.random.randint(0,100,(1000))
In [100]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
100 loops, best of 3: 6.24 ms per loop
In [101]: %timeit triu_elements(Incidents)
1000 loops, best of 3: 1.7 ms per loop
第 3 组(非常大):
In [121]: Incidents = np.random.randint(0,100,(10000))
In [122]: %%timeit
...: rowID,colID = np.triu_indices(len(Incidents))
...: row,col = Incidents[rowID],Incidents[colID]
1 loop, best of 3: 1.08 s per loop
In [123]: %timeit triu_elements(Incidents)
1 loop, best of 3: 421 ms per loop