获取事件上矩阵

Get incidents upper matrix

我有一个矩阵,每个 row/col 都有一个数字(称为事件),例如:

    9  7  6
9 [[1, 2, 3],
7  [4, 5, 6],
6  [7, 8, 9]]

我想在两个列表中获取上矩阵事件,一个用于行,另一个用于列(因为我的矩阵是对称的)。例如,

row = [9, 9, 9, 7, 7, 6]
col = [9, 7, 6, 7, 6, 6]

我可以为

的行执行此操作
import numpy as np

myIncidents = [9, 7, 6]
row = np.array(myIncidents).repeat(np.arange(len(myIncidents), 0, -1))  # [9, 9, 9, 7, 7, 6]

但我不知道如何为 col 实现这一目标。有什么建议吗?

您可以使用 triu_indices 和高级索引:

Incidents = np.array([9,7,6])
row,col = np.triu_indices(len(Incidents))
row,col = Incidents[row],Incidents[col]

row
# array([9, 9, 9, 7, 7, 6])
col
# array([9, 7, 6, 7, 6, 6])

对于中小型数组 itertools 通常比 numpy 更快:

import itertools as it
np.fromiter(it.chain.from_iterable(it.combinations_with_replacement([9,7,6],2)),int).reshape(2,-1,order="F")
# array([[9, 9, 9, 7, 7, 6],
#        [9, 7, 6, 7, 6, 6]])

这是使用 masking 获得 rowcol 的一种方法 -

def triu_elements(a):
    n = len(a)
    r1 = np.broadcast_to(a,(n,n))
    r2 = np.broadcast_to(a[:,None],(n,n))
    mask = ~np.tri(n,k=-1,dtype=bool)
    return r2[mask],r1[mask]

样本运行-

In [56]: myIncidents = np.array([9,7,6])

In [57]: triu_elements(myIncidents)
Out[57]: (array([9, 9, 9, 7, 7, 6]), array([9, 7, 6, 7, 6, 6]))

时间:在各种数据集上

将@Paul Panzer 的解决方案与此处的 np.triu_indices 进行比较。

第 1 组(小):

In [105]: Incidents = np.random.randint(0,100,(100))

# @Paul Panzer's solution-1
In [106]: %%timeit
     ...: rowID,colID = np.triu_indices(len(Incidents))
     ...: row,col = Incidents[rowID],Incidents[colID]
10000 loops, best of 3: 66.8 µs per loop

# @Paul Panzer's solution-2
In [116]: %timeit np.fromiter(it.chain.from_iterable(it.combinations_with_replacement(Incidents,2)),int).reshape(2,-1,order="F")
1000 loops, best of 3: 259 µs per loop

In [107]: %timeit triu_elements(Incidents)
10000 loops, best of 3: 38.3 µs per loop

设置#2(大):

In [99]: Incidents = np.random.randint(0,100,(1000))

In [100]: %%timeit
     ...: rowID,colID = np.triu_indices(len(Incidents))
     ...: row,col = Incidents[rowID],Incidents[colID]
100 loops, best of 3: 6.24 ms per loop

In [101]: %timeit triu_elements(Incidents)
1000 loops, best of 3: 1.7 ms per loop

第 3 组(非常大):

In [121]: Incidents = np.random.randint(0,100,(10000))

In [122]: %%timeit
     ...: rowID,colID = np.triu_indices(len(Incidents))
     ...: row,col = Incidents[rowID],Incidents[colID]
1 loop, best of 3: 1.08 s per loop

In [123]: %timeit triu_elements(Incidents)
1 loop, best of 3: 421 ms per loop