如何用单个 numpy 数组操作替换这个三重 For 循环?

How to replace this triple For loop with a single numpy array operation?

对于回合制游戏,我想计算每个玩家可以在地图的每个区域移动或生成的最大单位数。

我需要计算的所有数据都已经存储在几个 numpy 数组中,但我正在努力寻找高级数组索引技术来尽可能快地进行计算。

为了帮助解决这个问题,我用一些 For 循环以最简单的方式重写了函数:

import numpy as np

def get_max_units_on_zone_per_player(unitCountPerPlayer, zoneOwner, playerAvailableUnits, zoneLinks, blockedMovesPerPlayer):
    """
    Parameters
    ----------
    unitCountPerPlayer: np.array((zoneCount, playerCount), dtype=int)
        How many units each player has on a zone
    zoneOwner: np.array(zoneCount, dtype=int)
        Which player is owning a zone (-1 for none)
    playerAvailableUnits: np.array(playerCount, dtype=int)
        How many units each player can spawn
    zoneLinks: np.array((zoneCount, zoneCount), dtype=int)
        > 0 if zone1 is connected to zone2 (directed and weighted graph)
    blockedMovesPerPlayer: np.array((playerCount, zoneCount, zoneCount), dtype=bool)
        True if player can not move from zone1 to zone2

    Returns
    -------
    np.array((zoneCount, playerCount), dtype=int)
        Maximum count of units each player can have on each zone
    """

    zoneCount, playerCount = unitCountPerPlayer.shape

    # Adding units already on zone
    result = np.zeros((zoneCount, playerCount), dtype=int) + unitCountPerPlayer

    for p in xrange(playerCount):
        for z1 in xrange(zoneCount):

            if zoneOwner[z1] in (-1, p):
                # Player can spawn on neutral or owned zones
                result[z1, p] += playerAvailableUnits[p]

            for z2 in xrange(zoneCount):

                if zoneLinks[z1, z2] > 0 and not blockedMovesPerPlayer[p, z1, z2]:
                    # If z1 and z2 are connected and player can move from z1 to z2, adding units count on z1 to z2
                    result[z2, p] += unitCountPerPlayer[z1, p]
    return result

问题是我无法使用这个每次调用大约需要 30 毫秒的函数,而且我确信它是可重写的,因为一些 numpy 操作应该少于 5 毫秒来处理。

有人可以帮我解决这个问题吗?还有一步一步的过程,以便下次我可以自己做吗?我已经多次阅读 numpy 关于数组和索引的文档,但它远未 crystal 清楚,我就是想不通。

编辑:根据要求,这里有一些可以用作样本的随机数据:

zoneCount=8 ; playerCount=2

unitCountPerPlayer:
[[1 2]
 [1 3]
 [1 3]
 [3 2]
 [1 2]
 [3 2]
 [0 2]
 [3 2]]

zoneOwner:
[ 1  0 -1 -1 -1  0 -1 -1]

playerAvailableUnits:
[2 2]

zoneLinks:
[[0 1 1 1 0 1 0 0]
 [1 0 0 1 0 0 0 1]
 [1 1 1 1 0 1 0 1]
 [0 1 1 1 1 0 1 0]
 [0 0 1 1 1 0 1 1]
 [0 0 1 1 1 1 1 1]
 [1 0 0 0 0 1 0 1]
 [1 1 1 1 0 1 1 1]]

blockedMovesPerPlayer:
[[[False False False False False False False False]
  [ True False False False False False False False]
  [ True False False False False False False False]
  [False False False False False False False False]
  [False False False False False False False False]
  [False False False False False False False False]
  [ True False False False False False False False]
  [ True False False False False False False False]]

 [[False  True False False False  True False False]
  [False False False False False False False False]
  [False  True False False False  True False False]
  [False  True False False False False False False]
  [False False False False False False False False]
  [False False False False False  True False False]
  [False False False False False  True False False]
  [False  True False False False  True False False]]]

get_max_units_on_zone_per_player():
[[ 1 14]
 [11  3]
 [15 18]
 [18 20]
 [10 10]
 [13  2]
 [12 12]
 [14 18]]

Copy/paste-able数据:

zoneCount = 8
playerCount = 2

unitCountPerPlayer = np.array([[1,2], [1,3], [1,3], [3,2],
                               [1,2], [3,2], [0,2], [3,2]])

zoneOwner = np.array([1, 0, -1, -1, -1, 0, -1, -1])

playerAvailableUnits = np.array([2,2])

zoneLinks = np.array([[0,1,1,1,0,1,0,0], [1,0,0,1,0,0,0,1],
                      [1,1,1,1,0,1,0,1], [0,1,1,1,1,0,1,0],
                      [0,0,1,1,1,0,1,1], [0,0,1,1,1,1,1,1],
                      [1,0,0,0,0,1,0,1], [1,1,1,1,0,1,1,1]])

bmpp = [[[False, False, False, False, False, False, False, False],
         [ True, False, False, False, False, False, False, False],
         [ True, False, False, False, False, False, False, False],
         [False, False, False, False, False, False, False, False],
         [False, False, False, False, False, False, False, False],
         [False, False, False, False, False, False, False, False],
         [ True, False, False, False, False, False, False, False],
         [ True, False, False, False, False, False, False, False]],
        [[False,  True, False, False, False,  True, False, False],
         [False, False, False, False, False, False, False, False],
         [False,  True, False, False, False,  True, False, False],
         [False,  True, False, False, False, False, False, False],
         [False, False, False, False, False, False, False, False],
         [False, False, False, False, False,  True, False, False],
         [False, False, False, False, False,  True, False, False],
         [False,  True, False, False, False,  True, False, False]]]
blockedMovesPerPlayer = np.array(bmpp)

[更新:实施numpy方式,避免for循环]

这是我的 new 实现 get_max_units_on_zone_per_player():

def get_max_units_on_zone_per_player(unitCountPerPlayer, zoneOwner, playerAvailableUnits, zoneLinks, blockedMovesPerPlayer):
    result = unitCountPerPlayer.copy()
    result[zoneOwner < 0] += playerAvailableUnits
    _z1 = np.where(zoneOwner >= 0)
    result[_z1, zoneOwner[_z1]] += playerAvailableUnits[zoneOwner[_z1]]
    _p, _z1, _z2 = np.where(np.logical_and(zoneLinks > 0, np.logical_not(blockedMovesPerPlayer)))
    np.add.at(result, [_z2, _p], unitCountPerPlayer[_z1, _p])
    return result

我使用以下设置测试了这两个实现:

zoneCount = 100
playerCount = 1000
maxUnits = 500

unitCountPerPlayer = np.random.randint(0, maxUnits, size=(zoneCount, playerCount))
zoneOwner = np.random.randint(-1, playerCount, size=zoneCount)
playerAvailableUnits = np.random.randint(0, maxUnits, size=playerCount)
zoneLinks = np.random.randint(0, maxUnits, size=(zoneCount, zoneCount))
blockedMovesPerPlayer = np.random.randint(0, 2, size=(playerCount, zoneCount, zoneCount), dtype=bool)

这是测试结果(%timeit

  • fbparis 的原始实现:

    每个循环 7.27 秒 ± 10 毫秒(7 次运行的平均值 ± 标准偏差,每次 1 个循环)

  • 我的实施:

    每个循环 645 ms ± 490 µs(7 次运行的平均值 ± 标准偏差,每次 1 个循环)