加速多重排列

Question

我希望加快我的代码的速度，该代码需要大约 80 毫秒才能从 sympy 生成 300 个集合的 multiset_permutations。理想情况下，这只需要几毫秒；而且项目越多，速度越慢。

我怎样才能使我的代码更快？多线程？或者转换为 C？非常感谢任何有关加快速度的帮助。

import numpy as np
from time import monotonic
from sympy.utilities.iterables import multiset_permutations

milli_time = lambda: int(round(monotonic() * 1000))

start_time = milli_time()

num_indices = 5
num_items = 300
indices = np.array([list(multiset_permutations(list(range(num_indices)))) for _ in range(num_items)])

print(indices)    

[[[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]

 [[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]

 [[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]

 ...

 [[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]

 [[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]

 [[0 1 2 3 4]
  [0 1 2 4 3]
  [0 1 3 2 4]
  ...
  [4 3 1 2 0]
  [4 3 2 0 1]
  [4 3 2 1 0]]]

print('Multiset Perms:', milli_time() - start_time, 'milliseconds')

Multiset Perms: 88 milliseconds

** 代码更新以减少 2/3 的额外计算**

import itertools
import numpy as np
from time import time, monotonic
from sympy.utilities.iterables import multiset_permutations

milli_time = lambda: int(round(monotonic() * 1000))

start_time = milli_time()

num_colors = 5
color_range = list(range(num_colors))
total_media = 300

def all_perms(elements):
    if len(elements) <= 1:
        yield elements  # Only permutation possible = no permutation
    else:
        # Iteration over the first element in the result permutation:
        for (index, first_elmt) in enumerate(elements):
            other_elmts = elements[:index]+elements[index+1:]
            for permutation in all_perms(other_elmts): 
                yield [first_elmt] + permutation

multiset = list(multiset_permutations(color_range))
# multiset = list(itertools.permutations(color_range))
# multiset = list(all_perms(color_range))

_range = range(total_media)
perm_indices = np.array([multiset for _ in _range])

print('Multiset Perms:', milli_time() - start_time)

Multiset Perms: 34 milliseconds

Answer 1

首先，您不需要重新计算排列。

此外，np.array([multiset for _ in _range]) 很昂贵，因为 Numpy 必须转换 multiset total_media 次。您可以使用 np.array([multiset]).repeat(total_media, axis=0).

解决该问题

最后，sympy 并不是执行此类计算的最快实现。更快的实现包括使用 itertools 代替：

num_colors = 5
total_media = 300
color_range = list(range(num_colors))
multiset = list(set(itertools.permutations(color_range)))
perm_indices = np.array([multiset], dtype=np.int32).repeat(total_media, axis=0)

但是，这个基于 itertools 的实现不保留排列的顺序。如果这很重要，您可以在从 multiset 转换而来的 Numpy 数组上使用 np.sort（使用特定轴并在应用 repeat 之前）。

在我的机器上，这大约需要 0.15 毫秒。

加速多重排列

Speed up Multiset Permutations

python

numpy

permutation