Cython 从现有数组和变量创建新数组的最快方法是什么

Question

假设我有一个数组

from array import array
myarr = array('l', [1, 2, 3])

和一个变量： myvar = 4 创建新数组的最快方法是什么：

newarray = array('l', [1, 2, 3, 4])

您可以假设所有元素都是 'long' 类型

我尝试创建一个新数组并使用 array.append() 不确定它是否最快。我正在考虑使用 memoryview，例如： malloc(4*sizeof(long)) 但我不知道如何将较短的数组复制到内存视图的一部分。然后将最后一个元素插入最后一个位置。

我对 Cython 还很陌生。感谢您的帮助！

更新：我比较了以下三种方法：

Cython：[100000 次循环，3 次循环中的最佳次数：每次循环 5.94 微秒]

from libc.stdlib cimport malloc

def cappend(long[:] arr, long var, size_t N):
    cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long))
    result.base[:N] = arr
    result.base[N] = var
    return result

数组： [1000000 次循环，最好的 3 次：每次循环 1.21 微秒]

from array import array
import copy
def pyappend(arr, x):
    result = copy.copy(arr)
    result.append(x)
    return result

list append: [1000000 次循环，最好的 3 次：每次循环 480 ns]

def pylistappend(lst, x):
    result = lst[:]
    result.append(x)
    return result

有希望改进cython部分，打败array one吗？

Answer 1

Cython 比 "normal" python 更能让我们访问 array.array 的内部结构，因此我们可以利用它来加速代码：

对于您的小示例，几乎 7（通过消除大部分开销）。
对于更大的输入，通过消除不必要的数组复制，2 系数。

继续阅读以了解更多详情。

尝试为如此小的输入优化函数有点不寻常，但并非没有（至少理论上）兴趣。

所以让我们从您的函数作为基准开始：

a=array('l', [1,2,3])
%timeit pyappend(a, 8)
1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

lst=[1,2,3]
%timeit pylistappend(lst, 8)
279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

我们必须知道：我们衡量的不是复制成本，而是开销成本（python解释器、调用函数等），例如a 有 3 个或 5 个元素：

a=array('l', range(5))
%timeit pyappend(a, 8)
1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

在数组版本中，我们有更多的开销，因为我们通过 copy 模块进行了间接访问，我们可以尝试消除它：

def pyappend2(arr, x): 
      result = array('l',arr)                   
      result.append(x)                               
      return result

%timeit pyappend2(a, 8)
496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

那更快。现在让我们使用 cython - 这将消除解释器成本：

 %%cython
 def cylistappend(lst, x):      
     result = lst[:]                                  
     result.append(x)                            
     return result

 %%cython
 from cpython cimport array
 def cyappend(array.array arr, long long int x):
      cdef array.array res = array.array('l', arr)
      res.append(x)
      return res   

 %timeit cylistappend(lst, 8)
 193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
 %%timeit cyappend(a, 8)
 421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cython 版本 list 快约 33%，array 快约 10%。构造函数 array.array() 需要一个可迭代对象，但我们已经有了一个 array.array，因此我们使用 cpython 中的功能来访问 array.array 对象的内部并改善这种情况一点点：

 %%cython
 from cpython cimport array
 def cyappend2(array.array arr, long long int x):
     cdef array.array res = array.copy(arr)
     res.append(x)
     return res

 %timeit cyappend2(a, 8)
 305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

下一步我们需要知道array.array如何追加元素：通常，it over-allocates, so append() has amortized cost O(1), however after array.copy the new array is exactly the needed number of elements and the next append invokes reallocation. We need to change that (see here用于描述使用的函数）：

  %%cython
  from cpython cimport array
  from libc.string cimport memcpy
  def cyappend3(array.array arr, long long int x):
           cdef Py_ssize_t n=len(arr)
           cdef array.array res = array.clone(arr,n+1,False)
           memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy..
           res.data.as_longlongs[n]=x
           return res

 %timeit cyappend3(a, 8)
 154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

与您的函数类似，内存分配过多，因此我们不再需要调用 resize()。现在我们比 list 更快，比原始 python 版本快近 7 倍。

让我们比较一下更大数组大小的时间（a=array('l',range(1000))、lst=list(range(1000))，其中复制数据占用了大部分运行时间：

  pyappend             1.84 µs  #copy-module is slow!
  pyappend2            1.02 µs
  cyappend             0.94 µs  #cython no big help - we are copying twice
  cyappend2            0.90 µs  #still copying twice
  cyappend3            0.43 µs  #copying only once -> twice as fast!

  pylistappend         4.09 µs # needs to increment refs of integers
  cylistappend         3.85 µs # the same as above

现在，消除 array.array 不必要的副本可以得到预期的因子 2。

对于更大的数组（10000 个元素），我们看到以下内容：

  pyappend             6.9 µs  #copy-module is slow!
  pyappend2            4.8 µs
  cyappend2            4.4 µs  
  cyappend3            4.4 µs

版本之间不再有差异（如果丢弃慢速复制模块）。这样做的原因是 array.array 对如此大量的元素改变了行为：当复制它时过度分配从而避免在第一个 append().

之后重新分配

我们可以很容易地检查它：

b=array('l', array('l', range(10**3)))#emulate our functions
b.buffer_info()
[] (94481422849232, 1000)
b.append(1)
b.buffer_info()
[] (94481422860352, 1001) # another pointer address -> reallocated
...
b=array('l', array('l', range(10**4)))
b.buffer_info()
[](94481426290064, 10000)
b.append(33)
b.buffer_info()
[](94481426290064, 10001) # the same pointer address -> no reallocation!

Cython 从现有数组和变量创建新数组的最快方法是什么

What is the fastest way in Cython to create a new array from an existing array and a variable

python

arrays

cpython

cython