不同行大小的numpy填充矩阵

Question

我有一个不同行大小的 numpy 数组

a = np.array([[1,2,3,4,5],[1,2,3],[1]])

我想把这个变成一个密集的（固定的 n x m 大小，没有可变行）矩阵。直到现在我尝试过这样的东西

size = (len(a),5)    
result = np.zeros(size)
result[[0],[len(a[0])]]=a[0]

但是我收到一条错误消息告诉我

shape mismatch: value array of shape (5,) could not be broadcast to indexing result of shape (1,)

我也尝试使用 np.pad 进行填充，但根据 numpy.pad 的文档，我似乎需要在 pad_width 中指定行的先前大小（这是可变的，并且在尝试使用 -1,0 和最大行大小时产生了错误。

我知道我可以按照显示的那样每行填充填充列表 here，但我需要使用更大的数据数组来做到这一点。

如果有人能帮我解答这个问题，我将很高兴知道。

Answer 1

确实没有办法填充 jagged array 以使其松散锯齿状，而不必遍历数组的行。您甚至必须遍历数组两次：一次找出您需要填充的最大长度，另一次实际进行填充。

您链接到的代码提案将完成工作，但效率不高，因为它在迭代行元素的 python for 循环中添加了零，而追加可以预先计算，从而将更多的代码推送到 C。

下面的代码预先计算了一个所需最小维度的数组，用零填充，然后简单地将锯齿状数组 M 中的行添加到位，这样效率更高。

import random
import numpy as np
M = [[random.random() for n in range(random.randint(0,m))] for m in range(10000)] # play-data

def pad_to_dense(M):
    """Appends the minimal required amount of zeroes at the end of each 
     array in the jagged array `M`, such that `M` looses its jagedness."""

    maxlen = max(len(r) for r in M)

    Z = np.zeros((len(M), maxlen))
    for enu, row in enumerate(M):
        Z[enu, :len(row)] += row 
    return Z

给你一些关于速度的想法：

from timeit import timeit
n = [10, 100, 1000, 10000]
s = [timeit(stmt='Z = pad_to_dense(M)', setup='from __main__ import pad_to_dense; import numpy as np; from random import random, randint; M = [[random() for n in range(randint(0,m))] for m in range({})]'.format(ni), number=1) for ni in n]
print('\n'.join(map(str,s)))
# 7.838103920221329e-05
# 0.0005027339793741703
# 0.01208890089765191
# 0.8269036808051169

如果您想在数组前添加零而不是追加，这是对代码的一个足够简单的更改，我将留给您。

Answer 2

你可以用 numpy.pad

做这样的事情

import numpy as np
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
l = np.array([len(a[i]) for i in range(len(a))])
width = l.max()
b=[]
for i in range(len(a)):
    if len(a[i]) != width:
        x = np.pad(a[i], (0,width-len(a[i])), 'constant',constant_values = 0)
    else:
        x = a[i]
    b.append(x)
b = np.array(b)
print(b)

以上代码输出类似这样的内容。

b = [[1, 2, 3, 4, 5],
     [1, 2, 3, 0, 0],
     [1, 0, 0, 0, 0]]

您可以通过执行以下操作来回读您输入的数据版本

a = []
for i in range(len(b)):
    a.append(b[i][0:l[i]])
a = np.array(a)
print(a)

你在哪里得到以下输出

a = array([array([1, 2, 3, 4, 5]), array([1, 2, 3]), array([1])], dtype=object)

希望这可以帮助像我一样努力解决问题的人。谢谢。

不同行大小的numpy填充矩阵

numpy padding matrix of different row size

python

arrays

numpy

rows

padding