Numpy根据阈值改变元素，然后逐个元素相加

Question

我有 3247 个 197x10 维矩阵。我需要扫描它们，如果一个值大于 1，则将其设置为 1。如果一个值小于或等于 1，我想将其设置为零。然后，我必须把这个修改后的矩阵和它添加到其他 3246 个集合的修改后的矩阵中。这是我目前所拥有的：

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"
    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)
    m_cur = cur_resfile

    m_cur[m_cur <= 1] = 0
    m_cur[m_cur > 1 ] = 1

    m_ongoing = m_ongoing + m_cur

我希望 m_ongoing 保存正在进行的运行ning 总和，以便我可以将其保存到文件中。但是，它不起作用，似乎只是在循环中写入最后一个 m_cur 。如果我运行循环总共 3 次，有些单元格相互都是 1，所以我希望有几个 3。我肯定期待很多 2，但我只看到 1 和 0。

完成我想做的事情的最佳方法是什么？

-根据条件更改值

-获取大量矩阵并逐个添加所有元素以创建每个单元格的运行宁和。

Answer 1

你可以使用 numpy.clip()

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"

    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)

    m_ongoing += cur_resfile.clip(0,1)

编辑回答被问到的问题：

m_ongoing = np.zeros((197,10))

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"
    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)

    # add one to the places where cur_file > 1
    m_ongoing[cur_resfile > 1] += 1

Answer 2

正如@RootTwo 所暗示的，clip() 是一个很好的 numpy 内置函数。但出于性能原因，您可以对数据的 3D "stack" 使用矢量化操作。

示例：

import numpy as np
#simulating your data as a list of 3247 2D matrices, each 197x10
some_data = [np.random.randint(-2,2,(197,10)) for _i in range(3247)]
#stack the matrices
X = np.dstack(some_data)
print(X.shape)

(197, 10, 3247)

Y = X.clip(0,1)
Z = Y.sum(axis=2)
#Z is now the output you want!
print(Z.shape)

(197, 10)

编辑：添加计时结果，并更改我的答案

所以看来我建议创建深度堆栈并使用剪辑和求和函数的单个应用程序是不明智的。我运行进行了一些计时测试，发现增量方法更快，这很可能是由于分配大 3D 数组的分配时间开销。

这是测试，我在其中排除了数据加载方面的因素，因为无论哪种方式都是一样的。这是比较 ipython 和 %timeit 宏中两种方法的结果。

import numpy as np
# some_data is simulated as in the above code sample
def f1(some_data):
    x = some_data[0]
    x = x.clip(0,1)
    for y in some_data[1:]:
        x += y.clip(0,1)
    return x

def f2(some_data):
    X = np.dstack(some_data)
    X = X.clip(0,1)
    X = X.sum(axis=2)
    return X

%timeit x1 = f1(some_data)

10 loops, best of 3: 28.1 ms per loop

%timeit x2 = f2(some_data)

10 loops, best of 3: 103 ms per loop

因此，与堆叠数据后作为单个操作相比，通过增量执行该过程，速度提高了 3.7 倍。

Numpy根据阈值改变元素，然后逐个元素相加

Numpy change elements based on threshold and then do element by element addition

python

numpy

matrix

addition