使用 numpy 以矢量化形式转换 for 循环函数

Question

我试图通过使用 numpy 数组来使我的程序更快，但是我一直尝试以向量的形式修改 vanilla python 它给了我错误。我如何向量化代码，这样我就不必使用下面的 for loop.In for 循环代码我有线性回归和标准偏差公式，它们取决于要计算的 PC_list 值。

PC_list= [457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000]

#x_mean and x_squared is used for the lin regressions and stand dev
x_mean = number/2*(1 + number)
x_squared_mean = number*(number+1)*(2*number+1)/6

for i in range(len(PC_list)-number):

    y_mean = sum(PC_list[i:i+number])/number   
    xy_mean = sum([x * (i + 1) for i, x in enumerate(PC_list[i:i+number])])/number

    #Linear regression slope(m) and b vert shift
    m = (x_mean* y_mean- xy_mean)/((x_mean)**2- x_squared_mean)
    b = y_mean - m*x_mean

    #Standard Dev function = square root((first list value - y_mean)+(second list value - y_mean) + (third list value - y_mean)/n-1)
    std = (sum([(k - y_mean)**2 for k in PC_list[i:i+number]])/(number-1))**0.5

    #Upper and lower boundary calculations 
    Upper_Boundary = round((m*(i)+b + Upper*std),1)
    Lower_Boundary = round((m*(i)+b + Lower*std),1)

    #appends the upper and lower boundary to a list
    upper.append(Upper_Boundary)
    lower.append(Lower_Boundary)
    
    
    #Boundary x and y positions appended in list for graphing
    Boundary_x = number + i
    Boundary_x_list.append(Boundary_x)

Answer 1

Python 和 Numpy 在这里很好地实现了简单线性回归：Simple Linear Regression in Python

我建议的第一件事是将原始数据集转换为 numpy 数组。

import numpy as np

X = np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

# Calculating mean of the array is made trivial
x_mean = X.mean()

# values of array are squared first and then we get the mean
x_squared_mean = np.power(X, 2).mean()

# covariance (b)
cov = np.sum((X - x_mean) * (y - y_mean)) / np.sum(np.power(X - x_mean, 2))

# variance (m)
variance = x_mean - (cov * x_mean)

# regression line
reg_line = cov + variance * X

这只是一个示例，但通常第一步是将您的数据转换为 numpy 数组，然后您可以访问在 C 中实现的所有非循环类型函数。

使用 numpy 以矢量化形式转换 for 循环函数

Turning a for loop function in Vectorized form with numpy

python

numpy

list

dataset

numpy-ndarray