如何利用 numba 在 Python 中有效地解压 Monte Carlo 模拟?解决了
How to unpacking effectively Monte Carlo simulations in Python leveraging from numba? SOLVED
我正在尝试有效地创建一个 Monte Carlo 模拟,因为在我的用例中我需要 运行 这个模拟 70*10^6 次。我希望有人更有经验,尤其是在性能方面可以为我提供一些我可以尝试的想法。
我有以下输入:
- 需求
- 每一列是一个产品,每一行是一个月
- 确定月份中的某些产品的需求由三角分布元组(最小值、平均值、最大值)估算。对于这些值,我将进行 Monte Carlo 模拟 1000 次
- 股票
我想要的输出是找到:
- Medium of the Distribution of the sum of available products(np.median(np.sum(available_products))),中位数接收1000次模拟的总和available_products(available_products=库存需求).
但是我遇到了一些问题:
- 速度,我的直觉是有一些巧妙的方法可以利用矢量化函数进行计算。但是我想不出任何东西,所以我尝试了通常的循环。如果您有任何可以更快的不同方法的任何线索,请告诉我。
- FIXED 无法将值设置为数组,在我的解决方案中我无法使用
demand_j[index_demand_not_0][k] = dict_demand_values_simulations[k][j]
- 解决方案,我只需要通过 demand_j[row,col].
直接访问 demand_j 位置
这是@Glauco 建议的使用 3D 数组满足需求的代码:
import numpy as np
from numba import jit
@jit(nopython=True, nogil=True, fastmath=True)
def calc_triangular_dist(demand_distribution, num_monte):
# Calculates triangular distributions
return np.random.triangular(demand_distribution[0], demand_distribution[1], demand_distribution[2], size=num_monte)
def demand3d():
# Goal find distribution_of_median_of_sum_available_products(np.median(np.sum(available_products)), the median from the 1000 Monte Carlo Simulations ): available_products=stock-demand (Each demand is generated by a Monte Carlo simulation 1000 times, therefore I will have 1000 demand arrays and consequently I will have a distribution of 1000 values of available products)
# Input
demand_triangular = np.array(
[
[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, (4.5, 5.5, 8.25)],
[(2.1, 3.1, 4.65), 0.0, 0.0, (4.5, 5.5, 8.25)],
]
) # Each column represents a product, each row a month. Tuples are for triangular distribution (min,mean,max)
stock = np.array(
[[30, 30, 30, 22], [30, 30, 30, 22], [30, 30, 30, 22]]
) # Stock of available products, Each column represents a product, each row a month.
num_sim_monte_carlo = 1000
# Problem 1) How to unpack effectively each array of demand from simulation? Given that in my real case I would have 70 tuples to perform the Monte Carlo simulation?
row, col = demand_triangular.shape
index_demand_not_0 = np.where(
demand_triangular != 0
) # Index of values that are not zeros,therefore my tuples for triangular distribution
demand_j = np.zeros(shape=(row, col,num_sim_monte_carlo), dtype=float)
triangular_len = len(demand_triangular[index_demand_not_0]) # Length of rows to calculate triangular
for k in range(0, triangular_len): # loop per values to simulate
demand_j[index_demand_not_0[0][k], index_demand_not_0[1][k]] = calc_triangular_dist(
demand_triangular[index_demand_not_0][k], num_sim_monte_carlo
)
sums_available_simulations = np.zeros(
shape=num_sim_monte_carlo
) # Stores each 1000 different sums of available, generated by unpacking the dict_demand_velues_simulations
for j in range(0, num_sim_monte_carlo): # loop per number of monte carlo simulations
available = stock - demand_j[:,:,j]
available[available < 0] = 0 # Fixes with values are negative
sums_available_simulations[j] = np.sum(available) # Stores available for each simulation
print("Median of distribution of available is: ", np.median(sums_available_simulations))
if __name__ == "__main__":
demand3d()
建议的结果显示使用 3D 数组的性能要好得多:),现在我只有数组,我可以尝试使用 numba 进一步改进。
Baseline 0.4067141000000001
1) Monte Carlo per loop 0.035586100000000176
2) Demand 3D 0.017964299999999822
谢谢
可以使用数组编程+花哨的索引删除内部循环,这样可以加快对demand_j的赋值。
另一点是,你可以生成一次 demand_j 添加一个维度(num_sim_montecarlo)它变成 3d 数组,并且在循环中你必须只读取值避免在每个循环中创建值。
我正在尝试有效地创建一个 Monte Carlo 模拟,因为在我的用例中我需要 运行 这个模拟 70*10^6 次。我希望有人更有经验,尤其是在性能方面可以为我提供一些我可以尝试的想法。 我有以下输入:
- 需求
- 每一列是一个产品,每一行是一个月
- 确定月份中的某些产品的需求由三角分布元组(最小值、平均值、最大值)估算。对于这些值,我将进行 Monte Carlo 模拟 1000 次
- 股票
我想要的输出是找到:
- Medium of the Distribution of the sum of available products(np.median(np.sum(available_products))),中位数接收1000次模拟的总和available_products(available_products=库存需求).
但是我遇到了一些问题:
- 速度,我的直觉是有一些巧妙的方法可以利用矢量化函数进行计算。但是我想不出任何东西,所以我尝试了通常的循环。如果您有任何可以更快的不同方法的任何线索,请告诉我。
- FIXED 无法将值设置为数组,在我的解决方案中我无法使用
demand_j[index_demand_not_0][k] = dict_demand_values_simulations[k][j]
- 解决方案,我只需要通过 demand_j[row,col]. 直接访问 demand_j 位置
这是@Glauco 建议的使用 3D 数组满足需求的代码:
import numpy as np
from numba import jit
@jit(nopython=True, nogil=True, fastmath=True)
def calc_triangular_dist(demand_distribution, num_monte):
# Calculates triangular distributions
return np.random.triangular(demand_distribution[0], demand_distribution[1], demand_distribution[2], size=num_monte)
def demand3d():
# Goal find distribution_of_median_of_sum_available_products(np.median(np.sum(available_products)), the median from the 1000 Monte Carlo Simulations ): available_products=stock-demand (Each demand is generated by a Monte Carlo simulation 1000 times, therefore I will have 1000 demand arrays and consequently I will have a distribution of 1000 values of available products)
# Input
demand_triangular = np.array(
[
[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, (4.5, 5.5, 8.25)],
[(2.1, 3.1, 4.65), 0.0, 0.0, (4.5, 5.5, 8.25)],
]
) # Each column represents a product, each row a month. Tuples are for triangular distribution (min,mean,max)
stock = np.array(
[[30, 30, 30, 22], [30, 30, 30, 22], [30, 30, 30, 22]]
) # Stock of available products, Each column represents a product, each row a month.
num_sim_monte_carlo = 1000
# Problem 1) How to unpack effectively each array of demand from simulation? Given that in my real case I would have 70 tuples to perform the Monte Carlo simulation?
row, col = demand_triangular.shape
index_demand_not_0 = np.where(
demand_triangular != 0
) # Index of values that are not zeros,therefore my tuples for triangular distribution
demand_j = np.zeros(shape=(row, col,num_sim_monte_carlo), dtype=float)
triangular_len = len(demand_triangular[index_demand_not_0]) # Length of rows to calculate triangular
for k in range(0, triangular_len): # loop per values to simulate
demand_j[index_demand_not_0[0][k], index_demand_not_0[1][k]] = calc_triangular_dist(
demand_triangular[index_demand_not_0][k], num_sim_monte_carlo
)
sums_available_simulations = np.zeros(
shape=num_sim_monte_carlo
) # Stores each 1000 different sums of available, generated by unpacking the dict_demand_velues_simulations
for j in range(0, num_sim_monte_carlo): # loop per number of monte carlo simulations
available = stock - demand_j[:,:,j]
available[available < 0] = 0 # Fixes with values are negative
sums_available_simulations[j] = np.sum(available) # Stores available for each simulation
print("Median of distribution of available is: ", np.median(sums_available_simulations))
if __name__ == "__main__":
demand3d()
建议的结果显示使用 3D 数组的性能要好得多:),现在我只有数组,我可以尝试使用 numba 进一步改进。
Baseline 0.4067141000000001
1) Monte Carlo per loop 0.035586100000000176
2) Demand 3D 0.017964299999999822
谢谢
可以使用数组编程+花哨的索引删除内部循环,这样可以加快对demand_j的赋值。 另一点是,你可以生成一次 demand_j 添加一个维度(num_sim_montecarlo)它变成 3d 数组,并且在循环中你必须只读取值避免在每个循环中创建值。