生成具有百分位数约束的最佳指数分布

Question

我正在尝试生成最佳指数分布（适合）将具有以下特征：

1.The 结果范围在 [0,500]

2.The CDF百分位数接近以下关系（.percentile,value） 0.3-50,0.5-100, 0.8:200,0.9-300, 0.95-400, 1-500

首先，我尝试通过 100 的中值约束来获得 lambda 系数： lambda=100/ln(2)= 0.006931 然后绘制分布：

data = np.random.exponential((1/0.006931), size=1000)
plt.hist(data,bins=30)
plt.show()

重新缩放到[0,500]

data=(data-min(data))/(max(data)-min(data))*500    
plt.hist(data,bins=30)
plt.show()

比起我尝试将结果重新缩放到 500 并绘制直方图和 CDF，但我距离类似于我也想通过分布接近的百分位数的红点很远。

x = np.sort(data)
y = np.arange(1, len(x) +1) /len(x)
_ = plt.plot(x,y, marker ='.', linestyle='none')
x_percentile = np.array([0,50, 100, 200, 350, 400, 500])
y_percentile = np.array([0,30, 50, 80, 90, 95, 100])
plt.scatter(x_percentile, y_percentile/100,color='r')
plt.xlabel('results')
plt.ylabel('ECDF')
plt.show()

如何找到最接近我的问题的分布函数？

Answer 1

您也许可以使用 scipy.optimize.curve_fit 来找到 "best" 指数。假设你有一个约束，CDF 应该在 0 处为 0，即偏移量为 0，因此 loc=0，我们只需要拟合比例参数 (lambda):

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import expon
from scipy.optimize import curve_fit

def fitfun(x, *a):
    ex = expon(loc=0, scale=a[0])
    return ex.cdf(x)

y = 0, 0.3, 0.5, 0.8, 0.9, 0.95, 1
x = 0, 50, 100, 200, 300, 400, 500
p, _ = curve_fit(fitfun, x, y, 100/np.log(2))
print(p[0]) # 133.99106748543082
solution = expon(loc=0, scale=p[0])

X = arange(600)
plt.plot(X, solution.cdf(X))
plt.plot(x, y, marker='o', ls='')
plt.show()

使用curve_fit，您可以为拟合添加不确定性，使 CDF 中的约束 0-0 和 1-500 比其他约束更强，或者使用不同的优化程序。 curve_fit 将默认使用最小二乘法和 Levenberg-Marquardt 进行拟合。这只是一个想法，而不是一个完美的解决方案。

生成具有百分位数约束的最佳指数分布

Generating the best exponential distribution with percentile constraints

python

statistics

distribution

exponential