在给定名称和参数的情况下查找分布的均值和标准差
Finding the mean and standard deviation of a distribution given its name and parameters
我使用代码
从 scipy.stats
的鸢尾花数据集生成了以下内容
import scipy.stats as st
def get_best_distribution(data):
dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
dist_results = []
params = {}
for dist_name in dist_names:
dist = getattr(st, dist_name)
param = dist.fit(data)
params[dist_name] = param
# Applying the Kolmogorov-Smirnov test
D, p = st.kstest(data, dist_name, args=param)
print("p value for "+dist_name+" = "+str(p))
dist_results.append((dist_name, p))
# select the best fitted distribution
best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
# store the name of the best fit and its p value
print("Best fitting distribution: "+str(best_dist))
print("Best p value: "+ str(best_p))
print("Parameters for the best fit: "+ str(params[best_dist]))
return best_dist, best_p, params[best_dist]
从获得:
Best fitting distribution: invgauss
Best p value: 0.8268700800511397
Parameters for the best fit: (0.016421213754032188, 1.5064355144322001, 309.4166651914064)
best_result = {"virginica": {"distribution": "invgauss", "parameters": [0.016421213754032188, 1.5064355144322001, 309.4166651914064]}}
我现在想从 best_result
中获得均值和标准差(分别为方差)。在 查找了类似的内容,但我无法弄清楚如何使用 SciPy..
一些见解将不胜感激!
不保存分发的名称,而是保存分发对象。为此,更改
dist_results.append((dist_name, p))
到
dist_results.append((dist, p))
然后将函数中的三个print语句和return语句改成
print("Best fitting distribution:", best_dist.name)
print("Best p value: "+ str(best_p))
print("Parameters for the best fit:", params[best_dist.name])
return best_dist, best_p, params[best_dist.name]
那么你可以这样做:
dist, p, par = get_best_distribution(data)
print("mean:", dist.mean(*par))
print("std: ", dist.std(*par))
我使用代码
从scipy.stats
的鸢尾花数据集生成了以下内容
import scipy.stats as st
def get_best_distribution(data):
dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
dist_results = []
params = {}
for dist_name in dist_names:
dist = getattr(st, dist_name)
param = dist.fit(data)
params[dist_name] = param
# Applying the Kolmogorov-Smirnov test
D, p = st.kstest(data, dist_name, args=param)
print("p value for "+dist_name+" = "+str(p))
dist_results.append((dist_name, p))
# select the best fitted distribution
best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
# store the name of the best fit and its p value
print("Best fitting distribution: "+str(best_dist))
print("Best p value: "+ str(best_p))
print("Parameters for the best fit: "+ str(params[best_dist]))
return best_dist, best_p, params[best_dist]
从
Best fitting distribution: invgauss
Best p value: 0.8268700800511397
Parameters for the best fit: (0.016421213754032188, 1.5064355144322001, 309.4166651914064)
best_result = {"virginica": {"distribution": "invgauss", "parameters": [0.016421213754032188, 1.5064355144322001, 309.4166651914064]}}
我现在想从 best_result
中获得均值和标准差(分别为方差)。在
一些见解将不胜感激!
不保存分发的名称,而是保存分发对象。为此,更改
dist_results.append((dist_name, p))
到
dist_results.append((dist, p))
然后将函数中的三个print语句和return语句改成
print("Best fitting distribution:", best_dist.name)
print("Best p value: "+ str(best_p))
print("Parameters for the best fit:", params[best_dist.name])
return best_dist, best_p, params[best_dist.name]
那么你可以这样做:
dist, p, par = get_best_distribution(data)
print("mean:", dist.mean(*par))
print("std: ", dist.std(*par))