正态分布样本的置信区间
Confidence interval of normal distribution samples
我想找出服从正态分布的样本的置信区间。
为了测试代码,我先创建了一个示例,然后尝试在 Jupyter notebook 中绘制置信区间的图片[python kernel]
%matplotlib notebook
import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
s= np.random.normal(0,1,2000)
# s= range(10,14) <---this sample has the right CI
# s = (0,0,1,1,1,1,1,2) <---this sample has the right CI
# confidence interval
# I think this is the fucniton I misunderstand
ci=sms.DescrStatsW(s).tconfint_mean()
plt.figure()
_ = plt.hist(s, bins=100)
# cnfidence interval left line
one_x12, one_y12 = [ci[0], ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1], ci[1]], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
绿线和黄线假设为置信区间。但是他们不在正确的位置。
我可能误解了这个函数:
sms.DescrStatsW(s).tconfint_mean()
但是文档说这个函数会 return 置信区间。
这是我期望的数字:
%matplotlib notebook
import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
s= np.random.normal(0,1,2000)
plt.figure()
_ = plt.hist(s, bins=100)
# cnfidence interval left line
one_x12, one_y12 = [np.std(s, axis=0) * -1.96, np.std(s, axis=0) * -1.96], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [np.std(s, axis=0) * 1.96, np.std(s, axis=0) * 1.96], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
问题看起来像 "what function is there to calculate the confidence interval"。
由于给定的数据是正态分布的,这可以简单地通过
来完成
ci = scipy.stats.norm.interval(0.95, loc=0, scale=1)
0.95 是 alpha 值,它指定 ci95 个百分点,因为公式中给出了相应的平均值的 1.96 标准差。
(https://en.wikipedia.org/wiki/1.96)
loc=0
表示ci平均值,scale=1
表示sigma。
(https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
您可以查看@bogatron 的回答以获取有关 Compute a confidence interval from sample data
的更多详细信息
以下代码生成您想要的情节。我为 reproducibility.
播种了随机数
import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
import scipy
s = np.random.seed(100)
s= np.random.normal(0,1,2000)
plt.figure()
_ = plt.hist(s, bins=100)
sigma=1
mean=0
ci = scipy.stats.norm.interval(0.95, loc=mean, scale=sigma)
print(ci)
# cnfidence interval left line
one_x12, one_y12 = [ci[0],ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1],ci[1]], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
ci returns
(-1.959963984540054, 1.959963984540054)
这是情节。
我想找出服从正态分布的样本的置信区间。
为了测试代码,我先创建了一个示例,然后尝试在 Jupyter notebook 中绘制置信区间的图片[python kernel]
%matplotlib notebook
import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
s= np.random.normal(0,1,2000)
# s= range(10,14) <---this sample has the right CI
# s = (0,0,1,1,1,1,1,2) <---this sample has the right CI
# confidence interval
# I think this is the fucniton I misunderstand
ci=sms.DescrStatsW(s).tconfint_mean()
plt.figure()
_ = plt.hist(s, bins=100)
# cnfidence interval left line
one_x12, one_y12 = [ci[0], ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1], ci[1]], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
绿线和黄线假设为置信区间。但是他们不在正确的位置。
我可能误解了这个函数:
sms.DescrStatsW(s).tconfint_mean()
但是文档说这个函数会 return 置信区间。
这是我期望的数字:
%matplotlib notebook
import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
s= np.random.normal(0,1,2000)
plt.figure()
_ = plt.hist(s, bins=100)
# cnfidence interval left line
one_x12, one_y12 = [np.std(s, axis=0) * -1.96, np.std(s, axis=0) * -1.96], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [np.std(s, axis=0) * 1.96, np.std(s, axis=0) * 1.96], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
问题看起来像 "what function is there to calculate the confidence interval"。
由于给定的数据是正态分布的,这可以简单地通过
来完成ci = scipy.stats.norm.interval(0.95, loc=0, scale=1)
0.95 是 alpha 值,它指定 ci95 个百分点,因为公式中给出了相应的平均值的 1.96 标准差。 (https://en.wikipedia.org/wiki/1.96)
loc=0
表示ci平均值,scale=1
表示sigma。
(https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
您可以查看@bogatron 的回答以获取有关 Compute a confidence interval from sample data
的更多详细信息以下代码生成您想要的情节。我为 reproducibility.
播种了随机数import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
import scipy
s = np.random.seed(100)
s= np.random.normal(0,1,2000)
plt.figure()
_ = plt.hist(s, bins=100)
sigma=1
mean=0
ci = scipy.stats.norm.interval(0.95, loc=mean, scale=sigma)
print(ci)
# cnfidence interval left line
one_x12, one_y12 = [ci[0],ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1],ci[1]], [0, 20]
plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')
ci returns
(-1.959963984540054, 1.959963984540054)
这是情节。