如何创建一个函数来测试每个变量的正态性
How to create a function to test normality of each variable
我正在尝试构建一个迭代函数 returns i) JarqueBera test stat,ii) JarqueBera pvalue,iii) probplot 的斜率、截距和确定系数,以及 iv) probplot 本身。全部旨在一次为单个变量 returned。
def normality(c):
JB_test_stat = ss.jarque_bera(c)[0]
JB_pval = ss.jarque_bera(c)[1]
probplot_slope = ss.probplot(c, plot = plt)[1][0]
probplot_interc = ss.probplot(c, plot = plt)[1][1]
probplot_r = ss.probplot(c, plot = plt)[1][2]
return(print("Skewness:",c.skew(),"\nExcess kurtosis:",c.kurt(),"\nJarque-Bera stat:",JB_test_stat," pvalue:", JB_pval,"\nSlope:",probplot_slope,"Intercept:",probplot_interc, "r:",probplot_r,"\n"))
不幸的是,当我在我的数据框[numeric_cols]上调用函数时,numeric_cols是一个列表,
for c in numeric_cols:
normality(df[c])
我正确地得到了 return 语句中的所有数值结果,但在底部有一个单一的问题图,所有变量都以混乱的方式绘制,而我期望的是得到每个变量的数值结果连同其相应的概率图。
偏度:0.1004187952160102
超峰度:-0.543819517693596
Jarque-Bera 统计数据:7.593972235734294 pvalue:0.022438296430201454
斜率:4.3135147782152465 截距:25.5 r:0.9947611456706487
偏度:-0.1560130144763728
超峰度:-1.2824901951466612
Jarque-Bera 统计数据:38.56183464454786 pvalue:4.23061985443951e-09
斜率:11.492550446207257 截距:19.535714285714285 r:0.9668502992894236
偏度:0.2347601433103727
超峰度:-1.242639192300385
Jarque-Bera 统计数据:39.0662449724179 pvalue:3.287552452491127e-09
斜率:11.545683807955731 截距:15.714285714285714 r:0.9647448407831439
偏度:0.24353437856100904
超峰度:-1.1969521906230485
Jarque-Bera 统计数据:36.98912338336009 pvalue:9.287822622106034e-09
斜率:1013.985374629207 截距:1411.4436090225563 r:0.9682492605786011
偏度:2.837876986150242
超峰度:9.516628330654008
Jarque-Bera 统计数据:2675.4455000782764 pvalue:0.0
斜率:2.6057664781688454 截距:1.8533834586466167 r:0.7776054895177505
偏度:2.406153102778617
超峰度:7.002529753885085
Jarque-Bera 统计数据:1573.6596724989513 pvalue:0.0
斜率:1.714847443415902 截距:1.287593984962406 r:0.8152919114915671
偏度:0.9337529310147361
超峰度:0.45862734243889847
Jarque-Bera 统计数据:81.22389376608798 pvalue:0.0
斜率:605.3354149443196 截距:717.75 r:0.9550404156079808
偏度:-3.030640857636996
超峰度:15.686541621050898
Jarque-Bera 统计数据:6154.761075129672 pvalue:0.0
斜率:11.37955609488042 截距:77.82387218045113 r:0.8711740556551902
偏度:6.398317104228115
超峰度:49.10097819497357
Jarque-Bera 统计数据:56029.69126113364 pvalue:0.0
斜率:0.41431397013222515 截距:0.1917293233082707 r:0.48503363895959983
偏度:6.204252341215679
超峰度:47.28662289867727
Jarque-Bera 统计数据:52010.755388690835 pvalue:0.0
斜率:0.4947086253584861 截距:0.23496240601503762 r:0.5050004904368586
偏度:2.06633193738682
超峰度:5.770784034742405
Jarque-Bera 统计数据:1098.0175308306793 pvalue:0.0
斜率:0.12821997057404685 截距:0.11328947368421052 r:0.8619773533976459
偏度:2.9189857433086495
超峰度:16.837230233306762
Jarque-Bera 数据:6909.724155123523 pvalue:0.0
斜率:0.07805612907589729 截距:0.07265037593984962 r:0.8632361803763113
偏度:1.2633082232077495
超峰度:1.5265390704578943
Jarque-Bera 统计数据:190.6495836394772 pvalue:0.0
斜率:2.09821120102269 截距:2.1146616541353382 r:0.9211028014650718
偏度:3.091346622737553
超峰度:8.530683362863476
Jarque-Bera 统计数据:2421.371001114453 pvalue:0.0
斜率:0.16657862407594715 截距:0.09022556390977444 r:0.5658043763386988
如何解决?
提前谢谢大家
只需在你的函数中添加一个plt.figure()
,这样每次调用该函数都会打开一个新的图形。
换句话说,使用 return(print('stuff'))
是多余的。如果你真的想打印结果,那么只需使用 print
而不使用 return
.
return 您当前正在打印的值,然后在外部打印它们会更 pythonic 并且通常更好的做法:
def normality(c):
JB_test_stat = ss.jarque_bera(c)[0]
JB_pval = ss.jarque_bera(c)[1]
probplot_slope = ss.probplot(c, plot = plt)[1][0]
probplot_interc = ss.probplot(c, plot = plt)[1][1]
probplot_r = ss.probplot(c, plot = plt)[1][2]
return c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r
for c in numeric_cols:
c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r = normality(df[c])
print("Skewness:",c.skew(),
"\nExcess kurtosis:",c.kurt(),
"\nJarque-Bera stat:",JB_test_stat,
" pvalue:", B_pval,
"\nSlope:",probplot_slope,
"Intercept:",probplot_interc,
"r:",probplot_r,"\n")
我正在尝试构建一个迭代函数 returns i) JarqueBera test stat,ii) JarqueBera pvalue,iii) probplot 的斜率、截距和确定系数,以及 iv) probplot 本身。全部旨在一次为单个变量 returned。
def normality(c):
JB_test_stat = ss.jarque_bera(c)[0]
JB_pval = ss.jarque_bera(c)[1]
probplot_slope = ss.probplot(c, plot = plt)[1][0]
probplot_interc = ss.probplot(c, plot = plt)[1][1]
probplot_r = ss.probplot(c, plot = plt)[1][2]
return(print("Skewness:",c.skew(),"\nExcess kurtosis:",c.kurt(),"\nJarque-Bera stat:",JB_test_stat," pvalue:", JB_pval,"\nSlope:",probplot_slope,"Intercept:",probplot_interc, "r:",probplot_r,"\n"))
不幸的是,当我在我的数据框[numeric_cols]上调用函数时,numeric_cols是一个列表,
for c in numeric_cols:
normality(df[c])
我正确地得到了 return 语句中的所有数值结果,但在底部有一个单一的问题图,所有变量都以混乱的方式绘制,而我期望的是得到每个变量的数值结果连同其相应的概率图。
偏度:0.1004187952160102 超峰度:-0.543819517693596 Jarque-Bera 统计数据:7.593972235734294 pvalue:0.022438296430201454 斜率:4.3135147782152465 截距:25.5 r:0.9947611456706487
偏度:-0.1560130144763728 超峰度:-1.2824901951466612 Jarque-Bera 统计数据:38.56183464454786 pvalue:4.23061985443951e-09 斜率:11.492550446207257 截距:19.535714285714285 r:0.9668502992894236
偏度:0.2347601433103727 超峰度:-1.242639192300385 Jarque-Bera 统计数据:39.0662449724179 pvalue:3.287552452491127e-09 斜率:11.545683807955731 截距:15.714285714285714 r:0.9647448407831439
偏度:0.24353437856100904 超峰度:-1.1969521906230485 Jarque-Bera 统计数据:36.98912338336009 pvalue:9.287822622106034e-09 斜率:1013.985374629207 截距:1411.4436090225563 r:0.9682492605786011
偏度:2.837876986150242 超峰度:9.516628330654008 Jarque-Bera 统计数据:2675.4455000782764 pvalue:0.0 斜率:2.6057664781688454 截距:1.8533834586466167 r:0.7776054895177505
偏度:2.406153102778617 超峰度:7.002529753885085 Jarque-Bera 统计数据:1573.6596724989513 pvalue:0.0 斜率:1.714847443415902 截距:1.287593984962406 r:0.8152919114915671
偏度:0.9337529310147361 超峰度:0.45862734243889847 Jarque-Bera 统计数据:81.22389376608798 pvalue:0.0 斜率:605.3354149443196 截距:717.75 r:0.9550404156079808
偏度:-3.030640857636996 超峰度:15.686541621050898 Jarque-Bera 统计数据:6154.761075129672 pvalue:0.0 斜率:11.37955609488042 截距:77.82387218045113 r:0.8711740556551902
偏度:6.398317104228115 超峰度:49.10097819497357 Jarque-Bera 统计数据:56029.69126113364 pvalue:0.0 斜率:0.41431397013222515 截距:0.1917293233082707 r:0.48503363895959983
偏度:6.204252341215679 超峰度:47.28662289867727 Jarque-Bera 统计数据:52010.755388690835 pvalue:0.0 斜率:0.4947086253584861 截距:0.23496240601503762 r:0.5050004904368586
偏度:2.06633193738682 超峰度:5.770784034742405 Jarque-Bera 统计数据:1098.0175308306793 pvalue:0.0 斜率:0.12821997057404685 截距:0.11328947368421052 r:0.8619773533976459
偏度:2.9189857433086495 超峰度:16.837230233306762 Jarque-Bera 数据:6909.724155123523 pvalue:0.0 斜率:0.07805612907589729 截距:0.07265037593984962 r:0.8632361803763113
偏度:1.2633082232077495 超峰度:1.5265390704578943 Jarque-Bera 统计数据:190.6495836394772 pvalue:0.0 斜率:2.09821120102269 截距:2.1146616541353382 r:0.9211028014650718
偏度:3.091346622737553 超峰度:8.530683362863476 Jarque-Bera 统计数据:2421.371001114453 pvalue:0.0 斜率:0.16657862407594715 截距:0.09022556390977444 r:0.5658043763386988
如何解决? 提前谢谢大家
只需在你的函数中添加一个plt.figure()
,这样每次调用该函数都会打开一个新的图形。
换句话说,使用 return(print('stuff'))
是多余的。如果你真的想打印结果,那么只需使用 print
而不使用 return
.
return 您当前正在打印的值,然后在外部打印它们会更 pythonic 并且通常更好的做法:
def normality(c):
JB_test_stat = ss.jarque_bera(c)[0]
JB_pval = ss.jarque_bera(c)[1]
probplot_slope = ss.probplot(c, plot = plt)[1][0]
probplot_interc = ss.probplot(c, plot = plt)[1][1]
probplot_r = ss.probplot(c, plot = plt)[1][2]
return c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r
for c in numeric_cols:
c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r = normality(df[c])
print("Skewness:",c.skew(),
"\nExcess kurtosis:",c.kurt(),
"\nJarque-Bera stat:",JB_test_stat,
" pvalue:", B_pval,
"\nSlope:",probplot_slope,
"Intercept:",probplot_interc,
"r:",probplot_r,"\n")