Python 向前逐步回归 'Not in Index'
Python forward stepwise regression 'Not in Index'
我正在 运行宁宁一些关于波士顿住房数据的教程,借助几个在线的前瞻性逐步示例。我不断收到一个错误,其中一个变量不在索引中。
import statsmodels.api as sm
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
boston_dataset = load_boston()
#create dataframe from boston
X = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)
y = boston_dataset.target
#split data into training and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size = 0.2, random_state=5)
这里是回归循环,使用自this website, there is also a nearly identical slice of code here:
def forward_regression(X, y,
initial_list=[],
threshold_in=0.01,
threshold_out = 0.05,
verbose=True):
initial_list = []
included = list(initial_list)
while True:
changed=False
# forward step
excluded = list(set(X.columns)-set(included))
new_pval = pd.Series(index=excluded)
for new_column in excluded:
model = sm.OLS(y, sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
new_pval[new_column] = model.pvalues[new_column]
best_pval = new_pval.min()
if best_pval < threshold_in:
best_feature = new_pval.argmin()
included.append(best_feature)
changed=True
if verbose:
print('Add with p-value '.format(best_feature, best_pval))
if not changed:
break
return included
曾经我运行
forward_regression (X_train, Y_train)
,我收到以下错误:
感谢任何建议!
您需要使用 idxmin()
代替 argmin()
。后者是 return 整数位置,而 idxmin()
将 return 标签。
固定函数为
def forward_regression(X, y,
initial_list=[],
threshold_in=0.01,
threshold_out = 0.05,
verbose=True):
initial_list = []
included = list(initial_list)
while True:
changed=False
# forward step
excluded = list(set(X.columns)-set(included))
new_pval = pd.Series(index=excluded)
for new_column in excluded:
model = sm.OLS(y, sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
new_pval[new_column] = model.pvalues[new_column]
best_pval = new_pval.min()
if best_pval < threshold_in:
# Change argmin -> idxmin
best_feature = new_pval.idxmin()
included.append(best_feature)
changed=True
if verbose:
print('Add with p-value '.format(best_feature, best_pval))
if not changed:
break
return included
我正在 运行宁宁一些关于波士顿住房数据的教程,借助几个在线的前瞻性逐步示例。我不断收到一个错误,其中一个变量不在索引中。
import statsmodels.api as sm
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
boston_dataset = load_boston()
#create dataframe from boston
X = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)
y = boston_dataset.target
#split data into training and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size = 0.2, random_state=5)
这里是回归循环,使用自this website, there is also a nearly identical slice of code here:
def forward_regression(X, y,
initial_list=[],
threshold_in=0.01,
threshold_out = 0.05,
verbose=True):
initial_list = []
included = list(initial_list)
while True:
changed=False
# forward step
excluded = list(set(X.columns)-set(included))
new_pval = pd.Series(index=excluded)
for new_column in excluded:
model = sm.OLS(y, sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
new_pval[new_column] = model.pvalues[new_column]
best_pval = new_pval.min()
if best_pval < threshold_in:
best_feature = new_pval.argmin()
included.append(best_feature)
changed=True
if verbose:
print('Add with p-value '.format(best_feature, best_pval))
if not changed:
break
return included
曾经我运行
forward_regression (X_train, Y_train)
,我收到以下错误:
感谢任何建议!
您需要使用 idxmin()
代替 argmin()
。后者是 return 整数位置,而 idxmin()
将 return 标签。
固定函数为
def forward_regression(X, y,
initial_list=[],
threshold_in=0.01,
threshold_out = 0.05,
verbose=True):
initial_list = []
included = list(initial_list)
while True:
changed=False
# forward step
excluded = list(set(X.columns)-set(included))
new_pval = pd.Series(index=excluded)
for new_column in excluded:
model = sm.OLS(y, sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
new_pval[new_column] = model.pvalues[new_column]
best_pval = new_pval.min()
if best_pval < threshold_in:
# Change argmin -> idxmin
best_feature = new_pval.idxmin()
included.append(best_feature)
changed=True
if verbose:
print('Add with p-value '.format(best_feature, best_pval))
if not changed:
break
return included