在 Sklearn 中使用 GridsearchCV 进行数据转换警告
DataConverstionWarning with GridsearchCV in Sklearn
我在 Sklearn 中使用 GridsearchCV 时反复收到以下警告
"DataConversionWarning: Copying input dataframe for slicing."
我在 Gridsearch 之外单独尝试了 运行 一些模型,但没有收到任何警告。它也没有阻止 Gridsearch 查找模型。
我有两个问题:
1)这个错误是什么意思?
2) 如果有的话,对我的输出有什么影响?
代码的相关部分如下:
df = pd.read_csv(os.path.join(filepath, "Modeling_Set.csv")) #loads main data
keep_vars = pd.read_csv(os.path.join(filepath, "keep_vars.csv")) #loads a list of variables to keep from a CSV list
model_vars = keep_vars[keep_vars['keep']==1]['name'] #creates a list of vars to keep
modeling_df = df[model_vars] #creates the df with only keep vars
model_feature_vars = model_vars[:-1]
#Splits test and train data
X_train, X_test, y_train, y_test = train_test_split(modeling_df[model_feature_vars], modeling_df['Segment'], test_size=0.30, random_state=42)
#sets up models
#Range of parameters for gridsearch with decision trees
max_depth = range(2,20,2)
min_samples_split = range(2,10,2)
features = range(2, len(X_train.columns))
#set up for decision trees with gridsearch
parametersDT ={'feature_selection__k':features,
'feature_selection__score_func':(chi2, f_classif),
'classification__criterion':('gini','entropy'),
'classification__max_depth':max_depth,
'classification__min_samples_split':min_samples_split}
DT_with_K_Best = Pipeline([
('feature_selection', SelectKBest()),
('classification', DecisionTreeClassifier())
])
clf_DT = GridSearchCV(DT_with_K_Best, parametersDT, cv=10, verbose=2, scoring='f1_weighted', n_jobs = -2)
clf_DT.fit(X_train,y_train)
据我所知,这仅意味着您正在使用的 DataFrame 在被馈送到模型之前被复制。
这应该不会影响训练结果。这只是一个效率问题,与分类器的性能无关。
我在 Sklearn 中使用 GridsearchCV 时反复收到以下警告
"DataConversionWarning: Copying input dataframe for slicing."
我在 Gridsearch 之外单独尝试了 运行 一些模型,但没有收到任何警告。它也没有阻止 Gridsearch 查找模型。
我有两个问题: 1)这个错误是什么意思? 2) 如果有的话,对我的输出有什么影响?
代码的相关部分如下:
df = pd.read_csv(os.path.join(filepath, "Modeling_Set.csv")) #loads main data
keep_vars = pd.read_csv(os.path.join(filepath, "keep_vars.csv")) #loads a list of variables to keep from a CSV list
model_vars = keep_vars[keep_vars['keep']==1]['name'] #creates a list of vars to keep
modeling_df = df[model_vars] #creates the df with only keep vars
model_feature_vars = model_vars[:-1]
#Splits test and train data
X_train, X_test, y_train, y_test = train_test_split(modeling_df[model_feature_vars], modeling_df['Segment'], test_size=0.30, random_state=42)
#sets up models
#Range of parameters for gridsearch with decision trees
max_depth = range(2,20,2)
min_samples_split = range(2,10,2)
features = range(2, len(X_train.columns))
#set up for decision trees with gridsearch
parametersDT ={'feature_selection__k':features,
'feature_selection__score_func':(chi2, f_classif),
'classification__criterion':('gini','entropy'),
'classification__max_depth':max_depth,
'classification__min_samples_split':min_samples_split}
DT_with_K_Best = Pipeline([
('feature_selection', SelectKBest()),
('classification', DecisionTreeClassifier())
])
clf_DT = GridSearchCV(DT_with_K_Best, parametersDT, cv=10, verbose=2, scoring='f1_weighted', n_jobs = -2)
clf_DT.fit(X_train,y_train)
据我所知,这仅意味着您正在使用的 DataFrame 在被馈送到模型之前被复制。
这应该不会影响训练结果。这只是一个效率问题,与分类器的性能无关。