GridSearchCV 运行 有多少种组合?

How many combinations will GridSearchCV run for this?

使用 sklearn 运行 对随机森林分类器进行网格搜索。这已经 运行ning 比我想象的要长,我正在尝试估计这个过程还剩下多少时间。我认为它的总匹配次数是 3*3*3*3*5 = 405。

clf = RandomForestClassifier(n_jobs=-1, oob_score=True, verbose=1)
param_grid = {'n_estimators':[50,200,500],
'max_depth':[2,3,5],
'min_samples_leaf':[1,2,5],
'max_features': ['auto','log2','sqrt']
}

gscv = GridSearchCV(estimator=clf,param_grid=param_grid,cv=5)
gscv.fit(X.values,y.values.reshape(-1,))

从输出中,我看到它循环执行任务,其中每组都是估计器的数量:

[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2min
[Parallel(n_jobs=-1)]: Done 184 tasks | elapsed: 5.3min
[Parallel(n_jobs=-1)]: Done 200 out of 200 tasks | elapsed: 6.2min finished
[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.5s
[Parallel(n_jobs=8)]: Done 184 tasks | elapsed: 3.0s
[Parallel(n_jobs=8)]: Done 200 tasks out of 200 tasks | elapsed: 3.2s finished
[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1min
[Parallel(n_jobs=-1)]: Done 50 tasks out of 50 tasks | elapsed: 1.5min finished
[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.5s
[Parallel(n_jobs=8)]: Done 50 out of 50 tasks | elapsed: 0.8s finished

我数了一下"finished"的人数,目前是680。我还以为405就搞定了,是不是我算错了?

您的计算似乎是正确的:网格数是不同参数的组合乘积,在本例中为 81:

>>> from sklearn.model_selection import ParameterGrid

>>> pg = ParameterGrid(param_grid)
>>> len(pg)
81

在每个中,您有五个 cross-validations,总共 405 个。tasks 完全是一个单独的指示。

verbose 得到 passed through to a parent class BaseForest, and subsequently to joblib's Parallel.

我不确定在这种情况下什么构成了任务,但是 top-level grid-train 组合的数量应该是 405。请记住,每一个组合都是树.