_name_estimators 在以下代码中做了什么？

Question

from sklearn.pipeline import _name_estimators
class MajorityVoteClassifier(BaseEstimator,ClassifierMixin):
      def __init__(self,classifiers,vote='classlabel',weights=None):
        self.classifiers = classifiers
        self.named_classifiers={key:value for key,value in 
                                         _name_estimators(classifiers)}
        self.vote=vote
        self.weights=weights
clf1=LogisticRegression(penalty='l2',C=0.001,random_state=1)
clf2=DecisionTreeClassifier(max_depth=1,criterion='entropy',                                           
                                                  random_state=0)
clf3=KNeighborsClassifier(n_neighbors=1,p=2,metric='minkowski')

pipe1=Pipeline([['sc',StandardScaler()],['clf',clf1]])
pipe3=Pipeline([['sc',StandardScaler()],['clf',clf3]])
mv_clf=MajorityVoteClassifier(classifiers=[pipe1,clf2,pipe3])

我无法理解 _name_estimators 是如何工作的，请指教有人向我解释一下 _name_estimators 在这段代码中做了什么

Answer 1

您可以运行在交互模式下这样做：

from sklearn.pipeline import _name_estimators

estimators = ['a', 'a', 'b' ]
_name_estimators(estimators)
# >>> [('a-1', 'a'), ('a-2', 'a'), ('b', 'b')]

所以基本上它是 returns 元组，具有唯一键。每个元组都包含估计量 + 如果估计量是重复的，则它的出现和原始估计量值。

Answer 2

你给函数 _name_estimators 一个包含 n 个估计量的列表，它 returns 一个包含 n 个元组的列表。每个元组中的第一个组成部分是描述估计器名称的字符串，每个元组中的第二个组成部分是估计器对象

from sklearn.linear_model import LinearRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import _name_estimators
clf = GaussianNB()
clf2 = LinearRegression()
res = _name_estimators([clf, clf2])
print(res)
print(type(res))
print()
for p in res:
    print(type(p[0]))
    print(type(p[1]))


#[('gaussiannb', GaussianNB(priors=None, var_smoothing=1e-09)), ('linearregression', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False))]
#<class 'list'>

#<class 'str'>
#<class 'sklearn.naive_bayes.GaussianNB'>
#<class 'str'>
#<class 'sklearn.linear_model.base.LinearRegression'>

_name_estimators 在以下代码中做了什么？

What _name_estimators does in the following code?

python

pipeline

scikit-learn