如何为 8 个独立的分类器创建一个函数?

How can I make one function for 8 separate classifiers?

我想计算 8 个不同模型的标签“negative”的 f1 分数。我的前 3 个模型的代码和带有结果的数据框如下。 如何创建一个不需要为每个模型编写单独代码的函数?

# Train model with vectorizer and classifier
# Model training
from sklearn.model_selection import train_test_split

Independent_var = reviews_english['tokenized']
Dependent_var = reviews_english['sentiment']

IV_train, IV_test, DV_train, DV_test = train_test_split(Independent_var, Dependent_var, test_size = 0.2, random_state = 500 )

print('IV_train :', len(IV_train))
print('IV_test :', len(IV_test))
print('DV_train :', len(DV_train))
print('DV_test :', len(DV_test))

#Calculate f1 score for all 8 models

#RandomForestClassifier

model = Pipeline([('vectorizer', tvec),('classifier', RandomForestClassifier)])

# Model learning
model.fit(IV_train, DV_train)

# Model prediction on training and test data
pred_train= model.predict(IV_train)
pred_test = model.predict(IV_test)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
f1_rand = f1_score(DV_test, pred_test, pos_label='negative', average='binary')

#Multinominal NB

model = Pipeline([('vectorizer', tvec),('classifier', MultinominalNB)])

# Model learning
model.fit(IV_train, DV_train)

# Model prediction on training and test data
pred_train = model.predict(IV_train)
pred_test = model.predict(IV_test)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
f1_multi = f1_score(DV_test, pred_test, pos_label='negative', average='binary')

#BernoulliNB

model = Pipeline([('vectorizer', tvec),('classifier', BernoulliNB)])

# Model learning
model.fit(IV_train, DV_train)

# Model prediction on training and test data
pred_train = model.predict(IV_train)
pred_test = model.predict(IV_test)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
f1_bern = f1_score(DV_test, pred_test,pos_label='negative', average='binary')

IFF 您的代码对于所有模型都是 相同的,您可以迭代每个模型。您已经创建了一个分类器列表 clf_list,因此只需将每个分类器传递给一个执行所有常见步骤的函数即可。请注意,如果您有一些步骤对于每个模型都是唯一的,您将需要为它们创建不同的功能(通常)或在需要的地方添加 if...else 块。

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

RandomForestClassifier = RandomForestClassifier()
MultinominalNB = MultinomialNB()
...  # the lines above
KNeighborsClassifier = KNeighborsClassifier(n_neighbors=5)

clf_list = [RandomForestClassifier, MultinominalNB, BernoulliNB,
            XGBClassifier, GradientBoostingClassifier, LogisticRegression,
            LinearSVC, KNeighborsClassifier]
# only the names, for your dataframe, order must match:
clf_names = ['RandomForestClassifier', 
             'MultinominalNB',
             ...  # add the rest
             'KneighborsClassifier']

def do_something_with_classifier(clf):
    tvec = TfidfVectorizer()
    model = Pipeline([('vectorizer', tvec),('classifier', clf)])
    # Model learning
    model.fit(IV_train, DV_train)  # where are these variables from?
    # Model prediction on training and test data
    pred_train = model.predict(IV_train)
    pred_test = model.predict(IV_test)
    return f1_score(DV_test, pred_test, pos_label='negative', average='binary')

data = []
for clf in clf_list:
    data.append(do_something_with_classifier(clf))
# or the above as a list comprehension:
data = [do_something_with_classifier(clf) for clf in clf_list]

model_comparison = pd.DataFrame(data, columns=['model', 'f1 score "negative"'])

顺便说一句,与其创建指向每个分类器实例的变量以便将它们添加到列表中,不如直接从这些实例创建列表,并跳过为每个实例创建单独的变量。或者更好的是,由于您需要为每个标签在数据框中使用“文本”标签,因此创建一个字典,其中键是您的 label/text,值是通用函数的结果:

classifiers = {
    'RandomForestClassifier': RandomForestClassifier(),
    'MultinomialNB': MultinomialNB(),
    'BernoulliNB': BernoulliNB(),
    ...  # add the rest here
    'KNeighborsClassifier': KNeighborsClassifier(n_neighbors=5),
}

data = [[name, do_something_with_classifier(clf)] for name, clf in classifiers.items()]
model_comparison = pd.DataFrame(data, columns=['model', 'f1 score "negative"'])