如何从普通的机器学习技术转变为交叉验证?
How to change from normal machine learning technique to cross validation?
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import accuracy_score
X = data['Review']
y = data['Category']
tfidf = TfidfVectorizer(ngram_range=(1,1))
classifier = LinearSVC()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
clf = Pipeline([
('tfidf', tfidf),
('clf', classifier)
])
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy_score(y_test, y_pred)
这是训练模型和预测的代码。我需要知道我的模型性能。那么我应该在哪里改成cross_val_score?
来自 sklearn documentation
The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset.
你的情况是
from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(scores)
使用这个:(这是我之前项目的一个例子)
import numpy as np
from sklearn.model_selection import KFold, cross_val_score
kfolds = KFold(n_splits=5, shuffle=True, random_state=42)
def cv_f1(model, X, y):
score = np.mean(cross_val_score(model, X, y,
scoring="f1",
cv=kfolds))
return (score)
model = ....
score_f1 = cv_f1(model, X_train, y_train)
你可以有多个得分。您应该只更改 scoring="f1"。
如果您想查看每次折叠的分数,只需删除 np.mean
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import accuracy_score
X = data['Review']
y = data['Category']
tfidf = TfidfVectorizer(ngram_range=(1,1))
classifier = LinearSVC()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
clf = Pipeline([
('tfidf', tfidf),
('clf', classifier)
])
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy_score(y_test, y_pred)
这是训练模型和预测的代码。我需要知道我的模型性能。那么我应该在哪里改成cross_val_score?
来自 sklearn documentation
The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset.
你的情况是
from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(scores)
使用这个:(这是我之前项目的一个例子)
import numpy as np
from sklearn.model_selection import KFold, cross_val_score
kfolds = KFold(n_splits=5, shuffle=True, random_state=42)
def cv_f1(model, X, y):
score = np.mean(cross_val_score(model, X, y,
scoring="f1",
cv=kfolds))
return (score)
model = ....
score_f1 = cv_f1(model, X_train, y_train)
你可以有多个得分。您应该只更改 scoring="f1"。 如果您想查看每次折叠的分数,只需删除 np.mean