在 BaggingClassifier 中绘制准确性历史记录
Plotting history of accuracy in BaggingClassifier
我训练了一个简单的随机森林算法和装袋分类器 (n_estimators = 100)。是否可以绘制装袋分类器的准确性历史记录?如何计算100个样本的方差?
我刚刚打印了两种算法的准确度值:
# DecisionTree
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.90)
clf2 = tree.DecisionTreeClassifier()
clf2.fit(X_tr, y_tr)
pred2 = clf2.predict(X_test)
acc2 = clf2.score(X_test, y_test)
acc2 # 0.6983930778739185
# Bagging
clf3 = BaggingClassifier(tree.DecisionTreeClassifier(), max_samples=0.5, max_features=0.5, n_estimators=100,\
verbose=2)
clf3.fit(X_tr, y_tr)
pred3 = clf3.predict(X_test)
acc3=clf3.score(X_test,y_test)
acc3 # 0.911619283065513
我认为您无法从拟合 BaggingClassifier
中获取此信息。但是您可以通过拟合不同的 n_estimators
:
来创建这样的图
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X, X_test, y, y_test = train_test_split(iris.data,
iris.target,
test_size=0.20)
estimators = list(range(1, 20))
accuracy = []
for n_estimators in estimators:
clf = BaggingClassifier(DecisionTreeClassifier(max_depth=1),
max_samples=0.2,
n_estimators=n_estimators)
clf.fit(X, y)
acc = clf.score(X_test, y_test)
accuracy.append(acc)
plt.plot(estimators, accuracy)
plt.xlabel("Number of estimators")
plt.ylabel("Accuracy")
plt.show()
(当然,鸢尾花数据集很容易只用一个DecisionTreeClassifier
,所以我在这个例子中设置max_depth=1
。)
对于具有统计意义的结果,您应该为每个 n_estimators
拟合 BaggingClassifier
多次,并取所获得准确度的平均值。
我训练了一个简单的随机森林算法和装袋分类器 (n_estimators = 100)。是否可以绘制装袋分类器的准确性历史记录?如何计算100个样本的方差?
我刚刚打印了两种算法的准确度值:
# DecisionTree
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.90)
clf2 = tree.DecisionTreeClassifier()
clf2.fit(X_tr, y_tr)
pred2 = clf2.predict(X_test)
acc2 = clf2.score(X_test, y_test)
acc2 # 0.6983930778739185
# Bagging
clf3 = BaggingClassifier(tree.DecisionTreeClassifier(), max_samples=0.5, max_features=0.5, n_estimators=100,\
verbose=2)
clf3.fit(X_tr, y_tr)
pred3 = clf3.predict(X_test)
acc3=clf3.score(X_test,y_test)
acc3 # 0.911619283065513
我认为您无法从拟合 BaggingClassifier
中获取此信息。但是您可以通过拟合不同的 n_estimators
:
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X, X_test, y, y_test = train_test_split(iris.data,
iris.target,
test_size=0.20)
estimators = list(range(1, 20))
accuracy = []
for n_estimators in estimators:
clf = BaggingClassifier(DecisionTreeClassifier(max_depth=1),
max_samples=0.2,
n_estimators=n_estimators)
clf.fit(X, y)
acc = clf.score(X_test, y_test)
accuracy.append(acc)
plt.plot(estimators, accuracy)
plt.xlabel("Number of estimators")
plt.ylabel("Accuracy")
plt.show()
(当然,鸢尾花数据集很容易只用一个DecisionTreeClassifier
,所以我在这个例子中设置max_depth=1
。)
对于具有统计意义的结果,您应该为每个 n_estimators
拟合 BaggingClassifier
多次,并取所获得准确度的平均值。