keras和scikit-learn在精度计算上的区别
difference in the calculation of accuracy between keras and scikit-learn
我目前正在使用 keras 中的 CNN 进行多标签图像分类。
除了keras的准确率,我们也用各种评估方法(recall、precision、F1 score和accuracy)再次确认了scikit-learn的准确率。
我们发现keras计算的准确率显示在90%左右,而scikit-learn显示准确率只有60%左右。
我不知道为什么会这样,所以请告诉我。
是不是keras计算有问题?
我们使用 sigmoid 作为激活函数,binary_crossentropy
作为损失函数,adam 作为优化器。
Keras 训练
input_tensor = Input(shape=(img_width, img_height, 3))
base_model = MobileNetV2(include_top=False, weights='imagenet')
#model.summary()
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(2048, activation='relu')(x)
#x = Dropout(0.5)(x)
x = Dense(1024, activation = 'relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(6, activation = 'sigmoid')(x)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs = base_model.input, outputs = predictions)
print("{}層".format(len(model.layers)))
model.compile(optimizer=sgd, loss="binary_crossentropy", metrics=["acc"])
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), batch_size=64, verbose=2)
model_evaluate()
Keras 显示 90%(准确率)。
scikit-learn 检查
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
thresholds=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
y_pred = model.predict(X_test)
for val in thresholds:
print("For threshold: ", val)
pred=y_pred.copy()
pred[pred>=val]=1
pred[pred<val]=0
accuracy = accuracy_score(y_test, pred)
precision = precision_score(y_test, pred, average='micro')
recall = recall_score(y_test, pred, average='micro')
f1 = f1_score(y_test, pred, average='micro')
print("Micro-average quality numbers")
print("Acc: {:.4f}, Precision: {:.4f}, Recall: {:.4f}, F1-measure: {:.4f}".format(accuracy, precision, recall, f1))
输出(scikit-learn)
For threshold: 0.1
Micro-average quality numbers
Acc: 0.0727, Precision: 0.3776, Recall: 0.8727, F1-measure: 0.5271
For threshold: 0.2
Micro-average quality numbers
Acc: 0.1931, Precision: 0.4550, Recall: 0.8033, F1-measure: 0.5810
For threshold: 0.3
Micro-average quality numbers
Acc: 0.3323, Precision: 0.5227, Recall: 0.7403, F1-measure: 0.6128
For threshold: 0.4
Micro-average quality numbers
Acc: 0.4574, Precision: 0.5842, Recall: 0.6702, F1-measure: 0.6243
For threshold: 0.5
Micro-average quality numbers
Acc: 0.5059, Precision: 0.6359, Recall: 0.5858, F1-measure: 0.6098
For threshold: 0.6
Micro-average quality numbers
Acc: 0.4597, Precision: 0.6993, Recall: 0.4707, F1-measure: 0.5626
For threshold: 0.7
Micro-average quality numbers
Acc: 0.3417, Precision: 0.7520, Recall: 0.3383, F1-measure: 0.4667
For threshold: 0.8
Micro-average quality numbers
Acc: 0.2205, Precision: 0.7863, Recall: 0.2132, F1-measure: 0.3354
For threshold: 0.9
Micro-average quality numbers
Acc: 0.1063, Precision: 0.8987, Recall: 0.1016, F1-measure: 0.1825
在多标签分类的情况下,可能有两种正确答案。
如果预测的所有子标签都是正确的。示例:在演示数据集 y_true
中,有 5 个输出。在y_pred
中,其中3个是完全正确的。
在这种情况下,准确度应该是 60%
.
如果我们也考虑多标签分类的子标签,那么准确率会发生变化。示例:演示数据集 y_true
总共包含 15 个预测。 y_pred
正确预测了其中的 10 个。在这种情况下,准确度应该是 66.7%
.
SkLearn 处理第 1 点中所述的多标签分类。然而,
Keras 精度指标遵循第 2 点中所述的方法。下面给出了代码示例。
代码:
import tensorflow as tf
from sklearn.metrics import accuracy_score
import numpy as np
# A demo dataset
y_true = np.array([[0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0], [1, 0, 1]])
y_pred = np.array([[1, 0, 0], [1, 0, 0], [0, 0, 0], [0, 0, 0], [1, 0, 1]])
kacc = tf.keras.metrics.Accuracy()
_ = kacc.update_state(y_true, y_pred)
print(f'Keras Accuracy acc: {kacc.result().numpy()*100:.3}')
kbacc = tf.keras.metrics.BinaryAccuracy()
_ = kbacc.update_state(y_true, y_pred)
print(f'Keras BinaryAccuracy acc: {kbacc.result().numpy()*100:.3}')
print(f'SkLearn acc: {accuracy_score(y_true, y_pred)*100:.3}')
输出:
Keras Accuracy acc: 66.7
Keras BinaryAccuracy acc: 66.7
SkLearn acc: 60.0
因此,您必须选择其中的任何一个选项。如果您选择使用方法 1,则必须手动实施准确度指标。但是,多标签训练通常使用 sigmoid
和 binary_crossentropy
损失来完成。 binary_crossentropy
最小化损失是基于方法2,所以,你也应该照做。
我目前正在使用 keras 中的 CNN 进行多标签图像分类。 除了keras的准确率,我们也用各种评估方法(recall、precision、F1 score和accuracy)再次确认了scikit-learn的准确率。
我们发现keras计算的准确率显示在90%左右,而scikit-learn显示准确率只有60%左右。
我不知道为什么会这样,所以请告诉我。
是不是keras计算有问题?
我们使用 sigmoid 作为激活函数,binary_crossentropy
作为损失函数,adam 作为优化器。
Keras 训练
input_tensor = Input(shape=(img_width, img_height, 3))
base_model = MobileNetV2(include_top=False, weights='imagenet')
#model.summary()
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(2048, activation='relu')(x)
#x = Dropout(0.5)(x)
x = Dense(1024, activation = 'relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(6, activation = 'sigmoid')(x)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs = base_model.input, outputs = predictions)
print("{}層".format(len(model.layers)))
model.compile(optimizer=sgd, loss="binary_crossentropy", metrics=["acc"])
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_val, y_val), batch_size=64, verbose=2)
model_evaluate()
Keras 显示 90%(准确率)。
scikit-learn 检查
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
thresholds=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
y_pred = model.predict(X_test)
for val in thresholds:
print("For threshold: ", val)
pred=y_pred.copy()
pred[pred>=val]=1
pred[pred<val]=0
accuracy = accuracy_score(y_test, pred)
precision = precision_score(y_test, pred, average='micro')
recall = recall_score(y_test, pred, average='micro')
f1 = f1_score(y_test, pred, average='micro')
print("Micro-average quality numbers")
print("Acc: {:.4f}, Precision: {:.4f}, Recall: {:.4f}, F1-measure: {:.4f}".format(accuracy, precision, recall, f1))
输出(scikit-learn)
For threshold: 0.1
Micro-average quality numbers
Acc: 0.0727, Precision: 0.3776, Recall: 0.8727, F1-measure: 0.5271
For threshold: 0.2
Micro-average quality numbers
Acc: 0.1931, Precision: 0.4550, Recall: 0.8033, F1-measure: 0.5810
For threshold: 0.3
Micro-average quality numbers
Acc: 0.3323, Precision: 0.5227, Recall: 0.7403, F1-measure: 0.6128
For threshold: 0.4
Micro-average quality numbers
Acc: 0.4574, Precision: 0.5842, Recall: 0.6702, F1-measure: 0.6243
For threshold: 0.5
Micro-average quality numbers
Acc: 0.5059, Precision: 0.6359, Recall: 0.5858, F1-measure: 0.6098
For threshold: 0.6
Micro-average quality numbers
Acc: 0.4597, Precision: 0.6993, Recall: 0.4707, F1-measure: 0.5626
For threshold: 0.7
Micro-average quality numbers
Acc: 0.3417, Precision: 0.7520, Recall: 0.3383, F1-measure: 0.4667
For threshold: 0.8
Micro-average quality numbers
Acc: 0.2205, Precision: 0.7863, Recall: 0.2132, F1-measure: 0.3354
For threshold: 0.9
Micro-average quality numbers
Acc: 0.1063, Precision: 0.8987, Recall: 0.1016, F1-measure: 0.1825
在多标签分类的情况下,可能有两种正确答案。
如果预测的所有子标签都是正确的。示例:在演示数据集
y_true
中,有 5 个输出。在y_pred
中,其中3个是完全正确的。 在这种情况下,准确度应该是60%
.如果我们也考虑多标签分类的子标签,那么准确率会发生变化。示例:演示数据集
y_true
总共包含 15 个预测。y_pred
正确预测了其中的 10 个。在这种情况下,准确度应该是66.7%
.
SkLearn 处理第 1 点中所述的多标签分类。然而, Keras 精度指标遵循第 2 点中所述的方法。下面给出了代码示例。
代码:
import tensorflow as tf
from sklearn.metrics import accuracy_score
import numpy as np
# A demo dataset
y_true = np.array([[0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0], [1, 0, 1]])
y_pred = np.array([[1, 0, 0], [1, 0, 0], [0, 0, 0], [0, 0, 0], [1, 0, 1]])
kacc = tf.keras.metrics.Accuracy()
_ = kacc.update_state(y_true, y_pred)
print(f'Keras Accuracy acc: {kacc.result().numpy()*100:.3}')
kbacc = tf.keras.metrics.BinaryAccuracy()
_ = kbacc.update_state(y_true, y_pred)
print(f'Keras BinaryAccuracy acc: {kbacc.result().numpy()*100:.3}')
print(f'SkLearn acc: {accuracy_score(y_true, y_pred)*100:.3}')
输出:
Keras Accuracy acc: 66.7
Keras BinaryAccuracy acc: 66.7
SkLearn acc: 60.0
因此,您必须选择其中的任何一个选项。如果您选择使用方法 1,则必须手动实施准确度指标。但是,多标签训练通常使用 sigmoid
和 binary_crossentropy
损失来完成。 binary_crossentropy
最小化损失是基于方法2,所以,你也应该照做。