sklearn AssertionError: not equal to tolerance on custom estimator
sklearn AssertionError: not equal to tolerance on custom estimator
出于学习原因,我正在使用 scikit-learn 界面创建自定义分类器。所以,我想出了以下代码:
import numpy as np
from sklearn.utils.estimator_checks import check_estimator
from sklearn.base import BaseEstimator, ClassifierMixin, check_X_y
from sklearn.utils.validation import check_array, check_is_fitted, check_random_state
class TemplateEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, threshold=0.5, random_state=None):
self.threshold = threshold
self.random_state = random_state
def fit(self, X, y):
self.random_state_ = check_random_state(self.random_state)
X, y = check_X_y(X, y)
self.classes_ = np.unique(y)
self.fitted_ = True
return self
def predict(self, X):
check_is_fitted(self)
X = check_array(X)
y_hat = self.random_state_.choice(self.classes_, size=X.shape[0])
return y_hat
check_estimator(TemplateEstimator())
这个分类器只是随机猜测。我尽力遵循 developing my own estimator 的 scikit-learn 文档和指南。但是,我收到以下错误:
AssertionError:
Arrays are not equal
Classifier cant predict when only one class is present.
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 1.
Max relative difference: 1.
x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
y: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
我不能确定,但我猜是随机性(即 self.random_state_
)导致了错误。我正在使用 sklearn 版本 1.0.2
.
首先要注意的是,如果将 parametrize_with_checks
与 pytest
一起使用而不是 check_estimator
,则可以获得更好的输出。它看起来像:
@parametrize_with_checks([TemplateEstimator()])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
如果你 运行 使用 pytest,你将得到包含以下失败测试的输出:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_pipeline_consistency] - AssertionError:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train(readonly_memmap=True)] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_regression_target] - AssertionError: Did not raise: [<class 'ValueErr...
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_methods_sample_order_invariance] - AssertionError:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_methods_subset_invariance] - AssertionError:
其中一些测试检查某些输出一致性,这与您的情况无关,因为您 return 随机值。在这种情况下,您需要设置 non_deterministic
estimator tag
。其他一些测试,例如 check_classifiers_regression_target
检查您是否进行了正确的验证并引发了正确的错误,而您没有。所以你要么需要修复它,要么添加 no_validation
标签。另一个问题是 check_classifier_train
检查您的模型是否为给定问题提供了合理的输出。但是由于您 returning 随机值,因此不满足这些条件。您可以设置 poor_score
估计器标签以跳过它。
您可以通过将此添加到估算器来添加这些标签:
class TemplateEstimator(BaseEstimator, ClassifierMixin):
...
def _more_tags(self):
return {
"non_deterministic": True,
"no_validation": True,
"poor_score": True,
}
但即便如此,如果您使用 scikit-learn 的 main
分支或夜间构建,两个测试仍会失败。我相信这需要修复,我已经与上游开放 an issue for it (EDIT: the fix is now merged 并将在下一个版本中提供)。您可以通过在标签中将这些测试设置为预期失败来避免这些失败。最后,您的估算器将如下所示:
import numpy as np
from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.base import BaseEstimator, ClassifierMixin, check_X_y
from sklearn.utils.validation import check_array, check_is_fitted, check_random_state
class TemplateEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, threshold=0.5, random_state=None):
self.threshold = threshold
self.random_state = random_state
def fit(self, X, y):
self.random_state_ = check_random_state(self.random_state)
X, y = check_X_y(X, y)
self.classes_ = np.unique(y)
self.fitted_ = True
return self
def predict(self, X):
check_is_fitted(self)
X = check_array(X)
y_hat = self.random_state_.choice(self.classes_, size=X.shape[0])
return y_hat
def _more_tags(self):
return {
"non_deterministic": True,
"no_validation": True,
"poor_score": True,
"_xfail_checks": {
"check_methods_sample_order_invariance": "This test shouldn't be running at all!",
"check_methods_subset_invariance": "This test shouldn't be running at all!",
},
}
@parametrize_with_checks([TemplateEstimator()])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
出于学习原因,我正在使用 scikit-learn 界面创建自定义分类器。所以,我想出了以下代码:
import numpy as np
from sklearn.utils.estimator_checks import check_estimator
from sklearn.base import BaseEstimator, ClassifierMixin, check_X_y
from sklearn.utils.validation import check_array, check_is_fitted, check_random_state
class TemplateEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, threshold=0.5, random_state=None):
self.threshold = threshold
self.random_state = random_state
def fit(self, X, y):
self.random_state_ = check_random_state(self.random_state)
X, y = check_X_y(X, y)
self.classes_ = np.unique(y)
self.fitted_ = True
return self
def predict(self, X):
check_is_fitted(self)
X = check_array(X)
y_hat = self.random_state_.choice(self.classes_, size=X.shape[0])
return y_hat
check_estimator(TemplateEstimator())
这个分类器只是随机猜测。我尽力遵循 developing my own estimator 的 scikit-learn 文档和指南。但是,我收到以下错误:
AssertionError:
Arrays are not equal
Classifier cant predict when only one class is present.
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 1.
Max relative difference: 1.
x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
y: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
我不能确定,但我猜是随机性(即 self.random_state_
)导致了错误。我正在使用 sklearn 版本 1.0.2
.
首先要注意的是,如果将 parametrize_with_checks
与 pytest
一起使用而不是 check_estimator
,则可以获得更好的输出。它看起来像:
@parametrize_with_checks([TemplateEstimator()])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
如果你 运行 使用 pytest,你将得到包含以下失败测试的输出:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_pipeline_consistency] - AssertionError:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train(readonly_memmap=True)] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] - AssertionError
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_classifiers_regression_target] - AssertionError: Did not raise: [<class 'ValueErr...
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_methods_sample_order_invariance] - AssertionError:
FAILED ../../../../tmp/1.py::test_sklearn_compatible_estimator[TemplateEstimator()-check_methods_subset_invariance] - AssertionError:
其中一些测试检查某些输出一致性,这与您的情况无关,因为您 return 随机值。在这种情况下,您需要设置 non_deterministic
estimator tag
。其他一些测试,例如 check_classifiers_regression_target
检查您是否进行了正确的验证并引发了正确的错误,而您没有。所以你要么需要修复它,要么添加 no_validation
标签。另一个问题是 check_classifier_train
检查您的模型是否为给定问题提供了合理的输出。但是由于您 returning 随机值,因此不满足这些条件。您可以设置 poor_score
估计器标签以跳过它。
您可以通过将此添加到估算器来添加这些标签:
class TemplateEstimator(BaseEstimator, ClassifierMixin):
...
def _more_tags(self):
return {
"non_deterministic": True,
"no_validation": True,
"poor_score": True,
}
但即便如此,如果您使用 scikit-learn 的 main
分支或夜间构建,两个测试仍会失败。我相信这需要修复,我已经与上游开放 an issue for it (EDIT: the fix is now merged 并将在下一个版本中提供)。您可以通过在标签中将这些测试设置为预期失败来避免这些失败。最后,您的估算器将如下所示:
import numpy as np
from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.base import BaseEstimator, ClassifierMixin, check_X_y
from sklearn.utils.validation import check_array, check_is_fitted, check_random_state
class TemplateEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, threshold=0.5, random_state=None):
self.threshold = threshold
self.random_state = random_state
def fit(self, X, y):
self.random_state_ = check_random_state(self.random_state)
X, y = check_X_y(X, y)
self.classes_ = np.unique(y)
self.fitted_ = True
return self
def predict(self, X):
check_is_fitted(self)
X = check_array(X)
y_hat = self.random_state_.choice(self.classes_, size=X.shape[0])
return y_hat
def _more_tags(self):
return {
"non_deterministic": True,
"no_validation": True,
"poor_score": True,
"_xfail_checks": {
"check_methods_sample_order_invariance": "This test shouldn't be running at all!",
"check_methods_subset_invariance": "This test shouldn't be running at all!",
},
}
@parametrize_with_checks([TemplateEstimator()])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)