What features are causing TypeError: '<' not supported between instances of 'str' and 'int'

What features are causing TypeError: '<' not supported between instances of 'str' and 'int'

如何找到导致此错误的功能:

c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
Cross validation scores with F1 scoring [nan nan nan nan nan]
AVG Cross validation score with F1 scoring nan 

c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

  FitFailedWarning)
c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py:614: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 331, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\ensemble\_forest.py", line 605, in _validate_y_class_weight
    y_original)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 167, in compute_sample_weight
    y=y_full)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\class_weight.py", line 66, in compute_class_weight
    i = np.searchsorted(classes, c)
  File "<__array_function__ internals>", line 6, in searchsorted
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 1343, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 67, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "c:\users\pc\appdata\local\programs\python\python37\lib\site-packages\numpy\core\fromnumeric.py", line 44, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: '<' not supported between instances of 'str' and 'int'

我正在创建一个 ML 模型,当我想训练我的模型时,我总是会遇到该错误。 我的数据类型如下所示:

label          object
f1             object
f2             object
f3             object
f4             object
f5             object
f6             object
f7             object
f8             float64
f9             float64
f10            float64
f11            float64
f12            float64
f13            int64
f14            float64
f15            object
f16            object
f17            int64
f18            int64
f19            int64
f20            int64
f21            int64
f22            int64

我的列中没有 NaN 值,我没有包含混合值的列(包含字符串和数字的列)。 现在我正在转换列:

columns_for_encoding = ['f1',
                       'f2',
                       'f3',
                       'f4',
                       'f5',
                       'f6',
                       'f7',
                       'f15',
                       'f16']

columns_for_scaling = ['f8','f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f17', 'f18', 'f19', 'f20', 'f20', 'f21', 'f22']

transformerVectoriser = ColumnTransformer(transformers=[('Vector Cat', OneHotEncoder(handle_unknown = "ignore"), columns_for_encoding),
                                                        ('Normalizer', Normalizer(), columns_for_scaling)],
                                          remainder='passthrough') 

现在我正在训练模型:

classifiers = [["RandomForestClassifier 30", RandomForestClassifier(max_depth = 30, n_estimators = 175, random_state = 42, class_weight = {1: 3.5, 0: 1})],
               ["LogisticRegression", LogisticRegression(max_iter = 5000, class_weight = {1: 3.5, 0: 1})], 
               ["GradientBoostingClassifier", GradientBoostingClassifier(max_depth = 25, n_estimators = 175, random_state = 42)]]

for class_ in classifiers:
    
    name = class_[0]
    clf = class_[1]
    print(name)
    
    pipeline = Pipeline([('transformer', transformerVectoriser),
                         ('classifier', clf)])

    cv_score_f1 = cross_val_score(pipeline, features, results, cv=5, scoring = 'f1')
    cv_score_f1.sort()
    print('Cross validation scores with F1 scoring', cv_score_f1)
    cv_score_f1 = round(np.average(cv_score_f1), 5)
    print("AVG Cross validation score with F1 scoring", cv_score_f1, '\n')

    cv_score_acc = cross_val_score(pipeline, features, results, cv=5, scoring = 'accuracy')
    cv_score_acc.sort()
    print('Cross validation scores with accuracy scoring', cv_score_acc)
    cv_score_acc = round(np.average(cv_score_acc), 5)
    print("AVG Cross validation score with accuracy scoring", cv_score_acc, '\n')
    print()

有没有办法找出导致我出错的列?

我看到您的标签列是 object 类型,这意味着它是一个字符串。但是在 class 权重中,您在 class_weight = {1: 3.5, 0: 1} 中使用了一个整数,因此您可以正确指定 classes,或者 labelEncode.

使用示例数据集,我的标签是“是”或“否”:

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder, Normalizer, LabelEncoder
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np

df = pd.DataFrame({'f1':np.random.uniform(0,1,100),
'f2':np.random.choice(['a','b','c'],100),
'label':np.random.choice(['yes','no'],100)})

df.dtypes
f1       float64
f2        object
label     object

如果我们像您一样设置管道,我会收到类似的错误:

columns_for_encoding = ['f2']
columns_for_scaling = ['f1']

transformerVectoriser = ColumnTransformer(
transformers=[('Vector Cat', OneHotEncoder(handle_unknown = "ignore"), columns_for_encoding),
('Normalizer', Normalizer(), columns_for_scaling)],
remainder='passthrough') 

pipeline = Pipeline([('transformer', transformerVectoriser),
                         ('classifier', RandomForestClassifier(class_weight = {1: 3.5, 0: 1}))])

pipeline.fit(df[['f1','f2']],df['label'])

让我们正确定义权重,它起作用了:

pipeline = Pipeline([('transformer', transformerVectoriser),
                             ('classifier', RandomForestClassifier(class_weight = {'yes': 3.5, 'no': 1}))])

pipeline.fit(df[['f1','f2']],df['label'])

Pipeline(steps=[('transformer',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('Vector Cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['f2']),
                                                 ('Normalizer', Normalizer(),
                                                  ['f1'])])),
                ('classifier',
                 RandomForestClassifier(class_weight={'no': 1, 'yes': 3.5}))])