Python 索引超出了大小为 3 的轴 0 的范围
Python index is out of bounds for axis 0 with size 3
我有以下模拟测试数据框,所有列都有对象格式,除了 'Defect' 列具有 int 并且是目标特征。
我进行以下步骤:
- 创建数据框
- 在 X 和 y 中拆分
- 建立一个热编码类别的管道
- 使用交叉验证来衡量模型的准确性
import pandas as pd
data = {1 : ['test', '2222', '1111', '3333', '1111'],
2 : ['aaa', 'aaa', 'bbbb', 'ccccc', 'aaa'],
3 : ['x', 'y', 'z', 't', 'x'],
'Defect': [0, 1, 0, 1, 0]
}
data = pd.DataFrame(data)
X = data.drop('Defect', axis = 'columns')
y = data['Defect']
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector, make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
ohe = OneHotEncoder(handle_unknown='ignore')
cat_cols = make_column_selector(dtype_include = 'object')
preprocessor = make_column_transformer((make_pipeline(ohe), cat_cols))
pipe = make_pipeline(preprocessor, LogisticRegression())
from sklearn.model_selection import cross_val_score
scores = cross_val_score(pipe, X, y, cv=3, scoring='accuracy')
print(scores)
不幸的是,我的分数输出是 [nan nan nan]
,在输出下方我收到错误消息:
... The above exception was the direct cause of the following exception: ...
ValueError: all features must be in [0, 2] or [-3, 0]...
为什么会这样?如果我更改一列的数据类型,代码似乎可以工作。
它似乎不喜欢从 1 开始的列名。试试这个:
# V...look here
data = {0 : ['test', '2222', '1111', '3333', '1111'],
1 : ['aaa', 'aaa', 'bbbb', 'ccccc', 'aaa'],
2 : ['x', 'y', 'z', 't', 'x'],
'Defect': [0, 1, 0, 1, 0]
}
我有以下模拟测试数据框,所有列都有对象格式,除了 'Defect' 列具有 int 并且是目标特征。
我进行以下步骤:
- 创建数据框
- 在 X 和 y 中拆分
- 建立一个热编码类别的管道
- 使用交叉验证来衡量模型的准确性
import pandas as pd
data = {1 : ['test', '2222', '1111', '3333', '1111'],
2 : ['aaa', 'aaa', 'bbbb', 'ccccc', 'aaa'],
3 : ['x', 'y', 'z', 't', 'x'],
'Defect': [0, 1, 0, 1, 0]
}
data = pd.DataFrame(data)
X = data.drop('Defect', axis = 'columns')
y = data['Defect']
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector, make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
ohe = OneHotEncoder(handle_unknown='ignore')
cat_cols = make_column_selector(dtype_include = 'object')
preprocessor = make_column_transformer((make_pipeline(ohe), cat_cols))
pipe = make_pipeline(preprocessor, LogisticRegression())
from sklearn.model_selection import cross_val_score
scores = cross_val_score(pipe, X, y, cv=3, scoring='accuracy')
print(scores)
不幸的是,我的分数输出是 [nan nan nan]
,在输出下方我收到错误消息:
... The above exception was the direct cause of the following exception: ...
ValueError: all features must be in [0, 2] or [-3, 0]...
为什么会这样?如果我更改一列的数据类型,代码似乎可以工作。
它似乎不喜欢从 1 开始的列名。试试这个:
# V...look here
data = {0 : ['test', '2222', '1111', '3333', '1111'],
1 : ['aaa', 'aaa', 'bbbb', 'ccccc', 'aaa'],
2 : ['x', 'y', 'z', 't', 'x'],
'Defect': [0, 1, 0, 1, 0]
}