DataFrameMapper scikit-learn ValueError: all the input array dimensions except for the concatenation axis must match exactly
DataFrameMapper scikit-learn ValueError: all the input array dimensions except for the concatenation axis must match exactly
我一直在尝试使用 DataFrameMapper
将我的数据帧上的多个预处理转换添加到我的 scikit-learn 管道中。
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings']
df = pd.read_csv(url, names=names)
mapper = DataFrameMapper(
[('Height', Normalizer()), ('Sex', LabelBinarizer())]
)
stages = []
stages += [("mapper", mapper)]
estimator = DecisionTreeClassifier()
stages += [("dtree", estimator)]
pipeline = Pipeline(stages)
labelCol = 'Rings'
target = df[labelCol]
data = df.drop(labelCol, axis=1)
train_data, test_data, train_target, expected = train_test_split(data, target, test_size=0.25, random_state=33)
model = pipeline.fit(train_data, train_target)
但是,我收到以下错误:
Traceback (most recent call last):
File "app/experimenter/sklearn/transformations.py", line 65, in <module>
model = pipeline.fit(train_data, train_target)
File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 268, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 234, in _fit
Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
File "/Library/Python/2.7/site-packages/sklearn/base.py", line 497, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "/Library/Python/2.7/site-packages/sklearn_pandas/dataframe_mapper.py", line 225, in transform
stacked = np.hstack(extracted)
File "/Library/Python/2.7/site-packages/numpy/core/shape_base.py", line 288, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
我错过了什么?
谢谢:)
您将不得不更改 DataFrameMapper
的结构:
mapper = DataFrameMapper(
[(['Height'], Normalizer()), ('Sex', LabelBinarizer())]
)
这是一个微妙的细节,可以在 sklearn_pandas 的文档中找到:
Map the Columns to Transformations
The difference between specifying the column selector as 'column'
(as a simple string) and ['column']
(as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
[...]
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like OneHotEncoder
or Imputer
, expect 2-dimensional input, with the shape [n_samples, n_features]
.
我一直在尝试使用 DataFrameMapper
将我的数据帧上的多个预处理转换添加到我的 scikit-learn 管道中。
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings']
df = pd.read_csv(url, names=names)
mapper = DataFrameMapper(
[('Height', Normalizer()), ('Sex', LabelBinarizer())]
)
stages = []
stages += [("mapper", mapper)]
estimator = DecisionTreeClassifier()
stages += [("dtree", estimator)]
pipeline = Pipeline(stages)
labelCol = 'Rings'
target = df[labelCol]
data = df.drop(labelCol, axis=1)
train_data, test_data, train_target, expected = train_test_split(data, target, test_size=0.25, random_state=33)
model = pipeline.fit(train_data, train_target)
但是,我收到以下错误:
Traceback (most recent call last):
File "app/experimenter/sklearn/transformations.py", line 65, in <module>
model = pipeline.fit(train_data, train_target)
File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 268, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 234, in _fit
Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
File "/Library/Python/2.7/site-packages/sklearn/base.py", line 497, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "/Library/Python/2.7/site-packages/sklearn_pandas/dataframe_mapper.py", line 225, in transform
stacked = np.hstack(extracted)
File "/Library/Python/2.7/site-packages/numpy/core/shape_base.py", line 288, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
我错过了什么?
谢谢:)
您将不得不更改 DataFrameMapper
的结构:
mapper = DataFrameMapper(
[(['Height'], Normalizer()), ('Sex', LabelBinarizer())]
)
这是一个微妙的细节,可以在 sklearn_pandas 的文档中找到:
Map the Columns to Transformations
The difference between specifying the column selector as
'column'
(as a simple string) and['column']
(as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.[...]
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like
OneHotEncoder
orImputer
, expect 2-dimensional input, with the shape[n_samples, n_features]
.