为什么即使输入数组的形状和长度相同,也会出现 Deprecation Warning 和 ValueError?
Why Deprecation Warning and ValueError shows up even when the shapes and length of input arrays is same?
我正在使用 train_test_split
对数据进行分区。我有 2 个特征适合,即汽车的 'horsepower' 和 'price',每个特征包含 199 个元素。所以我尝试了以下代码:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
lm=LinearRegression()
x_train,x_test,y_train,y_test =train_test_split(df['horsepower'],df['price'],test_size=0.3,random_state=0)
model = lm.fit(x_train, y_train)
predictions = lm.predict(x_test)
#Now, just to recheck:
print(x_train.shape == y_train.shape)
>>>True
#And
len(x_train)
>>>139
len(y_train)
>>>139
但是我得到的只是 DeprecationWarning
和 ValueError
:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
和
ValueError: Found input variables with inconsistent numbers of samples: [1, 139]
Sklearn 需要您的 X 数据形状 (n_row, n_column)
。
当您 select DataFrame
中的一列时 df['horsepower']
,您得到的是 pandas.Series
,因此您的形状是 (n_row,)
。
为避免这种情况,您有两个选择:
- select 你的专栏是这样的:
df[['horsepower']]
,这给你一个新的 DataFrame
因此形状是 (n_row, n_column)
- 在拟合你的模型之前做
reshape
:x_train = x_train.reshape(-1,1)
和x_test = x_test.reshape(-1,1)
我正在使用 train_test_split
对数据进行分区。我有 2 个特征适合,即汽车的 'horsepower' 和 'price',每个特征包含 199 个元素。所以我尝试了以下代码:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
lm=LinearRegression()
x_train,x_test,y_train,y_test =train_test_split(df['horsepower'],df['price'],test_size=0.3,random_state=0)
model = lm.fit(x_train, y_train)
predictions = lm.predict(x_test)
#Now, just to recheck:
print(x_train.shape == y_train.shape)
>>>True
#And
len(x_train)
>>>139
len(y_train)
>>>139
但是我得到的只是 DeprecationWarning
和 ValueError
:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
和
ValueError: Found input variables with inconsistent numbers of samples: [1, 139]
Sklearn 需要您的 X 数据形状 (n_row, n_column)
。
当您 select DataFrame
中的一列时 df['horsepower']
,您得到的是 pandas.Series
,因此您的形状是 (n_row,)
。
为避免这种情况,您有两个选择:
- select 你的专栏是这样的:
df[['horsepower']]
,这给你一个新的DataFrame
因此形状是(n_row, n_column)
- 在拟合你的模型之前做
reshape
:x_train = x_train.reshape(-1,1)
和x_test = x_test.reshape(-1,1)