为什么即使输入数组的形状和长度相同,也会出现 Deprecation Warning 和 ValueError?

Why Deprecation Warning and ValueError shows up even when the shapes and length of input arrays is same?

我正在使用 train_test_split 对数据进行分区。我有 2 个特征适合,即汽车的 'horsepower' 和 'price',每个特征包含 199 个元素。所以我尝试了以下代码:

    import pandas as pd
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split

    lm=LinearRegression()

    x_train,x_test,y_train,y_test =train_test_split(df['horsepower'],df['price'],test_size=0.3,random_state=0)

    model = lm.fit(x_train, y_train)
    predictions = lm.predict(x_test)

    #Now, just to recheck:
    print(x_train.shape == y_train.shape)
    >>>True

    #And
    len(x_train)
    >>>139

    len(y_train)
    >>>139

但是我得到的只是 DeprecationWarningValueError:

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

ValueError: Found input variables with inconsistent numbers of samples: [1, 139]

Sklearn 需要您的 X 数据形状 (n_row, n_column)

当您 select DataFrame 中的一列时 df['horsepower'],您得到的是 pandas.Series,因此您的形状是 (n_row,)

为避免这种情况,您有两个选择:

  • select 你的专栏是这样的:df[['horsepower']],这给你一个新的 DataFrame 因此形状是 (n_row, n_column)
  • 在拟合你的模型之前做reshapex_train = x_train.reshape(-1,1)x_test = x_test.reshape(-1,1)