如何使用 SciKit-Learn 预测新房价？

Question

我目前没有机器学习方面的经验，所以我决定尝试在线课程。我正在尝试的项目是波士顿住房数据集。

我想知道如何将新的 DataFrame boston_df2 添加到我当前的 DataFrame boston_df1，以便我可以做出新的预测。我尝试使用下面的附加选项。我的最终目标是对 boston_df_append (boston_df1 + boston_df2).

进行价格预测

我注意到有人问了一个非常相似的问题，但我没有得到明确的答案：How to make prediction using the Boston housing dataset?。

请不要因为我问了类似的问题而打分。我还在学习。 =)


#import boston housing dataset

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_boston

from sklearn.ensemble import RandomForestRegressor

#load boston data

boston = load_boston()

boston;

#create boston_df1 DataFrame

boston_df1 = pd.DataFrame(boston['data'], columns = boston['feature_names'])

boston_df1['target'] = pd.Series(boston['target'])

#Random seed

np.random.seed(42)

#create the data

X = boston_df1.drop('target', axis=1)

y = boston_df1['target']

#split into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

#instantiate and fit model

model = RandomForestRegressor().fit(X_train, y_train)

#make predictions on new data   

y_preds = model.predict(X_test)

y_preds[-1]

#check model score (accuracy)

model.score(X_test, y_test)

我想添加新房子的数据，boston_df2，来预测房子的价格。

#add boston_df2 DataFrame

boston_df2 = {'CRIM': 0.6, 'ZN': 0.0, 'INDUS': 2.5, 'CHAS': 0.0, 'NOX': 0.6, 'RM': 8.0, 'AGE': 80, 'DIS': 5.0, 'RAD': 2.0, 'TAX': 300.0, 'PTRATIO': 20.0, 'B': 400.0, 'LSTAT': 10.0}

boston_df_append = boston_df1.append(boston_df2, ignore_index=True)

有人对我如何实现这个有什么建议吗？提前感谢您提供的任何帮助！ =)

Answer 1

根据您的命令预测新价格：

boston_df2 = pd.DataFrame.from_dict(boston_df2, orient='index').T
boston_df2['target'] = model.predict(boston_df2)

>>> boston_df2['target']
0    35.981
Name: target, dtype: float64

>>> boston_df2
   CRIM   ZN  INDUS  CHAS  NOX   RM   AGE  DIS  RAD    TAX  PTRATIO      B  LSTAT  target
0   0.6  0.0    2.5   0.0  0.6  8.0  80.0  5.0  2.0  300.0     20.0  400.0   10.0  35.981

如何使用 SciKit-Learn 预测新房价？

How to Predict a new housing price using SciKit-Learn?

python

machine-learning

prediction

pandas

scikit-learn