如何使用 SciKit-Learn 预测新房价?
How to Predict a new housing price using SciKit-Learn?
我目前没有机器学习方面的经验,所以我决定尝试在线课程。我正在尝试的项目是波士顿住房数据集。
我想知道如何将新的 DataFrame boston_df2 添加到我当前的 DataFrame boston_df1,以便我可以做出新的预测。我尝试使用下面的附加选项。我的最终目标是对 boston_df_append (boston_df1 + boston_df2).
进行价格预测
我注意到有人问了一个非常相似的问题,但我没有得到明确的答案:How to make prediction using the Boston housing dataset?。
请不要因为我问了类似的问题而打分。我还在学习。 =)
#import boston housing dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
#load boston data
boston = load_boston()
boston;
#create boston_df1 DataFrame
boston_df1 = pd.DataFrame(boston['data'], columns = boston['feature_names'])
boston_df1['target'] = pd.Series(boston['target'])
#Random seed
np.random.seed(42)
#create the data
X = boston_df1.drop('target', axis=1)
y = boston_df1['target']
#split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
#instantiate and fit model
model = RandomForestRegressor().fit(X_train, y_train)
#make predictions on new data
y_preds = model.predict(X_test)
y_preds[-1]
#check model score (accuracy)
model.score(X_test, y_test)
我想添加新房子的数据,boston_df2,来预测房子的价格。
#add boston_df2 DataFrame
boston_df2 = {'CRIM': 0.6, 'ZN': 0.0, 'INDUS': 2.5, 'CHAS': 0.0, 'NOX': 0.6, 'RM': 8.0, 'AGE': 80, 'DIS': 5.0, 'RAD': 2.0, 'TAX': 300.0, 'PTRATIO': 20.0, 'B': 400.0, 'LSTAT': 10.0}
boston_df_append = boston_df1.append(boston_df2, ignore_index=True)
有人对我如何实现这个有什么建议吗?提前感谢您提供的任何帮助! =)
根据您的命令预测新价格:
boston_df2 = pd.DataFrame.from_dict(boston_df2, orient='index').T
boston_df2['target'] = model.predict(boston_df2)
>>> boston_df2['target']
0 35.981
Name: target, dtype: float64
>>> boston_df2
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT target
0 0.6 0.0 2.5 0.0 0.6 8.0 80.0 5.0 2.0 300.0 20.0 400.0 10.0 35.981
我目前没有机器学习方面的经验,所以我决定尝试在线课程。我正在尝试的项目是波士顿住房数据集。
我想知道如何将新的 DataFrame boston_df2 添加到我当前的 DataFrame boston_df1,以便我可以做出新的预测。我尝试使用下面的附加选项。我的最终目标是对 boston_df_append (boston_df1 + boston_df2).
进行价格预测我注意到有人问了一个非常相似的问题,但我没有得到明确的答案:How to make prediction using the Boston housing dataset?。
请不要因为我问了类似的问题而打分。我还在学习。 =)
#import boston housing dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
#load boston data
boston = load_boston()
boston;
#create boston_df1 DataFrame
boston_df1 = pd.DataFrame(boston['data'], columns = boston['feature_names'])
boston_df1['target'] = pd.Series(boston['target'])
#Random seed
np.random.seed(42)
#create the data
X = boston_df1.drop('target', axis=1)
y = boston_df1['target']
#split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
#instantiate and fit model
model = RandomForestRegressor().fit(X_train, y_train)
#make predictions on new data
y_preds = model.predict(X_test)
y_preds[-1]
#check model score (accuracy)
model.score(X_test, y_test)
我想添加新房子的数据,boston_df2,来预测房子的价格。
#add boston_df2 DataFrame
boston_df2 = {'CRIM': 0.6, 'ZN': 0.0, 'INDUS': 2.5, 'CHAS': 0.0, 'NOX': 0.6, 'RM': 8.0, 'AGE': 80, 'DIS': 5.0, 'RAD': 2.0, 'TAX': 300.0, 'PTRATIO': 20.0, 'B': 400.0, 'LSTAT': 10.0}
boston_df_append = boston_df1.append(boston_df2, ignore_index=True)
有人对我如何实现这个有什么建议吗?提前感谢您提供的任何帮助! =)
根据您的命令预测新价格:
boston_df2 = pd.DataFrame.from_dict(boston_df2, orient='index').T
boston_df2['target'] = model.predict(boston_df2)
>>> boston_df2['target']
0 35.981
Name: target, dtype: float64
>>> boston_df2
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT target
0 0.6 0.0 2.5 0.0 0.6 8.0 80.0 5.0 2.0 300.0 20.0 400.0 10.0 35.981