如何在sklearn中的RandomForest中的不同迭代中获得相同的结果
How to get the same results in different iterations in RandomForest in sklearn
我使用随机森林分类器进行分类,在每次迭代中我得到不同的结果。我的代码如下。
input_file = 'sample.csv'
df1 = pd.read_csv(input_file)
df2 = pd.read_csv(input_file)
X=df1.drop(['lable'], axis=1) # Features
y=df2['lable'] # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf=RandomForestClassifier(random_state = 42, class_weight="balanced")
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
根据其他 answers 的建议,我添加了参数 n_estimators
和 random_state
。但是,它对我不起作用。
我已附上 csv 文件 here:
如果需要,我很乐意提供更多详细信息。
您还需要为训练测试拆分设置随机状态。
以下代码将为您提供可重现的结果。推荐的方法是不更改 random_state 值以提高性能。
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
df1=pd.read_csv('sample.csv')
X=df1.drop(['lable'], axis=1) # Features
y=df1['lable'] # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=5)
clf=RandomForestClassifier(random_state = 42, class_weight="balanced")
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
输出:
Accuracy: 0.6777777777777778
我使用随机森林分类器进行分类,在每次迭代中我得到不同的结果。我的代码如下。
input_file = 'sample.csv'
df1 = pd.read_csv(input_file)
df2 = pd.read_csv(input_file)
X=df1.drop(['lable'], axis=1) # Features
y=df2['lable'] # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf=RandomForestClassifier(random_state = 42, class_weight="balanced")
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
根据其他 answers 的建议,我添加了参数 n_estimators
和 random_state
。但是,它对我不起作用。
我已附上 csv 文件 here:
如果需要,我很乐意提供更多详细信息。
您还需要为训练测试拆分设置随机状态。
以下代码将为您提供可重现的结果。推荐的方法是不更改 random_state 值以提高性能。
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
df1=pd.read_csv('sample.csv')
X=df1.drop(['lable'], axis=1) # Features
y=df1['lable'] # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=5)
clf=RandomForestClassifier(random_state = 42, class_weight="balanced")
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
输出:
Accuracy: 0.6777777777777778