自定义随机森林分类器 sklearn
Customizing Random Forest classifier sklearn
出于个人目的,我正在尝试修改 sklearn
中的 Random Forest Classifier
class 以实现我的意图。基本上,我正在尝试让我的随机森林树采用一些预定义的特征和案例子样本,因此我正在修改默认值 class。我试图继承原始 sklearn
的所有方法和结构,以便我自定义的随机森林 class 的拟合方法可以采用 sklearn
的原始参数
例如,我希望我的自定义 class 能够采用与原始拟合方法相同的参数:
clf = RandomForestClassifier(n_estimators=10, max_depth=2, random_state=None, max_features=None...)
clf = Customized_RF(n_estimators=10, max_depth=2, random_state=None, max_features=None...)
但我在执行此操作时遇到了一些困难,具体来说,它似乎与 super().__init__
定义相关,我收到以下错误:TypeError: object.__init__() takes no arguments
我遵循 github 存储库作为指南
我是不是做错了什么或遗漏了一些明显的步骤?
这是我目前的方法:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
class Customized_RF:
def __init__(self, n_estimators=10, criterion='gini', max_depth=None, random_state=None):
super().__init__(base_estimator=DecisionTreeClassifier(),
n_estimators=n_estimators,
estimator_params=("criterion", "max_depth")) # Here's where the error happens
self.n_estimators = n_estimators
if random_state is None:
self.random_state = np.random.RandomState()
else:
self.random_state = np.random.RandomState(random_state)
self.criterion = criterion
self.max_depth = max_depth
def fit(self, X, y, max_features=None, cutoff=None, bootstrap_frac=0.8):
"""
max_features: number of features that each estimator will use,
including the fixed features.
bootstrap_frac: the size of bootstrap sample that each estimator will use.
cutoff: index feature number from which starting the features subsampling selection. Subsampling for each tree will be done retrieven a random number of features before and after the cutoff. Assuming that the features matrix is not sorted or altered somehow (sparsed).
"""
self.estimators = []
self.n_classes = np.unique(y).shape[0]
if max_features is None:
max_features = X.shape[1] # if max_features is None select all features for every estimator like original
if cutoff is None:
cutoff = int(X.shape[1] / 2) # pick the central index number of the x vector
print('Cutoff x vector: '.format(cutoff))
n_samples = X.shape[0]
n_bs = int(bootstrap_frac*n_samples) # fraction of samples to be used for every estimator (DT)
for i in range(self.n_estimators):
replace=False)
feats_left = self.random_state.choice(cutoff + 1, int(max_features / 2), replace=False) # inclusive cutoff
feats_right = self.random_state.choice(range(cutoff + 1, X.shape[1]), int(max_features/2), replace=False)
# exclusive cutoff
feats = np.concatenate((feats_left, feats_right)).tolist()
self.feats_used.append(feats)
print('Chosen feature indexes for estimator number {0}: {1}'.format(i, feats))
bs_sample = self.random_state.choice(n_samples,
size=n_bs,
replace=True)
dtc = DecisionTreeClassifier(random_state=self.random_state)
dtc.fit(X[bs_sample][:, feats], y[bs_sample])
self.estimators.append(dtc)
def predict_proba(self, X):
out = np.zeros((X.shape[0], self.n_classes))
for i in range(self.n_estimators):
out += self.estimators[i].predict_proba(X[:, self.feats_used[i]])
return out / self.n_estimators
def predict(self, X):
return self.predict_proba(X).argmax(axis=1)
def score(self, X, y):
return (self.predict(X) == y).mean()
如果您想从另一个 class 派生自己的 class,class 定义需要引用 base class,例如class MyClass(BaseClass)
。 super()
然后引用基数 class.
在您的情况下,基数 class 缺失,Python 假定使用通用 class object
。
从你的问题中不清楚你想要的基础 class 是 DecisionTreeClassifier
还是 RandomForestClassifier
。在任何一种情况下,您都需要更改 __init__
.
中使用的 class 参数
次要:检查 replace=False)
行,它是无效语法。
出于个人目的,我正在尝试修改 sklearn
中的 Random Forest Classifier
class 以实现我的意图。基本上,我正在尝试让我的随机森林树采用一些预定义的特征和案例子样本,因此我正在修改默认值 class。我试图继承原始 sklearn
的所有方法和结构,以便我自定义的随机森林 class 的拟合方法可以采用 sklearn
例如,我希望我的自定义 class 能够采用与原始拟合方法相同的参数:
clf = RandomForestClassifier(n_estimators=10, max_depth=2, random_state=None, max_features=None...)
clf = Customized_RF(n_estimators=10, max_depth=2, random_state=None, max_features=None...)
但我在执行此操作时遇到了一些困难,具体来说,它似乎与 super().__init__
定义相关,我收到以下错误:TypeError: object.__init__() takes no arguments
我遵循 github 存储库作为指南
我是不是做错了什么或遗漏了一些明显的步骤?
这是我目前的方法:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
class Customized_RF:
def __init__(self, n_estimators=10, criterion='gini', max_depth=None, random_state=None):
super().__init__(base_estimator=DecisionTreeClassifier(),
n_estimators=n_estimators,
estimator_params=("criterion", "max_depth")) # Here's where the error happens
self.n_estimators = n_estimators
if random_state is None:
self.random_state = np.random.RandomState()
else:
self.random_state = np.random.RandomState(random_state)
self.criterion = criterion
self.max_depth = max_depth
def fit(self, X, y, max_features=None, cutoff=None, bootstrap_frac=0.8):
"""
max_features: number of features that each estimator will use,
including the fixed features.
bootstrap_frac: the size of bootstrap sample that each estimator will use.
cutoff: index feature number from which starting the features subsampling selection. Subsampling for each tree will be done retrieven a random number of features before and after the cutoff. Assuming that the features matrix is not sorted or altered somehow (sparsed).
"""
self.estimators = []
self.n_classes = np.unique(y).shape[0]
if max_features is None:
max_features = X.shape[1] # if max_features is None select all features for every estimator like original
if cutoff is None:
cutoff = int(X.shape[1] / 2) # pick the central index number of the x vector
print('Cutoff x vector: '.format(cutoff))
n_samples = X.shape[0]
n_bs = int(bootstrap_frac*n_samples) # fraction of samples to be used for every estimator (DT)
for i in range(self.n_estimators):
replace=False)
feats_left = self.random_state.choice(cutoff + 1, int(max_features / 2), replace=False) # inclusive cutoff
feats_right = self.random_state.choice(range(cutoff + 1, X.shape[1]), int(max_features/2), replace=False)
# exclusive cutoff
feats = np.concatenate((feats_left, feats_right)).tolist()
self.feats_used.append(feats)
print('Chosen feature indexes for estimator number {0}: {1}'.format(i, feats))
bs_sample = self.random_state.choice(n_samples,
size=n_bs,
replace=True)
dtc = DecisionTreeClassifier(random_state=self.random_state)
dtc.fit(X[bs_sample][:, feats], y[bs_sample])
self.estimators.append(dtc)
def predict_proba(self, X):
out = np.zeros((X.shape[0], self.n_classes))
for i in range(self.n_estimators):
out += self.estimators[i].predict_proba(X[:, self.feats_used[i]])
return out / self.n_estimators
def predict(self, X):
return self.predict_proba(X).argmax(axis=1)
def score(self, X, y):
return (self.predict(X) == y).mean()
如果您想从另一个 class 派生自己的 class,class 定义需要引用 base class,例如class MyClass(BaseClass)
。 super()
然后引用基数 class.
在您的情况下,基数 class 缺失,Python 假定使用通用 class object
。
从你的问题中不清楚你想要的基础 class 是 DecisionTreeClassifier
还是 RandomForestClassifier
。在任何一种情况下,您都需要更改 __init__
.
次要:检查 replace=False)
行,它是无效语法。