Python 随机森林回归器在 nan 值上出错,尽管已删除
Python Random Forest Regressor Erroring on nan values, despite removal
我有一个干净的数据集,nan 值为零,但我继续在回归量上遇到相同的错误。我的相框叫做 new_player_data
我试过
list(new_player_data.where(new_player_data.isna()).count() > 0)
哪个returns
[假,
错误的,
错误的,
错误的,
错误的,
假]
大约两百次。我认为可能有一些浮动太大。我试过这个:
for i in new_player_data.columns[:]:
if new_player_data[i].dtype == float:
new_player_data[i] = round(new_player_data[i],2)
无论我得到什么:
regressor.fit(X_train, y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-327-3a664017ddaa> in <module>
----> 1 regressor.fit(X_train, y_train)
/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
248
249 # Validate or convert input data
--> 250 X = check_array(X, accept_sparse="csc", dtype=DTYPE)
251 y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
252 if sample_weight is not None:
/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
571 if force_all_finite:
572 _assert_all_finite(array,
--> 573 allow_nan=force_all_finite == 'allow-nan')
574
575 shape_repr = _shape_repr(array.shape)
/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
54 not allow_nan and not np.isfinite(X).all()):
55 type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56 raise ValueError(msg_err.format(type_err, X.dtype))
57
58
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
关于我还可以在这里检查的内容有什么想法吗?一败涂地
归功于@gmds 得出的答案,原来是 inf 值,通过
找到
infs = np.where(np.isinf(new_player_data))
infs
out: (array([ 261, 1162, 1190, 1339, 1365, 1451, 1656, 1736, 1878, 1954, 2189,
2299, 2741, 3137, 3162, 3799, 3821, 3881, 4305]),
array([ 3, 43, 43, 3, 43, 43, 43, 43, 43, 43, 23, 43, 3, 43, 43, 43, 3,
23, 43]))
然后我就这样换了
pd.options.mode.use_inf_as_na = True
infs = np.where(np.isinf(new_player_data))
infs
out: (array([], dtype=int64), array([], dtype=int64))
感谢 gmds 的定向帮助!
我有一个干净的数据集,nan 值为零,但我继续在回归量上遇到相同的错误。我的相框叫做 new_player_data
我试过
list(new_player_data.where(new_player_data.isna()).count() > 0)
哪个returns
[假, 错误的, 错误的, 错误的, 错误的, 假]
大约两百次。我认为可能有一些浮动太大。我试过这个:
for i in new_player_data.columns[:]:
if new_player_data[i].dtype == float:
new_player_data[i] = round(new_player_data[i],2)
无论我得到什么:
regressor.fit(X_train, y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-327-3a664017ddaa> in <module>
----> 1 regressor.fit(X_train, y_train)
/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
248
249 # Validate or convert input data
--> 250 X = check_array(X, accept_sparse="csc", dtype=DTYPE)
251 y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
252 if sample_weight is not None:
/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
571 if force_all_finite:
572 _assert_all_finite(array,
--> 573 allow_nan=force_all_finite == 'allow-nan')
574
575 shape_repr = _shape_repr(array.shape)
/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
54 not allow_nan and not np.isfinite(X).all()):
55 type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56 raise ValueError(msg_err.format(type_err, X.dtype))
57
58
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
关于我还可以在这里检查的内容有什么想法吗?一败涂地
归功于@gmds 得出的答案,原来是 inf 值,通过
找到infs = np.where(np.isinf(new_player_data))
infs
out: (array([ 261, 1162, 1190, 1339, 1365, 1451, 1656, 1736, 1878, 1954, 2189,
2299, 2741, 3137, 3162, 3799, 3821, 3881, 4305]),
array([ 3, 43, 43, 3, 43, 43, 43, 43, 43, 43, 23, 43, 3, 43, 43, 43, 3,
23, 43]))
然后我就这样换了
pd.options.mode.use_inf_as_na = True
infs = np.where(np.isinf(new_player_data))
infs
out: (array([], dtype=int64), array([], dtype=int64))
感谢 gmds 的定向帮助!