试图将一个向量的子集设置为等于另一个向量，但所有内容都设置为 0

Question

我正在尝试进入 Python 进行统计，来自 R 背景。我已经为我一直在使用的数据集设置了交叉验证脚本：

cvIndex = np.remainder(np.arange(dat.shape[0]), 10)
pred = np.arange(dat.shape[0])

for i in range(10):
    #get training and test set
    trFeatures = dat[cvIndex != i, :]
    teFeatures = dat[cvIndex == i, :]
    trY = y[cvIndex != i]

    #fit random forest
    rf = RandomForestClassifier(n_estimators = 500, random_state = 42)
    rf.fit(trFeatures, trY);

    #make and store prediction
    tePred = rf.predict_proba(teFeatures)[:, 1]
    pred[cvIndex == i] = tePred.copy()

print(pred)

其中 returns 全为零的向量。据我所知，这是将一个向量的子集设置为等于另一个向量的正确方法（事实上，我已经尝试用一些虚拟向量来做到这一点，并取得了成功）。另一个明显的潜在问题是 tePred 可能全为零，但例如提取任何特定情况 (i=9) 会给出：

i = 9
#get training and test set
trFeatures = dat[cvIndex != i, :]
teFeatures = dat[cvIndex == i, :]
trY = y[cvIndex != i]

#fit random forest
rf = RandomForestClassifier(n_estimators = 500, random_state = 42)
rf.fit(trFeatures, trY);

#make and store prediction
tePred = rf.predict_proba(teFeatures)[:, 1]

print(tePred[1:50])

[ 0.264  0.034  0.02   0.002  0.     0.014  0.     0.     0.     0.102
  0.14   0.     0.024  0.002  0.     0.002  0.004  0.     0.044  0.     0.382
  0.042  0.     0.004  0.     0.112  0.002  0.074  0.     0.016  0.012
  0.004  0.     0.     0.006  0.002  0.01   0.     0.     0.     0.     0.004
  0.002  0.002  0.044  0.004  0.     0.     0.004]

非常感谢您的帮助。

Answer 1

在我看来像是整数强制转换。 np.arange returns 一个整数数组，然后就地更新它。由于就地操作无法更改数组的 dtype r.h.s。将被转换为 int。如果您输入的是概率，这将全为零。

因为你覆盖了所有 pred 最终你不需要将它初始化为任何东西，所以使用 np.empty(dat.shape[0]) 默认为 float dtype 而不是 np.arange 应该修复你的代码.

两个不相关的旁注：

不需要在循环的最后一行复制 tePred。
Python 像 C 使用从零开始的索引，所以 tePred[1:50] 跳过第一个元素。

试图将一个向量的子集设置为等于另一个向量，但所有内容都设置为 0

Trying to set a subset of a vector to equal another vector, but everything gets set to 0

python

validation

numpy

python-3.x

numpy-ndarray