两个 100X100 多维数组的随机样本，具有相同的行号。在 python 中

Question

我在 numpy 中有两个多维数组（矩阵），一个是训练集（100,100 维），另一个是 class 标签（100X1 维）我想使用 [=18= 随机抽样] 但不知道如何找出同一行号。来自两个矩阵。

例如，

k=np.random.choice(10,replace=False)
temp_data=data.ix[k]
temp_datat=datat.ix[k]

这是否适用于从我的数组数据和数据中抽取 10 个相同的随机行？

Answer 1

您可以生成一个随机选择并选择同一行吗？

有些事情，比如，

k = np.random.choice(100, 10, replace=True)
row1 = arr1[k]
row2 = arr2[k]

这将适用于 10 行。第一个参数是要从中选择的数字（0-99，包括两者），第二个参数是要选择的元素数。

Answer 2

@Umang Gupta 建议的一种不同方法，如果您还想跟踪那些未选择的方法，可能会有所帮助

# Suppose X_train is your 100 x 100 dataset
# and y_train is your array of labels
idx = np.arange(len(X_train))
np.shuffle(idx)

NUM_SAMPLES = 50
sampled_idxs = idx[:NUM_SAMPLES]
rest_idxs = idx[NUM_SAMPLES:]

X_samples = X_train[sampled_idxs]
X_rest = X_train[rest_idxs]
y_samples = y_train[sampled_idxs]
y_rest = y_train[rest_idxs]

如果您已经安装了 Scikit-Learn，您可以使用 test_train_split

from sklearn.model_selection import test_train_split
X_samples, X_rest, y_samples, y_rest = train_test_split(X_train, y_train,
                                                        train_size=NUM_SAMPLES,
                                                        random_state=123)

Answer 3

您可以利用很棒的 numpy.random.randint() 来实现这一目标。一个完整的工作示例如下：

# toy data
In [29]: train_set = np.random.random_sample((100, 100))

# of course, class labels have to be discrete :)
In [30]: class_label = np.random.random_sample((100, 1))

# number of samples that need to be picked (a.k.a batch_size)
In [31]: num_samples = 5

# generate sample indices in the range of 100
In [32]: sample_idxs = np.unique(np.random.randint(train_set.shape[0], size=num_samples))

In [33]: sample_idxs
Out[33]: array([24, 30, 37, 73, 74])

# index into the array to get the actual entries
In [34]: (train_set[sample_idxs]).shape
Out[34]: (5, 100)

# index into the class_label array to get the corresponding label entries    
In [35]: (class_label[sample_idxs]).shape
Out[35]: (5, 1)

不过可能有一个警告。即使在多个运行之后，您也可能不会对整个数据集进行采样。此外，同一个示例可能会用于多个训练运行.

两个 100X100 多维数组的随机样本，具有相同的行号。在 python 中

random sample of two 100X100 multidimensional arrays, with same row no. in python numpy

python

random

numpy

machine-learning

numpy-ndarray