Epoch 和迭代之间的澄清
Clarification between Epoch and iteration
This 答案指出了训练神经网络时 Epoch 和迭代之间的区别。但是,当我在斯坦福 CS231n 课程中查看求解器 API 的源代码时(我假设大多数库也是这种情况),在每次迭代期间,batch_size示例数量 随机选择 替换。因此,不能保证在每个时代都会看到所有的例子吗?
那么一个纪元是不是意味着所有的例子都会在预期中看到?还是我理解错了?
相关源代码:
def _step(self):
"""
Make a single gradient update. This is called by train() and should not
be called manually.
"""
# Make a minibatch of training data
num_train = self.X_train.shape[0]
batch_mask = np.random.choice(num_train, self.batch_size)
X_batch = self.X_train[batch_mask]
y_batch = self.y_train[batch_mask]
# Compute loss and gradient
loss, grads = self.model.loss(X_batch, y_batch)
self.loss_history.append(loss)
# Perform a parameter update
for p, w in self.model.params.iteritems():
dw = grads[p]
config = self.optim_configs[p]
next_w, next_config = self.update_rule(w, dw, config)
self.model.params[p] = next_w
self.optim_configs[p] = next_config
def train(self):
"""
Run optimization to train the model.
"""
num_train = self.X_train.shape[0]
iterations_per_epoch = max(num_train / self.batch_size, 1)
num_iterations = self.num_epochs * iterations_per_epoch
for t in xrange(num_iterations):
self._step()
# Maybe print training loss
if self.verbose and t % self.print_every == 0:
print '(Iteration %d / %d) loss: %f' % (
t + 1, num_iterations, self.loss_history[-1])
# At the end of every epoch, increment the epoch counter and decay the
# learning rate.
epoch_end = (t + 1) % iterations_per_epoch == 0
if epoch_end:
self.epoch += 1
for k in self.optim_configs:
self.optim_configs[k]['learning_rate'] *= self.lr_decay
# Check train and val accuracy on the first iteration, the last
# iteration, and at the end of each epoch.
first_it = (t == 0)
last_it = (t == num_iterations + 1)
if first_it or last_it or epoch_end:
train_acc = self.check_accuracy(self.X_train, self.y_train,
num_samples=1000)
val_acc = self.check_accuracy(self.X_val, self.y_val)
self.train_acc_history.append(train_acc)
self.val_acc_history.append(val_acc)
if self.verbose:
print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
self.epoch, self.num_epochs, train_acc, val_acc)
# Keep track of the best model
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
self.best_params = {}
for k, v in self.model.params.iteritems():
self.best_params[k] = v.copy()
# At the end of training swap the best params into the model
self.model.params = self.best_params
谢谢。
我相信,正如您所说,在斯坦福大学的课程中,他们有效地使用了 "epoch",其含义不太严格 "expected number of times each example is seen during training"。但是,根据我的经验,大多数实现都将训练集中的每个示例一次视为 运行 一个纪元,我想说他们只是为了简单起见而选择了带替换的采样。如果您有大量数据,您很可能看不出差异,但更正确的做法是在没有更多示例之前不放回地进行采样。
例如,您可以查看Keras does the training in its source code;这有点复杂,但重要的一点是 make_batches
被调用以将(可能被洗牌的)示例分成批次,这符合您最初对 "epoch".
的想法
This 答案指出了训练神经网络时 Epoch 和迭代之间的区别。但是,当我在斯坦福 CS231n 课程中查看求解器 API 的源代码时(我假设大多数库也是这种情况),在每次迭代期间,batch_size示例数量 随机选择 替换。因此,不能保证在每个时代都会看到所有的例子吗?
那么一个纪元是不是意味着所有的例子都会在预期中看到?还是我理解错了?
相关源代码:
def _step(self):
"""
Make a single gradient update. This is called by train() and should not
be called manually.
"""
# Make a minibatch of training data
num_train = self.X_train.shape[0]
batch_mask = np.random.choice(num_train, self.batch_size)
X_batch = self.X_train[batch_mask]
y_batch = self.y_train[batch_mask]
# Compute loss and gradient
loss, grads = self.model.loss(X_batch, y_batch)
self.loss_history.append(loss)
# Perform a parameter update
for p, w in self.model.params.iteritems():
dw = grads[p]
config = self.optim_configs[p]
next_w, next_config = self.update_rule(w, dw, config)
self.model.params[p] = next_w
self.optim_configs[p] = next_config
def train(self):
"""
Run optimization to train the model.
"""
num_train = self.X_train.shape[0]
iterations_per_epoch = max(num_train / self.batch_size, 1)
num_iterations = self.num_epochs * iterations_per_epoch
for t in xrange(num_iterations):
self._step()
# Maybe print training loss
if self.verbose and t % self.print_every == 0:
print '(Iteration %d / %d) loss: %f' % (
t + 1, num_iterations, self.loss_history[-1])
# At the end of every epoch, increment the epoch counter and decay the
# learning rate.
epoch_end = (t + 1) % iterations_per_epoch == 0
if epoch_end:
self.epoch += 1
for k in self.optim_configs:
self.optim_configs[k]['learning_rate'] *= self.lr_decay
# Check train and val accuracy on the first iteration, the last
# iteration, and at the end of each epoch.
first_it = (t == 0)
last_it = (t == num_iterations + 1)
if first_it or last_it or epoch_end:
train_acc = self.check_accuracy(self.X_train, self.y_train,
num_samples=1000)
val_acc = self.check_accuracy(self.X_val, self.y_val)
self.train_acc_history.append(train_acc)
self.val_acc_history.append(val_acc)
if self.verbose:
print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
self.epoch, self.num_epochs, train_acc, val_acc)
# Keep track of the best model
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
self.best_params = {}
for k, v in self.model.params.iteritems():
self.best_params[k] = v.copy()
# At the end of training swap the best params into the model
self.model.params = self.best_params
谢谢。
我相信,正如您所说,在斯坦福大学的课程中,他们有效地使用了 "epoch",其含义不太严格 "expected number of times each example is seen during training"。但是,根据我的经验,大多数实现都将训练集中的每个示例一次视为 运行 一个纪元,我想说他们只是为了简单起见而选择了带替换的采样。如果您有大量数据,您很可能看不出差异,但更正确的做法是在没有更多示例之前不放回地进行采样。
例如,您可以查看Keras does the training in its source code;这有点复杂,但重要的一点是 make_batches
被调用以将(可能被洗牌的)示例分成批次,这符合您最初对 "epoch".