Epoch 和迭代之间的澄清

Clarification between Epoch and iteration

This 答案指出了训练神经网络时 Epoch 和迭代之间的区别。但是,当我在斯坦福 CS231n 课程中查看求解器 API 的源代码时(我假设大多数库也是这种情况),在每次迭代期间,batch_size示例数量 随机选择 替换。因此,不能保证在每个时代都会看到所有的例子吗?

那么一个纪元是不是意味着所有的例子都会在预期中看到?还是我理解错了?

相关源代码:

  def _step(self):
    """
    Make a single gradient update. This is called by train() and should not
    be called manually.
    """
    # Make a minibatch of training data
    num_train = self.X_train.shape[0]
    batch_mask = np.random.choice(num_train, self.batch_size)
    X_batch = self.X_train[batch_mask]
    y_batch = self.y_train[batch_mask]

    # Compute loss and gradient
    loss, grads = self.model.loss(X_batch, y_batch)
    self.loss_history.append(loss)

    # Perform a parameter update
    for p, w in self.model.params.iteritems():
      dw = grads[p]
      config = self.optim_configs[p]
      next_w, next_config = self.update_rule(w, dw, config)
      self.model.params[p] = next_w
      self.optim_configs[p] = next_config

  def train(self):
    """
    Run optimization to train the model.
    """
    num_train = self.X_train.shape[0]
    iterations_per_epoch = max(num_train / self.batch_size, 1)
    num_iterations = self.num_epochs * iterations_per_epoch

    for t in xrange(num_iterations):
      self._step()

      # Maybe print training loss
      if self.verbose and t % self.print_every == 0:
        print '(Iteration %d / %d) loss: %f' % (
               t + 1, num_iterations, self.loss_history[-1])

      # At the end of every epoch, increment the epoch counter and decay the
      # learning rate.
      epoch_end = (t + 1) % iterations_per_epoch == 0
      if epoch_end:
        self.epoch += 1
        for k in self.optim_configs:
          self.optim_configs[k]['learning_rate'] *= self.lr_decay

      # Check train and val accuracy on the first iteration, the last
      # iteration, and at the end of each epoch.
      first_it = (t == 0)
      last_it = (t == num_iterations + 1)
      if first_it or last_it or epoch_end:
        train_acc = self.check_accuracy(self.X_train, self.y_train,
                                        num_samples=1000)
        val_acc = self.check_accuracy(self.X_val, self.y_val)
        self.train_acc_history.append(train_acc)
        self.val_acc_history.append(val_acc)

        if self.verbose:
          print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                 self.epoch, self.num_epochs, train_acc, val_acc)

        # Keep track of the best model
        if val_acc > self.best_val_acc:
          self.best_val_acc = val_acc
          self.best_params = {}
          for k, v in self.model.params.iteritems():
            self.best_params[k] = v.copy()

    # At the end of training swap the best params into the model
    self.model.params = self.best_params

谢谢。

我相信,正如您所说,在斯坦福大学的课程中,他们有效地使用了 "epoch",其含义不太严格 "expected number of times each example is seen during training"。但是,根据我的经验,大多数实现都将训练集中的每个示例一次视为 运行 一个纪元,我想说他们只是为了简单起见而选择了带替换的采样。如果您有大量数据,您很可能看不出差异,但更正确的做法是在没有更多示例之前不放回地进行采样。

例如,您可以查看Keras does the training in its source code;这有点复杂,但重要的一点是 make_batches 被调用以将(可能被洗牌的)示例分成批次,这符合您最初对 "epoch".

的想法