Unable Train on Tpu Google Colab InternalError: 9 root error(s) found

Question

BATCH SIZE = 64
HEIGHT ,WIDTH = 124,124

Train_data set   = 14906 6 classes.
Validation_datat =  3726 6 classes.

with strategy.scope():
  model = create_model()
  model = complile_model(model,lr=0.0001)
  callbacks = create_callbacks()
epochs = 5
steps_per_epoch  = 14906//BATCH_SIZE
validation_steps = 3726//BATCH_SIZE

history = model.fit(train_dataset,
                    epochs=epochs,
                    steps_per_epoch=steps_per_epoch,
                    validation_data=validation_dataset, 
                    validation_steps=validation_steps)

我正在尝试在 google collab 提供的 TPU 上训练它，但无法这样做，请就此帮助我。附上截图

Answer 1

由于 ImageDataGenerator 还在底层使用 PyFunction，因此它与 TPU 不兼容。相反，您必须使用 tf.data API 来加载图像。本教程介绍了如何操作。

Answer 2

数据集必须 repeat():

def get_dataset(filenames, batch_size):
    dataset = (
        tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTOTUNE)
        .map(parse_tfrecord_fn, num_parallel_calls=AUTOTUNE)
        .map(prepare_sample, num_parallel_calls=AUTOTUNE)
        .repeat()
        .shuffle(batch_size * 10)
        .batch(batch_size)
        .prefetch(AUTOTUNE)
    )
    return dataset

Unable Train on Tpu Google Colab InternalError: 9 root error(s) found

Unable Train on Tpu Google Colab InternalError: 9 root error(s) found

data-science

google-colaboratory