tf.keras.Model 如何区分 tf.data.Dataset 和 TFRecords 中的特征和标签？

Question

我正在尝试从 CSV 数据创建 tfrecords 文件，然后我想使用 tf.data.TFRecordDataset() 从中创建 Dataset，然后提供 Dataset到 tf.keras.Model。（实际上，我正在使用 spark-tensorflow-connector 直接从 Spark Dataframes 创建 tfrecords 文件。）

在fit() method of tf.keras.Model中，参数x是输入数据。可能是：

A tf.data dataset. Should return a tuple of either (inputs, targets) or (inputs, targets, sample_weights).

Q1：这里 tf.keras.Model 知道在哪里分隔特征和标签吗？ 即，特征是 inputs，标签是targets.

但是在某些示例中，我在 tfrecords 文件或 tf.data.Dataset 的构建中看不到任何 "tuple"。例如，在下面的 example、

def convert_to_tfrecord(input_files, output_file):
  """Converts a file to TFRecords."""
  print('Generating %s' % output_file)
  with tf.io.TFRecordWriter(output_file) as record_writer:
    for input_file in input_files:
      data_dict = read_pickle_from_file(input_file)
      data = data_dict[b'data']
      labels = data_dict[b'labels']
      num_entries_in_batch = len(labels)
      for i in range(num_entries_in_batch):
        example = tf.train.Example(features=tf.train.Features(
            feature={
                'image': _bytes_feature(data[i].tobytes()),
                'label': _int64_feature(labels[i])
            }))
        record_writer.write(example.SerializeToString())

...

# Read dataset from tfrecords
dataset = tf.data.TFRecordDataset(tfrecords_files)

Q2：那么这个 tf.keras.models.Sequential() model 如何知道在哪里可以找到特征以及在哪里可以找到标签？ 为什么模型不接受 'label' 作为数据特征？

Answer 1

您需要考虑完整的代码示例，即完成训练的其他文件等。主要是 this file 中的 parse_and_decode 函数，它解析 TFRecords 文件（没有这样的一个解析函数，数据无法解释）和 returns 每个数据的元组 image, label。然后将此函数映射到 create_datasets 函数中的数据集。

因此，提供给 model.fit 的数据集实际上是一个元组数据集，据我所知，如果您提供 tf.data.Dataset 作为 fit 函数的输入——元组 inputs, labels 的数据集。所以第一个将作为模型的输入，第二个作为损失函数的目标。

Answer 2

在这个例子中，

            feature={
                'image': _bytes_feature(data[i].tobytes()),
                'label': _int64_feature(labels[i])
            }))

这里，image 和 label 是一个元组，其中图像具有 byte 类型，标签具有 int64 类型。您可以阅读更多 here。

tf.keras.Model 如何区分 tf.data.Dataset 和 TFRecords 中的特征和标签？

How does tf.keras.Model tell between features and label(s) in tf.data.Dataset and in TFRecords?

tensorflow

tensorflow-datasets

tf.keras

tensorflow2.0