如何获取使用 ImageDataGenerator 为双输入 CNN 模型构建的数据集的标签？

Question

有人可以帮助我获取 validation_set 的标签，当它获取一对图像作为输入并使用 ImageDataGenerator 提供图像批次时，如下所示：

GEN = ImageDataGenerator(rescale = 1./255)

def two_inputs(generator, X1, X2, batch_size, img_height, img_width):
    U = generator.flow_from_directory(X1,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    V = generator.flow_from_directory(X2,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    while True:
        X1i = U.next()
        X2i = V.next()
        yield [X1i[0], X2i[0]], X2i[1]   # Yield both images and their mutual label

在以下情况下，我可以通过 preds = base_model.predict_generator(val_flow) 获得预测，其中 val_flow 是：

val_flow = two_inputs(generator= GEN,
                      X1 = val_05_dirs,
                      X2 = val_06_dirs,
                      batch_size = batch_size,
                      img_height=img_height,
                      img_width=img_width
                      )

我需要使用 fpr, tpr, _ = metrics.roc_curve(LABELS, preds).

获取 fpr 和 tpr

因此，我正在尝试获取正在访问 two val_05_dirs、val_06_dirs 文件夹的完整 val_flow 的 LABELS。

提前致谢

Answer 1

我创建了一个简单的代码示例。您可以调整此示例以适合您的用例。
代码：

GEN = tf.keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)

folder_path = r'C:\Users\Aniket\.keras\datasets\flower_photos'

def two_inputs(generator, X1, X2, batch_size, img_height, img_width):
    U = generator.flow_from_directory(X1,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    V = generator.flow_from_directory(X2,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    while True:
        X1i = U.next()
        X2i = V.next()
        yield [X1i[0], X2i[0]], X2i[1]   # Yield both images and their mutual label
        
custom_gen = two_inputs(GEN, folder_path, folder_path, 1000, 256, 256)

在这里，我的flower_photos目录包含5个子目录，子目录名称作为图像的标签。

输出：

Found 3670 images belonging to 5 classes.

现在遍历生成器。
代码：

val_labels = []
for image, labels in custom_gen:
    val_labels += list(labels.astype('int32'))
    break

注意：循环将运行无限，因为此生成器会根据您的数据无限生成增强图像。

如果您不希望这样，请只为以下对象创建循环运行：

no_of_times = total_samples / batch_size

确保您的批量大小可以被样本总数整除，否则您将在列表末尾添加重复的标签。

您得到的标签将是整数。如果你想要映射，你可以使用：

mapping = U.class_indices
mapping

输出：

{'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3, 'tulips': 4}

如何获取使用 ImageDataGenerator 为双输入 CNN 模型构建的数据集的标签？

how to get the labels of a dataset which is built using ImageDataGenerator for dual input CNN model?

python

generator

keras

tensorflow