如何获取使用 ImageDataGenerator 为双输入 CNN 模型构建的数据集的标签?

how to get the labels of a dataset which is built using ImageDataGenerator for dual input CNN model?

有人可以帮助我获取 validation_set 的标签,当它获取一对图像作为输入并使用 ImageDataGenerator 提供图像批次时,如下所示:

GEN = ImageDataGenerator(rescale = 1./255)

def two_inputs(generator, X1, X2, batch_size, img_height, img_width):
    U = generator.flow_from_directory(X1,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    V = generator.flow_from_directory(X2,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    while True:
        X1i = U.next()
        X2i = V.next()
        yield [X1i[0], X2i[0]], X2i[1]   # Yield both images and their mutual label

在以下情况下,我可以通过 preds = base_model.predict_generator(val_flow) 获得预测,其中 val_flow 是:

val_flow = two_inputs(generator= GEN,
                      X1 = val_05_dirs,
                      X2 = val_06_dirs,
                      batch_size = batch_size,
                      img_height=img_height,
                      img_width=img_width
                      )

我需要使用 fpr, tpr, _ = metrics.roc_curve(LABELS, preds).

获取 fprtpr

因此,我正在尝试获取正在访问 two val_05_dirsval_06_dirs 文件夹的完整 val_flowLABELS

提前致谢

我创建了一个简单的代码示例。您可以调整此示例以适合您的用例。
代码:

GEN = tf.keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)

folder_path = r'C:\Users\Aniket\.keras\datasets\flower_photos'

def two_inputs(generator, X1, X2, batch_size, img_height, img_width):
    U = generator.flow_from_directory(X1,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    V = generator.flow_from_directory(X2,
                                            target_size=(img_height, img_width),
                                            batch_size=batch_size,
                                            shuffle= False,
                                            class_mode='binary',
                                            seed=1221)
    while True:
        X1i = U.next()
        X2i = V.next()
        yield [X1i[0], X2i[0]], X2i[1]   # Yield both images and their mutual label
        
custom_gen = two_inputs(GEN, folder_path, folder_path, 1000, 256, 256)

在这里,我的flower_photos目录包含5个子目录,子目录名称作为图像的标签。

输出:

Found 3670 images belonging to 5 classes.

现在遍历生成器。
代码:

val_labels = []
for image, labels in custom_gen:
    val_labels += list(labels.astype('int32'))
    break

注意:循环将 运行 无限,因为此生成器会根据您的数据无限生成增强图像。

如果您不希望这样,请只为以下对象创建循环 运行:

no_of_times = total_samples / batch_size

确保您的批量大小可以被样本总数整除,否则您将在列表末尾添加重复的标签。

您得到的标签将是整数。 如果你想要映射,你可以使用:

mapping = U.class_indices
mapping

输出:

{'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3, 'tulips': 4}