多标签图像分类

Question

我自己试过，但无法到达终点，所以在这里发帖，请指导我。

我从事多标签图像分类工作，略有不同的场景。其实我很困惑，我们如何将标签及其属性与 Id 等映射以便我们可以用于训练和测试。

这是我正在处理的代码

import os
import numpy as np
import pandas as pd
from keras.utils import to_categorical
from collections import Counter
from keras.callbacks import Callback
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from matplotlib import pyplot
from tensorflow.keras import backend

def create_tag_mapping(mapping_csv):
    labels = set()
    for i in range(len(mapping_csv)):
        tags = mapping_csv['Labels'][i].split(' ')
        labels.update(tags)
    labels = list(labels)
    labels.sort()
    labels_map = {labels[i]:i for i in range(len(labels))}
    inv_labels_map = {i:labels[i] for i in range(len(labels))}
    return labels_map, inv_labels_map

# create a mapping of filename to tags
def create_file_mapping(mapping_csv):
    mapping = dict()
    for i in range(len(mapping_csv)):
        name, tags = mapping_csv['Id'][i], mapping_csv['Labels'][i]
        mapping[name] = tags.split(' ')
    return mapping

# create a one hot encoding for one list of tags
def one_hot_encode(tags, mapping):
    # create empty vector
    encoding = np.zeros(len(mapping), dtype='uint8')
    # mark 1 for each tag in the vector
    for tag in tags:
        encoding[mapping[tag]] = 1
    return encoding

def load_dataset(path, file_mapping, tag_mapping):
    photos, targets = list(), list()
    # enumerate files in the directory
    for filename in os.listdir(path):
        # load image
        photo = load_img(path + filename, target_size=(760,415))
        # convert to numpy array
        photo = img_to_array(photo, dtype='uint8')
        # get tags
        tags = file_mapping[filename[:-4]]
        # one hot encode tags
        target = one_hot_encode(tags, tag_mapping)
        # store
        photos.append(photo)
        targets.append(target)
    X = np.asarray(photos, dtype='uint8')
    y = np.asarray(targets, dtype='uint8')
    return X, y

trainingLabels = 'labels.csv'
# load the mapping file
mapping_csv = pd.read_csv(trainingLabels)


# create a mapping of tags to integers
tag_mapping, _ = create_tag_mapping(mapping_csv)

# create a mapping of filenames to tag lists
file_mapping = create_file_mapping(mapping_csv)


# load the png images
folder = 'dataset/'

X, y = load_dataset(folder, file_mapping, tag_mapping)
print(X.shape, y.shape)

trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.3, random_state=1)
print(trainX.shape, trainY.shape, testX.shape, testY.shape)

img_x,img_y=760,415
trainX=trainX.reshape(trainX.shape[0], img_x,img_y,3)
testX=testX.reshape(testX.shape[0], img_x,img_y,3)

trainX=trainX.astype('float32')
testX=testX.astype('float32')

trainX /= 255
testX /=255

trainY=to_categorical(trainY,3)
testY=to_categorical(testY,3)
print(trainX.shape)
print(trainY.shape)

model = Sequential()
model.add(Conv2D(32, (5, 5), strides=(1,1), activation='relu', input_shape=(img_x, img_y,3)))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='sigmoid'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history=model.fit(trainX, trainY, batch_size=2, epochs=5, verbose=1)
plt.plot(history.history['acc'])
plt.plot(history.history['loss'])
plt.title('Accuracy and loss')
plt.xlabel('epoch')
plt.ylabel('accuracy/loss')
plt.legend(['Accuracy','loss'],loc='upper left')
plt.show()

score=model.evaluate(testX,testY,verbose=0)
print('test loss',score[0])
print('test accuracy',score[1])

I have attached an image file, that will give a clear picture of my problem.

因为如果我们遵循这些

等他们对每个图像都有多个标签，但就我而言，我有多个标签加上它们的属性。

Answer 1

如果您的目标是预测“L”、“M”和“H[=35” =]'，您使用的损失函数不正确。你应该使用 binary_crossentropy。在这种情况下，目标的形状将为 batch × 3。

categorical_crossentropy 假设输出是一个分类分布：一个总和为 1 的值向量。也就是说，你有多种可能，但只有其中一种可能是正确的。

binary_crossentropy 假设输出向量中的每个数字都是（条件）独立的二进制分布，因此每个数字都在 0 和 1 之间，但它们不一定总和为 1 ，因为很可能所有这些都是真的。

如果您的目标是预测每个标签 1、...、标签 6 的值，那么您应该为每个标签建模分类分布。您有六个标签，每个标签都有 3 个值，因此您需要 18 个数字（logits）。在这种情况下，目标的形状将为 batch × 6 × 3。

model.add(Dense(18, activation='none'))

因为您不希望单个分布超过 18 个值，而是超过 6 × 3 个值，所以您需要先对 logits 进行整形：

model.add(Reshape((6, 3)) model.add(Softmax())

Answer 2

基于以上讨论。这是上述问题的解决方案。正如我提到的，我们总共有 5 个标签，每个标签还有另外三个标签，如 (L, M, H) 我们可以用这种方式进行编码

# create a one hot encoding for one list of tags
def custom_encode(tags, mapping):
    # create empty vector
    encoding=[]
    for tag in tags:
        if tag == 'L':
            encoding.append([1,0,0])
        elif tag == 'M':
            encoding.append([0,1,0])
        else:
            encoding.append([0,0,1])
    return encoding

所以编码后的 y 向量看起来像

**Labels     Tags             Encoded Tags** 
Label1 ----> [L,L,L,M,H] ---> [ [1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
Label2 ----> [L,H,L,M,H] ---> [ [1,0,0], [0,0,1], [1,0,0], [0,1,0], [0,0,1] ]
Label3 ----> [L,M,L,M,H] ---> [ [1,0,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label4 ----> [M,M,L,M,H] ---> [ [0,1,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label5 ----> [M,L,L,M,H] ---> [ [0,1,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]

最后一层会像

 model.add(Dense(15)) #because we have total 5 labels and each has 3 tags so 15 neurons will be on final layer
 model.add(Reshape((5,3))) # each 5 have further 3 tags we need to reshape it
 model.add(Activation('softmax'))

多标签图像分类

Multi-Label Image Classification

neural-network

python-3.x

multilabel-classification

deep-learning

conv-neural-network