在 multi class classification 期间忽略填充 class (0)

Question

我有一个问题，给定一组标记，预测另一个标记。对于此任务，我使用 Vocab-size + 1 作为 input_size 的嵌入层。 +1 是因为序列用零填充。例如。假设 Vocab-size 为 10 000 和 max_sequence_len=6，x_train 看起来像：

array([[    0,     0,     0,    11,    22,     4],
       [    29,     6,     12,    29,  1576,    29],
       ...,
       [    0,     0,     67,    8947,  7274,  7019],
       [    0,     0,     0,    15,  10000,    50]])

y_train 由 1 到 10000 之间的整数组成，换句话说，这变成了具有 10000 classes 的多class class化问题。

我的问题：当我在输出层指定输出大小时，我想指定10000，但是模型会预测classes 0- 9999 如果我这样做。另一种方法是将输出大小设置为 10001，但模型可以预测 0-class（填充），这是不需要的。

由于 y_train 从 1 映射到 10000，我可以将其重新映射到 0-9999，但由于它们与输入共享映射，这似乎是一个不必要的解决方法。

编辑：
我意识到，@Andrey 在评论中指出，我可以允许 10001 classes，并简单地向词汇表添加填充，尽管我对预测 0 的网络从不感兴趣.

我如何告诉模型预测标签 1-10000，同时有 10000 classes，而不是 10001？

Answer 1

我会使用以下方法：

import tensorflow as tf
inputs = tf.keras.layers.Input(shape=())
x = tf.keras.layers.Embedding(10001, 512)(inputs) # input shape of full vocab size [10001]
x = tf.keras.layers.Dense(10000, activation='softmax')(x) # training weights based on reduced vocab size [10000]
z = tf.zeros(tf.shape(x)[:-1])[..., tf.newaxis]
x = tf.concat([z, x], axis=-1) # add constant zero on the first position (to avoid predicting 0)
model = tf.keras.Model(inputs=inputs, outputs=x)

inputs = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32) 
labels = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32)
model.compile(loss='sparse_categorical_crossentropy')
model.fit(inputs, labels)

pred = model.predict(inputs) # all zero positions filled by 0 (which is minimum value)

在 multi class classification 期间忽略填充 class (0)

Ignore padding class (0) during multi class classification

nlp

python-3.x

keras

tensorflow

tensorflow2.0