StringLookup 等效于 tensorflow v2.1.0

Question

我正在尝试构建一个类似于此 example 的推荐模型。但是这个例子使用的是 Tensorflow v2.4.0，对于我的工作，我需要使用 v2.1.0。 StringLookUp 层似乎在 v2.1.0 中不存在。在 2.1.0 中是否有任何等效的方法来实现完全相同的事情？我需要在这样的模型中使用它：

user_model = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.StringLookup(
      vocabulary=unique_user_ids, mask_token=None),
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])

Answer 1

您可以使用 tf.strings.to_hash_bucket_strong 将您的字符串哈希到索引，只要您不关心映射顺序。

示例：

import tensorflow as tf

print(tf.__version__)
# 2.1.0

NUM_BUCKETS = 6
EMB_DIM = 128

# five unique user ids
user_ids = tf.constant([u'463', u'112', u'666', u'932', u'878', u'[UNK]'])

# hash to numbers 0-5; 5 numbers for the known unique ids and one
# to account for unknown or empty strings
idxs = tf.strings.to_hash_bucket_strong(
    input=user_ids,
    num_buckets=NUM_BUCKETS,
    key=[1, 2])

print(idxs)
# <tf.Tensor: shape=(6,), dtype=int64, numpy=array([2, 3, 4, 0, 5, 1])>

# And now you can apply your embeddings to the indices you've generated
emb = tf.keras.layers.Embedding(
    input_dim=NUM_BUCKETS,
    output_dim=EMB_DIM)

assert emb(idxs).shape == (6, 128)

StringLookup 等效于 tensorflow v2.1.0

StringLookup equivalent for tensorflow v2.1.0

tensorflow

tensorflow2.0