Pytorch 相当于 tensorflow keras StringLookup？

Question

我现在正在使用 pytorch，我缺少一个层：tf.keras.layers.StringLookup 帮助处理 id。有什么解决方法可以用 pytorch 做类似的事情吗？

我正在寻找的功能示例：

vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "a", "b"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
layer(data)

Outputs:
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[1, 3, 4],
       [4, 1, 2]])>

Answer 1

你可以使用库 torchtext，只需使用 python3 -m pip install torchtext

安装它

你可以这样：

from torchtext.vocab import vocab
from collections import OrderedDict

tokens = ['a', 'b', 'c', 'd']
v1 = vocab(OrderedDict([(token, 1) for token in tokens]))
v1.lookup_indices(["a","b","c"])

这是结果：

([0, 1, 2],)

Answer 2

软件包 tornlp，

pip install pytorch-nlp

from torchnlp.encoders import LabelEncoder

data = ["a", "c", "d", "e", "d"]
encoder = LabelEncoder(data, reserved_labels=['unknown'], unknown_index=0)

enl = encoder.batch_encode(data)

print(enl)

tensor([1, 2, 3, 4, 3])

Answer 3

您可以将 Collections.Counter 与 torchtext 的 vocab 对象一起使用，从您的词汇表中构造一个查找函数。然后，您可以轻松地将序列传递给它并将它们的编码作为张量获取：

from torchtext.vocab import vocab
from collections import Counter

tokens = ["a", "b", "c", "d"]
samples = [["a", "c", "d"], ["d", "a", "b"]]

# Build string lookup
lookup = vocab(Counter(tokens))

>>> torch.tensor([lookup(s) for s in samples])
tensor([[0, 2, 3],
        [3, 0, 1]])

Pytorch 相当于 tensorflow keras StringLookup？

Pytorch equivalent of tensorflow keras StringLookup?

python

keras

tensorflow

pytorch