标记化期间没有值出现 (texts_to_sequence)

Question

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
test_sentence1 = "This is the worst flight experience of my life!"
tokenizer = Tokenizer(num_words=5000)
sequences = tokenizer.texts_to_sequences([test_sentence1])
print(sequences)
text = pad_sequences(sequences, maxlen=200)
print(text)

Output: sequences --> [[]]

当我用 text_to_sequence 标记时没有输出。

Answer 1

请试试这个：

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
test_sentence1 = "This is the worst flight experience of my life!"
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts([test_sentence1]) #insert this step in your original code
sequences = tokenizer.texts_to_sequences([test_sentence1])
print(sequences)
text = pad_sequences(sequences, maxlen=200)
print(text)

首先，在将文本转换为序列之前，您需要将分词器对象适配到文本上。

标记化期间没有值出现 (texts_to_sequence)

No values is coming during tokenizing (texts_to_sequence)

machine-learning

python-3.x