FailedPreconditionError: Table not initialized

FailedPreconditionError: Table not initialized

我正在尝试使用以下代码创建 NLP 神经网络:

进口:

import zipfile
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization,Embedding,Input,GlobalAveragePooling1D,Dense
from sklearn.model_selection import train_test_split

下载并解压数据集:

# Download data (same as from Kaggle)
!wget "https://storage.googleapis.com/ztm_tf_course/nlp_getting_started.zip"

# Unzip data
zip_ref = zipfile.ZipFile("nlp_getting_started.zip", "r")
zip_ref.extractall()
zip_ref.close()

将数据拆分为训练和测试数据集:

    train_df = pd.read_csv("train.csv")
    
    train_sentences, val_sentences, train_labels, val_labels = train_test_split(train_df['text'], train_df['target'],train_size=.8)
    average_output_sequence_length = 15

                                           

创建神经网络:

input = Input(shape=(1,),dtype='string')
x = TextVectorization(max_tokens=10000,
                      ngrams=5,
                      standardize='lower_and_strip_punctuation',
                      output_mode='int',
                      output_sequence_length = average_output_sequence_length)(input)
x = Embedding(input_dim=22,embeddings_initializer='uniform',output_dim=128, name= 'embeding_layer')(x)
x = GlobalAveragePooling1D()(x)
output = Dense(1,activation='sigmoid')(x)

model = tf.keras.Model(input,output)

model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'] )

model.fit(x=train_sentences,y=train_labels,epochs=6,validation_data=(val_sentences,val_labels))

不幸的是,当我 运行 代码时,我遇到以下错误:

Epoch 1/6
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-52-991547d73612> in <module>()
     13 model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'] )
     14 
---> 15 model.fit(x=train_sentences,y=train_labels,epochs=6,validation_data=(val_sentences,val_labels))

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57     ctx.ensure_initialized()
     58     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 59                                         inputs, attrs, num_outputs)
     60   except core._NotOkStatusException as e:
     61     if name is not None:

FailedPreconditionError:  Table not initialized.
     [[node model_6/text_vectorization_7/string_lookup_7/None_Lookup/LookupTableFindV2
 (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/preprocessing/index_lookup.py:669)
]] [Op:__inference_train_function_5066]

Errors may have originated from an input operation.

更新: 当我改变我使用功能代码的方式时,我让它工作了:

创建 TextVectorization 函数:

text_vectorization_layer =  TextVectorization(max_tokens=10000,
                                              ngrams=5,
                                              standardize='lower_and_strip_punctuation',
                                              output_mode='int',
                                              output_sequence_length = average_output_sequence_length
                                              )

最后创建神经网络:

input = Input(shape=(1,),dtype='string')
x = text_vectorization_layer(input)
x = Embedding(input_dim=22,embeddings_initializer='uniform',output_dim=128, name= 'embeding_layer')(x)
x = GlobalAveragePooling1D()(x)
output = Dense(1,activation='sigmoid')(x)

model = tf.keras.Model(input,output)

model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'] )

model.fit(x=train_sentences,y=train_labels,epochs=6,validation_data=(val_sentences,val_labels))

问题:

但是,我仍然不明白为什么这些更改可以解决问题?

换句话说,为什么:

x = TextVectorization(input)

x = TextVectorization(max_tokens=10000,
                      ngrams=5,
                      standardize='lower_and_strip_punctuation',
                      output_mode='int',
                      output_sequence_length = average_output_sequence_length)(input)

不等于?

TextVectorization层是预处理层,调用前需要实例化。也正如 docs 解释的那样:

The vocabulary for the layer must be either supplied on construction or learned via adapt().

可以找到另一个重要信息here:

Crucially, these layers are non-trainable. Their state is not set during training; it must be set before training, either by initializing them from a precomputed constant, or by "adapting" them on data

此外,需要注意的是,TextVectorization 层使用底层 StringLookup 层,该层也需要预先初始化。否则,您将获得您发布的 FailedPreconditionError: Table not initialized