tensorflow.keras.preprocessing.text.Tokenizer.texts_to_sequences 的 Numpy 数组给出了奇怪的输出,list([2]) 而不是 [[2]]
Numpy Array of tensorflow.keras.preprocessing.text.Tokenizer.texts_to_sequences is giving weird output, list([2]) instead of [[2]]
tensorflow.keras.preprocessing.text.Tokenizer.texts_to_sequences
的 Numpy 数组为训练标签提供奇怪的输出,如下所示:
(training_label_list[0:10]) = [list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1])]
但正在为验证标签打印普通数组,
(validation_label_list[0:10]) = [[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]]
换句话说,type(training_label_list[0]) = <class 'list'>
但
type(validation_label_list[0]) = <class 'numpy.ndarray'>
因此,在使用 Keras Model.fit
训练模型时,会导致以下错误,
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).
这是 Google Colab 的 Link,可以轻松重现错误。
重现错误的完整代码如下:
!pip install tensorflow==2.1
# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Package for performing Numerical Operations
import numpy as np
Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']
Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))
Val_Labels = Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))
No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]
T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
T_L.append([Each_Label] * Item)
T_L = [item for sublist in T_L for item in sublist]
V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
V_L.append([Each_Label] * Item)
V_L = [item for sublist in V_L for item in sublist]
len(T_L)
len(V_L)
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(Unique_Labels_List)
# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and
# Test Labels
training_label_list = np.array(label_tokenizer.texts_to_sequences(T_L))
validation_label_list = np.array(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
如果有人能建议我如何获得相同格式的训练标签和验证标签,我将不胜感激,因为我在这上面花了很多时间。
您的问题是,当您将训练数据转换为 numpy 数组时,特定的 numpy 数组由列表元素组成,因此出现错误
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported
object type list).
错误比看起来更微妙;有些人报告说他们不得不从 2.1.0 切换回 2.0.0。 What is the difference between Numpy's array() and asarray() functions?
我会亲自尝试这个:
- 使用
training_label_list = np.asarray(label_tokenizer.texts_to_sequences(T_L))
,而不是 np.array
。
- 据此:
List of lists into numpy array
你将不得不强制转换(虽然很奇怪但是这应该有效):
x=[[1,2],[1,2,3],[1]]
y=numpy.array([numpy.array(xi) for xi in x])
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'numpy.ndarray'>
在尝试帮助您解决这个问题时,我发现了一个关于 numpy 转换的有趣事实:
案例 1:
my_list = [[1,2],[2],[3]]
my_numpy_array = np.array(my_list)
print(type(my_numpy_array))
print(type(my_numpy_array[0]))
<class 'numpy.ndarray'>
<class 'list'>
案例 2:
my_list = [[1],[2],[3]]
my_numpy_array = np.array(my_list)
print(type(my_numpy_array))
print(type(my_numpy_array[0]))
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
简短的结论:如果子列表长度不同,显然它们保留为列表而不是转换为 numpy 数组。
我测试了你的代码,现在可以了:
training_label_seq = np.asarray(label_tokenizer.texts_to_sequences(T_L))
training_label_seq = np.array([np.array(training_element) for training_element in training_label_seq])
validation_label_seq = np.asarray(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_seq[0:10]) = {}'.format((training_label_seq[0:10])))
print('(validation_label_seq[0:10]) = {}'.format((validation_label_seq[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
(training_label_seq[0:10]) = [array([1]) array([1]) array([1]) array([1]) array([1]) array([1])
array([1]) array([1]) array([1]) array([1])]
(validation_label_seq[0:10]) = [[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]]
type(training_label_list[0]) = <class 'numpy.ndarray'>
type(validation_label_seq[0]) = <class 'numpy.ndarray'>
将 np.array
替换为 np.hstack
已解决此问题 Stack Overflow Answer对我来说。
现在,正确的输出是
(training_label_seq[0:10]) = [1 1 1 1 1 1 1 1 1 1]
(validation_label_seq[0:10]) = [16 16 16 16 16 16 16 16 16 16]
type(training_label_list[0]) = <class 'numpy.int64'>
type(validation_label_seq[0]) = <class 'numpy.int64'>
Link 的工作代码在这个 Google Colab.
下面提到的是工作代码(以防上面的 link 不起作用):
!pip install tensorflow==2.1
# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Package for performing Numerical Operations
import numpy as np
Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']
Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))
Val_Labels = Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))
No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]
T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
T_L.append([Each_Label] * Item)
T_L = [item for sublist in T_L for item in sublist]
V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
V_L.append([Each_Label] * Item)
V_L = [item for sublist in V_L for item in sublist]
len(T_L)
len(V_L)
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(Unique_Labels_List)
# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and
# Test Labels
training_label_list = np.hstack(label_tokenizer.texts_to_sequences(T_L))
validation_label_list = np.hstack(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
tensorflow.keras.preprocessing.text.Tokenizer.texts_to_sequences
的 Numpy 数组为训练标签提供奇怪的输出,如下所示:
(training_label_list[0:10]) = [list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1])]
但正在为验证标签打印普通数组,
(validation_label_list[0:10]) = [[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]]
换句话说,type(training_label_list[0]) = <class 'list'>
但
type(validation_label_list[0]) = <class 'numpy.ndarray'>
因此,在使用 Keras Model.fit
训练模型时,会导致以下错误,
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).
这是 Google Colab 的 Link,可以轻松重现错误。
重现错误的完整代码如下:
!pip install tensorflow==2.1
# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Package for performing Numerical Operations
import numpy as np
Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']
Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))
Val_Labels = Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))
No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]
T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
T_L.append([Each_Label] * Item)
T_L = [item for sublist in T_L for item in sublist]
V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
V_L.append([Each_Label] * Item)
V_L = [item for sublist in V_L for item in sublist]
len(T_L)
len(V_L)
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(Unique_Labels_List)
# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and
# Test Labels
training_label_list = np.array(label_tokenizer.texts_to_sequences(T_L))
validation_label_list = np.array(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
如果有人能建议我如何获得相同格式的训练标签和验证标签,我将不胜感激,因为我在这上面花了很多时间。
您的问题是,当您将训练数据转换为 numpy 数组时,特定的 numpy 数组由列表元素组成,因此出现错误
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).
错误比看起来更微妙;有些人报告说他们不得不从 2.1.0 切换回 2.0.0。 What is the difference between Numpy's array() and asarray() functions?
我会亲自尝试这个:
- 使用
training_label_list = np.asarray(label_tokenizer.texts_to_sequences(T_L))
,而不是np.array
。 - 据此:
List of lists into numpy array
你将不得不强制转换(虽然很奇怪但是这应该有效):
x=[[1,2],[1,2,3],[1]]
y=numpy.array([numpy.array(xi) for xi in x])
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'numpy.ndarray'>
在尝试帮助您解决这个问题时,我发现了一个关于 numpy 转换的有趣事实:
案例 1:
my_list = [[1,2],[2],[3]]
my_numpy_array = np.array(my_list)
print(type(my_numpy_array))
print(type(my_numpy_array[0]))
<class 'numpy.ndarray'>
<class 'list'>
案例 2:
my_list = [[1],[2],[3]]
my_numpy_array = np.array(my_list)
print(type(my_numpy_array))
print(type(my_numpy_array[0]))
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
简短的结论:如果子列表长度不同,显然它们保留为列表而不是转换为 numpy 数组。
我测试了你的代码,现在可以了:
training_label_seq = np.asarray(label_tokenizer.texts_to_sequences(T_L))
training_label_seq = np.array([np.array(training_element) for training_element in training_label_seq])
validation_label_seq = np.asarray(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_seq[0:10]) = {}'.format((training_label_seq[0:10])))
print('(validation_label_seq[0:10]) = {}'.format((validation_label_seq[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
(training_label_seq[0:10]) = [array([1]) array([1]) array([1]) array([1]) array([1]) array([1])
array([1]) array([1]) array([1]) array([1])]
(validation_label_seq[0:10]) = [[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]
[16]]
type(training_label_list[0]) = <class 'numpy.ndarray'>
type(validation_label_seq[0]) = <class 'numpy.ndarray'>
将 np.array
替换为 np.hstack
已解决此问题 Stack Overflow Answer对我来说。
现在,正确的输出是
(training_label_seq[0:10]) = [1 1 1 1 1 1 1 1 1 1]
(validation_label_seq[0:10]) = [16 16 16 16 16 16 16 16 16 16]
type(training_label_list[0]) = <class 'numpy.int64'>
type(validation_label_seq[0]) = <class 'numpy.int64'>
Link 的工作代码在这个 Google Colab.
下面提到的是工作代码(以防上面的 link 不起作用):
!pip install tensorflow==2.1
# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Package for performing Numerical Operations
import numpy as np
Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']
Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))
Val_Labels = Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))
No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]
T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
T_L.append([Each_Label] * Item)
T_L = [item for sublist in T_L for item in sublist]
V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
V_L.append([Each_Label] * Item)
V_L = [item for sublist in V_L for item in sublist]
len(T_L)
len(V_L)
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(Unique_Labels_List)
# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and
# Test Labels
training_label_list = np.hstack(label_tokenizer.texts_to_sequences(T_L))
validation_label_list = np.hstack(label_tokenizer.texts_to_sequences(V_L))
print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))
print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))