Solving SVHN using Tensorflow Error: "Resource exhausted: OOM when allocating tensor.."
Solving SVHN using Tensorflow Error: "Resource exhausted: OOM when allocating tensor.."
我试图使用此处提供的卷积神经网络解决 "SVHN" 数据集分类问题 https://www.tensorflow.org/versions/0.6.0/tutorials/deep_cnn/index.html#convolutional-neural-networks
我是这样读取数据并格式化的:
read_input = scipy.io.loadmat('data/train_32x32.mat')
converted_label = tf.cast(read_input['y'], tf.int32)
converted_image = tf.cast(read_input['X'], tf.float32)
reshaped_image = tf.transpose(converted_image, [3, 0, 1, 2])
在_generate_image_and_label_batch
函数中,我稍微修改了代码,因为train_32X32.mat
和text_32X32.mat
中的输入图像已经是4D格式了。
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=FLAGS.batch_size,
enqueue_many=True,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * FLAGS.batch_size,
min_after_dequeue=min_queue_examples)
我遇到了这些错误:
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c180015a0 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
[[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c280ea810 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
[[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
Killed
如果我在任何逻辑上有任何错误,请告诉我。
谢谢
莎拉
请注意,您的数据包含 2*32*3*73257 个条目,浮点数为 900 MB,双精度数为 1800MB。因此,您在 read_input['X']
处分配了 1800MB,然后 TF 将其转换为张量以馈入 cast
,这是另外 900MB。 tf.cast
的输出是另一个900MB的张量,transpose
的输出是另一个900MB的张量。
因此您可能需要 4.5GB 的内存才能运行。
一般来说,这种方法(转换为 Constant
节点)仅推荐用于 "small" 问题。有一个 2GB 的硬性限制,您可以将其放入常量中,但如果您移动到 GPU,甚至更小的值(即 >100MB)也可能会导致问题(示例 here)
另一种可扩展的方法是使用 Cifar 示例中的输入管道
我试图使用此处提供的卷积神经网络解决 "SVHN" 数据集分类问题 https://www.tensorflow.org/versions/0.6.0/tutorials/deep_cnn/index.html#convolutional-neural-networks
我是这样读取数据并格式化的:
read_input = scipy.io.loadmat('data/train_32x32.mat')
converted_label = tf.cast(read_input['y'], tf.int32)
converted_image = tf.cast(read_input['X'], tf.float32)
reshaped_image = tf.transpose(converted_image, [3, 0, 1, 2])
在_generate_image_and_label_batch
函数中,我稍微修改了代码,因为train_32X32.mat
和text_32X32.mat
中的输入图像已经是4D格式了。
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=FLAGS.batch_size,
enqueue_many=True,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * FLAGS.batch_size,
min_after_dequeue=min_queue_examples)
我遇到了这些错误:
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 4
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c180015a0 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
[[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
W tensorflow/core/kernels/cast_op.cc:66] Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x7f1c280ea810 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 32 } dim { size: 32 } dim { size: 3 } dim { size: 73257 }
[[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/cpu:0"](Cast_1/x)]]
Killed
如果我在任何逻辑上有任何错误,请告诉我。
谢谢
莎拉
请注意,您的数据包含 2*32*3*73257 个条目,浮点数为 900 MB,双精度数为 1800MB。因此,您在 read_input['X']
处分配了 1800MB,然后 TF 将其转换为张量以馈入 cast
,这是另外 900MB。 tf.cast
的输出是另一个900MB的张量,transpose
的输出是另一个900MB的张量。
因此您可能需要 4.5GB 的内存才能运行。
一般来说,这种方法(转换为 Constant
节点)仅推荐用于 "small" 问题。有一个 2GB 的硬性限制,您可以将其放入常量中,但如果您移动到 GPU,甚至更小的值(即 >100MB)也可能会导致问题(示例 here)
另一种可扩展的方法是使用 Cifar 示例中的输入管道