对象检测 API model_main_tf2.py : Dst 张量未初始化
Object Detection API model_main_tf2.py : Dst tensor is not initialized
我正在尝试将 tensorflow 与 GPU 结合使用,但我无法停止遇到问题。我真的要放弃了...
我在 tensorflow 2.2.0 中使用对象检测 API。所以我试图通过执行文件 model_main_tf2.py :
python model_main_tf2.py --model_dir=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2 --pipeline_config_path=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2/pipeline.config
我有以下输出:
2021-03-18 20:48:33.947464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-18 20:48:33.984880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:33.991147: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:33.993016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:33.993360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:33.995848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:33.997723: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.003189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.017701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.018129: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-03-18 20:48:34.042797: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3000000000 Hz
2021-03-18 20:48:34.060539: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe080000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.060586: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-18 20:48:34.498255: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6cb0aa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.498292: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.498300: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.499612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.500408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:34.500424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:34.500438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:34.500453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:34.500467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.500482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.510455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.510513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.515846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-18 20:48:34.515864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1
2021-03-18 20:48:34.515876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y
2021-03-18 20:48:34.515883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N
2021-03-18 20:48:34.520362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3595 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:8e:00.0, compute capability: 7.0)
2021-03-18 20:48:34.521752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 52 MB memory) -> physical GPU (device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:9c:00.0, compute capability: 7.0)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I0318 20:48:34.543391 140628099540800 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
INFO:tensorflow:Maybe overwriting train_steps: None
I0318 20:48:34.547359 140628099540800 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0318 20:48:34.547507 140628099540800 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.083467 140628099540800 dataset_builder.py:163] Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.085170 140628099540800 dataset_builder.py:80] Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Number of filenames to read: 1
I0318 20:48:36.085289 140628099540800 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0318 20:48:36.085340 140628099540800 dataset_builder.py:88] num_readers has been reduced to 1 to match inputfile shards.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0318 20:48:36.091829 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0318 20:48:36.120102 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0318 20:48:48.361122 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated andwill be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0318 20:48:56.421583 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will beremoved in a future version.
Instructions for updating:
Use `tf.cast` instead.
2021-03-18 20:49:12.346383: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:12.346462: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:12.346493: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256): Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:12.346519: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346547: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024): Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:12.346572: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346596: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346621: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346645: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346670: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346694: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346719: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346743: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346767: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346791: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346816: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346840: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346864: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346888: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347077: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432): TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347101: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347125: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347149: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347176: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:12.347197: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:12.347225: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:12.347247: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:12.347267: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:12.347288: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 4
2021-03-18 20:49:12.347309: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:12.347329: I tensorflow/core/common_runtime/bfc_allocator.cc:995] Summary of in-use Chunks by size:
2021-03-18 20:49:12.347352: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:12.347374: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:12.347395: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:12.347416: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:12.347444: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit: 55115776
InUse: 2048
MaxInUse: 2048
NumAllocs: 6
MaxAllocSize: 1280
2021-03-18 20:49:12.347509: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1986, in execution_mode
yield
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 655, in _next_internal
output_shapes=self._flat_output_shapes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2363, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[{{node RemoteCall}}]] [Op:IteratorGetNext]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "model_main_tf2.py", line 134, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 131, in main
record_summaries=FLAGS.record_summaries)
File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 554, in train_loop
unpad_groundtruth_tensors)
File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 338, in load_fine_tune_checkpoint
features, labels = iter(input_dataset).next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 292, in next
return self.__next__()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 296, in __next__
return self.get_next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 316, in get_next
self._iterators[i].get_next_as_list_static_shapes(new_name))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 1112, in get_next_as_list_static_shapes
return self._iterator.get_next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line581, in get_next
result.append(self._device_iterators[i].get_next())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 741, in get_next
return self._next_internal()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 661, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1989, in execution_mode
executor_new.wait()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[{{node RemoteCall}}]]
2021-03-18 20:49:22.355141: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:22.355187: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:22.355202: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256): Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:22.355211: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355220: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024): Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:22.355229: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355237: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355245: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355253: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355262: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355270: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355278: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355286: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355293: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355301: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355309: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355317: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355325: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355333: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355342: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432): TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355350: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355358: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355367: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355376: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:22.355383: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:22.355396: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:22.355404: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:22.355412: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:22.355418: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 16
2021-03-18 20:49:22.355425: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:22.355433: I tensorflow/core/common_runtime/bfc_allocator.cc:995] Summary of in-use Chunks by size:
2021-03-18 20:49:22.355441: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:22.355449: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:22.355456: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:22.355463: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:22.355475: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit: 55115776
InUse: 2048
MaxInUse: 34816
NumAllocs: 20
MaxAllocSize: 12800
2021-03-18 20:49:22.355503: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________
如果是 GPU 内存问题,我不知道如何解决,我需要你的帮助;)谢谢!
这可能是由于 pipeline.config
文件中的 batch_size
。尝试将其减少到 1,看看是否有效。
我正在尝试将 tensorflow 与 GPU 结合使用,但我无法停止遇到问题。我真的要放弃了...
我在 tensorflow 2.2.0 中使用对象检测 API。所以我试图通过执行文件 model_main_tf2.py :
python model_main_tf2.py --model_dir=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2 --pipeline_config_path=/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/models/faster_rcnn_inception_resnet_v2/pipeline.config
我有以下输出:
2021-03-18 20:48:33.947464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-18 20:48:33.984880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:33.988792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:33.991147: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:33.993016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:33.993360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:33.995848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:33.997723: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.003189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.017701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.018129: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-03-18 20:48:34.042797: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3000000000 Hz
2021-03-18 20:48:34.060539: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe080000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.060586: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-18 20:48:34.498255: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6cb0aa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-18 20:48:34.498292: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.498300: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla V100-PCIE-16GB, Compute Capability 7.0
2021-03-18 20:48:34.499612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:8e:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:9c:00.0 name: Tesla V100-PCIE-16GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-18 20:48:34.500390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.500408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-18 20:48:34.500424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-18 20:48:34.500438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-18 20:48:34.500453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-18 20:48:34.500467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-18 20:48:34.500482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-18 20:48:34.510455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2021-03-18 20:48:34.510513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-18 20:48:34.515846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-18 20:48:34.515864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1
2021-03-18 20:48:34.515876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y
2021-03-18 20:48:34.515883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N
2021-03-18 20:48:34.520362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3595 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:8e:00.0, compute capability: 7.0)
2021-03-18 20:48:34.521752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 52 MB memory) -> physical GPU (device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:9c:00.0, compute capability: 7.0)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I0318 20:48:34.543391 140628099540800 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
INFO:tensorflow:Maybe overwriting train_steps: None
I0318 20:48:34.547359 140628099540800 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0318 20:48:34.547507 140628099540800 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.083467 140628099540800 dataset_builder.py:163] Reading unweighted datasets: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
I0318 20:48:36.085170 140628099540800 dataset_builder.py:80] Reading record datasets for input file: ['/tf/EPhotoCompteur_Object_Detection/workspace/training_demo/annotations/TRAIN.record']
INFO:tensorflow:Number of filenames to read: 1
I0318 20:48:36.085289 140628099540800 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0318 20:48:36.085340 140628099540800 dataset_builder.py:88] num_readers has been reduced to 1 to match inputfile shards.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0318 20:48:36.091829 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0318 20:48:36.120102 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0318 20:48:48.361122 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:96: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated andwill be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0318 20:48:56.421583 140628099540800 deprecation.py:323] From /tf/EPhotoCompteur_Object_Detection/models/research/object_detection/inputs.py:282: to_float (from tensorflow.python.ops.math_ops) is deprecated and will beremoved in a future version.
Instructions for updating:
Use `tf.cast` instead.
2021-03-18 20:49:12.346383: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:12.346462: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:12.346493: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256): Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:12.346519: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346547: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024): Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:12.346572: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346596: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346621: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346645: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346670: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346694: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346719: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346743: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346767: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346791: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346816: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346840: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346864: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.346888: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347077: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432): TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347101: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347125: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347149: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:12.347176: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:12.347197: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:12.347225: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:12.347247: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:12.347267: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:12.347288: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 4
2021-03-18 20:49:12.347309: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:12.347329: I tensorflow/core/common_runtime/bfc_allocator.cc:995] Summary of in-use Chunks by size:
2021-03-18 20:49:12.347352: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:12.347374: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:12.347395: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:12.347416: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:12.347444: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit: 55115776
InUse: 2048
MaxInUse: 2048
NumAllocs: 6
MaxAllocSize: 1280
2021-03-18 20:49:12.347509: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1986, in execution_mode
yield
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 655, in _next_internal
output_shapes=self._flat_output_shapes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2363, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[{{node RemoteCall}}]] [Op:IteratorGetNext]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "model_main_tf2.py", line 134, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 131, in main
record_summaries=FLAGS.record_summaries)
File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 554, in train_loop
unpad_groundtruth_tensors)
File "/tf/EPhotoCompteur_Object_Detection/models/research/object_detection/model_lib_v2.py", line 338, in load_fine_tune_checkpoint
features, labels = iter(input_dataset).next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 292, in next
return self.__next__()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 296, in __next__
return self.get_next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 316, in get_next
self._iterators[i].get_next_as_list_static_shapes(new_name))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 1112, in get_next_as_list_static_shapes
return self._iterator.get_next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line581, in get_next
result.append(self._device_iterators[i].get_next())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 741, in get_next
return self._next_internal()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 661, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1989, in execution_mode
executor_new.wait()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[{{node RemoteCall}}]]
2021-03-18 20:49:22.355141: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_1_bfc) ran out of memory trying to allocate 162.68MiB (rounded to 170581504)
Current allocation summary follows.
2021-03-18 20:49:22.355187: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_1_bfc
2021-03-18 20:49:22.355202: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (256): Total Chunks:3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 48B client-requested in use in bin.
2021-03-18 20:49:22.355211: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (512): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355220: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1024): Total Chunks:1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-03-18 20:49:22.355229: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2048): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355237: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4096): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355245: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8192): Total Chunks:0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355253: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16384): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355262: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (32768): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355270: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (65536): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355278: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (131072): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355286: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (262144): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355293: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (524288): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355301: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (1048576): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355309: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (2097152): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355317: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (4194304): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355325: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (8388608): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355333: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (16777216): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355342: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (33554432): TotalChunks: 1, Chunks in use: 0. 52.56MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355350: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (67108864): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355358: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (134217728): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355367: I tensorflow/core/common_runtime/bfc_allocator.cc:941] Bin (268435456): TotalChunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-03-18 20:49:22.355376: I tensorflow/core/common_runtime/bfc_allocator.cc:957] Bin for 162.68MiB was 128.00MiB, Chunk State:
2021-03-18 20:49:22.355383: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 55115776
2021-03-18 20:49:22.355396: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000000 of size 256 next 1
2021-03-18 20:49:22.355404: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000100 of size 1280 next 2
2021-03-18 20:49:22.355412: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000600 of size 256 next 3
2021-03-18 20:49:22.355418: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7fda08000700 of size 256 next 16
2021-03-18 20:49:22.355425: I tensorflow/core/common_runtime/bfc_allocator.cc:990] Free at 7fda08000800 of size 55113728 next 18446744073709551615
2021-03-18 20:49:22.355433: I tensorflow/core/common_runtime/bfc_allocator.cc:995] Summary of in-use Chunks by size:
2021-03-18 20:49:22.355441: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 3 Chunks of size 256 totalling 768B
2021-03-18 20:49:22.355449: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1280 totalling 1.2KiB
2021-03-18 20:49:22.355456: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 2.0KiB
2021-03-18 20:49:22.355463: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 55115776 memory_limit_: 55115776 available bytes: 0 curr_region_allocation_bytes_: 110231552
2021-03-18 20:49:22.355475: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit: 55115776
InUse: 2048
MaxInUse: 34816
NumAllocs: 20
MaxAllocSize: 12800
2021-03-18 20:49:22.355503: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *___________________________________________________________________________________________________
如果是 GPU 内存问题,我不知道如何解决,我需要你的帮助;)谢谢!
这可能是由于 pipeline.config
文件中的 batch_size
。尝试将其减少到 1,看看是否有效。