如何在 Tensorflow 2.x 中打印准确性和其他指标?
How to print Accuracy and other metrics in Tensorflow 2.x?
我正在使用自定义数据集进行对象检测项目。我的问题是,真的很难理解我应该如何以及在何处进行更改以评估我的训练集(准确性、mAP 指标)。我在 colab 上使用 tensorflow 2.3.0,现在我只得到损失值,如下图所示:
.
此外,这是我的张量板的图片:
.
为了训练模型,我使用 model_main_tf2.py、
!python /content/gdrive/My\ Drive/models/research/object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir={model_dir} \
--alsologtostderr \
--num_train_steps={num_steps} \
--sample_1_of_n_eval_examples=10 \
--eval_training_data=True \
--sample_1_of_n_eval_on_train_examples=10 \
--num_eval_steps={num_eval_steps}
在我的配置文件中: eval_config { metrics_set: "coco_detection_metrics" use_moving_averages: false }
我已经尝试了各种方法,就像 eval.py(我读到它适用于 tensorflow 1.x),但是我遇到了很多错误,或者就像对象检测中的其他脚本一样来自 github、object_detection(repository).
的存储库
目前最重要的是准确性。我发现loss大概定义在model_lib_v2.py at 845-858 line:
eval_metrics = {}
for evaluator in evaluators:
eval_metrics.update(evaluator.evaluate())
for loss_key in loss_metrics:
eval_metrics[loss_key] = loss_metrics[loss_key].result()
eval_metrics = {str(k): v for k, v in eval_metrics.items()}
tf.logging.info('Eval metrics at step %d', global_step)
for k in eval_metrics:
tf.compat.v2.summary.scalar(k, eval_metrics[k], step=global_step)
tf.logging.info('\t+ %s: %f', k, eval_metrics[k])
return eval_metrics
但我不知道如何更改代码以增加准确性。
如果有帮助,我使用 ssd_mobilenet_v2_fpnlite_640x640 模型和 gdrive 来加载数据和 运行 脚本。
更新:
我使用的配置文件如下:
model {
ssd {
num_classes: 18
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "ssd_mobilenet_v2_fpn_keras"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
use_depthwise: true
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 3
max_level: 7
additional_layer_depth: 128
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.75
unmatched_threshold: 0.25
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
depth: 128
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.599999904632568
share_prediction_tower: true
use_depthwise: true
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 16
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.07999999821186066
total_steps: 50000
warmup_learning_rate: 0.026666000485420227
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
fine_tune_checkpoint: "/content/gdrive/My Drive/models/research/deploy/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0"
num_steps: 20000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 1
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
tf_record_input_reader {
input_path: "/content/gdrive/My Drive/models/research/object_detection/data/train.record"
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/content/gdrive/My Drive/models/research/object_detection/data/test.record"
}
}
尝试使用这段代码后:
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--checkpoint_dir='object_detection/training' \
--alsologtostderr
我最终设法得到了一些不同于零的结果(也许关于检查点的某事发生了变化)。结果:
2020-09-20 10:11:00.899773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0920 10:11:02.925597 140679676843904 model_lib_v2.py:925] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0920 10:11:02.925838 140679676843904 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0920 10:11:02.925928 140679676843904 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0920 10:11:02.926012 140679676843904 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0920 10:11:02.926136 140679676843904 model_lib_v2.py:940] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
2020-09-20 10:11:02.934837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-20 10:11:02.971958: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.972512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:02.972554: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:02.974022: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:02.975587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:02.975923: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:02.980298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:02.981636: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:02.985271: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:02.985387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.985948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.986434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:02.991809: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2020-09-20 10:11:02.991994: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:02.992021: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-20 10:11:03.103553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:03.104245: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-09-20 10:11:03.104426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:03.105024: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.105075: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:03.105097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:03.105121: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:03.105140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:03.105158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:03.105181: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:03.105258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.105829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.106297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:03.106340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.719107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-20 10:11:03.719168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-20 10:11:03.719181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-20 10:11:03.719388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720576: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-09-20 10:11:03.720625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13962 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0920 10:11:03.765423 140679676843904 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0920 10:11:03.768426 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0920 10:11:03.799010 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0920 10:11:07.302673 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:08.378596 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:11:10.765872 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
INFO:tensorflow:Found new checkpoint at object_detection/training/ckpt-7
I0920 10:11:10.769416 140679676843904 checkpoint_utils.py:134] Found new checkpoint at object_detection/training/ckpt-7
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
W0920 10:11:10.791586 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:51.835231 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2020-09-20 10:11:56.921874: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:57.287284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
INFO:tensorflow:Finished eval step 0
I0920 10:11:59.687197 140679676843904 model_lib_v2.py:799] Finished eval step 0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
W0920 10:11:59.819776 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
INFO:tensorflow:Performing evaluation on 43 images.
I0920 10:12:04.423586 140679676843904 coco_evaluation.py:282] Performing evaluation on 43 images.
creating index...
index created!
INFO:tensorflow:Loading and preparing annotation results...
I0920 10:12:04.423980 140679676843904 coco_tools.py:116] Loading and preparing annotation results...
INFO:tensorflow:DONE (t=0.00s)
I0920 10:12:04.426340 140679676843904 coco_tools.py:138] DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.17s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.728
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.806
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.728
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.784
INFO:tensorflow:Eval metrics at step 5000
I0920 10:12:04.649628 140679676843904 model_lib_v2.py:853] Eval metrics at step 5000
INFO:tensorflow: + DetectionBoxes_Precision/mAP: 0.727510
I0920 10:12:04.672720 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP: 0.727510
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
I0920 10:12:04.675546 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
I0920 10:12:04.676922 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
INFO:tensorflow: + DetectionBoxes_Precision/mAP (small): -1.000000
I0920 10:12:04.678196 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (medium): -1.000000
I0920 10:12:04.679378 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (medium): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (large): 0.727510
I0920 10:12:04.680430 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (large): 0.727510
INFO:tensorflow: + DetectionBoxes_Recall/AR@1: 0.783721
I0920 10:12:04.681638 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@1: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@10: 0.783721
I0920 10:12:04.682775 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@10: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@100: 0.783721
I0920 10:12:04.683973 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (small): -1.000000
I0920 10:12:04.685043 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (medium): -1.000000
I0920 10:12:04.686104 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (medium): -1.000000
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (large): 0.783721
I0920 10:12:04.687381 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (large): 0.783721
INFO:tensorflow: + Loss/localization_loss: 0.089791
I0920 10:12:04.688443 140679676843904 model_lib_v2.py:856] + Loss/localization_loss: 0.089791
INFO:tensorflow: + Loss/classification_loss: 0.336598
I0920 10:12:04.689506 140679676843904 model_lib_v2.py:856] + Loss/classification_loss: 0.336598
INFO:tensorflow: + Loss/regularization_loss: 0.117549
I0920 10:12:04.690544 140679676843904 model_lib_v2.py:856] + Loss/regularization_loss: 0.117549
INFO:tensorflow: + Loss/total_loss: 0.543938
I0920 10:12:04.691550 140679676843904 model_lib_v2.py:856] + Loss/total_loss: 0.543938
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:16:10.791572 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
Traceback (most recent call last):
File "object_detection/model_main_tf2.py", line 114, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "object_detection/model_main_tf2.py", line 89, in main
wait_interval=300, timeout=FLAGS.eval_timeout)
File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 966, in eval_continuously
checkpoint_dir, timeout=timeout, min_interval_secs=wait_interval):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 184, in checkpoints_iterator
checkpoint_dir, checkpoint_path, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 132, in wait_for_new_checkpoint
time.sleep(seconds_to_sleep)
KeyboardInterrupt
但是,tensorboard 尚未更新这些指标(AP、AR 等)。可能我做错了。关于准确性,我仍然没有取得任何进展,所以如果有人知道一些事情,那将非常有帮助。谢谢!
您似乎已经找到了问题的解决方案。太好了,但您只需要再等一会儿,评估指标就会显示在您的张量板上。
从默认设置开始,逐步进行培训。
第一个 运行 训练为 dameon:
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr &
然后 运行 在不同的控制台或 shell 上进行评估。评估将自动选择较新的检查点(默认情况下,这是每 1000 步一次。因此您必须等待训练达到第 1000 步 + 等待评估完成 1 个时期或 test.record 中的图像数量)
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr \
--checkpoint_dir='object_detection/training'
这是您可以用来可视化的示例配置。
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1
num_visualizations: 10
max_num_boxes_to_visualize: 5
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}
eval_input_reader: {
label_map_path: "path/to/label_map.pbtxt"
shuffle: true
queue_capacity: 100 #depending on your GPU/TPU/CPU
num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]
tf_record_input_reader {
input_path: "path/to/test.record"
}
}
这里有几点需要注意:TF20 issue
batch_size: 1 #This has to be 1. TF2 throws errors
num_epochs: 1 #provide this as 1
eval_interval_secs: #something based on your dataset & gpu config. Default is 300
num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]
最重要的是,等待。等待至少两次评估完成(通常是第一次,它基本上 运行s 用于第 0 步,这是无用的(因为你的训练必须达到检查点(默认如上文所述,第 1000 次))
我正在使用自定义数据集进行对象检测项目。我的问题是,真的很难理解我应该如何以及在何处进行更改以评估我的训练集(准确性、mAP 指标)。我在 colab 上使用 tensorflow 2.3.0,现在我只得到损失值,如下图所示:
此外,这是我的张量板的图片:
为了训练模型,我使用 model_main_tf2.py、
!python /content/gdrive/My\ Drive/models/research/object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir={model_dir} \
--alsologtostderr \
--num_train_steps={num_steps} \
--sample_1_of_n_eval_examples=10 \
--eval_training_data=True \
--sample_1_of_n_eval_on_train_examples=10 \
--num_eval_steps={num_eval_steps}
在我的配置文件中: eval_config { metrics_set: "coco_detection_metrics" use_moving_averages: false }
我已经尝试了各种方法,就像 eval.py(我读到它适用于 tensorflow 1.x),但是我遇到了很多错误,或者就像对象检测中的其他脚本一样来自 github、object_detection(repository).
的存储库目前最重要的是准确性。我发现loss大概定义在model_lib_v2.py at 845-858 line:
eval_metrics = {}
for evaluator in evaluators:
eval_metrics.update(evaluator.evaluate())
for loss_key in loss_metrics:
eval_metrics[loss_key] = loss_metrics[loss_key].result()
eval_metrics = {str(k): v for k, v in eval_metrics.items()}
tf.logging.info('Eval metrics at step %d', global_step)
for k in eval_metrics:
tf.compat.v2.summary.scalar(k, eval_metrics[k], step=global_step)
tf.logging.info('\t+ %s: %f', k, eval_metrics[k])
return eval_metrics
但我不知道如何更改代码以增加准确性。
如果有帮助,我使用 ssd_mobilenet_v2_fpnlite_640x640 模型和 gdrive 来加载数据和 运行 脚本。
更新: 我使用的配置文件如下:
model {
ssd {
num_classes: 18
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "ssd_mobilenet_v2_fpn_keras"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
use_depthwise: true
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 3
max_level: 7
additional_layer_depth: 128
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.75
unmatched_threshold: 0.25
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
depth: 128
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.599999904632568
share_prediction_tower: true
use_depthwise: true
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 16
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.07999999821186066
total_steps: 50000
warmup_learning_rate: 0.026666000485420227
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
fine_tune_checkpoint: "/content/gdrive/My Drive/models/research/deploy/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0"
num_steps: 20000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 1
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
tf_record_input_reader {
input_path: "/content/gdrive/My Drive/models/research/object_detection/data/train.record"
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/content/gdrive/My Drive/models/research/object_detection/data/test.record"
}
}
尝试使用这段代码后:
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--checkpoint_dir='object_detection/training' \
--alsologtostderr
我最终设法得到了一些不同于零的结果(也许关于检查点的某事发生了变化)。结果:
2020-09-20 10:11:00.899773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0920 10:11:02.925597 140679676843904 model_lib_v2.py:925] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0920 10:11:02.925838 140679676843904 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0920 10:11:02.925928 140679676843904 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0920 10:11:02.926012 140679676843904 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0920 10:11:02.926136 140679676843904 model_lib_v2.py:940] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
2020-09-20 10:11:02.934837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-20 10:11:02.971958: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.972512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:02.972554: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:02.974022: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:02.975587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:02.975923: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:02.980298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:02.981636: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:02.985271: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:02.985387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.985948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.986434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:02.991809: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2020-09-20 10:11:02.991994: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:02.992021: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-20 10:11:03.103553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:03.104245: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-09-20 10:11:03.104426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:03.105024: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.105075: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:03.105097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:03.105121: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:03.105140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:03.105158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:03.105181: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:03.105258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.105829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.106297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:03.106340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.719107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-20 10:11:03.719168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-20 10:11:03.719181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-20 10:11:03.719388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720576: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-09-20 10:11:03.720625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13962 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0920 10:11:03.765423 140679676843904 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0920 10:11:03.768426 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0920 10:11:03.799010 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0920 10:11:07.302673 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:08.378596 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:11:10.765872 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
INFO:tensorflow:Found new checkpoint at object_detection/training/ckpt-7
I0920 10:11:10.769416 140679676843904 checkpoint_utils.py:134] Found new checkpoint at object_detection/training/ckpt-7
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
W0920 10:11:10.791586 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:51.835231 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2020-09-20 10:11:56.921874: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:57.287284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
INFO:tensorflow:Finished eval step 0
I0920 10:11:59.687197 140679676843904 model_lib_v2.py:799] Finished eval step 0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
W0920 10:11:59.819776 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
INFO:tensorflow:Performing evaluation on 43 images.
I0920 10:12:04.423586 140679676843904 coco_evaluation.py:282] Performing evaluation on 43 images.
creating index...
index created!
INFO:tensorflow:Loading and preparing annotation results...
I0920 10:12:04.423980 140679676843904 coco_tools.py:116] Loading and preparing annotation results...
INFO:tensorflow:DONE (t=0.00s)
I0920 10:12:04.426340 140679676843904 coco_tools.py:138] DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.17s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.728
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.999
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.806
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.728
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.784
INFO:tensorflow:Eval metrics at step 5000
I0920 10:12:04.649628 140679676843904 model_lib_v2.py:853] Eval metrics at step 5000
INFO:tensorflow: + DetectionBoxes_Precision/mAP: 0.727510
I0920 10:12:04.672720 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP: 0.727510
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
I0920 10:12:04.675546 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
INFO:tensorflow: + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
I0920 10:12:04.676922 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
INFO:tensorflow: + DetectionBoxes_Precision/mAP (small): -1.000000
I0920 10:12:04.678196 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (medium): -1.000000
I0920 10:12:04.679378 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (medium): -1.000000
INFO:tensorflow: + DetectionBoxes_Precision/mAP (large): 0.727510
I0920 10:12:04.680430 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Precision/mAP (large): 0.727510
INFO:tensorflow: + DetectionBoxes_Recall/AR@1: 0.783721
I0920 10:12:04.681638 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@1: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@10: 0.783721
I0920 10:12:04.682775 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@10: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@100: 0.783721
I0920 10:12:04.683973 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100: 0.783721
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (small): -1.000000
I0920 10:12:04.685043 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (small): -1.000000
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (medium): -1.000000
I0920 10:12:04.686104 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (medium): -1.000000
INFO:tensorflow: + DetectionBoxes_Recall/AR@100 (large): 0.783721
I0920 10:12:04.687381 140679676843904 model_lib_v2.py:856] + DetectionBoxes_Recall/AR@100 (large): 0.783721
INFO:tensorflow: + Loss/localization_loss: 0.089791
I0920 10:12:04.688443 140679676843904 model_lib_v2.py:856] + Loss/localization_loss: 0.089791
INFO:tensorflow: + Loss/classification_loss: 0.336598
I0920 10:12:04.689506 140679676843904 model_lib_v2.py:856] + Loss/classification_loss: 0.336598
INFO:tensorflow: + Loss/regularization_loss: 0.117549
I0920 10:12:04.690544 140679676843904 model_lib_v2.py:856] + Loss/regularization_loss: 0.117549
INFO:tensorflow: + Loss/total_loss: 0.543938
I0920 10:12:04.691550 140679676843904 model_lib_v2.py:856] + Loss/total_loss: 0.543938
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:16:10.791572 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
Traceback (most recent call last):
File "object_detection/model_main_tf2.py", line 114, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "object_detection/model_main_tf2.py", line 89, in main
wait_interval=300, timeout=FLAGS.eval_timeout)
File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 966, in eval_continuously
checkpoint_dir, timeout=timeout, min_interval_secs=wait_interval):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 184, in checkpoints_iterator
checkpoint_dir, checkpoint_path, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 132, in wait_for_new_checkpoint
time.sleep(seconds_to_sleep)
KeyboardInterrupt
但是,tensorboard 尚未更新这些指标(AP、AR 等)。可能我做错了。关于准确性,我仍然没有取得任何进展,所以如果有人知道一些事情,那将非常有帮助。谢谢!
您似乎已经找到了问题的解决方案。太好了,但您只需要再等一会儿,评估指标就会显示在您的张量板上。
从默认设置开始,逐步进行培训。
第一个 运行 训练为 dameon:
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr &
然后 运行 在不同的控制台或 shell 上进行评估。评估将自动选择较新的检查点(默认情况下,这是每 1000 步一次。因此您必须等待训练达到第 1000 步 + 等待评估完成 1 个时期或 test.record 中的图像数量)
!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr \
--checkpoint_dir='object_detection/training'
这是您可以用来可视化的示例配置。
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1
num_visualizations: 10
max_num_boxes_to_visualize: 5
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}
eval_input_reader: {
label_map_path: "path/to/label_map.pbtxt"
shuffle: true
queue_capacity: 100 #depending on your GPU/TPU/CPU
num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]
tf_record_input_reader {
input_path: "path/to/test.record"
}
}
这里有几点需要注意:TF20 issue
batch_size: 1 #This has to be 1. TF2 throws errors
num_epochs: 1 #provide this as 1
eval_interval_secs: #something based on your dataset & gpu config. Default is 300
num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]
最重要的是,等待。等待至少两次评估完成(通常是第一次,它基本上 运行s 用于第 0 步,这是无用的(因为你的训练必须达到检查点(默认如上文所述,第 1000 次))