如何在 Tensorflow 2.x 中打印准确性和其他指标？

Question

我正在使用自定义数据集进行对象检测项目。我的问题是，真的很难理解我应该如何以及在何处进行更改以评估我的训练集（准确性、mAP 指标）。我在 colab 上使用 tensorflow 2.3.0，现在我只得到损失值，如下图所示： .

此外，这是我的张量板的图片： .

为了训练模型，我使用 model_main_tf2.py、

!python /content/gdrive/My\ Drive/models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path={pipeline_file} \
    --model_dir={model_dir} \
    --alsologtostderr \
    --num_train_steps={num_steps} \
    --sample_1_of_n_eval_examples=10 \
    --eval_training_data=True \
    --sample_1_of_n_eval_on_train_examples=10 \
    --num_eval_steps={num_eval_steps}

在我的配置文件中： eval_config { metrics_set: "coco_detection_metrics" use_moving_averages: false }

我已经尝试了各种方法，就像 eval.py（我读到它适用于 tensorflow 1.x），但是我遇到了很多错误，或者就像对象检测中的其他脚本一样来自 github、object_detection(repository).

的存储库

目前最重要的是准确性。我发现loss大概定义在model_lib_v2.py at 845-858 line:

 eval_metrics = {}

  for evaluator in evaluators:
    eval_metrics.update(evaluator.evaluate())
  for loss_key in loss_metrics:
    eval_metrics[loss_key] = loss_metrics[loss_key].result()

  eval_metrics = {str(k): v for k, v in eval_metrics.items()}
  tf.logging.info('Eval metrics at step %d', global_step)
  for k in eval_metrics:
    tf.compat.v2.summary.scalar(k, eval_metrics[k], step=global_step)
    tf.logging.info('\t+ %s: %f', k, eval_metrics[k])

  return eval_metrics

但我不知道如何更改代码以增加准确性。

如果有帮助，我使用 ssd_mobilenet_v2_fpnlite_640x640 模型和 gdrive 来加载数据和运行脚本。

更新： 我使用的配置文件如下：

model {
  ssd {
    num_classes: 18
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v2_fpn_keras"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.9999998989515007e-05
          }
        }
        initializer {
          random_normal_initializer {
            mean: 0.0
            stddev: 0.009999999776482582
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.996999979019165
          scale: true
          epsilon: 0.0010000000474974513
        }
      }
      use_depthwise: true
      override_base_feature_extractor_hyperparams: true
      fpn {
        min_level: 3
        max_level: 7
        additional_layer_depth: 128
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.75
        unmatched_threshold: 0.25
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.9999998989515007e-05
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.009999999776482582
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.996999979019165
            scale: true
            epsilon: 0.0010000000474974513
          }
        }
        depth: 128
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.599999904632568
        share_prediction_tower: true
        use_depthwise: true
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 9.99999993922529e-09
        iou_threshold: 0.6000000238418579
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 16
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.07999999821186066
          total_steps: 50000
          warmup_learning_rate: 0.026666000485420227
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.8999999761581421
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "/content/gdrive/My Drive/models/research/deploy/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0"
  num_steps: 20000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 1
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/content/gdrive/My Drive/models/research/object_detection/data/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/content/gdrive/My Drive/models/research/deploy/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/content/gdrive/My Drive/models/research/object_detection/data/test.record"
  }
}

Answer 1

尝试使用这段代码后：

!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--checkpoint_dir='object_detection/training' \
--alsologtostderr

我最终设法得到了一些不同于零的结果（也许关于检查点的某事发生了变化）。结果：

2020-09-20 10:11:00.899773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0920 10:11:02.925597 140679676843904 model_lib_v2.py:925] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None
I0920 10:11:02.925838 140679676843904 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0920 10:11:02.925928 140679676843904 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0920 10:11:02.926012 140679676843904 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0920 10:11:02.926136 140679676843904 model_lib_v2.py:940] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
2020-09-20 10:11:02.934837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-20 10:11:02.971958: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.972512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:02.972554: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:02.974022: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:02.975587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:02.975923: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:02.980298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:02.981636: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:02.985271: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:02.985387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.985948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:02.986434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:02.991809: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2020-09-20 10:11:02.991994: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:02.992021: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-20 10:11:03.103553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12e1640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-20 10:11:03.104245: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-09-20 10:11:03.104426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.104982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-09-20 10:11:03.105024: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.105075: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:03.105097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-20 10:11:03.105121: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-20 10:11:03.105140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-20 10:11:03.105158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-20 10:11:03.105181: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-20 10:11:03.105258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.105829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.106297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-20 10:11:03.106340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-20 10:11:03.719107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-20 10:11:03.719168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-20 10:11:03.719181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-20 10:11:03.719388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-20 10:11:03.720576: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-09-20 10:11:03.720625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13962 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0920 10:11:03.765423 140679676843904 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0920 10:11:03.768426 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0920 10:11:03.799010 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0920 10:11:07.302673 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:08.378596 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:11:10.765872 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
INFO:tensorflow:Found new checkpoint at object_detection/training/ckpt-7
I0920 10:11:10.769416 140679676843904 checkpoint_utils.py:134] Found new checkpoint at object_detection/training/ckpt-7
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
W0920 10:11:10.791586 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py:702: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0920 10:11:51.835231 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/eval_util.py:878: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2020-09-20 10:11:56.921874: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-20 10:11:57.287284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
INFO:tensorflow:Finished eval step 0
I0920 10:11:59.687197 140679676843904 model_lib_v2.py:799] Finished eval step 0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
W0920 10:11:59.819776 140679676843904 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
INFO:tensorflow:Performing evaluation on 43 images.
I0920 10:12:04.423586 140679676843904 coco_evaluation.py:282] Performing evaluation on 43 images.
creating index...
index created!
INFO:tensorflow:Loading and preparing annotation results...
I0920 10:12:04.423980 140679676843904 coco_tools.py:116] Loading and preparing annotation results...
INFO:tensorflow:DONE (t=0.00s)
I0920 10:12:04.426340 140679676843904 coco_tools.py:138] DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.17s).
Accumulating evaluation results...
DONE (t=0.04s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.728
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.999
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.806
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.784
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.784
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.784
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.784
INFO:tensorflow:Eval metrics at step 5000
I0920 10:12:04.649628 140679676843904 model_lib_v2.py:853] Eval metrics at step 5000
INFO:tensorflow:    + DetectionBoxes_Precision/mAP: 0.727510
I0920 10:12:04.672720 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP: 0.727510
INFO:tensorflow:    + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
I0920 10:12:04.675546 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP@.50IOU: 0.999325
INFO:tensorflow:    + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
I0920 10:12:04.676922 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP@.75IOU: 0.806042
INFO:tensorflow:    + DetectionBoxes_Precision/mAP (small): -1.000000
I0920 10:12:04.678196 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP (small): -1.000000
INFO:tensorflow:    + DetectionBoxes_Precision/mAP (medium): -1.000000
I0920 10:12:04.679378 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP (medium): -1.000000
INFO:tensorflow:    + DetectionBoxes_Precision/mAP (large): 0.727510
I0920 10:12:04.680430 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Precision/mAP (large): 0.727510
INFO:tensorflow:    + DetectionBoxes_Recall/AR@1: 0.783721
I0920 10:12:04.681638 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@1: 0.783721
INFO:tensorflow:    + DetectionBoxes_Recall/AR@10: 0.783721
I0920 10:12:04.682775 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@10: 0.783721
INFO:tensorflow:    + DetectionBoxes_Recall/AR@100: 0.783721
I0920 10:12:04.683973 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@100: 0.783721
INFO:tensorflow:    + DetectionBoxes_Recall/AR@100 (small): -1.000000
I0920 10:12:04.685043 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@100 (small): -1.000000
INFO:tensorflow:    + DetectionBoxes_Recall/AR@100 (medium): -1.000000
I0920 10:12:04.686104 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@100 (medium): -1.000000
INFO:tensorflow:    + DetectionBoxes_Recall/AR@100 (large): 0.783721
I0920 10:12:04.687381 140679676843904 model_lib_v2.py:856]  + DetectionBoxes_Recall/AR@100 (large): 0.783721
INFO:tensorflow:    + Loss/localization_loss: 0.089791
I0920 10:12:04.688443 140679676843904 model_lib_v2.py:856]  + Loss/localization_loss: 0.089791
INFO:tensorflow:    + Loss/classification_loss: 0.336598
I0920 10:12:04.689506 140679676843904 model_lib_v2.py:856]  + Loss/classification_loss: 0.336598
INFO:tensorflow:    + Loss/regularization_loss: 0.117549
I0920 10:12:04.690544 140679676843904 model_lib_v2.py:856]  + Loss/regularization_loss: 0.117549
INFO:tensorflow:    + Loss/total_loss: 0.543938
I0920 10:12:04.691550 140679676843904 model_lib_v2.py:856]  + Loss/total_loss: 0.543938
INFO:tensorflow:Waiting for new checkpoint at object_detection/training
I0920 10:16:10.791572 140679676843904 checkpoint_utils.py:125] Waiting for new checkpoint at object_detection/training
Traceback (most recent call last):
  File "object_detection/model_main_tf2.py", line 114, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "object_detection/model_main_tf2.py", line 89, in main
    wait_interval=300, timeout=FLAGS.eval_timeout)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 966, in eval_continuously
    checkpoint_dir, timeout=timeout, min_interval_secs=wait_interval):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 184, in checkpoints_iterator
    checkpoint_dir, checkpoint_path, timeout=timeout)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 132, in wait_for_new_checkpoint
    time.sleep(seconds_to_sleep)
KeyboardInterrupt

但是，tensorboard 尚未更新这些指标（AP、AR 等）。可能我做错了。关于准确性，我仍然没有取得任何进展，所以如果有人知道一些事情，那将非常有帮助。谢谢！

Answer 2

您似乎已经找到了问题的解决方案。太好了，但您只需要再等一会儿，评估指标就会显示在您的张量板上。

从默认设置开始，逐步进行培训。

第一个运行训练为 dameon:

!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr &

然后运行在不同的控制台或 shell 上进行评估。评估将自动选择较新的检查点（默认情况下，这是每 1000 步一次。因此您必须等待训练达到第 1000 步 + 等待评估完成 1 个时期或 test.record 中的图像数量）

!python object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_file} \
--model_dir='object_detection/training' \
--alsologtostderr \
--checkpoint_dir='object_detection/training'

这是您可以用来可视化的示例配置。

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
  num_visualizations: 10
  max_num_boxes_to_visualize: 5
  visualize_groundtruth_boxes: true
  eval_interval_secs: 30
}

eval_input_reader: {
  label_map_path: "path/to/label_map.pbtxt"
  shuffle: true
  queue_capacity: 100 #depending on your GPU/TPU/CPU
  num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]
  tf_record_input_reader {
    input_path: "path/to/test.record"
  }
}

这里有几点需要注意：TF20 issue

batch_size: 1 #This has to be 1. TF2 throws errors

num_epochs: 1 #provide this as 1

eval_interval_secs: #something based on your dataset & gpu config. Default is 300

num_epochs: 1 #what encompasses when to upload results for mAP and AR, if you rather have a number provide that under [num_examples <= test.record total size else error]

最重要的是，等待。等待至少两次评估完成（通常是第一次，它基本上运行s 用于第 0 步，这是无用的（因为你的训练必须达到检查点（默认如上文所述，第 1000 次））

如何在 Tensorflow 2.x 中打印准确性和其他指标？

How to print Accuracy and other metrics in Tensorflow 2.x?

object-detection

tensorflow2.x