在 caffe 中使用两个损失时出现奇怪的损失模式

Weird loss pattern when using two losses in caffe

我在 caffe 中训练 CNN 并收到以下奇怪的损失模式:

I0425 16:38:58.305482 23335 solver.cpp:398]     Test net output #0: loss = nan (* 1 = nan loss)
I0425 16:38:58.305524 23335 solver.cpp:398]     Test net output #1: loss_intermediate = inf (* 1 = inf loss)
I0425 16:38:59.235857 23335 solver.cpp:219] Iteration 0 (-4.2039e-45 iter/s, 20.0094s/50 iters), loss = 18284.4
I0425 16:38:59.235926 23335 solver.cpp:238]     Train net output #0: loss = 18274.9 (* 1 = 18274.9 loss)
I0425 16:38:59.235942 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 9.46859 (* 1 = 9.46859 loss)
I0425 16:38:59.235955 23335 sgd_solver.cpp:105] Iteration 0, lr = 1e-06
I0425 16:39:39.330327 23335 solver.cpp:219] Iteration 50 (1.24704 iter/s, 40.0948s/50 iters), loss = 121737
I0425 16:39:39.330410 23335 solver.cpp:238]     Train net output #0: loss = 569.695 (* 1 = 569.695 loss)
I0425 16:39:39.330425 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 121168 (* 1 = 121168 loss)
I0425 16:39:39.330433 23335 sgd_solver.cpp:105] Iteration 50, lr = 1e-06
I0425 16:40:19.372197 23335 solver.cpp:219] Iteration 100 (1.24868 iter/s, 40.0421s/50 iters), loss = 34088.4
I0425 16:40:19.372268 23335 solver.cpp:238]     Train net output #0: loss = 369.577 (* 1 = 369.577 loss)
I0425 16:40:19.372283 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 33718.8 (* 1 = 33718.8 loss)
I0425 16:40:19.372292 23335 sgd_solver.cpp:105] Iteration 100, lr = 1e-06
I0425 16:40:59.501541 23335 solver.cpp:219] Iteration 150 (1.24596 iter/s, 40.1297s/50 iters), loss = 21599.6
I0425 16:40:59.501606 23335 solver.cpp:238]     Train net output #0: loss = 478.262 (* 1 = 478.262 loss)
I0425 16:40:59.501621 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 21121.3 (* 1 = 21121.3 loss)
...
I0425 17:09:01.895849 23335 solver.cpp:219] Iteration 2200 (1.24823 iter/s, 40.0568s/50 iters), loss = 581.874
I0425 17:09:01.895912 23335 solver.cpp:238]     Train net output #0: loss = 532.049 (* 1 = 532.049 loss)
I0425 17:09:01.895926 23335 solver.cpp:238]     Train net output #1: loss_intermediate = 49.8377 (* 1 = 49.8377 loss)
I0425 17:09:01.895936 23335 sgd_solver.cpp:105] Iteration 2200, lr = 1e-06

仅供参考:我的网络基本上由两个阶段组成,因此我有两个损失。第一阶段可以看作是 coarse 阶段,第二阶段是 coarseupsampling 阶段]阶段。

我的问题是:这是典型的亏损模式吗?首先 loss 值很高, intermediate_loss 第一次迭代很低,然后它基本上在下一次迭代中转向所以loss 较低,intermediate_loss 较高。最后只有intermediate_loss收敛。

"typical" 并不是一个真正适用的术语。模型和拓扑种类繁多,您可以找到许多奇怪的损失级数的例子。

在您的情况下,中间损失很可能开始时很低,因为它 "doesn't know any better" 还没有。随着后一层的训练足以向中间层提供可靠的反馈,然后它开始学习到足以出现严重错误的程度。

最后的损失计算与ground truth直接联系;它从第一次迭代中学习,因此从高损失到低损失的过程更容易理解。