什么是效用？

What is utility?

reinforcement-learning

作为 Q 学习的一部分，objective 是为了最大化预期效用。我知道

阅读维基百科： https://en.wikipedia.org/wiki/Q-learning 描述了以下上下文中的预期效用：

It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter.

One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment.

但是没有定义utility是什么，utility是什么意思？

最大化时 utility 到底最大化了什么？

在这种情况下，"utility" 表示功能或实用性。所以 "maximum functionality" 或 "maximum usefulness".

将单词插入 Google 得到：

the state of being useful, profitable, or beneficial.

一般来说，效用意味着有利可图或有益（正如@Rob 在他的回复中所张贴的）。

在 Q-learning 上下文中，utility 与 action-value function 密切相关（它们可以被视为同义词），如你读过维基百科的解释。这里，策略 π 的动作值函数是对代理在给定状态下执行动作 a 时将获得的 return（长期奖励）的估计s 并遵循政策 π。所以，当你最大化效用时，实际上你是在最大化你的代理人将获得的奖励。由于奖励是为实现目标而定义的，因此您正在最大化实现目标的 "quantity"。

什么是效用？

What is utility?

reinforcement-learning