强化学习中顺序决策中的平稳性概念

Stationarity conecpt in Sequential decision in reinforcement learning

以下是 Stuart Russel 和 Peter Norvig 撰写的《人工智能现代方法》一书中的顺序决策问题的文本片段。遮打 17 区 17.1

Stationarity for preferences means the following:

if two state sequences [s0, s1, s2, . . .] and [s0',s1', s2', . . .] begin with the same state (i.e., s0 =s01), then the two sequences should be preference-ordered the same way as the sequences [s1, s2, . . .] and [s1', s2', . . .].

In English, this means that if you prefer one future to another starting tomorrow, then you should still prefer that future if it were to start today instead.

我很难理解最后一句话。

在英语中，这意味着如果你更喜欢一个从明天开始的未来，那么如果它从今天开始，你仍然应该更喜欢那个未来。

请多多指教和解释。

维基百科关于平稳性的另一个定义可能有助于理解这个想法：

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.

关键概念是在时间偏移时不会改变。因此，适用于偏好的情况，偏好应该是相同的，而与做出的时间无关。即，如果您在第 2 天（明天）或第 1 天（今天），对第 3 天的偏好应该相同。

强化学习中顺序决策中的平稳性概念

Stationarity conecpt in Sequential decision in reinforcement learning

artificial-intelligence

machine-learning

reinforcement-learning