强化学习中顺序决策中的平稳性概念

Stationarity conecpt in Sequential decision in reinforcement learning

以下是 Stuart Russel 和 Peter Norvig 撰写的《人工智能现代方法》一书中的顺序决策问题的文本片段。遮打 17 区 17.1

Stationarity for preferences means the following:

if two state sequences [s0, s1, s2, . . .] and [s0',s1', s2', . . .] begin with the same state (i.e., s0 =s01), then the two sequences should be preference-ordered the same way as the sequences [s1, s2, . . .] and [s1', s2', . . .].

In English, this means that if you prefer one future to another starting tomorrow, then you should still prefer that future if it were to start today instead.

我很难理解最后一句话。

在英语中,这意味着如果你更喜欢一个从明天开始的未来,那么如果它从今天开始,你仍然应该更喜欢那个未来。

请多多指教和解释。

维基百科关于平稳性的另一个定义可能有助于理解这个想法:

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.

关键概念是在时间偏移时不会改变。因此,适用于偏好的情况,偏好应该是相同的,而与做出的时间无关。即,如果您在第 2 天(明天)或第 1 天(今天),对第 3 天的偏好应该相同。