强化学习中顺序决策中的平稳性概念
Stationarity conecpt in Sequential decision in reinforcement learning
以下是 Stuart Russel 和 Peter Norvig 撰写的《人工智能现代方法》一书中的顺序决策问题的文本片段。遮打 17 区 17.1
Stationarity for preferences means the following:
if two state sequences [s0, s1, s2, . . .] and [s0',s1', s2', . . .]
begin with the same state (i.e., s0 =s01), then the two sequences
should be preference-ordered the same way as the sequences [s1, s2, .
. .] and [s1', s2', . . .].
In English, this means that if you prefer one future to another
starting tomorrow, then you should still prefer that future if it were
to start today instead.
我很难理解最后一句话。
在英语中,这意味着如果你更喜欢一个从明天开始的未来,那么如果它从今天开始,你仍然应该更喜欢那个未来。
请多多指教和解释。
维基百科关于平稳性的另一个定义可能有助于理解这个想法:
In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.
关键概念是在时间偏移时不会改变。因此,适用于偏好的情况,偏好应该是相同的,而与做出的时间无关。即,如果您在第 2 天(明天)或第 1 天(今天),对第 3 天的偏好应该相同。
以下是 Stuart Russel 和 Peter Norvig 撰写的《人工智能现代方法》一书中的顺序决策问题的文本片段。遮打 17 区 17.1
Stationarity for preferences means the following:
if two state sequences [s0, s1, s2, . . .] and [s0',s1', s2', . . .] begin with the same state (i.e., s0 =s01), then the two sequences should be preference-ordered the same way as the sequences [s1, s2, . . .] and [s1', s2', . . .].
In English, this means that if you prefer one future to another starting tomorrow, then you should still prefer that future if it were to start today instead.
我很难理解最后一句话。
在英语中,这意味着如果你更喜欢一个从明天开始的未来,那么如果它从今天开始,你仍然应该更喜欢那个未来。
请多多指教和解释。
维基百科关于平稳性的另一个定义可能有助于理解这个想法:
In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.
关键概念是在时间偏移时不会改变。因此,适用于偏好的情况,偏好应该是相同的,而与做出的时间无关。即,如果您在第 2 天(明天)或第 1 天(今天),对第 3 天的偏好应该相同。