强化学习中的连续动作 spaces - 智能体如何从连续的 space 中选择动作值？

Continuous action spaces in Reinforcement Learning - How does the agent choose action value from a continuous space?

我这几天一直在学习强化学习，我看到了像山车问题和车杆问题这样的例子问题。

在这些问题中，描述动作的方式space是离散的。例如在 Cart Pole Problem 中，代理可以向左移动或向右移动。

但是例子没讲到多少？在所有这些运动都是连续的 space 动作之后，智能体如何决定向左移动多少，向右移动多少。所以我想知道代理是如何决定从一个连续的动作中选择什么真实值的 space.
此外，我一直在 Julia 中使用 ReinforcementLearning.jl，并且想知道一种方法，我可以在其中表示对动作 space 的范围限制。例如，代理在执行操作时选择的实际值应该在 [10.00, 20.00[ 之类的范围内。我想知道如何做到这一点。

But the examples don't talk about how much? How does the agent decide how much to move left, how much to move right, after all these movements are continuous space actions. So I want to know how does the agent decide what real value to choose from a continuous action space.

常见的解决方案是假设代理的输出服从正态分布。然后你只需要设计一个预测均值和标准差的代理。最后从该分布中抽取一个随机动作并将其传递给环境。

另一种可能的解决方案是将连续动作 space 离散化，将其转化为离散动作 space 问题。然后从预测的 bin 中随机抽取一个动作。

Also I have been using ReinforcementLearning.jl in Julia and wanted to know a way i could represent range constraints on action space in it. Example, the real value that the agent chooses as it's action should lie in a range like [10.00, 20.00[ for example. I want to know how this can be done.

您可以查看 PendulumEnv. Currently, it uses .. from IntervalSets.jl 描述连续范围的实现细节。

强化学习中的连续动作 spaces - 智能体如何从连续的 space 中选择动作值？

Continuous action spaces in Reinforcement Learning - How does the agent choose action value from a continuous space?

reinforcement-learning

julia