Need help in understanding PPO Lag #334

jyao97 · 2024-05-21T08:17:45Z

jyao97
May 21, 2024

Hi folks,

I am researching PPO Lag and am trying to understand the Lagrange relaxation on the constraints. I think the original optimization goal of PPO is to maximize the advantage function, but when considering cost control, the cumulative sum of the constraints seems similar to state values or action values. i.e. $max A(s, a) \quad s.t. \sum c < d$. The direct relaxation should be like $A(s, a) - \lambda (\sum c -d)$, which seems meaningless. How can I move the penalty on costs to the advantage item?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help in understanding PPO Lag #334

{{title}}

Replies: 0 comments

Select a reply

Need help in understanding PPO Lag #334

jyao97 May 21, 2024

Replies: 0 comments

jyao97
May 21, 2024