-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Multi Objective Reward Discussion #110
Comments
@JacobHanouna , I would look at 'Lower Partial Moment'-based risk measures: |
@Kismuz Thank you for sharing, this is a very interesting paper. I was interested before on the Sortino and Omega ratio to model drawdowns. so cool to have one formalism that unite them together. Backtrader actually have limited support for the PyFolio project ( I think that the most challenging part is what to do when you have a single/family of those risk ratio. because in the end we have only one reward function, each part of the reward needs to be scalarized and weighted so we can construct the final reward value. The paper above offers a framework to tackle this challenge by dynamically learning the weights of relations between each part of the reward instead of manually trying to find a static reward shape between all the reward parts. @Kismuz, I think it might be worth looking into as part of the design for BTGym 2.0 |
@JacobHanouna,
omega = kappa(n=1) + 1 the only trouble with these when converting to loss or reward term is that kappa uses whole set of data to make single estimation while loss/reward are usually estimated from single point; in case of classification loss it can be tackled in a SGD-like manner (make estimation from i.i.d batch instead of whole dataset); in case of reward it can be tricky. |
@Kismuz, first thank you for sharing Kappa paper and code.
Ok, I agree. according to the paper, Hindsight Experience Replay, maybe we can use the kappa summary value of each episode as a goal we want to achieve. so when we optimize the policy we optimize with respect to f(s, a, g)? from the paper:
The best results for accomplishing the task they tried to perform was without using a reward shaping. @Kismuz, do you think such approach is viable here as well? |
I have been recently thinking on how to incorporate risk adjusted returns, like sharpe ratio, as a way to form a richer and more complex reward function.
The idea is to find the optimal policy for high returns but in a way that also minimize risk.
The currently available way to play around with the idea is to create a custom reward function of the following form:
Reward = a * profit + b * risk
where 'a' and 'b' are hyper parameters that need to be manually crafted.
the risk itself can be obtained by using backtrader analyzer (didn't checked yet on how it integrates, but i think it is possible)
During a survey on the subject, I came across the following paper:
Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning
according to the paper it is possible to learn the relation between different reward objective as part of the learning process in an off-policy setup using a more general form of Hindsight Experience Replay. where, final reward = W * (reward vector)
I think it can be a powerful tool that will allow a way to control risk based on scenario.
thoughts and ideas on the topic would be appreciated
The text was updated successfully, but these errors were encountered: