-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Is there any real-life cases of successful application of reinforcement learning in trading / asset management? #121
Comments
@Kismuz Long time no see your update :-) thanks for the information. If J E Moody & Company LLC succeeded 15+ years ago with applying RL in trading in practice and turned it into business as a small team. Without nowadays computing power, software stack and recent advances in ML/DL field, which part do you think is most likely the key to their success from algorithm's aspect? thanks |
@mysl , there are many match and CS experts and even more die-hard traders and asset-managers. The mix of both expertise in one head or team is still rare. One needed courage and independent vision to propose model-free policy search while living in the era of linear regression-based econometric models.
|
From 'Multi Objective Reward Discussion' #110:
Differential risk adjusted measurement is just a brilliant concept! In principle this derivation can be applied to the 'N'th Lower Partial Moment' and get a whole family of those differential risk adjusted measurement. @Kismuz, you did a real detective job for this post :) While looking for related articles in google, I came across this nice survey on the topic. |
@Kismuz, differential risk adjusted are based on having some moving average statistic of the previous rewards. So it means that for the first stage (moving average initialization period), we don't get any risk adjusted reward feedback to learn from. Is it a problem from RL stand point? or is it ok to have a very sparse reward at the beginning and then dense rewards from that point onward? (especially in light of that the sparse reward are due to an arbitrary number of actions taken and not environment dynamics) |
I’ve been repeatedly asked this question and my answer always was something like ‘I have no evidence’. Unfortunately, in this domain every person who able to say something valuable instantly turns covert, vague and mysterious when it comes to real application.
But after all I think we can now track at least one case of successful application of RL in asset management. What is incredible about it is that we talk about two decades-old research work.
I consider myself decent information retriever so it is absolute shame I’ve missed this thread until now. If I wouldn't - my work with BTGym would pace times faster. Though I have independently repeated some findings (like time-series preprocessing via differencing stack of moving averages or recurrent policies), other key features didn’t came so easily. I mainly mean performance functions like Differential Sharpe or Downside Deviation Ratio found in these papers (I’m absolutely sure it is my ignorance of domain specific performance functions and attempts to use only linear combinations of returns as source of reward is main cause of suboptimal performance of algorithms included in BTGym).
So here is a stack of accessible works of John Moody, Matthew Saffell et all:
https://www.researchgate.net/scientific-contributions/31497186_Matthew_Saffell
https://www.researchgate.net/scientific-contributions/10597646_John_Moody
Going through full stack form 1996 through 2004 one can easily see evolution of ideas. Some key papers to read are:
Learning to Trade via Direct Reinforcement https://pdfs.semanticscholar.org/1a49/99c918c6206cd9804c48f7dce1bac6ec5b4a.pdf
PERFORMANCE FUNCTIONS AND REINFORCEMENT
LEARNING FOR TRADING SYSTEMS AND PORTFOLIOS http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.8437&rep=rep1&type=pdf
Reinforcement Learning for Trading https://pdfs.semanticscholar.org/93b8/17deef9dd5afc66ccf43174a07ddaa49854f.pdf
Think again, it is early bimillenary, no TF, PPO and all that deep stuff we used to have at hand until a decade passes.
Than in 2004 line breaks. The last public document available is Saffell’s 2005 thesis:
Knowledge discovery for time series
https://www.semanticscholar.org/paper/Knowledge-discovery-for-time-series-Moody-Saffell/002165064501911ca06679ba762bd7ffc00bf44d
… and some Paris conference ppt presentation.
Given industry realities, there are some reasons to interrupt such a line:
I think these guys are hilarious. Small team, no fuss, no conference talks, no private investors, no fancy landing pages but 2018 ‘HFM US Performance Award^ for Best Quantitative Strategy under $1B’.
As for me, everything just says: “ If you want to evidence financial RL going practical - look no further”.
The text was updated successfully, but these errors were encountered: