Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Is there any real-life cases of successful application of reinforcement learning in trading / asset management? #121

Open
Kismuz opened this issue Oct 25, 2019 · 4 comments

Comments

@Kismuz
Copy link
Owner

Kismuz commented Oct 25, 2019

I’ve been repeatedly asked this question and my answer always was something like ‘I have no evidence’. Unfortunately, in this domain every person who able to say something valuable instantly turns covert, vague and mysterious when it comes to real application.

But after all I think we can now track at least one case of successful application of RL in asset management. What is incredible about it is that we talk about two decades-old research work.

I consider myself decent information retriever so it is absolute shame I’ve missed this thread until now. If I wouldn't - my work with BTGym would pace times faster. Though I have independently repeated some findings (like time-series preprocessing via differencing stack of moving averages or recurrent policies), other key features didn’t came so easily. I mainly mean performance functions like Differential Sharpe or Downside Deviation Ratio found in these papers (I’m absolutely sure it is my ignorance of domain specific performance functions and attempts to use only linear combinations of returns as source of reward is main cause of suboptimal performance of algorithms included in BTGym).

So here is a stack of accessible works of John Moody, Matthew Saffell et all:
https://www.researchgate.net/scientific-contributions/31497186_Matthew_Saffell
https://www.researchgate.net/scientific-contributions/10597646_John_Moody

Going through full stack form 1996 through 2004 one can easily see evolution of ideas. Some key papers to read are:
Learning to Trade via Direct Reinforcement https://pdfs.semanticscholar.org/1a49/99c918c6206cd9804c48f7dce1bac6ec5b4a.pdf

PERFORMANCE FUNCTIONS AND REINFORCEMENT
LEARNING FOR TRADING SYSTEMS AND PORTFOLIOS http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.8437&rep=rep1&type=pdf

Reinforcement Learning for Trading https://pdfs.semanticscholar.org/93b8/17deef9dd5afc66ccf43174a07ddaa49854f.pdf

Think again, it is early bimillenary, no TF, PPO and all that deep stuff we used to have at hand until a decade passes.

Than in 2004 line breaks. The last public document available is Saffell’s 2005 thesis:

Knowledge discovery for time series
https://www.semanticscholar.org/paper/Knowledge-discovery-for-time-series-Moody-Saffell/002165064501911ca06679ba762bd7ffc00bf44d
… and some Paris conference ppt presentation.

Given industry realities, there are some reasons to interrupt such a line:

  • no practical result can be made of these research, so everything just stops when grants are exhausted; but is not the case here because at least one of the researchers should surface later with some publications on fresh supported topic;
  • there are good practical perspectives, group get hired by an institution and signs confidentiality papers; nothing to find here cause any practical applications and reports would be locked forever inside some corporate intarnet;
  • perspectives are so promising and novel one can start it’s own business. Indeed, the company was founded in 2011 as ‘J E Moody & Company LLC’ in Portland, OR’: http://www.jemoody.com/

I think these guys are hilarious. Small team, no fuss, no conference talks, no private investors, no fancy landing pages but 2018 ‘HFM US Performance Award^ for Best Quantitative Strategy under $1B’.

As for me, everything just says: “ If you want to evidence financial RL going practical - look no further”.

@mysl
Copy link

mysl commented Oct 26, 2019

@Kismuz Long time no see your update :-) thanks for the information. If J E Moody & Company LLC succeeded 15+ years ago with applying RL in trading in practice and turned it into business as a small team. Without nowadays computing power, software stack and recent advances in ML/DL field, which part do you think is most likely the key to their success from algorithm's aspect? thanks

@Kismuz
Copy link
Owner Author

Kismuz commented Oct 26, 2019

@mysl , there are many match and CS experts and even more die-hard traders and asset-managers. The mix of both expertise in one head or team is still rare. One needed courage and independent vision to propose model-free policy search while living in the era of linear regression-based econometric models.

from algorithm's aspect?

  • differentiable utility function is number one, proper feature extraction is number two IMHO.

@JaCoderX
Copy link
Contributor

JaCoderX commented Oct 27, 2019

From 'Multi Objective Reward Discussion' #110:

I have been recently thinking on how to incorporate risk adjusted returns, like sharpe ratio, as a way to form a richer and more complex reward function.

Differential risk adjusted measurement is just a brilliant concept!
I went over the derivation of 'The Differential Sharpe Ratio' and 'Differential Downside Deviation Ratio' and they are both quite interesting, although I probably will need to read them a couple more times to really get them.

In principle this derivation can be applied to the 'N'th Lower Partial Moment' and get a whole family of those differential risk adjusted measurement.

@Kismuz, you did a real detective job for this post :)
It really seems like they were on the right path since 20+ years ago

While looking for related articles in google, I came across this nice survey on the topic.
Reinforcement Learning in Financial Markets

@JaCoderX
Copy link
Contributor

@Kismuz, differential risk adjusted are based on having some moving average statistic of the previous rewards. So it means that for the first stage (moving average initialization period), we don't get any risk adjusted reward feedback to learn from.

Is it a problem from RL stand point? or is it ok to have a very sparse reward at the beginning and then dense rewards from that point onward? (especially in light of that the sparse reward are due to an arbitrary number of actions taken and not environment dynamics)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants