-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Feature request: Enable Spark Support #381
Comments
@ProbStub Very cool feature request! You mention scaling across many portfolios and assets, which to me are two quite different problems. The former can anyways be easily parallelized in most cases. For the latter, I'm really curious to hear what use cases you have in mind that require e.g. spark to compute. |
@phschiele Yes two very different use cases. A large number of portfolios may result from the number of users and would, on its own not be a challange. However, the system would work with look-through ETF decomposition and re-compose the portfolios given a selection of metrics. An ETF portfolio may contain several positions of different share classes and even after aggregating up to the same securities the number of positions may be quite large. Hope that adds a little color. |
Gotta be honest, this is a bit beyond me! I've never had to deal with portfolios of more than a few hundred assets. |
In my experience, the optimization usually would become a bottleneck first. With that said, it's cool that the koalas project will directly be included into PySpark, thanks for sharing that! |
@robertmartin8 It is admittedly more an issue with fixed income ETFs but some large equity ETFs (e.g. VT) have a huge number of holdings. Adding alternatives and hard to value assets the numbers increase further. If I were to extend pypfopt with PySpark capabilities, would you prefere that to be: I am more than happy to work on this quietly and propose a PR in a few weeks time. It would help to know how you would prefere to do such an extension, if at all. @phschiele: The optimization bottleneck is certainly a challange. That problem may only arrise when and if computations become possible. Additionally, one may want to actually reduce target holdings to a lot less than the original 1000+ positions as part of the optimization or using other techniques. |
@ProbStub did you go any further with this? |
@blair-anson I have had to stop working on this due to other commitments and the limited interest. Happy to pick it up again. There have been a few Spark releases since, so chances are this might be easier now. Back then then pandas cov function was still missing on the Pandas API for Spark. |
Is your feature request related to a problem?
I like to perform large portfolio/a large number of asset optimizations, e.g. more than 1m portfolios and use Apache Spark rather than Pandas/Numpy.
Describe the feature you'd like
Large asset covariance computations should leverage parallel structures and portfolio optimization should be possible to scale to a larger number of portfolios. Similar to the mocked up implementation on this fork.
Additional context
Hi,
I am currently thinking about using pypfopt in a spark environment. I have mocked up my idea on a fork here - it's work in progress. API and other changes are purely for demonstration purpose and clearly would need much more discussion if a PR were to be considered - also spark 3.2’s pyspark.pandas will simplify the effort.
Before going ahead further I would have two questions:
Cheers and happily looking forward to your thoughts.
Best regards,
Prob
The text was updated successfully, but these errors were encountered: