Feature request: Enable Spark Support #381

ProbStub · 2021-10-14T16:43:48Z

Is your feature request related to a problem?
I like to perform large portfolio/a large number of asset optimizations, e.g. more than 1m portfolios and use Apache Spark rather than Pandas/Numpy.

Describe the feature you'd like
Large asset covariance computations should leverage parallel structures and portfolio optimization should be possible to scale to a larger number of portfolios. Similar to the mocked up implementation on this fork.

Additional context
Hi,

I am currently thinking about using pypfopt in a spark environment. I have mocked up my idea on a fork here - it's work in progress. API and other changes are purely for demonstration purpose and clearly would need much more discussion if a PR were to be considered - also spark 3.2’s pyspark.pandas will simplify the effort.

Before going ahead further I would have two questions:

Is spark support generally desired for pypfopt?
If yes, should that be an extension of the API as in the mock, or should it be a more stand-alone kind of implementation?

Cheers and happily looking forward to your thoughts.

Best regards,
Prob

phschiele · 2021-10-14T17:48:01Z

@ProbStub Very cool feature request!

You mention scaling across many portfolios and assets, which to me are two quite different problems. The former can anyways be easily parallelized in most cases. For the latter, I'm really curious to hear what use cases you have in mind that require e.g. spark to compute.

ProbStub · 2021-10-14T19:24:05Z

@phschiele Yes two very different use cases. A large number of portfolios may result from the number of users and would, on its own not be a challange. However, the system would work with look-through ETF decomposition and re-compose the portfolios given a selection of metrics. An ETF portfolio may contain several positions of different share classes and even after aggregating up to the same securities the number of positions may be quite large. Hope that adds a little color.

robertmartin8 · 2021-10-19T19:21:26Z

Gotta be honest, this is a bit beyond me! I've never had to deal with portfolios of more than a few hundred assets.

phschiele · 2021-10-19T19:55:30Z

In my experience, the optimization usually would become a bottleneck first.

With that said, it's cool that the koalas project will directly be included into PySpark, thanks for sharing that!

ProbStub · 2021-10-20T12:07:38Z

@robertmartin8 It is admittedly more an issue with fixed income ETFs but some large equity ETFs (e.g. VT) have a huge number of holdings. Adding alternatives and hard to value assets the numbers increase further.

If I were to extend pypfopt with PySpark capabilities, would you prefere that to be:
a) totally seperate from the existing methods and classes (which would allow for a much better use of SparkML pipelines)?
b) an extension to become an option of the existing API, much like in the mock-up fork, (allowing for better adoption of alternative compute backend for users)?
c) placed into a different package so that these features do not affect pypfopt design going forward (which would allow more degrees of freedom, given Spark has it's own wrapper for BLAS)?

I am more than happy to work on this quietly and propose a PR in a few weeks time. It would help to know how you would prefere to do such an extension, if at all.

@phschiele: The optimization bottleneck is certainly a challange. That problem may only arrise when and if computations become possible. Additionally, one may want to actually reduce target holdings to a lot less than the original 1000+ positions as part of the optimization or using other techniques.

blair-anson · 2022-06-24T07:18:19Z

@ProbStub did you go any further with this?

ProbStub · 2022-06-24T07:35:32Z

@blair-anson I have had to stop working on this due to other commitments and the limited interest. Happy to pick it up again. There have been a few Spark releases since, so chances are this might be easier now. Back then then pandas cov function was still missing on the Pandas API for Spark.

ProbStub added the enhancement New feature or request label Oct 14, 2021

ProbStub changed the title ~~Feature request: [your feature]~~ Feature request: Enable Spark Support Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Enable Spark Support #381

Feature request: Enable Spark Support #381

ProbStub commented Oct 14, 2021

phschiele commented Oct 14, 2021

ProbStub commented Oct 14, 2021

robertmartin8 commented Oct 19, 2021

phschiele commented Oct 19, 2021

ProbStub commented Oct 20, 2021

blair-anson commented Jun 24, 2022

ProbStub commented Jun 24, 2022 •

edited

Loading

Feature request: Enable Spark Support #381

Feature request: Enable Spark Support #381

Comments

ProbStub commented Oct 14, 2021

phschiele commented Oct 14, 2021

ProbStub commented Oct 14, 2021

robertmartin8 commented Oct 19, 2021

phschiele commented Oct 19, 2021

ProbStub commented Oct 20, 2021

blair-anson commented Jun 24, 2022

ProbStub commented Jun 24, 2022 • edited Loading

ProbStub commented Jun 24, 2022 •

edited

Loading