Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Suppose someone wants to build a library that is usable by both rust and python DataFusion users. They have written a UDF in rust and it implements the rust DataFusion traits (whether scalar, aggregate, or window). Right now, if that user wants to use their UDF in datafusion-python
, they need to expose a variety of methods that basically mimic the trait functions of the rust code. For scalar UDFs the interface requires a bit of wrangling from ColumnarValue to PyArrow objects.
While it is possible to do this, it is likely error prone and tedious for implementers to write and maintain this code.
Describe the solution you'd like
We have an established pattern of adding foreign table providers via FFI interface and using PyCapsule. This makes adding a TableProvider a very easy operation. In our example code, the function to expose a table provider is only 6 lines of code and likely will require minimal maintenance.
It would be nice to expose all of the varieties of user defined functions via FFI to make this follow the established pattern and also easy for users to reuse their code.
Describe alternatives you've considered
I did a brief proof of concept where I used python calls to the required functions. This did work, but it took quite a bit of code and I suspect it will be difficult to maintain.
Additional context
This may provide additional value in that it would get us much closer to being able to expose a SessionContext
via ffi, which would have nice impacts to both the datafusion-ray and ballista projects.