-
Notifications
You must be signed in to change notification settings - Fork 1.5k
RFC: Demonstrate what a function package might look like -- encoding expressions #8046
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
b3e25be
to
c441a0d
Compare
@@ -710,30 +704,6 @@ impl BuiltinScalarFunction { | |||
BuiltinScalarFunction::Digest => { | |||
utf8_or_binary_to_binary_type(&input_expr_types[0], "digest") | |||
} | |||
BuiltinScalarFunction::Encode => Ok(match input_expr_types[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metadata information about the functions is now moved into functions/encoding.rs
module, along side its implementation
datafusion/expr/src/udf.rs
Outdated
} | ||
} | ||
|
||
/// Convenience trait for implementing ScalarUDF. See [`ScalarUDF::from_impl()`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This echo's the trait that @2010YOUY01 proposed in ) #7752, but does so in a way that is backwards compatible (makes a ScalarUDF out of the trait, to retain backwards compatibly)
|
||
pub(super) struct EncodeFunc {} | ||
|
||
static ENCODE_SIGNATURE: OnceLock<Signature> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what encode
and decode
look like using the ScalarUDF
API -- I think they are much clearer when all this type information is in one place (though I still kept it separate from the implementation to show the implementation did not change at all)
use std::collections::HashMap; | ||
use std::sync::Arc; | ||
|
||
/// Registers the `encode` and `decode` functions with the function registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the conditional registration of these functions based on feature flag -- there are probably nicer ways to do this but I don't think it is any worse than the current solution.
datafusion/functions/src/lib.rs
Outdated
|
||
/// Registers all "built in" functions from this crate with the provided registry | ||
pub fn register_all(registry: &mut HashMap<String, Arc<ScalarUDF>>) { | ||
encoding::register(registry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I envision extending this list with other packages over time.
@2010YOUY01 and @viirya I wonder if you have any thoughts on this approach / proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks great. I have several questions/suggestions:
-
to support the expr_fns encode and decode, I think we will need a Expr::ScalarFunction call or something that can take a function by name rather than fully resolved function
Now constructing an Expr
for built-in functions is stateless (does not require context), so it's tricky to be backwards compatible for Expr
API, the best solution I can think of is to also support initializing a UDF Expr
with only name string, and resolve them during logical plan optimization.
-
Extract registration functions from SessionContext into their own trait / consolidate the function registry code rather than passing
around a set of HahsMaps.... And make a way to actually modify them
It's a good idea to pack 3 HashMaps for scalar/aggr/window UDFs into a new struct like FunctionRegistry
👍🏼
datafusion/functions/src/lib.rs
Outdated
pub mod utils; | ||
|
||
/// Registers all "built in" functions from this crate with the provided registry | ||
pub fn register_all(registry: &mut HashMap<String, Arc<ScalarUDF>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support registering a single function here, there might be a use case that the user wants to override only one function from a function package
(possibly by changing this interface to something like
pub fn register_all() {
register_package(encoding::all_functions());
register_function(my_encoding::decode()); // override a method in default function package
}
datafusion/expr/src/udf.rs
Outdated
/// Returns this function's name | ||
pub fn name(&self) -> &str { | ||
&self.name | ||
} | ||
/// Returns this function's signature | ||
pub fn signature(&self) -> &Signature { | ||
&self.signature | ||
} | ||
/// return the return type of this function given the types of the arguments | ||
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> { | ||
// Old API returns an Arc of the datatype for some reason | ||
let res = (self.return_type)(args)?; | ||
Ok(res.as_ref().clone()) | ||
} | ||
/// return the implementation of this function | ||
pub fn fun(&self) -> &ScalarFunctionImplementation { | ||
&self.fun | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the case this set of interfaces is internal-faced for execution, we might extend it during separating function packages?
And trait FunctionImplementation
is the user-faced API for defining functions in separate crates
Yes, I agree this approach is the best I can come up with.
👍 Unfortunately that name is already taken :) Maybe |
@@ -34,6 +34,17 @@ pub trait FunctionRegistry { | |||
|
|||
/// Returns a reference to the udwf named `name`. | |||
fn udwf(&self, name: &str) -> Result<Arc<WindowUDF>>; | |||
|
|||
/// Registers a new `ScalarUDF`, returning any previously registered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a new proposed API -- to allow registering new scalar UDFs with a FunctionRegistry.
@@ -1228,30 +1229,4 @@ mod test { | |||
unreachable!(); | |||
} | |||
} | |||
|
|||
#[test] | |||
fn encode_function_definitions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests add a lot -- they simply encode the signature again. This is also covered by actually calling encode()
via the expr API / dataframe tests which is done.
pub mod expr_fn { | ||
use super::*; | ||
/// Return encode(arg) | ||
pub fn encode(args: Vec<Expr>) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are the new expr_fn implementations
Which issue does this PR close?
Builds on #8039
Demonstrates what #8045 might look like
Rationale for this change
This PR demonstrates what a function package API might look like by removing encoding expressions encode/decode from
BuiltInScalarFunction
enum and adding it in a separate crate (datafusion-functions
)What changes are included in this PR?
FunctionImplementation
trait and integration intoScalarUDF
to make it easier to write ScalarUDFs;datafusion-functions
crate that has the implementation ofencode
anddecode
.SessionState::new()
, similarly to the automatically registeredListingTable
sOpen Questions:
expr_fn
s encode and decode, I think we will need aExpr::ScalarFunction
call or something that can take a function by name rather than fully resolved functionaround a set of HahsMaps.... And make a way to actually modify them
Are these changes tested?
Are there any user-facing changes?