Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: add .pipe() to ibis #8926

Closed
1 task done
koaning opened this issue Apr 10, 2024 · 0 comments
Closed
1 task done

feat: add .pipe() to ibis #8926

koaning opened this issue Apr 10, 2024 · 0 comments
Labels
feature Features or general enhancements

Comments

@koaning
Copy link

koaning commented Apr 10, 2024

Is your feature request related to a problem?

I'm working on an ibis livestream today and while there's a bunch of stuff to like, I noticed that if I want to apply functions to tables that I gotta do that via something like:

def aggregate(tbl):
    return tbl.group_by(tbl.developer).aggregate([tbl.count().name("c")])

aggregate(t)

This works, but I worry about the elegance in the long run. In this case the function is doing a simple aggregation but what if I have a function that adds sessions and another that removes bots before I aggregate? The call might look something like this.

aggregate(remove_bots(add_session(t)))

It may even get more nesty if I have arguments to these functions as well.

aggregate(remove_bots(add_session(t, cutoff=5), limit=10)))

What I like to do instead when I write polars or pandas is to do something like this:

(
  df
    .pipe(add_session, cutoff=5)
    .pipe(remove_bots, limit=10)
    .pipe(aggregate)
)

I was able to get something similar working in ibis via:

from ibis.expr.types.relations import Table

def pipe(tbl, func, *args, **kwargs):
    return func(tbl, *args, **kwargs)

Table.pipe = pipe

This is a hacky solution by all means, but at a quick glance ... it seems to pull off the trick!

t.pipe(aggregate)

What is the motivation behind your request?

This approach should allow folks to split their concerns a bit more. I should mention that I am somewhat opinionated on this. I've given a somewhat popular Python talk about this style of writing pipelines and I also introduced the .pipe() method in polars.

It feels like something that would fit this library too and if folks don't mind ... I'll gladly have a proper stab at it.

Describe the solution you'd like

Something like this should work.

from ibis.expr.types.relations import Table

def pipe(tbl, func, *args, **kwargs):
    return func(tbl, *args, **kwargs)

Table.pipe = pipe

I'd love to hear about any edge cases, if any. I'm totally unaware of the details in this library so I'll gladly hear it if I'm overlooking something. The typing probably needs to be considered, but to me it feels like this feature may be relatively easy to add.

What version of ibis are you running?

8.0.0

What backend(s) are you using, if any?

I tried this on DuckDB and Pandas.

Code of Conduct

  • I agree to follow this project's Code of Conduct
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant