Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add useful predicates to filter based on the previous value #341

Merged
merged 4 commits into from
Nov 29, 2024

Conversation

llucax
Copy link
Contributor

@llucax llucax commented Nov 28, 2024

This PR started as a way to remove duplicates from a receiver, but then was expanded to OnlyIfPrevious, a more generic approach accepting any type of sub-predicate to compare with the previous message.

OnlyIfPrevious is used as a base to implement also a specific predicate to remove the duplicates (ChangedOnly), as this is a very common use case.

OnlyIfPrevious could be made even more generic by accepting also a function to determine what to save as the last message instead of always saving the new message. Such functionality would allow to implement a filter that ensured the received messages are monotonically increasing for example, by saving as the last message the max(old, new) instead of just the new message.

But this is left as a future improvement, as it is also not trivial because such function should be able to also receive the sentinel, which can complicate the API a bit.

@llucax llucax requested a review from a team as a code owner November 28, 2024 13:16
@llucax llucax self-assigned this Nov 28, 2024
@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests labels Nov 28, 2024
@llucax llucax added part:utilities Affects the utility receivers (`Timer`, `Event`, `FileWatcher`) and removed part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests labels Nov 28, 2024
@llucax llucax added this to the v1.4.0 milestone Nov 28, 2024
@llucax llucax added type:enhancement New feature or enhancement visitble to users part:experimental Affects the experimental package labels Nov 28, 2024
@llucax
Copy link
Contributor Author

llucax commented Nov 28, 2024

I needed this to avoid sending configuration updates if the configuration didn't change and thought it might be generally useful.

@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests labels Nov 28, 2024
@llucax llucax enabled auto-merge November 28, 2024 13:32
Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>
This directory should have been removed when the `util` package was
removed.

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>
Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>
The new `ChangedOnly` predicate is is just a special case of
`OnlyIfPrevious` that skips messages that are equal to the previous one.

This is such a common use case that it deserves its own predicate.

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>
@github-actions github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Nov 28, 2024
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great :)

@llucax llucax added this pull request to the merge queue Nov 28, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 28, 2024
@llucax llucax added this pull request to the merge queue Nov 29, 2024
Merged via the queue into frequenz-floss:v1.x.x with commit 1a91c6a Nov 29, 2024
14 checks passed
@llucax llucax deleted the predicates branch November 29, 2024 08:21
@llucax
Copy link
Contributor Author

llucax commented Nov 29, 2024

But this is left as a future improvement, as it is also not trivial because such function should be able to also receive the sentinel, which can complicate the API a bit.

@shsms
Copy link
Contributor

shsms commented Nov 29, 2024

The names for these are very confusing to me. Does OnlyIfPrevious drop things that match the given predicate or keep them? And does OnlyChanged have the same meaning for Only as OnlyIfPrevious?

I'd suggest Keep and Drop as prefixes, so DropIfPrevious, KeepChanged or DropUnchanged, etc.

I also realize that the filter method's documentation could be more explicit, e.g., drop if the predicate returns true, etc.

@llucax
Copy link
Contributor Author

llucax commented Nov 29, 2024

Yeah, this is a problem I always have with filter in general, but it is a built-in in Python, so it is a problem that doesn't go away if we change it. I'm not opposed to make it more explicit and even eventually rename filter even if goes outside Python's built-in lingo. I like the drop/keep terms (maybe skip instead of drop?). Maybe we could deprecate filter() and add a drop_if()/skip_if() and keep_if() methods to replace it.

I actually considered various options and leaned towards the ones I chose after bouncing with ChatGPT, but always trying to stay in tune with how Python's built-in filter works (which is the common way it is used in functional languages).

In functional programming, filter is a higher-order function that processes a data structure (usually a list) in some order to produce a new data structure containing exactly those elements of the original data structure for which a given predicate returns the Boolean value true.

(as a lisp guy I'm surprised is not natural for you :P)

So, filter is positive, so you get the stuff that evaluates to true, only is used in that sense, you get only what's changed (so if prev != new is True). OnlyIfPrevious is the same, you get the stuff if the given predicate evaluates to True.

I also realize that the filter method's documentation could be more explicit, e.g., drop if the predicate returns true, etc.

Yeah, no, is the other way around 😆

I'd prefer to do the current v1.4 release with this as it is, as I would like to use it already, but we can create issues for future improvements, the predicates are experimental so we can do breaking changes until we stabilize it.

@shsms
Copy link
Contributor

shsms commented Nov 29, 2024

The problem is that the filter(ChangedOnly) reads like "filter out changed only", and that threw me off.

(maybe skip instead of drop?)

I think drop is better because the filter drops messages from the stream.

Maybe we could deprecate filter() and add a drop_if()/skip_if() and keep_if() methods to replace it.

I'm not sure we should do that because the alternative names are unfamiliar and might be hard to find. But clarifying the predicate names goes a long way already.

@llucax
Copy link
Contributor Author

llucax commented Nov 29, 2024

The problem is that the filter(ChangedOnly) reads like "filter out changed only", and that threw me off.

But but this is a problem with the ambiguity of filter itself. You'll have the same issue with filter(is_equal). I do too. I always have to double check.

@llucax
Copy link
Contributor Author

llucax commented Nov 29, 2024

I'm not sure we should do that because the alternative names are unfamiliar and might be hard to find. But clarifying the predicate names goes a long way already.

For me is one or the other, once you know filter() is positive (accepts stuff that evaluates to True), then filter(ChangedOnly()) looks perfectly clear to me. I mean, we could remove the Only, that was added to make it more clear. For me the issue is with the filter() name.

For more context, here is copilot response to my ask for name options:


Ah, good point about avoiding "filter" since it would lead to redundant expressions like receiver.filter(DuplicateFilter()). Let me suggest some alternatives that would read better in that context:

  1. SkipDuplicates - Simple and reads naturally: receiver.filter(SkipDuplicates())
  2. DistinctConsecutive - Describes the behavior precisely: receiver.filter(DistinctConsecutive())
  3. ChangeDetector - Focuses on what passes through: receiver.filter(ChangeDetector())
  4. UniqueSequence - Emphasizes the sequential nature: receiver.filter(UniqueSequence())
  5. ConsecutiveChanges - Most clear about what gets through: receiver.filter(ConsecutiveChanges())
  6. DistinctOnly - Concise and clear: receiver.filter(DistinctOnly())
  7. ChangesOnly - Very readable in context: receiver.filter(ChangesOnly())

Among these, I would recommend either ChangesOnly or DistinctConsecutive because:

  1. ChangesOnly:

    • Reads extremely naturally in code: receiver.filter(ChangesOnly())
    • Makes it clear what gets through (only changes)
    • Short but descriptive
    • Follows the "Only" suffix pattern common in filters
  2. DistinctConsecutive:

    • Very precise about the behavior
    • Makes it clear it works on consecutive items only
    • Follows mathematical terminology (distinct)
    • Still reads well: receiver.filter(DistinctConsecutive())

My top choice would be ChangesOnly because it creates the most readable code while accurately describing what the class does.

Would you like me to elaborate on any of these suggestions or propose more alternatives with different naming patterns?


I decided to believe "only" was a common pattern in filters but maybe it hallucinated because googling I can't find anything about that being a pattern for filter predicates.

Asking about that:


I need to correct my previous suggestion as I can't actually find good examples of predicates or filters using the "Only" suffix pattern. Looking at some common libraries and languages:

Python's itertools and built-ins:

  • filterfalse(), dropwhile(), takewhile()
  • filter()

Rust's Iterator:

  • filter(), filter_map()
  • skip(), skip_while()
  • take(), take_while()

C++'s algorithms:

  • remove_if()
  • copy_if()
  • find_if()

Java's Stream:

  • filter()
  • takeWhile()
  • dropWhile()

Looking at these, better naming patterns would follow:

  1. Action verbs: Skip, Take, Drop, Filter
  2. Clear predicate names: Changed, Distinct, Unique
  3. Combinations with If or While: TakeWhile, DropWhile

So I should revise my suggestion - ChangedOnly is not actually following common patterns in the industry. Better alternatives would be:

  • Changed - follows predicate naming pattern
  • SkipUnchanged - follows action verb pattern
  • DistinctConsecutive - follows predicate + qualifier pattern

Would you like me to elaborate on any of these patterns with more examples from other libraries?


Still not really giving predicate names examples, but at least it confirms that both skip and drop are used and drop seems to be more popular.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
part:docs Affects the documentation part:experimental Affects the experimental package part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) part:utilities Affects the utility receivers (`Timer`, `Event`, `FileWatcher`) type:enhancement New feature or enhancement visitble to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants