-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improve support for finding duplicate rows #1028
Comments
I'll reiterate how I see a transformation like duplicates-only working in the distant future. In general, to see only duplicates, you have to perform these steps:
At the moment, we are encapsulating all of the above steps into a single monolithic transformation and calling it What I'd like to see happening in the future is that we give the user the general tools to perform above listed operations and he can compose them into the exact above sequence and get the same result. That does require/imply this pipeline-type interface that we've been talking about lately. |
This is blocked by #1065 since we'll probably be using the query builder to identify duplicate rows. |
I'm marking this as unblocked now that #1065 is completed. |
Closing this, too old, requirements are likely out of date. We can create a new issue if we need this functionality in the future. |
Problem
We have implemented a filter to identify duplicate rows. However, after implementing it, we've realized that it doesn't make sense as a filter like our other filters.
Our other filters work the same regardless of the order in which they are applied. This is not the case for "has duplicates". Depending on the other filters applied, we may end up returning different results and confusing the user. We don't face this problem with any other filter because other filters only rely on the data in a specific row. "Has duplicates" relies on all of the rows visible.
As an example, here's a table:
Order 1
Imagine the user applies filters in this order:
"Year" > 1993
,"Favorite" is TRUE
,"Title" has duplicates
Filter 1:
"Year" > 1993
Filter 2:
"Favorite" is TRUE
Filter 3:
"Title" has duplicates
0 results.
Order 2
But instead if the user applies filter in this order
"Year" > 1993
,"Title" has duplicates
and"Favorite" is TRUE
Filter 1:
"Year" > 1993
Filter 2:
"Title" has duplicates
Filter 3:
"Favorite" is TRUE
Proposed solution
A few ideas:
Design notes
Additional context
The text was updated successfully, but these errors were encountered: