Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rework mutate.data.frame() to better implement .keep #6035

Merged
merged 4 commits into from
Oct 1, 2021

Conversation

DavisVaughan
Copy link
Member

Closes #6007
Closes #5967

I'm hoping that this is a much simpler variant of #6007 that is overall easier to understand and test.

Essentially there are 4 rules that always hold:

  • Group variables are always retained and never move
  • Modified variables are always retained and never move
  • New variables are always retained and move at the whim of .before and .after. Default is to add them to the end.
  • NULL expressions remove columns entirely

This takes care of everything except .keep, and we now have a language to refer to the different types of columns. There are two extra column types:

  • Used variables are variables from the original data used to generate other columns. The set of used variables excludes group variables and modified variables.
  • Unused variables are variables from the original data that were never touched. This again excludes group variables and modified variables.

Now for .keep:

  • "all":
    • Retain group, modified, new, used, unused.
    • Drop nothing.
  • "used":
    • Retain group, modified, new, used.
    • Drop unused.
  • "unused":
    • Retain group, modified, new, unused.
    • Drop used.
  • "none":
    • Retain group, modified, new.
    • Drop used, unused.

So the implementation is:

  • We let dplyr_col_modify() add modified and new variables to the data frame. This works nicely as modified variables overwrite without changing the location, and new variables are added to the end.
  • We let relocate() change the position of the new variables as needed.
  • We apply the .keep rules outlined above to figure out what to drop from the set of used/unused variables. Note that the column order doesn't change at all here. This is strictly about dropping columns.

An important invariant that falls out here is that .keep plays no role in the column ordering, and I think that is valuable. I think giving keep = "none" special behavior in a few places that changed column order is what made this so hard to get correct before.

@romainfrancois
Copy link
Member

Thanks, should the description here contribute to the documentation of mutate() ?

@DavisVaughan
Copy link
Member Author

@romainfrancois I've refreshed the docs for .keep and for the return value of mutate().

In particular I like that this part of .keep is now highlighted up front:

Grouping columns and columns created by ... are always kept.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transmute() shouldn't change order of variables
2 participants