-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bug or inconsistent or illogical processing of "suffix" in consecutive joins. #6553
Comments
I don’t see any inconsistency — there are no non-join columns with the same name when joining a table with columns A, B1, and B2 and a table with columns A and B. |
fwiw, this is how I'd get your expected result using the method described in #5700. You might also look at that issue for an explanation of how to do this in a more standardized / repeatable way with the {dm} package library(dplyr, warn.conflicts = FALSE)
df1 <- tibble(A = 1:3, B = letters[1:3])
df2 <- tibble(A = 1:3, B = letters[4:6])
df3 <- tibble(A = 1:3, B = letters[7:9])
df4 <- tibble(A = 1:3, B = letters[10:12])
df1 %>%
inner_join(df2, by = "A", suffix = c("", "2")) %>%
inner_join(df3, by = "A", suffix = c("", "3")) %>%
inner_join(df4, by = "A", suffix = c("1", "4"))
#> # A tibble: 3 × 5
#> A B1 B2 B3 B4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 a d g j
#> 2 2 b e h k
#> 3 3 c f i l Created on 2022-11-19 with reprex v2.0.2 |
@hadley -- thank you for pointing out to focus on the matched columns. That did lead me to two solutions that feel consistent and intuitive to me.
|
When joining tibbles with columns that have the same name but are not join columns, the
suffix
argument allows you to provide a scheme to rename the columns from the tibbles. However, when you perform consecutive joins there is an inconsistent behavior. The first join correctly applies the suffixes. The second does not. All subsequent ones appear to.I was especially surprised that it wasn't just with piped joined. Saving the intermediate joins to variables gives the same effect.
My expected output from the
reprex
below is:There may be a "reason" why
dplyr
behaves this way. But it is very confusing and, more importantly, would require an elaborate workaround to achieve the desired result.Created on 2022-11-18 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: