Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Modify how arrange sorts strings #7044

Closed
prubin73 opened this issue Jun 26, 2024 · 3 comments
Closed

Modify how arrange sorts strings #7044

prubin73 opened this issue Jun 26, 2024 · 3 comments

Comments

@prubin73
Copy link

When sorting a data frame/tibble based on a character column, arrange uses a different sort order than what is used by sort and by most (all?) spreadsheet programs. This creates issues when working on data coming from/going to a spreadsheet. Interestingly, use of the desc function within arrange switches the sort order to conform to sort and the spreadsheets.

# Demonstrate sorting discrepancy between `arrange` and `sort`.

# Create sample data. The second column is just to ensure that sorting does not
# convert a data frame into a vector.
df <- data.frame(Label = c("bama", "mama", "1000x", "BAnn", "10:00x"), Index = 1:5)

# Sort the rows into ascending label order using `dplyr::arrange`.
df |> dplyr::arrange(Label) |> print()
#>    Label Index
#> 1  1000x     3
#> 2 10:00x     5
#> 3   BAnn     4
#> 4   bama     1
#> 5   mama     2
# Sort the rows into ascending label order using `sort`.
df[sort(df$Label, index.return = TRUE)$ix, ] |> print()
#>    Label Index
#> 5 10:00x     5
#> 3  1000x     3
#> 1   bama     1
#> 4   BAnn     4
#> 2   mama     2
# Sort with `arrange` in "not descending" order.
df |> dplyr::arrange(-dplyr::desc(Label)) |> print()
#>    Label Index
#> 1 10:00x     5
#> 2  1000x     3
#> 3   bama     1
#> 4   BAnn     4
#> 5   mama     2
@DavisVaughan
Copy link
Member

This is intended, it uses the C locale by default. See the .locale argument
https://dplyr.tidyverse.org/reference/arrange.html

You probably want to specify .locale = "en"

@prubin73
Copy link
Author

Thanks. I wondered if locale was an issue, but failed to read the fine print. (I assume it would use the operating system's default locale.) It's interesting that arrange defaults to the C locale but desc apparently does not.

@DavisVaughan
Copy link
Member

Yea that's a good point, I'll open another issue about that in particular

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants