-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider .sort = FALSE
for summarise()
, reframe()
, and slice_sample()
#6663
Comments
I would like this feature, currently the only way I can see achieving this type of sorting is by using the deprecated version of the function
which I think is a less elegant way to write the above suggestion. A new argument |
I would like to note that this caught me by surprise today. I anticipated that Buried deep in the docs is this quote
which provides an answer to the quandary. Earlier in the same doc it says
This doesn't actually make sense to me because we're going from a Many -> 1 so it is not actually possible to keep the order of the keys. Perhaps the "first appearance" portion can be added to this part of the doc? Repro:
library(dplyr)
user <- starwars |>
select(height, species) |>
group_by(species) |>
summarise(avg_height = mean(height))
answer <- starwars |>
select(height, species) |>
summarise(avg_height = mean(height), .by = species)
waldo::compare(user, answer)
#> old vs new
#> species avg_height
#> - old[1, ] Aleena 79.0000
#> + new[1, ] Human NA
#> - old[2, ] Besalisk 198.0000
#> + new[2, ] Droid NA
#> - old[3, ] Cerean 198.0000
#> + new[3, ] Wookiee 231.0000
#> - old[4, ] Chagrian 196.0000
#> + new[4, ] Rodian 173.0000
#> - old[5, ] Clawdite 168.0000
#> + new[5, ] Hutt 175.0000
#> - old[6, ] Droid NA
#> + new[6, ] NA 175.0000
#> - old[7, ] Dug 112.0000
#> + new[7, ] Yoda's species 66.0000
#> - old[8, ] Ewok 88.0000
#> + new[8, ] Trandoshan 190.0000
#> - old[9, ] Geonosian 183.0000
#> + new[9, ] Mon Calamari 180.0000
#> - old[10, ] Gungan 208.6667
#> + new[10, ] Ewok 88.0000
#> and 28 more ...
#>
#> old$species | new$species
#> [1] "Aleena" - "Human" [1]
#> [2] "Besalisk" - "Droid" [2]
#> [3] "Cerean" - "Wookiee" [3]
#> [4] "Chagrian" - "Rodian" [4]
#> [5] "Clawdite" - "Hutt" [5]
#> [6] "Droid" - NA [6]
#> [7] "Dug" - "Yoda's species" [7]
#> [8] "Ewok" - "Trandoshan" [8]
#> [9] "Geonosian" - "Mon Calamari" [9]
#> [10] "Gungan" - "Ewok" [10]
#> ... ... ... and 28 more ...
#>
#> `old$avg_height`: 79 198 198 196 168 NA 112 88 183 209 and 28 more...
#> `new$avg_height`: NA NA 231 173 175 175 66 190 180 88 ... |
With the introduction of
.by
, we no longer sort group keys automatically. There are a whole host of good reasons for this as outlined here #5664 (comment), and I am mostly confident this is the right long term default for dplyr.However, I am empathetic to the fact that users do often like to see their summary results sorted in ascending order. Right now, our recommendation is:
This is nice because you get the full power of
arrange()
includingdesc()
and.locale
.I think we should consider a
.sort
argument like:.sort = FALSE
would be the default for reasons mentioned above.group_by()
to.by
(even though most of the time the ordering isn't important).group_by()
. If you need anything fancier, callarrange()
..sort = TRUE
errors on unorderable types like clock's year-month-weekday..data.frame
method, as opposed to the generic, because dbplyr probably won't want to enforce a sort order? Uncertain.Basically, this leaves the idea of a
groupby + summarise
operation theoretically pure (because it shouldn't require orderable keys), but also gives users a convenient way to optionally opt in to sorted results.There are 3 functions that would get this argument:
summarise()
reframe()
slice_sample()
(goes withslice()
andslice_head/tail/min/max()
should act like afilter()
not areframe()
#6662)The following would not get
.sort
because they aren't about row ordering:filter()
mutate()
slice()
andslice_min/max/head/tail()
(afterslice()
andslice_head/tail/min/max()
should act like afilter()
not areframe()
#6662 is changed)The text was updated successfully, but these errors were encountered: