Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Convenient geoms #135

Open
5 tasks
DominiqueMakowski opened this issue May 31, 2021 · 14 comments
Open
5 tasks

Convenient geoms #135

DominiqueMakowski opened this issue May 31, 2021 · 14 comments

Comments

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented May 31, 2021

One way of robustifying our plots and making see useful beyond being a plotting companion to the other packages would be to add some useful geoms. We have some early tests in geom_violinhalf / geom_violinpoint / geom_poolpoint, but these were done back in the days when these functionalities were not easily available and when we (I) didn't have much ggplot programming experience (it's still not my area). As a result, the current geoms are not really as flexible and robust as they should be.

Here are some examples of useful geoms:

  • Updating geom_violinhalf (or simply removing) using ggdist's geom_halfeye
  • Updating geom_violinpoint based on the above
  • Adding some sort of geom_numericdescribe or something like that that would be a combination of the raw jittered points on one side, their distribution as a half violin (or dots-density) on the other other (so far resembling a classic raincloud plot) and why not some sleek shading for the quantiles behind the points. That would be super super useful to have a one geom solution to get an elegant summary of points, that we could add as a background for pointranges related to estimated means/CI
  • geom_lightbeam as a helper for robust lighthouse plots
  • ...
@bwiernik
Copy link
Contributor

We can also consider an alias geom_raincloud. I Can take a look at the existing geoms

@bwiernik
Copy link
Contributor

The big issue with ggdist (which I love dearly) is its tidyverse dependencies. We could consider helping to reduce those if @mjskay would be interested in that sort of contribution.

@mjskay
Copy link

mjskay commented Jun 1, 2021

The big issue with ggdist (which I love dearly) is its tidyverse dependencies. We could consider helping to reduce those if @mjskay would be interested in that sort of contribution.

I'm happy to take contributions to ggdist that reduce deps. I looked at the dependencies {ggdist} has in the current development version (and what those depend on, etc) and the dependencies (and what those depend on, etc) {see} has in the current CRAN version and the difference:

> ggdist_deps = pak::pkg_deps(".")   # executed on ggdist dev version
> see_deps = pak::pkg_deps("see")
> setdiff(ggdist_deps$package, see_deps$package)
 [1] "ggdist"         "HDInterval"     "distributional" "dplyr"          "forcats"        "generics"      
 [7] "numDeriv"       "purrr"          "tidyr"          "tidyselect" 

It turns out {see} already depends on a bunch of tidyverse stuff (like vctrs, tibble, rlang, glue, etc) directly or indirectly, so there aren't that many extra deps from ggdist's dependency tree (coincidentally, both packages have exactly 31 direct or indirect dependencies). A few of the extra deps ggdist has are not easily worked around:

  • "HDInterval" is for calculating highest-density intervals. In principle this is basically one function and could be duplicated into ggdist, but I don't much see the upside of maintaining it myself.
  • "distributional" is needed for the stat_dist_... family and can't be removed. It also brings in "generics" and "numDeriv" (those aren't direct dependencies of ggdist).
  • "dplyr" is needed because several functions in ggdist support grouped tibbles, notably the point_interval() family. This also brings in "tidyselect".

That leaves "forcats", "purrr", and "tidyr", all of which I suspect could be removed with varying levels of effort if someone wanted to take a stab at it. If there's interest I'd say open an issue on the ggdist repo and I'm happy to chat :).

As an aside, if there's interest in adopting {ggdist} in {see} in some capacity or other, I'd also be happy to chat about if there are missing distributional visualization types that could be helpful. {ggdist} is intended to be quite flexible and general with respect to distribution visualization so if there's something you can't do I'd like to know about it :).

@bwiernik
Copy link
Contributor

bwiernik commented Jun 1, 2021

@mattansb It would be really cool to support analytic uncertainty distribution visualizations with bayestestR.

@mattansb
Copy link
Member

mattansb commented Jun 1, 2021

@bwiernik Like some advance version of stat_summary(fun = mean_cl_normal)?

@bwiernik
Copy link
Contributor

bwiernik commented Jun 1, 2021

Ala https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html --so add methods for the various posterior visualizations in bayestestR for frequentist/MLE models using analytic distributions

@mattansb
Copy link
Member

mattansb commented Jun 1, 2021

Are we talking about making a geom/stat? Or adding these options to the plotting methods?

@bwiernik
Copy link
Contributor

bwiernik commented Jun 1, 2021

ggdist already has done much of that work, for example, I use stat_dist_slabinterval() often in my work. I'm thinking adding something like plot.see_dist_ci() that would produce a confidence distribution visualization similar to plot.see_ci() using, for example: ggdist::stat_dist_slabinterval().

@mattansb
Copy link
Member

Agree! (Don't know why you tagged me, but I agree 😉) - what say you @DominiqueMakowski ?

@DominiqueMakowski
Copy link
Member Author

I agree

@IndrajeetPatil
Copy link
Member

Don't know why you tagged me

Because you are amazing, brah
easystats/modelbased#119 (comment)

@IndrajeetPatil
Copy link
Member

Updating geom_violinhalf (or simply removing) using ggdist's geom_halfeye

Why not use ggridges::geom_density_ridges?

We already rely on ggridges, so we don't even need to gain an additional dependency.

library(ggridges)
library(ggplot2)

ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(
    # points
    jittered_points = TRUE,
    position = position_raincloud(
      adjust_vlines = TRUE,
      width = 0.02,
      height = 0.2
    ),
    point_size = 2,
    point_alpha = 0.5,
    quantile_lines = TRUE,
    # density
    scale = 0.7,
    alpha = 0.5,
    # quantile lines
    vline_size = 1,
    vline_color = "red"
  ) +
  coord_flip()
#> Picking joint bandwidth of 0.181

Created on 2021-06-10 by the reprex package (v2.0.0)

@bwiernik
Copy link
Contributor

We could switch the dependency to ggdist. It would give us a lot more flexibility, especially for analytic distributions

@DominiqueMakowski
Copy link
Member Author

I think it's worth the change

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants