-
Notifications
You must be signed in to change notification settings - Fork 10
StatsMakie's grammar and future #109
Comments
Love to get this started :) Can you share the code that generated the graphic? |
The graphic was taken from the tidybayes package, I'm not certain how it specifically was generated: http://mjskay.github.io/tidybayes/index.html |
Hm, well, if you have things you like specifically, it'd be really nice to collect syntax examples ;) |
I'm happy to provide examples of plots in other packages that I think are useful and the syntax from that package for generating them. Is that what you mean? |
Yes that'd be amazing! Best snippet + plus output! |
I think of this like a couple of separate issues, as you mention yourself. StatsMakie (and StatsPlots) have suffered a bit from scope creep, and now is a package with several different capabilities. IMHO the gog part could be separate. Anyway the gog part started IIRC as a follow-up at the last vizcon of repeated requests from the user base (of Plots) for having a ggplot2-like syntax, so we had a presentation of ggplot2 and started discussing whether we could make a gog syntax that was more julian and better suited than ggplot2. There were these concerns:
@piever beautifully (IMHO) resolved this with the StatsMakie design, which implements the logic of grammar-of-graphics, modularises with other plotting commands, and thinks out of the box with respect to ggplot2. (BTW Julia has a package with almost 100% the same syntax as ggplot2 - Gadfly). The primitives you're showing examples there are AFAICS just what's called The idea of StatsPlots was simply to provide recipes for the packages in the JuliaStats ecosystem. The rationale for this is pragmatic - the maintainers of JuliaStats consistently rejected PRs adding Plots recipes, so in the end we put them in a package that would take deps on all JuliaStats packages and implement the recipes there. I'd be very open to dropping the Plots dep of StatsPlots so that the recipes in there could be used with Makie out of the box, if we are able to make that happen. Coincidentally the author of StatsMakie (@piever ) is also the primary maintainer of StatsPlots :-) |
@sethaxen you can get a good impression of the considerations behind the design here by reading the discussion on this PR: #7 and this previous issue JuliaPlots/Plots.jl#1530 |
Thanks for bringing this up so thoroughly! This will make for a very interesting discussion at vizcon. My personal view is that there are somehow two very distinct way to find a nicer syntax for complex plots. There is the GoG way, and then there is the Plots "recipe" way (handle combinations of complex types recursively, until you get to simpler types that you know how to plot). The question explored at vizcon 2018 was the following. Is it enough to come up with a few "well chosen" custom types, so that the "recipe" approach can actually also offer a grammar of graphics syntax? The result generated the following "recipes":
There are still big TODOs (we do not have scales, layout support, automated labels and legends, confidence intervals), but a lot of the functionality of GoG is there. The main plus, in my mind, is that you can use any function as a "Statistics", as long as it returns something for which there is a recipe (as the pipeline will continue). I believe it is mostly straightforward to make things overloadable. For now, The only non-trivial part of the implementation so far is |
Thanks to you both for the explanations and links to previous discussions. These are very helpful. I think the priorities I have are that it would ideally be straightforward for a user to alter the pipeline through overloading and dispatch. For example, as in #107, for some plots/keyword arguments, one may want to do some processing after symbols are converted to columns but before splitting into groups. Or if a user wants to extend the GoG, adding a new step to the pipeline, it should be possible to do that. (e.g. to implement the probabilistic grammar of graphics above, one might want something like |
I've opened MakieOrg/AlgebraOfGraphics.jl#2 to discuss a possible standalone grammar of graphics design. |
This is a large super-issue that I hope will drive some discussion and split into smaller issues. Perhaps this is a project/discussion for vizcon2 (I will not be in attendance).
I would really like to see StatsMakie as a first-class tool for interactively visualizing outputs of probabilistic programming languages like Turing, Gen, Soss, and Stan (explicitly, random draws from high-dimensional probability distributions, usually with a shape like
(nchains, ndraws, nvariables)
).Due to Makie's interactivity, it could be a prime candidate for building an interactive dashboard for visualizing draws, not just after the analysis has completed, but in an on-line fashion.
To proceed, it would be helpful to formalize/document the grammar of graphics design for StatsMakie.
This would aid in determining at which levels different features should be implemented.
Some useful references for related GoG's are
I personally like ggplot2's GoG, augmented by tidybayes.
In ggplot2's GoG, the plots contain:
tidybayes is a package that augments ggplot2 to aid in plotting outputs of PPLs. It operates at the level of mappings, statistical transformations, and geoms.
The main thing it adds is the ability to flatten multidimensional draws to something like a dataframe that can then be understood by a GoG (i.e. "tidy data").
Many of the additional stats/geoms are drop-in replacements for some in ggplot2, often changing defaults, combining geoms, or applying some principles from the probabilistic GoG.
Having some of these primitives for visualizing uncertainty in StatsMakie would be great:
With these primitives, you can build an impressive array of statistical plots.
I'm sure the devs here have given a lot of thought to the future of StatsMakie.
It would be useful to know how StatsMakie's GoG compares to that above (is it missing elements that are needed? does it have additional elements?, etc)
For example, with
Data
and symbols, we currently have a convenient syntax to extract columns from a dataframe, but can this be extended?E.g. what if instead of a dataframe, we have some other structure that uses
getindex
to access values? What if the data is in an iterator, and we'd like to on-line update plots? Can a useful API for grouping, etc be defined that could be overloaded by individual packages?Lastly, what is the intended relationship between StatsPlots and StatsMakie? With the coming ability of Makie to consume Plots recipes, should recipes whenever possible be implemented without respect to either package? And should then this GoG be StatsMakie-specific or in some more generic package?
cc-ing @cscherrer, @cpfiffer, @trappmartin
The text was updated successfully, but these errors were encountered: