Skip to content
This repository has been archived by the owner on Jun 29, 2021. It is now read-only.

StatsMakie's grammar and future #109

Open
sethaxen opened this issue Mar 6, 2020 · 10 comments
Open

StatsMakie's grammar and future #109

sethaxen opened this issue Mar 6, 2020 · 10 comments
Labels

Comments

@sethaxen
Copy link
Member

sethaxen commented Mar 6, 2020

This is a large super-issue that I hope will drive some discussion and split into smaller issues. Perhaps this is a project/discussion for vizcon2 (I will not be in attendance).

I would really like to see StatsMakie as a first-class tool for interactively visualizing outputs of probabilistic programming languages like Turing, Gen, Soss, and Stan (explicitly, random draws from high-dimensional probability distributions, usually with a shape like (nchains, ndraws, nvariables)).
Due to Makie's interactivity, it could be a prime candidate for building an interactive dashboard for visualizing draws, not just after the analysis has completed, but in an on-line fashion.

To proceed, it would be helpful to formalize/document the grammar of graphics design for StatsMakie.
This would aid in determining at which levels different features should be implemented.

Some useful references for related GoG's are

  • Wilkinson's Grammar of Graphics
  • Hadley Wickham's thesis, in particular Chapter 3 on layered GoG, which explains the design of ggplot2 in R. See also the succint explanation here.
  • This paper on a probabilistic grammar of graphics. I haven't read it very closely, but it uses current research in interpretibility of visualizations to propose changes to a ggplot2 style GoG so probabilities are communicated more accurately. This GoG's major changes happen on the data/aesthetic level, annotating groups with probabilistic information (see Fig 5).

I personally like ggplot2's GoG, augmented by tidybayes.

In ggplot2's GoG, the plots contain:

  1. a default dataset and set of mappings from variables to aesthetics,
  2. one or more layers, with each layer having each of the following:
    • one geometric object
    • one statistical transformation
    • one position adjustment
    • optionally, one dataset and set of aesthetic mappings,
  3. one scale for each aesthetic mapping used,
  4. a coordinate system,
  5. the facet specification.

tidybayes is a package that augments ggplot2 to aid in plotting outputs of PPLs. It operates at the level of mappings, statistical transformations, and geoms.
The main thing it adds is the ability to flatten multidimensional draws to something like a dataframe that can then be understood by a GoG (i.e. "tidy data").
Many of the additional stats/geoms are drop-in replacements for some in ggplot2, often changing defaults, combining geoms, or applying some principles from the probabilistic GoG.
Having some of these primitives for visualizing uncertainty in StatsMakie would be great:
slabinterval_family (1)
With these primitives, you can build an impressive array of statistical plots.

I'm sure the devs here have given a lot of thought to the future of StatsMakie.
It would be useful to know how StatsMakie's GoG compares to that above (is it missing elements that are needed? does it have additional elements?, etc)
For example, with Data and symbols, we currently have a convenient syntax to extract columns from a dataframe, but can this be extended?
E.g. what if instead of a dataframe, we have some other structure that uses getindex to access values? What if the data is in an iterator, and we'd like to on-line update plots? Can a useful API for grouping, etc be defined that could be overloaded by individual packages?
Lastly, what is the intended relationship between StatsPlots and StatsMakie? With the coming ability of Makie to consume Plots recipes, should recipes whenever possible be implemented without respect to either package? And should then this GoG be StatsMakie-specific or in some more generic package?

cc-ing @cscherrer, @cpfiffer, @trappmartin

@SimonDanisch
Copy link
Member

Love to get this started :) Can you share the code that generated the graphic?

@sethaxen
Copy link
Member Author

sethaxen commented Mar 7, 2020

The graphic was taken from the tidybayes package, I'm not certain how it specifically was generated: http://mjskay.github.io/tidybayes/index.html

@SimonDanisch
Copy link
Member

Hm, well, if you have things you like specifically, it'd be really nice to collect syntax examples ;)

@sethaxen
Copy link
Member Author

sethaxen commented Mar 7, 2020

I'm happy to provide examples of plots in other packages that I think are useful and the syntax from that package for generating them. Is that what you mean?

@SimonDanisch
Copy link
Member

Yes that'd be amazing! Best snippet + plus output!

@mkborregaard
Copy link
Member

I think of this like a couple of separate issues, as you mention yourself. StatsMakie (and StatsPlots) have suffered a bit from scope creep, and now is a package with several different capabilities. IMHO the gog part could be separate.

Anyway the gog part started IIRC as a follow-up at the last vizcon of repeated requests from the user base (of Plots) for having a ggplot2-like syntax, so we had a presentation of ggplot2 and started discussing whether we could make a gog syntax that was more julian and better suited than ggplot2. There were these concerns:

  1. GGPlot2 is based on DataFrames, which is how you always have your data in R. Julia, on the other hand, is built around user types; the very thing that make Plots and Makie stand out is that you can build recipes that dispatch on user types.
  2. GGPlot2 is a completely separate syntax from R plotting. You can' mix and match with other plotting calls - which has lead to an explosion of "gg-" packages, that implement existing functionality in GOG logic.

@piever beautifully (IMHO) resolved this with the StatsMakie design, which implements the logic of grammar-of-graphics, modularises with other plotting commands, and thinks out of the box with respect to ggplot2. (BTW Julia has a package with almost 100% the same syntax as ggplot2 - Gadfly).

The primitives you're showing examples there are AFAICS just what's called PlotTypes in Makie - some should already exist here (density, violin, cdf, histogram), the black thing is just a particular version of a boxplot that can be overplotted when desired, - and you've got a nice PR for dotplot :-).

The idea of StatsPlots was simply to provide recipes for the packages in the JuliaStats ecosystem. The rationale for this is pragmatic - the maintainers of JuliaStats consistently rejected PRs adding Plots recipes, so in the end we put them in a package that would take deps on all JuliaStats packages and implement the recipes there. I'd be very open to dropping the Plots dep of StatsPlots so that the recipes in there could be used with Makie out of the box, if we are able to make that happen. Coincidentally the author of StatsMakie (@piever ) is also the primary maintainer of StatsPlots :-)

@mkborregaard
Copy link
Member

mkborregaard commented Mar 7, 2020

@sethaxen you can get a good impression of the considerations behind the design here by reading the discussion on this PR: #7 and this previous issue JuliaPlots/Plots.jl#1530

@piever
Copy link
Member

piever commented Mar 9, 2020

Thanks for bringing this up so thoroughly! This will make for a very interesting discussion at vizcon.

My personal view is that there are somehow two very distinct way to find a nicer syntax for complex plots. There is the GoG way, and then there is the Plots "recipe" way (handle combinations of complex types recursively, until you get to simpler types that you know how to plot).

The question explored at vizcon 2018 was the following. Is it enough to come up with a few "well chosen" custom types, so that the "recipe" approach can actually also offer a grammar of graphics syntax?

The result generated the following "recipes":

  • If you encounter a Data argument, replace symbols by corresponding columns.
  • If you encounter a Group argument, split and style the data according to the relevant variables.
  • If you encounter a Style argument, style the data without splitting.
  • If you encounter an Analysis or Function, apply it to the other arguments and continue with the pipeline.

There are still big TODOs (we do not have scales, layout support, automated labels and legends, confidence intervals), but a lot of the functionality of GoG is there. The main plus, in my mind, is that you can use any function as a "Statistics", as long as it returns something for which there is a recipe (as the pipeline will continue).

I believe it is mostly straightforward to make things overloadable. For now, Data fully supports anything that implements the Tables interface, so the methods to overload are Tables.getcolumn and Tables.column_names.

The only non-trivial part of the implementation so far is Group. This is implemented using functionality from StructArrays. I can imagine providing a select(t, cols; by) that would do column selection and grouping, and different packages can overload it for different tables types. On the other hand, this type of API probably also belongs to some Tables.jl / TableOperations.jl like interface package.

@sethaxen
Copy link
Member Author

Thanks to you both for the explanations and links to previous discussions. These are very helpful.

I think the priorities I have are that it would ideally be straightforward for a user to alter the pipeline through overloading and dispatch. For example, as in #107, for some plots/keyword arguments, one may want to do some processing after symbols are converted to columns but before splitting into groups.

Or if a user wants to extend the GoG, adding a new step to the pipeline, it should be possible to do that. (e.g. to implement the probabilistic grammar of graphics above, one might want something like Condition that has slightly different behavior from Group)

@piever
Copy link
Member

piever commented Mar 12, 2020

I've opened MakieOrg/AlgebraOfGraphics.jl#2 to discuss a possible standalone grammar of graphics design.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants