ggD3

grammar of graphics in D3

Why the grammar of graphics in D3?

It needs to be done. As I get better at using D3, I sometimes think that it is as general as it can and should be. Of course, hours later when I sort of have what I want, I realize that development has overshadowed whatever statistical question I was thinking of. Panning, zooming, tooltips and brushing are valuable features that should be available in exploratory data analysis. Faceting and interactivity can go a long way toward eliminating silly dual x and y axes plots that are often showcased as an advanced feature of many packages (with apologies to fans of them and authors of packages that include them).

I first got into R, like many people, by learning Hadley Wickham's ggplot2. Its goal of shortening the effort between brain and picture is laudable and should be replicated in D3. Gigamonkey has started one, but it looks like he's got more pressing things to do. I've been casually writing and learning javascript for a couple of years, but have no formal CS pedigree and have never embarked on something this big. I'd like it to work well, eventually. After starting and restarting twice, I now have something that works and appears to be extendable. If you have advice on architecture, features, or can point out something blatantly dumb that I'm doing, please let me know. In the meantime, I plan to build out the following features and would love some collaborators.

Facets

I find it the most useful feature in ggplot2 and wonder why other libraries don't start with a plan for small multiples. Fixed and free scales both work and I'd like to adopt free space as well. Eventually, you will be able to enter a reusable chart to a ggd3.plot() object and have it faceted.

Composable

So far this is working okay. A chart has a layers method that appends a layer to an array of layers. Erase them by passing an empty array. Layers have their own aesthetic maps that inherit undefined aesthetics from the chart object. An entry in the array can be a string of a geom name, a layer object, or a geom object.

Style

I'd like a handful of css files that can be applied at the chart level to easily change the theme and allow for customization or quick overrides of defaults.

Geoms

Currently in fairly working order are point, line, histogram, density, bar, boxplot, text and abline. Hline and vline should come next without too much effort, then path, ribbon, area and error/linerange.

Todo: Axis pan and zoom

For continuous axes, a default zoom behavior would save a lot of hassle when dealing with uncertain differences in domains between facets or groups of geoms. If someone has an example of how to do such a thing on an ordinal axis, I'd like to see it.

Todo: Brushing

I'd like each geom to have a default brush behavior that exports the brush range to drive behaviors or highlight the same ranges or same data points in other facets.

Get Started

The basics

This features a gaussian density curve drawn over a histogram with 30 bins. Proportions are shown, rather than frequency counts to allow you to see the density curve. With stacked histograms, the proportion will add up to more than one. I'll have to fix that.


gaussian = ggd3.geoms.density().kernel('gaussianKernel');
proportionHist = ggd3.geoms.histogram().frequency(false);
irisHistogram = ggd3.plot()
                .facet({titleSize: [0,0]})
                .margins({left: 40, bottom: 50})
                .aes({x: "Sepal.Width", fill: "Species",
                        color:'Species'})
                .yScale({axis:{ticks: 4},
                            offset:50, label:''})
                .xScale({axis:{ticks:5}, offset:20})
                .layers([proportionHist, gaussian])
                .dtypes({"Species": ['string', 'few'],
                        "Sepal.Width": ['number', 'many', ',.2f'],
                        "Sepal.Length": ['number', 'many', ',.2f'],
                        "Petal.Width": ['number', 'many', ',.2f'],
                        "Petal.Length": ['number', 'many', ',.2f'],
                    })
                .data(iris)
                .width(600)
                .height(350);
irisHistogram.draw(d3.select('#iris'));
                    

To see the histograms side-by-side, crack open the console and type the following:


irisHistogram.layers()[0].position('dodge');
irisHistogram.yScale(null) // reset to retrain yscale
irisHistogram.yScale({axis:{ticks: 4},
                            offset:50, label:''});
irisHistogram.draw(d3.select('#iris')); 
                    

A little fancier

This example employs a facet according to whether the car is American. The use of D3's "silhouette" option on the stack layout is a bit gratuitous, but I like it for show.

I feel I've got a pretty good method for aggregating continuous variables and mapping them to aesthetics. Here the y variable is mapped to the mean and the alpha is mapped, by default, to the median. For kicks, I threw in the points geom as well, though you can't see the y-axis because that is not drawn with the "silhouette" option.


var carLayer = ggd3.layer()
                    .position('stack')
                    .stat({y: 'mean'})
                    .geom(ggd3.geoms.bar().offset('silhouette'));
var jitterLayer = ggd3.layer().position('jitter').geom('point');
var carChart = ggd3.plot()
                    .layers([carLayer, jitterLayer])
                    .width(250)
                    .facet({x:'am', nrows: 1})
                    .dtypes({"gear": ["string"], "cyl": ["string"]})
                    .data(cars)
                    .aes({x: "cyl", y: "mpg", fill: "gear", 
                         alpha: "hp"});
carChart.draw(d3.select('#cars'));
                    

And yes, there are no axis labels yet. Those'll happen in due time. Check out the examples, they're still a little rough, but they make sense. Thanks for reading. Hit me up on github.