R's Flavours of Stacked Dot Plots

The humble stacked dot plot is, I think, often preferable to the histogram as a means of graphing distributions of small data sets. To gauge how closely a histogram approximates an underlying population distribution, one must take into account the number of points that the histogram is based on (the sample size). Many readers fail to do this—and all too often the sample size is not provided within the graph. However, a dot plot lets any reader make an immediate guess at how closely the graph follows the shape of the underlying distribution.

Several R functions implement stacked dot plots. But which one to use?

There's this one from the base graphics package:

stripchart(faithful$waiting, method="stack", offset=0.5, pch=1)
An R dot plot.

There's this from the BHH2 package:

require(BHH2)
dotPlot(faithful$waiting)
An R dot plot.

This is from the qualityTools package:

require(qualityTools)
dotPlot(faithful$waiting)
An R dot plot.

And, of course, ggplot2's implementation:

require(ggplot2)
ggplot(faithful, aes(x = waiting)) + geom_dotplot(binwidth = 1.5) + theme_bw()
An R dot plot.

You can indicate the mean and standard deviation:

require(BHH2)
x <- faithful$waiting
dotPlot(x, xlab="Waiting Time")
mean.x <- mean(x, na.rm=TRUE)
sd.x   <- sd(x, na.rm=TRUE)
horiz.line.y <- 0.65

# Draw the vertical lines
lines(rep(mean.x       , 2), c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(rep(mean.x - sd.x, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(rep(mean.x + sd.x, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))

# Draw the horizontal line
lines(c(mean.x - sd.x, mean.x + sd.x), rep(horiz.line.y, 2))

# Add the descriptive text
text(mean.x       , 0.75, expression(bar(x)))
text(mean.x - sd.x, 0.75, expression(paste(bar(x), " + 1s")))
text(mean.x + sd.x, 0.75, expression(paste(bar(x), " + 1s")))
An R dot plot.

Or you can overlay a boxplot:

require(BHH2)
x <- faithful$waiting
dotPlot(x, xlab="Waiting Time")
horiz.line.y <- 0.65

# Draw the vertical lines
boxplot.x <- boxplot.stats(x)$stats
for (x.pt in boxplot.x) lines(rep(x.pt, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))

# Draw the horizontal lines
lines(c(boxplot.x[1], boxplot.x[2]), rep(horiz.line.y, 2))
lines(c(boxplot.x[2], boxplot.x[4]), c(horiz.line.y-0.05, horiz.line.y-0.05))
lines(c(boxplot.x[2], boxplot.x[4]), c(horiz.line.y+0.05, horiz.line.y+0.05))
lines(c(boxplot.x[4], boxplot.x[5]), rep(horiz.line.y, 2))
An R dot plot.

It's often useful to stack dot plots on top of one another to show differences in distributions between groups. Here's how it's done with stripchart:

xlim <- c(min(faithful$waiting), max(faithful$waiting))
faithful$group <- as.factor(ifelse(faithful$eruptions < 3, "Eruptions < 3", "Eruptions >= 3"))
par(las=1, mar=c(5.1,11.0,4.1,2.1))
stripchart(waiting~group, data=faithful, method="stack", xlim=xlim, pch=1,
           main="Waiting Time by Eruptions for Old Faithful Geyser", xlab="Waiting Time")
An R dot plot.

And using BHH2:

require(BHH2)
par(mfcol=c(2,1))
xlim <- c(min(faithful$waiting), max(faithful$waiting))
dotPlot(faithful[faithful$eruptions <3,]$waiting, main="eruptions < 3", xlim=xlim, xlab="Waiting Time")
dotPlot(faithful[faithful$eruptions>=3,]$waiting, main="eruptions < 3", xlim=xlim, xlab="Waiting Time")
An R dot plot.

And for some reason you can't use the mfcol parameter to create multiple dot plots with qualityTools. So it's back to the powerful but obtuse ggplot2:

require(ggplot2)
faithful$eruptions_banding <- faithful$eruptions<3
ggplot(faithful, aes(x=waiting)) +
   geom_dotplot(binwidth=1) +
   theme_bw() +
   facet_wrap(~ eruptions_banding, ncol=1)
An R dot plot.