## R's Flavours of Stacked Dot Plots

The humble stacked dot plot is, I think, often preferable to the histogram as a means of graphing distributions of small data sets. To gauge how closely a histogram approximates an underlying population distribution, one must take into account the number of points that the histogram is based on (the sample size). Many readers fail to do this—and all too often the sample size is not provided within the graph. However, a dot plot lets any reader make an immediate guess at how closely the graph follows the shape of the underlying distribution.

Several R functions implement stacked dot plots. But which one to use?

There's this one from the base graphics package:

stripchart(faithful\$waiting, method="stack", offset=0.5, pch=1) There's this from the BHH2 package:

require(BHH2)
dotPlot(faithful\$waiting) This is from the qualityTools package:

require(qualityTools)
dotPlot(faithful\$waiting) And, of course, ggplot2's implementation:

require(ggplot2)
ggplot(faithful, aes(x = waiting)) + geom_dotplot(binwidth = 1.5) + theme_bw() You can indicate the mean and standard deviation:

require(BHH2)
x <- faithful\$waiting
dotPlot(x, xlab="Waiting Time")
mean.x <- mean(x, na.rm=TRUE)
sd.x   <- sd(x, na.rm=TRUE)
horiz.line.y <- 0.65

# Draw the vertical lines
lines(rep(mean.x       , 2), c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(rep(mean.x - sd.x, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(rep(mean.x + sd.x, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))

# Draw the horizontal line
lines(c(mean.x - sd.x, mean.x + sd.x), rep(horiz.line.y, 2))

text(mean.x       , 0.75, expression(bar(x)))
text(mean.x - sd.x, 0.75, expression(paste(bar(x), " + 1s")))
text(mean.x + sd.x, 0.75, expression(paste(bar(x), " + 1s"))) Or you can overlay a boxplot:

require(BHH2)
x <- faithful\$waiting
dotPlot(x, xlab="Waiting Time")
horiz.line.y <- 0.65

# Draw the vertical lines
boxplot.x <- boxplot.stats(x)\$stats
for (x.pt in boxplot.x) lines(rep(x.pt, 2), c(horiz.line.y-0.05, horiz.line.y+0.05))

# Draw the horizontal lines
lines(c(boxplot.x, boxplot.x), rep(horiz.line.y, 2))
lines(c(boxplot.x, boxplot.x), c(horiz.line.y-0.05, horiz.line.y-0.05))
lines(c(boxplot.x, boxplot.x), c(horiz.line.y+0.05, horiz.line.y+0.05))
lines(c(boxplot.x, boxplot.x), rep(horiz.line.y, 2)) It's often useful to stack dot plots on top of one another to show differences in distributions between groups. Here's how it's done with stripchart:

xlim <- c(min(faithful\$waiting), max(faithful\$waiting))
faithful\$group <- as.factor(ifelse(faithful\$eruptions < 3, "Eruptions < 3", "Eruptions >= 3"))
par(las=1, mar=c(5.1,11.0,4.1,2.1))
stripchart(waiting~group, data=faithful, method="stack", xlim=xlim, pch=1,
main="Waiting Time by Eruptions for Old Faithful Geyser", xlab="Waiting Time") And using BHH2:

require(BHH2)
par(mfcol=c(2,1))
xlim <- c(min(faithful\$waiting), max(faithful\$waiting))
dotPlot(faithful[faithful\$eruptions <3,]\$waiting, main="eruptions < 3", xlim=xlim, xlab="Waiting Time")
dotPlot(faithful[faithful\$eruptions>=3,]\$waiting, main="eruptions < 3", xlim=xlim, xlab="Waiting Time") And for some reason you can't use the mfcol parameter to create multiple dot plots with qualityTools. So it's back to the powerful but obtuse ggplot2:

require(ggplot2)
faithful\$eruptions_banding <- faithful\$eruptions<3
ggplot(faithful, aes(x=waiting)) +
geom_dotplot(binwidth=1) +
theme_bw() +
facet_wrap(~ eruptions_banding, ncol=1) 