R's Flavours of Stacked Dot Plots
Written by Peter Rosenmai on 25 Nov 2013. Last revised 13 Jan 2014.
The humble stacked dot plot is, I think, often preferable to the histogram as a means of graphing distributions
of small data sets. To gauge how closely a histogram approximates an underlying population distribution, one must
take into account the number of points that the histogram is based on (the sample size). Many readers fail to do
this—and all too often the sample size is not provided within the graph. However, a dot plot lets any reader
make an immediate guess at how closely the graph follows the shape of the underlying distribution.
Several R functions implement stacked dot plots. But which one to use?
There's this one from the base graphics package:
stripchart(faithful$waiting, method=
"stack", offset=0.5, pch=1)
There's this from the BHH2 package:
require(BHH2)
dotPlot(faithful$waiting)
This is from the qualityTools package:
require(qualityTools)
dotPlot(faithful$waiting)
And, of course, ggplot2's implementation:
require(ggplot2)
ggplot(faithful,
aes(x = waiting)) +
geom_dotplot(binwidth = 1.5) +
theme_bw()
You can indicate the mean and standard deviation:
require(BHH2)
x <- faithful$waiting
dotPlot(x, xlab=
"Waiting Time")
mean.x <-
mean(x, na.rm=TRUE)
sd.x <-
sd(x, na.rm=TRUE)
horiz.line.y <- 0.65
lines(
rep(mean.x , 2),
c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(
rep(mean.x - sd.x, 2),
c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(
rep(mean.x + sd.x, 2),
c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(
c(mean.x - sd.x, mean.x + sd.x),
rep(horiz.line.y, 2))
text(mean.x , 0.75,
expression(
bar(x)))
text(mean.x - sd.x, 0.75,
expression(
paste(
bar(x),
" + 1s")))
text(mean.x + sd.x, 0.75,
expression(
paste(
bar(x),
" + 1s")))
Or you can overlay a boxplot:
require(BHH2)
x <- faithful$waiting
dotPlot(x, xlab=
"Waiting Time")
horiz.line.y <- 0.65
boxplot.x <-
boxplot.stats(x)$stats
for (x.pt in boxplot.x)
lines(
rep(x.pt, 2),
c(horiz.line.y-0.05, horiz.line.y+0.05))
lines(
c(boxplot.x[1], boxplot.x[2]),
rep(horiz.line.y, 2))
lines(
c(boxplot.x[2], boxplot.x[4]),
c(horiz.line.y-0.05, horiz.line.y-0.05))
lines(
c(boxplot.x[2], boxplot.x[4]),
c(horiz.line.y+0.05, horiz.line.y+0.05))
lines(
c(boxplot.x[4], boxplot.x[5]),
rep(horiz.line.y, 2))
It's often useful to stack dot plots on top of one another to show differences in distributions between groups. Here's how it's done with stripchart:
xlim <-
c(
min(faithful$waiting),
max(faithful$waiting))
faithful$group <- as.factor(
ifelse(faithful$eruptions < 3,
"Eruptions < 3",
"Eruptions >= 3"))
par(las=1, mar=
c(5.1,11.0,4.1,2.1))
stripchart(waiting~group, data=faithful, method=
"stack", xlim=xlim, pch=1,
main=
"Waiting Time by Eruptions for Old Faithful Geyser", xlab=
"Waiting Time")
And using BHH2:
require(BHH2)
par(mfcol=
c(2,1))
xlim <-
c(
min(faithful$waiting),
max(faithful$waiting))
dotPlot(faithful[faithful$eruptions <3,]$waiting, main=
"eruptions < 3", xlim=xlim, xlab=
"Waiting Time")
dotPlot(faithful[faithful$eruptions>=3,]$waiting, main=
"eruptions < 3", xlim=xlim, xlab=
"Waiting Time")
And for some reason you can't use the mfcol parameter to create multiple dot plots with qualityTools. So it's back to the powerful but obtuse ggplot2:
require(ggplot2)
faithful$eruptions_banding <- faithful$eruptions<3
ggplot(faithful,
aes(x=waiting)) +
geom_dotplot(binwidth=1) +
theme_bw() +
facet_wrap(~ eruptions_banding, ncol=1)