Calculating a Distance Matrix for Geographic Points Using R

Here's an example of how to calculate a distance matrix for geographic points (expressed as decimal latitudes and longitudes) using R:

> df.cities <- data.frame(name = c("New York City", "Chicago", "Los Angeles", "Atlanta"),
+                         lat  = c(       40.75170,  41.87440,      34.05420,  33.75280),
+                         lon  = c(      -73.99420, -87.63940,    -118.24100, -84.39360))
> round(GeoDistanceInMetresMatrix(df.cities) / 1000)
              New York City Chicago Los Angeles Atlanta
New York City             0    1148        3945    1204
Chicago                1148       0        2808     945
Los Angeles            3945    2808           0    3116
Atlanta                1204     945        3116       0

For example, the above distance matrix shows that the straight-line distance—accounting for curvature of the earth—between Los Angeles and NYC is 3,945 km.

And here's the code for the GeoDistanceInMetresMatrix() function that generates the matrix:

ReplaceLowerOrUpperTriangle <- function(m, triangle.to.replace){
   # If triangle.to.replace="lower", replaces the lower triangle of a square matrix with its upper triangle.
   # If triangle.to.replace="upper", replaces the upper triangle of a square matrix with its lower triangle.

   if (nrow(m) != ncol(m)) stop("Supplied matrix must be square.")
   if      (tolower(triangle.to.replace) == "lower") tri <- lower.tri(m)
   else if (tolower(triangle.to.replace) == "upper") tri <- upper.tri(m)
   else stop("triangle.to.replace must be set to 'lower' or 'upper'.")
   m[tri] <- t(m)[tri]
   return(m)
}

GeoDistanceInMetresMatrix <- function(df.geopoints){
   # Returns a matrix (M) of distances between geographic points.
   # M[i,j] = M[j,i] = Distance between (df.geopoints$lat[i], df.geopoints$lon[i]) and
   # (df.geopoints$lat[j], df.geopoints$lon[j]).
   # The row and column names are given by df.geopoints$name.


   GeoDistanceInMetres <- function(g1, g2){
      # Returns a vector of distances. (But if g1$index > g2$index, returns zero.)
      # The 1st value in the returned vector is the distance between g1[[1]] and g2[[1]].
      # The 2nd value in the returned vector is the distance between g1[[2]] and g2[[2]]. Etc.
      # Each g1[[x]] or g2[[x]] must be a list with named elements "index", "lat" and "lon".
      # E.g. g1 <- list(list("index"=1, "lat"=12.1, "lon"=10.1), list("index"=3, "lat"=12.1, "lon"=13.2))

      DistM <- function(g1, g2){
         require("Imap")
         return(ifelse(g1$index > g2$index, 0, gdist(lat.1=g1$lat, lon.1=g1$lon, lat.2=g2$lat, lon.2=g2$lon, units="m")))
      }
      return(mapply(DistM, g1, g2))
   }

   n.geopoints <- nrow(df.geopoints)

   # The index column is used to ensure we only do calculations for the upper triangle of points
   df.geopoints$index <- 1:n.geopoints

   # Create a list of lists
   list.geopoints <- by(df.geopoints[,c("index", "lat", "lon")], 1:n.geopoints, function(x){return(list(x))})

   # Get a matrix of distances (in metres)
   mat.distances <- ReplaceLowerOrUpperTriangle(outer(list.geopoints, list.geopoints, GeoDistanceInMetres), "lower")

   # Set the row and column names
   rownames(mat.distances) <- df.geopoints$name
   colnames(mat.distances) <- df.geopoints$name

   return(mat.distances)
}

An Example: Starbucks Crowding

You can use the above code to, for instance, find those Starbucks stores in New York that have 70 or more other Starbucks stores within one km:

Starbucks stores in New York State.
> # Using locations for Starbucks stores in New York. (Downloaded on 29/01/2014 from
> # https://opendata.socrata.com/Business/All-Starbucks-Locations-in-the-US/txu4-fsic)
> head(df.starbucks.ny)
                                              name      lat       lon
1 1770 West Main Street, Suite 1400, Riverhead, NY 40.91709 -72.70946
2                   10095 Main Road, Mattituck, NY 40.98587 -72.53907
3             24 Montauk Highway, Hampton Bays, NY 40.87864 -72.52142
4                       485 CR-111, Manorville, NY 40.85249 -72.78954
5              2488 Montauk Hwy, Bridgehampton, NY 40.93760 -72.30177
6             385 Route 25 A, 14, Miller Place, NY 40.94175 -72.98710
> distance.mat.m <- GeoDistanceInMetresMatrix(df.starbucks.ny)
> stores.within.1.km.count <- colSums(x=(distance.mat.m < 1000), na.rm=TRUE) - 1
> sort(stores.within.1.km.count[which(stores.within.1.km.count > 70)], decreasing=TRUE)
       1166 Avenue of the Americas, New York, NY   76
1101 - 1109 Avenue of the Americas, New York, NY   72
       1100 Avenue of the Americas, New York, NY   71

Wow. A Starbucks within one km of 76 other Starbucks stores!