## Calculating a Distance Matrix for Geographic Points Using R

Here's an example of how to calculate a distance matrix for geographic points (expressed as decimal latitudes and longitudes) using R:

> df.cities <- data.frame(name = c("New York City", "Chicago", "Los Angeles", "Atlanta"),
+                         lat  = c(       40.75170,  41.87440,      34.05420,  33.75280),
+                         lon  = c(      -73.99420, -87.63940,    -118.24100, -84.39360))
> round(GeoDistanceInMetresMatrix(df.cities) / 1000)
New York City Chicago Los Angeles Atlanta
New York City             0    1148        3945    1204
Chicago                1148       0        2808     945
Los Angeles            3945    2808           0    3116
Atlanta                1204     945        3116       0

For example, the above distance matrix shows that the straight-line distance—accounting for curvature of the earth—between Los Angeles and NYC is 3,945 km.

And here's the code for the GeoDistanceInMetresMatrix() function that generates the matrix:

ReplaceLowerOrUpperTriangle <- function(m, triangle.to.replace){
# If triangle.to.replace="lower", replaces the lower triangle of a square matrix with its upper triangle.
# If triangle.to.replace="upper", replaces the upper triangle of a square matrix with its lower triangle.

if (nrow(m) != ncol(m)) stop("Supplied matrix must be square.")
if      (tolower(triangle.to.replace) == "lower") tri <- lower.tri(m)
else if (tolower(triangle.to.replace) == "upper") tri <- upper.tri(m)
else stop("triangle.to.replace must be set to 'lower' or 'upper'.")
m[tri] <- t(m)[tri]
return(m)
}

GeoDistanceInMetresMatrix <- function(df.geopoints){
# Returns a matrix (M) of distances between geographic points.
# M[i,j] = M[j,i] = Distance between (df.geopoints\$lat[i], df.geopoints\$lon[i]) and
# (df.geopoints\$lat[j], df.geopoints\$lon[j]).
# The row and column names are given by df.geopoints\$name.

GeoDistanceInMetres <- function(g1, g2){
# Returns a vector of distances. (But if g1\$index > g2\$index, returns zero.)
# The 1st value in the returned vector is the distance between g1[] and g2[].
# The 2nd value in the returned vector is the distance between g1[] and g2[]. Etc.
# Each g1[[x]] or g2[[x]] must be a list with named elements "index", "lat" and "lon".
# E.g. g1 <- list(list("index"=1, "lat"=12.1, "lon"=10.1), list("index"=3, "lat"=12.1, "lon"=13.2))

DistM <- function(g1, g2){
require("Imap")
return(ifelse(g1\$index > g2\$index, 0, gdist(lat.1=g1\$lat, lon.1=g1\$lon, lat.2=g2\$lat, lon.2=g2\$lon, units="m")))
}
return(mapply(DistM, g1, g2))
}

n.geopoints <- nrow(df.geopoints)

# The index column is used to ensure we only do calculations for the upper triangle of points
df.geopoints\$index <- 1:n.geopoints

# Create a list of lists
list.geopoints <- by(df.geopoints[,c("index", "lat", "lon")], 1:n.geopoints, function(x){return(list(x))})

# Get a matrix of distances (in metres)
mat.distances <- ReplaceLowerOrUpperTriangle(outer(list.geopoints, list.geopoints, GeoDistanceInMetres), "lower")

# Set the row and column names
rownames(mat.distances) <- df.geopoints\$name
colnames(mat.distances) <- df.geopoints\$name

return(mat.distances)
}

### An Example: Starbucks Crowding

You can use the above code to, for instance, find those Starbucks stores in New York that have 70 or more other Starbucks stores within one km: > # Using locations for Starbucks stores in New York. (Downloaded on 29/01/2014 from
name      lat       lon
1 1770 West Main Street, Suite 1400, Riverhead, NY 40.91709 -72.70946
2                   10095 Main Road, Mattituck, NY 40.98587 -72.53907
3             24 Montauk Highway, Hampton Bays, NY 40.87864 -72.52142
4                       485 CR-111, Manorville, NY 40.85249 -72.78954
5              2488 Montauk Hwy, Bridgehampton, NY 40.93760 -72.30177
6             385 Route 25 A, 14, Miller Place, NY 40.94175 -72.98710
> distance.mat.m <- GeoDistanceInMetresMatrix(df.starbucks.ny)
> stores.within.1.km.count <- colSums(x=(distance.mat.m < 1000), na.rm=TRUE) - 1
> sort(stores.within.1.km.count[which(stores.within.1.km.count > 70)], decreasing=TRUE)
1166 Avenue of the Americas, New York, NY   76
1101 - 1109 Avenue of the Americas, New York, NY   72
1100 Avenue of the Americas, New York, NY   71

Wow. A Starbucks within one km of 76 other Starbucks stores!