Estimating the Distance Between GPS Points While Accounting for Circular Error Probable (CEP)

Given two GPS points recorded as being d metres apart with circular error probable (CEP) of c1 and c2 metres respectively, the true distance between the recorded points has the distribution

D ~ simga * sqrt(C)

where:

  • C is a non-central chi-square random variable having 2 degrees of freedom and non-centrality parameter d2 / σ2
  • sigma = m * sqrt(c_1^2 + c_1^2 )
  • m = 1 / sqrt(-2 ln(0.5)) is approx. equal to 0.85

(I give a proof of this easy result below.)

This yields a handful of useful functions:

sigma_cepdistance <- function(cep1, cep2){
   m <- 1 / sqrt(-2 * log(0.5))
   return(m * sqrt(cep1^2 + cep2^2))
}

ncp_cepdistance <- function(d, cep1, cep2){
   return((d / sigma_cepdistance(cep1, cep2))^2)
}

qcepdistance <- function(p, d, cep1, cep2){
   # Quantile function for D.
   sigma <- sigma_cepdistance( cep1, cep2)
   ncp   <- ncp_cepdistance(d, cep1, cep2)
   return(sigma * sqrt(qchisq(p, 2, ncp)))
}

pcepdistance <- function(q, d, cep1, cep2){
   # Distribution function for D.
   sigma <- sigma_cepdistance( cep1, cep2)
   ncp   <- ncp_cepdistance(d, cep1, cep2)
   return(pchisq((q/sigma)^2, 2, ncp))
}

rcepdistance <- function(n, d, cep1, cep2){
   # Random generation function for D.
   sigma <- sigma_cepdistance( cep1, cep2)
   ncp   <- ncp_cepdistance(d, cep1, cep2)
   return(sigma * sqrt(rchisq(n, 2, ncp)))
}

dcepdistance <- function(x, d, cep1, cep2){
   # Density function for D.
   sigma <- sigma_cepdistance( cep1, cep2)
   ncp   <- ncp_cepdistance(d, cep1, cep2)
   return(2*x/(sigma^2) * dchisq((x/sigma)^2, 2, ncp))
}

median_cepdistance <- function(d, cep1, cep2){
   # Returns the median of D.
   return(qcepdistance(0.5, d, cep1, cep2))
}

mean_cepdistance <- function(d, cep1, cep2, n_simulation_points=100000){
   # Returns the mean of D using simulation.
   return(mean(rcepdistance(n_simulation_points, d, cep1, cep2)))
}

For example, if d=10, c1=6 and c2=8 (metres), the dcepdistance() function shows that distribution of the actual distances (in metres) looks like:

x <- seq(0, 40, 0.01)
plot(x, dcepdistance(x, d=10, cep1=6, cep2=8), type='l')
Density function for the distribution D

And we can estimate the mean and median:

paste("Sample mean distance:", round(mean_cepdistance(d=10, cep1=6, cep2=8), 1))
[1] "Sample mean distance: 14.1"
paste("Sample mean distance:", round(median_cepdistance(d=10, cep1=6, cep2=8), 1))
[1] "Sample median distance: 13.5"

In this case the mean and median distances (14.1 and 13.4 metres, respectively) are quite a lot more than the distance (d=10 metres) from which the points were generated. This can happen when d is small relative to c1 and c2.

If that seems strange, imagine that d=10 metres and that c1 and c2 are 1000 metres. In this case, even though the GPS points are recorded as being only 10 metres apart, the innacuracy of the GPS devices is so high that it is very likely that the distance between the actual points (the points at which the GPS data was recorded) is a lot more than 10 metres.

Here's a graph that shows how d becomes an increasingly poor estimator of the mean and median of D as sqrt(c12 + c22) increases relative to d:

median_cepdistance_for_d_equals_1 <- function(sqrt_of_sum_of_cep_squared){
   return(median_cepdistance(1, 0, sqrt_of_sum_of_cep_squared))
}

mean_cepdistance_for_d_equals_1 <- function(sqrt_of_sum_of_cep_squared){
   return(mean_cepdistance( 1, 0, sqrt_of_sum_of_cep_squared, 100000))
}

sqrt_of_sum_of_cep_squared <- seq(0, 2, 0.01)
median_cep <- sapply(sqrt_of_sum_of_cep_squared, FUN=median_cepdistance_for_d_equals_1)
mean_cep   <- sapply(sqrt_of_sum_of_cep_squared, FUN=mean_cepdistance_for_d_equals_1)
plot(x=sqrt_of_sum_of_cep_squared, y=median_cep, type="l", col="black",
     xlab="sqrt(cep1^2 + cep2^2) / d", ylab="mean / d OR median / d")
points(x=sqrt_of_sum_of_cep_squared, y=mean_cep, type="l", col="red")
legend("topleft", c("mean / d", "median / d"), col=c("red", "black"), lty=c(1, 1))
Graph that showing how d becomes an increasingly poor estimator of the mean and median of D as sqrt(c_1^2 + c_2^2) increases relative to d

For example, for d=10, c1=6 and c2=8 (metres), we have sqrt(c12 + c22)/d = 1. The above graph indicates that the median and mean of D is therefore about 13.5 metres and 14 metres respectively—which corresponds to the values of 13.5 and 14.1, respectively, that we calculated above.

All this assumes that the north-south errors and the east-west errors of the GPS devices are uncorrelated and have the same standard deviation—which is, I believe, a reasonable assumption.

Proof

Consider two bivariate-normal random variables, the covariance matrices of which are scalar multiples of the identity matrix:

Equation for distribution of B_1 and B_2

We have that the distance between a point from B1 and a point from B2 has the distribution:

D ~ sqrt(Z_x^2+Z_y^2) ~ sigma_z sqrt((Z_x/sigma_z)^2+(Z_y/sigma_z)^2)

Where

Z_x ~ N(µ_2x - µ_1x, sigma_z^2), Z_y ~ N(µ_2y - µ_1y, sigma_z^2), sigma_z^2 = sigma_1^2 + sigma_2^2

Hence

D ~ sigma_z sqrt(C)

where C has a non-central chi-square distribution with 2 degrees of freedom and non-centrality parameter

lambda = ((µ_2x - µ_1x)^2 + (µ_2y - µ_1y)^2) / (sigma_z^2) = d^2/sigma^2

where d is the Euclidean distance between the means of the two distributions.

We then use the constant

m = 1 / sqrt(-2ln(0.5))

to convert between standard deviations and CEP (Levi, Sakitt and Hobson, 1989, p. 125).

References

Levi, B., Sakitt, M. and Hobson, A. (1989). The Future of land-based strategic missiles. 1st ed. New York: American Institute of Physics.