Calculating Influence Matrices

I present here R code to calculate the influence that a set of points have upon each other, where influence is a function of (1) the distance between the points and (2) the inherent influence of the points.

Example: Light Bulbs Shining on Light Bulbs

Consider, for example, five light bulbs with brightness given by this vector:

> brightness
[1] 3 4 2 1 2

Now, suppose that the distance between the light bulbs (in metres) is given by this distance matrix:

> distance.matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    6   11    4    5
[2,]    6    0   12    9   10
[3,]   11   12    0   14   15
[4,]    4    9   14    0    4
[5,]    5   10   15    4    0

This matrix tells us, for instance, that bulbs two and three are 12 metres apart. Note that the distance matrix is symmetrical about a zero diagonal. This corresponds with the ordinary notion of distance: Any point is a zero distance from itself, and the distance from point A to point B equals the distance from point B to point A. However, my code permits non-symmetric distances: If bulb two is "uphill" from bulb three, [2, 3] will be greater than [2, 3].

Okay, next thing we need is a function that gives the light (the influence) that one light bulb receives from another as a function of the brightness of the bulbs and the distance between them. Let's use the inverse square of the distance between the bulbs:

LightReceived <- function(brightness.of.receiver, brightness.of.giver, distance.from.receiver.to.giver){
   # The brightness.of.receiver doesn't come into the equation in this case
   return(brightness.of.giver / (distance.from.receiver.to.giver^2))
}

All done! Let's run the code:

> influence.matrix <- InfluenceMatrix(LightReceived, distance.matrix, brightness)
> influence.matrix
           [,1]       [,2]        [,3]        [,4]        [,5]
[1,]         NA 0.11111111 0.016528926 0.062500000 0.080000000
[2,] 0.08333333         NA 0.013888889 0.012345679 0.020000000
[3,] 0.02479339 0.02777778          NA 0.005102041 0.008888889
[4,] 0.18750000 0.04938272 0.010204082          NA 0.125000000
[5,] 0.12000000 0.04000000 0.008888889 0.062500000          NA

I call the resulting matrix an influence matrix. It tells us, for instance, that bulb five throws four times as much light on bulb one as it does on bulb two: [1, 5] is 0.08 and [2, 5] is 0.02.

To get the total light shone on a bulb by the other bulbs, we take the row sums:

> rowSums(influence.matrix, na.rm=TRUE)
[1] 0.2701400 0.1295679 0.0665621 0.3720868 0.2313889

So we see that bulb four receives the most light and bulb three receives the least light.

Now, let's consider a more practical example.

Example: Doctors Influencing Doctors

I decided to write this blog piece after listening to a talk by Mastadon C in February 2014 at the Society of Data Miners in London. The speaker described in that talk a study of the uptake of new treatments by general medical practitioners (GPs). The study tested the assumption that GPs are more likely to adopt new treatments if their practices are located near other practices, particularly large ones. I found myself wondering how the required calculations were done; I then went home and wrote my InfluenceMatrix() function (see below).

Okay, so let's consider five medical practices in Camden Town, London, described by the following dataframe:

> medical.practices
                  name      lat       lon doctors
1 Adelaide Med. Centre 51.54564 -0.161203       8
2      Queens Crescent 51.55151 -0.151676       9
3   Prince Of Wales Rd 51.54788 -0.148071      11
4            Fourtrees 51.54954 -0.153221       2
5             Ampthill 51.53625 -0.137257       5

I use my GeoDistanceInMetresMatrix() function to obtain a distance matrix from the longitudes and latitudes of the practices:

> distance.matrix.in.km <- GeoDistanceInMetresMatrix(medical.practices) / 1000
> distance.matrix.in.km
                     Adelaide Med. Centre Queens Crescent Prince Of Wales Rd Fourtrees Ampthill
Adelaide Med. Centre            0.0000000       0.9291940          0.9444937 0.7032550 1.962810
Queens Crescent                 0.9291940       0.0000000          0.4749221 0.2443764 1.971147
Prince Of Wales Rd              0.9444937       0.4749221          0.0000000 0.4018999 1.496379
Fourtrees                       0.7032550       0.2443764          0.4018999 0.0000000 1.847585
Ampthill                        1.9628102       1.9711466          1.4963786 1.8475850 0.000000

One again, I'll use the inverse square to get the influence that one medical practice has on another:

InfluenceReceived <- function(influence.of.receiver, influence.of.giver, distance.from.receiver.to.giver){
   # The influence.of.receiver doesn't come into the equation in this case
   return(influence.of.giver / (distance.from.receiver.to.giver^2))
}

To calculate the assumed influence that medical practices have upon each other, I suppose that influence is proportional to the number of doctors in a practice. So the influence acting on these five practices is:

> influence.acting.on <- 10000 * # Rescale for ease of reading the results
+                        rowSums(InfluenceMatrix(InfluenceReceived,
+                                                distance.matrix,
+                                                medical.practices$doctors),
+                                na.rm=TRUE)
> setNames(influence.acting.on, medical.practices$name)
Adelaide Med. Centre     Queens Crescent  Prince Of Wales Rd           Fourtrees            Ampthill
         0.28096535          0.92811717          0.63485299          2.36445719          0.09891337

So Fourtrees—being located very close to Queens Crescent, the second largest practice—receives the most influence.

Calculating the Influence Matrix

Here's the code for the InfluenceMatrix() function used in the above examples. It calculates an influence matrix using a supplied function, a distance matrix and a vector of influences. It's short but complicated—sorry! I like my code simple, but sometimes it's better to build for speed, not comfort.

InfluenceMatrix <- function(influence.received.fn, distance.matrix, influences){
   # Calculates an influence matrix in which the value at row i and column j is the influence of
   # a j-th thing on an i-th thing.
   #
   # influence.received.fn:
   # A function that takes three vectors -- influence.of.receiver, influence.of.giver,
   # distance.from.receiver.to.giver -- of equal length and returns a vector of that same
   # length that gives the influence of the giver on the receiver.
   #
   # distance.matrix:
   # A square matrix in which the value at row i and column j is the distance from thing j to
   # thing i. In most cases, this matrix will be symmetrical about the diagonal with a zero
   # diagonal. However, an asymmetrical matrix may be used to incorporate "uphill" and
   # "downhill" distances.
   #
   # influences:
   # A vector giving the influence of the things under consideration. The i-th element of
   # this vector is the influence of the i-th thing.
   margin <- 1:nrow(distance.matrix)
   influence.matrix <- outer(margin, margin, function(irow, icol){influence.received.fn(influences[irow], influences[icol], mapply(function(i.row, i.col){distance.matrix[i.row, i.col]}, irow, icol))})
   diag(influence.matrix) <- NA
   return(influence.matrix)
}