Written by Peter Rosenmai on 22 Feb 2014.
I present here R code to calculate the influence that a set of points have upon each other, where influence is a function of (1) the distance between the points and (2) the inherent influence of the points.
Consider, for example, five light bulbs with brightness given by this vector:
Now, suppose that the distance between the light bulbs (in metres) is given by this distance matrix:
This matrix tells us, for instance, that bulbs two and three are 12 metres apart. Note that the distance matrix is symmetrical about a zero diagonal. This corresponds with the ordinary notion of distance: Any point is a zero distance from itself, and the distance from point A to point B equals the distance from point B to point A. However, my code permits non-symmetric distances: If bulb two is "uphill" from bulb three, [2, 3] will be greater than [2, 3].
Okay, next thing we need is a function that gives the light (the influence) that one light bulb receives from another as a function of the brightness of the bulbs and the distance between them. Let's use the inverse square of the distance between the bulbs:
All done! Let's run the code:
I call the resulting matrix an influence matrix. It tells us, for instance, that bulb five throws four times as much light on bulb one as it does on bulb two: [1, 5] is 0.08 and [2, 5] is 0.02.
To get the total light shone on a bulb by the other bulbs, we take the row sums:
So we see that bulb four receives the most light and bulb three receives the least light.
Now, let's consider a more practical example.
I decided to write this blog piece after listening to a talk by Mastadon C in February 2014 at the Society of Data Miners in London. The speaker described in that talk a study of the uptake of new treatments by general medical practitioners (GPs). The study tested the assumption that GPs are more likely to adopt new treatments if their practices are located near other practices, particularly large ones. I found myself wondering how the required calculations were done; I then went home and wrote my InfluenceMatrix() function (see below).
Okay, so let's consider five medical practices in Camden Town, London, described by the following dataframe:
I use my GeoDistanceInMetresMatrix() function to obtain a distance matrix from the longitudes and latitudes of the practices:
One again, I'll use the inverse square to get the influence that one medical practice has on another:
To calculate the assumed influence that medical practices have upon each other, I suppose that influence is proportional to the number of doctors in a practice. So the influence acting on these five practices is:
So Fourtrees—being located very close to Queens Crescent, the second largest practice—receives the most influence.
Here's the code for the InfluenceMatrix() function used in the above examples. It calculates an influence matrix using a supplied function, a distance matrix and a vector of influences. It's short but complicated—sorry! I like my code simple, but sometimes it's better to build for speed, not comfort.