Calculating Kaplan Meier Survival Curves and Their Confidence Intervals in SQL Server

I provide here a SQL Server script to calculate Kaplan Meier survival curves and their confidence intervals (plain, log and log-log) for time-to-event data.

Example 1: Customer Attrition, Ungrouped, Without Censoring

Suppose a web-application company has seen its ten customers cancel their subscriptions after 0.5, 1, 10, 10, 11, 13.5, 14, 19, 19.5 and 30 months (from the start of their respective subscriptions). Based on this data, we want to estimate the probability of a new customer remaining a customer more than, say, 12 months—and we want a confidence interval around that estimate.

We create the input table:

create table #input
(
    [Group] nvarchar(255),
    [TimeToEvent] float,
    [Event] int
);
insert into #input values('Web-App Ltd',  0.5, 1);
insert into #input values('Web-App Ltd',  1.0, 1);
insert into #input values('Web-App Ltd', 10.0, 1);
insert into #input values('Web-App Ltd', 10.0, 1);
insert into #input values('Web-App Ltd', 11.0, 1);
insert into #input values('Web-App Ltd', 13.5, 1);
insert into #input values('Web-App Ltd', 14.0, 1);
insert into #input values('Web-App Ltd', 19.0, 1);
insert into #input values('Web-App Ltd', 19.5, 1);
insert into #input values('Web-App Ltd', 30.0, 1);
Read More

Calling SPSS Modeler from R

SPSS Modeler streams can be executed from R via input files and command-line calls. It's a hacky technique, but it works. And it's an easy way to make use of Modeler's excellent Expert Modeler functionality.

First, create an example of the data file that you want Modeler to read in. For example, if you want to run Modeler on a single time series, your data file will probably be a text file comprising a date column and a value column.

Then create and save a Modeler stream that reads in that file, fits the required model and produces an output file. This is fairly easy so I won't cover it here. But this is how it might look:

An example of an SPSS Modeler stream.

Read More

Kaplan Meier Survival Curve Grapher

Beta Distribution PDF Grapher

Here's a D3-rendered graph of the probability density function (PDF) of the beta distribution. Move the sliders to change the shape parameters or the scale of the y-axis.

Installing Rpy2

Here's how I installed the rpy2 module for communicating between R and Python. I did this for R version 3.0.2, Python version 3.4.1, and rpy2 version 2.4.4 on a 64-bit machine running Windows 7.

Setting System Variables

First, I got the full pathname of my R executible by right-clicking the R icon in my Start menu and selecting Properties. It was: C:\\Program Files\\R\\R-3.0.2\\bin\\x64\\Rgui.exe. And it only took a moment of poking around to find the full pathname of my Python executible: C:\\Anaconda3\\python.exe.

Time to look at my system variables. I right-clicked on Computer in my Start menu and selected Properties; I then clicked on Advanced System Settings in the window that appeared. In the Advanced tab of the System Properties window, I clicked the Environment Variables button. The lower half of the resulting Environment Variables window showed my system variables.

The first system variable I had to deal with was Path. It didn't include the directory in which my R executable sits, so I added it: C:\\Program Files\\R\\R-3.0.2\\bin\\x64\\.

I then added an R_HOME system variable and set it to the top level directory of my R installation: C:\\Program Files\\R\\R-3.0.2\\.

And I added an R_USER system variable and set it to the directory that the rpy2 module would install into: C:\\Anaconda3\\Lib\\site-packages\\rpy2\\.


Read More

Estimating the Distance between GPS Points Accounting for Circular Error Probable (CEP)

Given two GPS points recorded as being d metres apart with circular error probable (CEP) of c1 and c2 metres respectively, the true distance between the recorded points has the distribution

D ~ simga * sqrt(C)

where:

  • C is a non-central chi-square random variable having 2 degrees of freedom and non-centrality parameter d2 / σ2
  • sigma = m * sqrt(c_1^2 + c_1^2 )
  • m = 1 / sqrt(-2 ln(0.5)) is approx. equal to 0.85

(I give a proof of this easy result below.)


Read More