Generating Random Numbers From Any Non Negative Function

Here's some R code that generates random numbers from the probability distribution described by a given non-negative function. This can be useful when running simulations or generating datasets for testing purposes.

For example, suppose you want to generate random data from a distribution that looks something like this back-of-an-envelope sketch:

A back-of-an-envelope sketch of a graph.

To do this, you:

  1. Mimic your function (the one sketched above) by creating a spline function that interpolates between a bunch of points (perhaps generated using an interactive plot).
  2. Generate a cumulative distribution function (CDF) from that function.
  3. Take the inverse of it.
  4. Run the resulting function on numbers selected from a uniform distribution.

Here's the code and the graphs it generates illustrating the above steps:

spline_function <- function(x, y){
   # Returns a spline function (made up of cubic polynomials) that interpolates the
   # points given by the x and y vectors. The function has range [min(x), max(x)].
   fn <- splinefun(x, y, method="natural")
   min_x <- min(x)
   max_x <- max(x)
   f <- function(x){
      y <- fn(x)
      y[x < min_x | max_x < x] <- NA
      return(y)
   }
   return(f)
}

cdf <- function(fn, min_x, max_x){
   # Returns a cumulative distribution function for a non-negative function over a given range.
   f <- function(x){
      y <- rep(NA, length(x))
      total_area_under_curve <- integrate(fn, min_x, max_x)$value
      apply_fn <- function(xval){integrate(fn, min_x, min(xval, max_x))$value}
      y[min_x <= x & x <= max_x] <- sapply(x[min_x <= x & x <= max_x], FUN=apply_fn) / total_area_under_curve
      y[x < min_x] <- 0
      y[x > max_x] <- 1
      return(y)
   }
   return(f)
}

inverse <- function(fn, min_x, max_x){
   # Returns the inverse of a function for a given range.
   # E.g. inverse(sin, 0, pi/2)(sin(pi/4)) equals pi/4 because 0 <= pi/4 <= pi/2
   fn_inv <- function(y){
      uniroot((function(x){fn(x) - y}), lower=min_x, upper=max_x)[1]$root
   }
   return(Vectorize(fn_inv))
}

# The points to interpolate between.
x <- c(  7,   8,   9,  9.5,   10,  11,  12,   13, 14)
y <- c(0.1, 0.2, 0.5, 0.58, 0.35, 0.3, 0.1, 0.15,  0)
xlim <- c(5, 15)

# Set parameters to draw three graphs in the same plot window.
op <- par(mfrow=c(3, 1))

# Create and plot a spline function that interpolates between the points.
fn <- spline_function(x, y)
curve(fn, xlim[1], xlim[2], n=1000, main="Spline Interpolation Function")
points(x, y)

# Turn the spline function into a cumulative distribution function and graph it.
cdf_fn <- cdf(fn, min(x), max(x))
curve(cdf_fn, xlim[1], xlim[2], n=1000, main="CDF from Spline Interpolation Function")

# Invert the CDF and use it to get some random points.
n_random_points <- 10000
cdf_inv <- inverse(cdf_fn, min(x), max(x))
random_points <- cdf_inv(runif(n_random_points))

# Use a histogram to check that the distribution of the random points is similar to
# that of the spline function created from the supplied data.
hist(random_points, xlim=xlim, breaks=50, main="Random Points from Spline Interpolation Function")

# Reset the parameters
par(op)
Three graphs showing how random numbers may be generated from a given non-negative function.