## Generating Random Numbers From Any Non Negative Function

Here's some R code that generates random numbers from the probability distribution described by a given non-negative function. This can be useful when running simulations or generating datasets for testing purposes.

For example, suppose you want to generate random data from a distribution that looks something like this back-of-an-envelope sketch: To do this, you:

1. Mimic your function (the one sketched above) by creating a spline function that interpolates between a bunch of points (perhaps generated using an interactive plot).
2. Generate a cumulative distribution function (CDF) from that function.
3. Take the inverse of it.
4. Run the resulting function on numbers selected from a uniform distribution.

Here's the code and the graphs it generates illustrating the above steps:

spline_function <- function(x, y){
# Returns a spline function (made up of cubic polynomials) that interpolates the
# points given by the x and y vectors. The function has range [min(x), max(x)].
fn <- splinefun(x, y, method="natural")
min_x <- min(x)
max_x <- max(x)
f <- function(x){
y <- fn(x)
y[x < min_x | max_x < x] <- NA
return(y)
}
return(f)
}

cdf <- function(fn, min_x, max_x){
# Returns a cumulative distribution function for a non-negative function over a given range.
f <- function(x){
y <- rep(NA, length(x))
total_area_under_curve <- integrate(fn, min_x, max_x)\$value
apply_fn <- function(xval){integrate(fn, min_x, min(xval, max_x))\$value}
y[min_x <= x & x <= max_x] <- sapply(x[min_x <= x & x <= max_x], FUN=apply_fn) / total_area_under_curve
y[x < min_x] <- 0
y[x > max_x] <- 1
return(y)
}
return(f)
}

inverse <- function(fn, min_x, max_x){
# Returns the inverse of a function for a given range.
# E.g. inverse(sin, 0, pi/2)(sin(pi/4)) equals pi/4 because 0 <= pi/4 <= pi/2
fn_inv <- function(y){
uniroot((function(x){fn(x) - y}), lower=min_x, upper=max_x)\$root
}
return(Vectorize(fn_inv))
}

# The points to interpolate between.
x <- c(  7,   8,   9,  9.5,   10,  11,  12,   13, 14)
y <- c(0.1, 0.2, 0.5, 0.58, 0.35, 0.3, 0.1, 0.15,  0)
xlim <- c(5, 15)

# Set parameters to draw three graphs in the same plot window.
op <- par(mfrow=c(3, 1))

# Create and plot a spline function that interpolates between the points.
fn <- spline_function(x, y)
curve(fn, xlim, xlim, n=1000, main="Spline Interpolation Function")
points(x, y)

# Turn the spline function into a cumulative distribution function and graph it.
cdf_fn <- cdf(fn, min(x), max(x))
curve(cdf_fn, xlim, xlim, n=1000, main="CDF from Spline Interpolation Function")

# Invert the CDF and use it to get some random points.
n_random_points <- 10000
cdf_inv <- inverse(cdf_fn, min(x), max(x))
random_points <- cdf_inv(runif(n_random_points))

# Use a histogram to check that the distribution of the random points is similar to
# that of the spline function created from the supplied data.
hist(random_points, xlim=xlim, breaks=50, main="Random Points from Spline Interpolation Function")

# Reset the parameters
par(op) 