4 min read

How to draw sample from a generic distribution?

In this post, I am going to show two methods to draw samples from a generic distribution.

But before we get started, we should define what do I mean generic distribution. Here is one example:

$$ f(x)= \begin{cases} 0 & \text{if \(x\) < 0} \\ c \cdot \sqrt{x} & \text{if $0 < x < 1 $} \\ 0 & \text{if \(x > 1\)} \end{cases} $$

First, let’s take a look at the probability density function (pdf):
 

Monte_Carlo
 

Obviously, this is not any type of distribution we have learned before (not Gaussian, not Poisson, not uniform etc), and therefore, we wouldn’t be able to use build-in functions to draw samples from (in R programming language, we have rnorm(), rpois(), runif()).

Despite the limitations, there are still ways to draw samples from, which I will discuss below:

 
   
   
 

Method 1: inverse transform sampling

You may have noticed that our function has a constant \(c\). In order for this curve to be a legit probability density function, we have to make sure that the area under the curve is 1. In mathematical notation, we have $$ \int_{0}^{1} c \sqrt{x} = 1 $$

Finding the integral isn’t difficult. We have $$ F = \frac{2c}{3} \cdot x^{1.5} $$

We have to make sure \(F |_{0}^{1} = 1\), therefore we solve \(c = 1.5\)

 

Plug \(c = 1.5\) into our pdf and cdf, we pdf:

$$ f(x)= \begin{cases} 0 & \text{if \(x\) < 0} \\ 1.5 \cdot \sqrt{x} & \text{if $0 < x < 1 $} \\ 0 & \text{if \(x > 1\)} \end{cases} $$

And we have cdf:

$$ cdf(x)= \begin{cases} 0 & \text{if \(x\) < 0} \\ x^{1.5} & \text{if $0 < x < 1 $} \\ 1 & \text{if \(x > 1\)} \end{cases} $$

   

The steps I described above are pretty standard and boring, but here comes to the interesting part that allows us to sample from this generic distribution.

Step1: Inverse the cdf

We first find the inverse of cumulative density function. \(x = y^{2/3}\)

Step2: Sampling

We draw sample y from a uniform distribution, then raise this sample to the power of \(2/3\), and we will get a customized sampler!

Sampler <- function(n = 1){
     y = runif(n, min = 0, max = 1)
     x = y ^(2/3)
     return(x)
}

# draw one sample 
Sampler()
## [1] 0.6835192


# draw 10000 samples
hist(Sampler(10000), probability = T, main = "Histogram, inverse transform sampling")
lines(seq(0,1, 0.01), 1.5 * sqrt(seq(0,1, 0.01)), col = "red" )

The red line represents the theoretical pdf, and it perfectly matches this histogram!

 
   
 

Method 2: accept-reject sampling.

This is a trick that allows you to draw samples without explicitly calculating \(c\). It is particularly useful if it’s difficult to normalize the probability (e.g. Bayesian statistics).

Here are the steps involved:

  • Step1: Find an easy-to-sample distribution \(g(x)\) (normal, uniform, etc) and multiply by a constant \(k\), so that \(k \times g(x)\) is always greater than our customized distribution \(f(x)\).

  • Step2: Draw a sample from \(g(x)\)

  • Step3: Calculate the probability of acceptance: \(p (accept) = \frac{f(x)}{k \cdot g(x)}\)

  • Step4: Collect samples. The bag of your collected samples will have a distribution of \(f(x)\).

Notice that you don’t need the full format of f(x). In our case, you can ignore \(c\), and set \(f(x) = \sqrt x\)

# our collecting bag
accept = c()

# draw 100000 samples from a uniform distribution
samples = runif(100000, 0, 1)

# iterate all the samples
for (sample in samples){
   accpt_prob = sqrt(sample)/(2 * 1)           # calculate the probability of acceptance
    if (rbinom(n = 1, size = 1, prob = accpt_prob) == 1){ # flip a unfair coin with probability we just calculated. If head up, we accept.
        accept = c(accept, sample)               # add the accepted samples to our bag
    }
}

# draw 10000 samples
hist(accept, probability = T, main = "Histogram, accept-reject sampling.")
lines(seq(0,1, 0.01), 1.5 * sqrt(seq(0,1, 0.01)), col = "red" )

# proportion of samples accepted
length(accept)/100000
## [1] 0.33196

Notice here, I chose the constant \(k = 2\). We are sure that \(2 \times U(0, 1)\) is always larger than our customized pdf.

Also, notice in my code, I use \(f(x) = \sqrt x\), but not \(f(x) = 1.5 \sqrt x\), although either way works.

You might find that I drew 100000 samples, but only a small proportion of samples are accepted. This is one of a big disadvantage of accept-reject sampling!

Being able to draw samples without knowing its full format is very advantageous. This accept-reject sampling is a core foundation of MCMC sampling, which I will demonstrate in the next few posts

 
   
 

Summary

In this post, we learned two ways of drawing samples form a customized distribution. Either one has its pros and cons.

Method Advantage Disadvantage
Inverse Transform Sampling: Utilizes all the samples Need to calculate cdf; Inverse cdf sometimes could be very difficult
Accept Reject Sampling: Do not require full format of pdf, easy to implement Waste samples

 
   
   

Reference:

ritvikmath: https://www.youtube.com/watch?v=OXDqjdVVePY
Ben Lambert: https://www.youtube.com/watch?v=rnBbYsysPaU