GENERATING RANDOM VARIATES

İrem Tanrıverdi
10 min readJun 27, 2022

INVERSE TRANSFORMATION METHOD

One of the basic tools required in computational statistics is to simulate random variables from specified probability distributions. To simulate drawing a random observation from a finite population, a method of generating random observations from a discrete uniform distribution is required. Therefore, a proper generator of uniform random numbers is essential. All methods of generating random variables from other probability distributions depend on a uniform random number generator.

  • We generate random number from uniform distribution with runif() function.

Sample size;

If the sample size is reasonably large, then the true probability that the event occurs should be close to the percentage of times that the event occurs in our sample. The larger the sample size is, the closer we expect our estimate to be to the true value. Estimates vary, and rare events are harder to sample, so we should always repeat your estimate a few times to see what kind of variation you obtain in your answers.

Generating random sample from unkown probability distribution

The sample() function can be used to sample from a finite population, with or without replacement.

y<- sample(-100:100, size = 10, replace = FALSE)
y
## [1] -75 -83 -63 81 7 -86 2 29 -33 -22set.seed(56789) # for reproducibility
x <-sample(-2:2, size = 50, replace = TRUE, prob = c(0.1, 0.2, 0.3, 0.2, 0.2))
table(x)
## x
## -2 -1 0 1 2
## 8 8 16 12 6
prop.table(table(x))## x
## -2 -1 0 1 2
## 0.16 0.16 0.32 0.24 0.12

After generate a sample, we should check the histogram, density and statistics (mean, variance) of the sample!!!

z <- sample(-100:100, size = 500, replace = TRUE)data_z<- as.data.frame(z) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_z, aes(x=z)) +
geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of the sample Z")


b<-ggplot(data_z, aes(x=z)) + geom_density(alpha=0.4, color="blue", fill="blue")+labs(title = "Density of the sample Z")

library(ggpubr)
ggarrange(h,b)
cat("Mean of sample Z is=", mean(z), "\n")## Mean of sample Z is= 0.09cat("Variance of sample Z is=", var(z), "\n")## Variance of sample Z is= 3333.541

Random Generators of Common Probability Distributions

  • The probability mass function (pmf) or density (pdf),
  • Cumulative distribution function (cdf),
  • Quantile function, and random generator of many commonly used probability distributions are used.

For example, we mostly use uniform distribution in this lecture. Let see how we summarize the probability function of uniform.

  • dunif(x, min = 0, max = 1, log = FALSE)
  • punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
  • qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
  • runif(n, min = 0, max = 1)

dunif gives the density, punif gives the distribution function, qunif gives the quantile function, and runif generates random deviates.

Mostly used discrete variable, binomial;

  • dbinom(x, size, prob, log = FALSE)
  • pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
  • qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
  • rbinom(n, size, prob)

Let generate sample from X ~Unif(2,5) sample of size 100

u<- runif(100, min = 2, max = 5)
data_u<- data.frame(u)
ggplot(data_u, aes(x=u)) + geom_histogram(color="blue", fill="blue", bins = 30, alpha=0.4) +labs(title = "Histogram of the sample U")

Let generate numbers from p~Binom(n=100, p=0.3) sample of size 1000.

p<- rbinom(n=1000, size=100, prob=0.3)
data_p<- as.data.frame(p)
ggplot(data_p, aes(x=p)) + geom_histogram(color="darkred", fill="red", bins = 30, alpha=0.4) +labs(title = "Histogram of the sample P")

Some Univariate Probability Functions Available in R;

There are some techniques to generate random sample (numbers) from a distribution. Methods for transformation;

– Inverse transform method

– Acceptance-rejection method

– Convolution method

– Composition method

Steps of transformation

  • Generate a random variate uniformly distributed in [0; 1] .. also called a random number.
  • Use an appropriate transformation to convert the random number to a random variate of the correct distribution

Focus on generating samples from ONE distribution only.

The first technique to be mentioned is the Inverse Transformation Method.

Inverse Transform Method

Inverse transform sampling (also known as inversion sampling, the inverse probability integral transform, the inverse transformation method, Smirnov transform, or the golden rule) is a basic method for generating sample numbers at random from any probability distribution given its cumulative distribution function.

Inverse transformation sampling takes uniform samples of a number u between 0 and 1 (standard uniform distribution), interpreted as a probability, and then returns the largest number x from the domain of the distribution P(X) such that P(∞ < X < x) ≤ u

We randomly choose a ratio of the area under the curve and rotate the number in the area so that exactly that ratio of the area is to the left of that number. Intuitively, we are unlikely to pick a number from the far end of the tails because there is little space in the tails to require choosing a number very close to zero or one.

Computationally, this method involves calculating the quantitative function of the distribution — in other words, calculating the cumulative distribution function (CDF) of the distribution, and then inverting that function.

Proposition: Let U be a Uniform (0,1) random variable for any continuous distribution function F.

uUnif(0, 1)

F(x) = u

Then,

X = F ⁻¹(U) Then, the random variable X has distribution .

Explanation

Let Fx denote the distribution function of X = F ⁻¹(U). Then,

Fx = P(Xx) = P(F ⁻¹(U) ≤ x)

Now, since F is a distribution function it follows that F(x) is a monotone increasing function of x and so the inequality ab is equivalent to the inequality F(a) ≤ F(b). Hence, from previous equation we see that

Fx = P(F ⁻¹(U) ≤ x)

Fx = P(Xx) = P(F(F ⁻¹(U)) ≤ F(x))

We know that F(F ⁻¹(U)) = U. Then,

Fx = P(Xx) = P(UF(x))

And, since U is Uniform(0,1)

Fx(x) = F(x)

The above proposition thus shows that we can generate a random variable X from the continuous distribution function F by generating a random number U and then setting

X = F ⁻¹(U)

Theorem (Probability Integral Transformation):

The inverse transform method of generating random variables applies the probability integral transformation. Define the inverse transformation

Fx ⁻ ¹= inf(x : FX(x) = u), 0 < u < 1

If U∼Uniform(0,1), then for all x∈R

P[Fx ⁻ ¹(U) ≤ x] = P[inf(t : FX(t) = U) ≤ x] = P(UFX(x))

then, P[Fx⁻ ¹ (U) ≤ x] = Fu(Fx(x)) = Fx(x) and therefore Fx ⁻ ¹(U) has the same distribution as X. Thus, to generate a random observation X, first generate a Uniform(0,1) variate u and deliver the inverse value Fx ⁻ ¹(U). The method is easy to apply, provided that Fx ⁻ ¹(U) is easy to compute. The method can be applied for generating continuous or discrete random variables.

This method can be summarized as follows:

  • Derive the expression for the inverse distribution function F ⁻¹(U).
  • Generate a uniform random number U. (by using runif function)
  • Obtain the desired X from X = F ⁻¹(U).

Why do we use uniform distribution to generate number?

  • Properties of random number generators should be fast and not memory intensive.
  • It can reproduce a given stream of random numbers.
  • It can produce several different independent “streams” of random numbers.

Continuous Case

Example 1) Generate random variables from exponential distribution (λ = 5) by using inverse transform method.

PDF of the exponential distribution with parameter λ given by;

First generate number from standard uniform and name it u. Then, define x according to the formula we wrote above.

lambda <- 5
set.seed(361)
u <- runif(n = 10000)
x <- -log(u) / lambda
head(x,20)
## [1] 0.08889541 0.17184086 0.05525627 0.13791990 0.24135235 0.08027214 0.12736651 0.08856022 0.13043548 0.20631929 0.41607879 0.06178493
## [13] 0.02935462 0.07407742 0.14880980 0.10281559 0.09212092 0.63833793 0.02678287 0.34341852

Then, we generate 100000 numbers from exponential distribution with parameter 5.

After generate a sample, we should check the histogram, density and statistics (mean, variance) of the sample!!!

data_x<- as.data.frame(x) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_x, aes(x=x)) +
geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of Exponential distribution")


b<-ggplot(data_x, aes(x=x)) + geom_density(alpha=0.4, color="blue", fill="blue")+labs(title = "Density of Exponential distribution")

library(ggpubr)
ggarrange(h,b)

Now, calculate the sample mean to compare with the expectation of exponential distribution.

cat("Mean of sample X is=", mean(x), "\n")## Mean of sample X is= 0.1996129cat("Variance of sample X is=", var(x), "\n")## Variance of sample X is= 0.0411155

Expectation of exponential distribution is 1/𝛌=1/5=0.2

Variance of exponential distribution is 1/𝛌²=1/25=0.04

Example 2) Use the inverse transformation method to simulate a random sample from Pareto(9,2) distribution.

Pdf of Pareto Distribution is;

# Set up parameters
n = 10000
a = 9
b = 2
#Generate random variables
u = runif(n)
x = b*(1-u)^(-1/a)
head(x,20)
## [1] 2.825940 2.537858 2.070218 2.090651 2.665472 2.219546 2.057625 2.118054 2.101973 2.096153 2.087118 2.379728 2.587322 2.029828
## [15] 2.096480 2.977251 2.256091 2.392884 2.270208 2.237627

Now, we generate random sample of size 10000 from Pareto distribution whose parameters are 9 and 2.

hist(x, prob = TRUE, main ="Pareto(9,2)")
y <- seq(1, 40, 0.01)
lines(y, a*b^a/y^(a+1),col="red")
summary(x)##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 2.000 2.066 2.156 2.253 2.337 5.398

Discrete Case

The inverse transform method can also be applied to discrete distributions. If X is a discrete random variable that has a probability mass function given by

P(Xᵢ = xᵢ) = pᵢ, x₀ < x₁ < …, ∑ pᵢ = 1

the inverse transformation is X = xᵢ, if F(xᵢ − 1) < U < F(xᵢ)

The following algorithm can be used;

  • Define a pmf for xᵢ, i = 1, …, k
  • Generate a uniform random number U.
  • If Up₀ deliver X = x₀
  • else if Up₀ + p₁ deliver X = x₁
  • else if Up₀ + p₁+ p₂ deliver X = x₂

Example 3) Let X be a discrete random variable that has probability mass function given by P(X = 0) = 0.3, P(X = 1) = 0.2, P(X = 2) = 0.5. Generate random variable for X by using discrete inverse transform method for 100 variates from the desired probability mass function.

n <- 1000 # sample size

x <- numeric(n) # create an empty vector to store the generated variables.

prob <- c(0.3, 0.2, 0.5) # write probability masses.

## Generate n rv’s from that distribution.
set.seed(361) # for reproducibility
for (i in 1:n){
u <- runif(1)
if(u <= prob[1]){
x[i] <- 0}
else if(u <= sum(prob[1:2])){
x[i] <- 1}
else{
x[i] <- 2}
}

# In order to confirm, we look at the relative frequency of each x.
# For this, find the proportion of each number.

x0 <- length(which(x==0))/n
x1 <- length(which(x==1))/n
x2 <- length(which(x==2))/n
est.prob <- c(x0,x1,x2)
est.prob
## [1] 0.318 0.197 0.485

Q4) Consider a balanced coin tossing example where PH = 0.6. You are tossing that balanced coin 4 times and let the random variable Y is the number of Heads occured.

A) Find the pmf of this distribution of Y and from that distribution, generate a random sample of size 1000 by using inverse transformation method.

The distribution of this random variable is simply a Binomial(n = 4, p = 0.6). PMF and CDF of this distribution are shown as;

n <- 1000 # sample size
y <- numeric(n) # create an empty vector to store the generated variables.
prob <- c(0.0256, 0.1536, 0.3456, 0.3456, 0.1296) # write probability masses.

i <- 0 # for iteration
set.seed(361)
while( i <= n ){
u <- runif(1)
y[i] <- ifelse(u <= prob[1],0,
ifelse(u <= sum(prob[1:2]),1,
ifelse(u <= sum(prob[1:3]),2,
ifelse(u <= sum(prob[1:4]),3,4))))
i <- i + 1
}
#Estimated Probabilities.
table(y)/n
## y
## 0 1 2 3 4
## 0.033 0.154 0.351 0.335 0.127

B) Compute the mean and variance of the generated sample and compare the results with theoretical expectation and variance.

#domain of y
d <- seq(0,4)
theoretical.mean = sum(prob*d)
est.mean = mean(y)
theoretical.var = sum(prob*d^2)-(sum(prob*d))^2
est.var = var(y)
vec <- round(c(theoretical.mean,est.mean,theoretical.var,est.var),4)
out <- matrix(vec,nrow=2)
rownames(out) <- c("Theoretical", "Estimated")
colnames(out) <- c("Mean","Variance")
out
## Mean Variance
## Theoretical 2.400 0.9600
## Estimated 2.369 0.9938

Advantages and Disadvantages

Disadvantages:

  1. requires a closed form expression for F(x).
  2. speed … often very slow because a number of comparisons required.

Advantages:

Inverse transform method preserves monotonicity and correlation which helps in

  1. Variance reduction methods.
  2. Generating truncated distributions.
  3. Order statistics.

--

--

İrem Tanrıverdi

Research and Teaching Assistant. MSc in Statistics. Interested in programming, machine learning and artificial inteligence.