TRANSFORMATION

SUMS AND MIXTURES

İrem Tanrıverdi
7 min readJun 27, 2022

Sums and Mixtures

Sums and mixtures of a random variable is a special type of transformation.

Convolution: Basically, sums of independent random variables

S= X₁+X₂+…+X𝚗 = ∑{i=1} X𝒾

For example;

  1. X ~ N(1, 5)
  2. Y ~ N(2, 3)
  3. Z = X + Y ~ N(3, 8)$

or;

  1. v ~𝛘²₂
  2. u ~𝛘²₆
  3. z = v + u ~𝛘²₈

or;

  1. w ~ Gamma(2, 4)
  2. z ~ Gamma(7, 4)
  3. s = w + z ~ Gamma(9, 4)

This are examples of convolution.

Mixtures: Mixtures can be defined as combination of continuous and discretes.

  • If the random variable X has a distribution as a weighted sum 𝐹(𝑥)=Σ𝜃ᵢ𝐹(𝑋ᵢ) for some sequence of Xᵢ’s and 𝜃ᵢ>0 such that Σ𝜃ᵢ=1, then X is a discrete mixture. The constant 𝜃ᵢ’s are called as mixing weights.
  • If the distribution of X is 𝐹(𝑥)=∫𝐹{𝑋|𝑌=𝑦}(𝑥)𝑓(𝑦)𝑑𝑦

for a family X|Y=y indexed by real numbers y and weighting function fʸ such that ∫𝑓(𝑦)𝑑𝑦 =1, then X is a continuous mixture.

  1. X ~ N(1, 5)
  2. Y ~ N(2, 3)
  3. 3. Z = 0.5X + 0.5Y

or;

  1. w ~ Gamma(2, 4)
  2. z ~Gamma(7, 4)
  3. s = 0.7w + 0.3z

Transformation Methods

Relationships between some distributions are given below;

Note that; The difference between Erlang and Gamma is that in a Gamma distribution, t (shape parameter) can be a non-integer (positive real number) but in Erlang, t is positive integer only.

Example 1)

Generate a random sample of size n = 10⁵ from Gamma(4,9) by using relation between Exponential and Gamma distributions. Try to use only runif(1) function to generate random variables.

(Hint: Summation of t random variables from Exponential(𝝀) is distributed as Gamma(t, 𝝀 ))

Also, show that your sample has the properties of Gamma(4,9).

E(X)= t/𝝀

Var(X)=t/𝝀²

Generating Random Variables from Gamma Distribution.

Let X is a random variable from Gamma Distribution(t,𝜆) and its PDF is

The inverse transform method cannot be applied in this case since there is not a closed form of solution for its inverse.

We know that the sum of t independent Exponential(𝜆) is distributed as Gamma(t,𝜆). This leads the following transformation based on t uniform random numbers.

Inverse transform of Exponential distribution is;

set.seed(361)

#parameters
n <- 10^4
t <- 4
lambda <- 9


U <- matrix(runif(n*t),nrow = t, ncol = n) #since we will take sum, we define a matrix
Screen%20Shot%202022-03-30%20at%206.20.32%20PM.png
logU <- -log(U) / lambda # inverse transform method

gamma_y <- apply(logU,2,sum) # col sums of matrix logU

hist(gamma_y, prob=TRUE, main ="Gamma Distribution (4,9)")
y = seq(0,2.5,0.01)
lines(y,dgamma(y,t,lambda),col="red",lwd = 2.5)
png
#theoratical parameters
theoratical_mean <- t/lambda
theoratical_var <- t/lambda^2

#sample statistics
est_mean <- mean(gamma_y)
est_var <- var(gamma_y)

control <- matrix(round(c(theoratical_mean,theoratical_var, est_mean,est_var),5), nrow = 2, byrow = T)


colnames(control) <- c("Mean","Variance")
rownames(control) <- c("Theoritical", "Estimated")
control

Theoritical Mean= 0.44444

Theoritical Variance= 0.04938

Estimated mean= 0.44201

Estimated variance= 0.04864

Example 2)

Generate a sample of size n=10⁵ from Pareto(1, 18) by using relation between Exponential(8), Erlang and Pareto distributions. Try to use only runif(1) function to generate random variables.

Also, show that your sample has the properties of Pareto(1,18).

Let X is a random variable from Pareto Distribution(1,18) and its PDF is

We know that the sum of t independent Exponential(8) is distributed as Erlang(t,8). This leads the following transformation based on t uniform random numbers.

Inverse transform of Exponential distribution is

Then, to generate a random sample from Erlang Distribution we can use inverse transform method for exponential distribution;

library(sads)set.seed(361)

#parameters
n <- 10^5
t <- 18
lambda <- 8

U <- matrix(runif(n*t),nrow = t, ncol = n)

exponential <- -log(U) / lambda # inverse transform method
erlang <- apply(exponential,2,sum) # col sums of matrix logU

pareto = (exponential/erlang) + 1

hist(pareto, prob=TRUE, xlim = c(0, 10),
breaks = 100,
main = "Pareto(1, 18)")
y = seq(0,10,0.01)
lines(y, dpareto(y, scale = 1, shape=t, log = FALSE),col="red",lwd = 1.5)
png

Example 3)

Generate random sample of size 10⁵ from 𝝌² distribution with 10 degrees of freedom by using the relation between Chi-Square and standard normal distribution. Check the histogram and compare estimated mean and variance with the theoritical expectation and variance of 𝝌²(v) distribution (v=10).

E(x)=v and V(x) = 2v

We can use the fact that the chi-square distribution with V degrees of freedom is the sum of V squared independent standard normal;

set.seed(361)
n <- 10^5
v <- 10


Z <- matrix(rnorm(v*n,0,1),nrow = v, ncol = n)
SquaredZ <- Z^2

X <- colSums(SquaredZ)

hist(X, prob=TRUE, main ="Chi-Square Distribution")
y = seq(0,35,0.01)
lines(y,dchisq(y,v),col="red",lwd = 2.5)
png
#theoratical parameters
theoratical_mean <- v
theoratical_var <- 2*v

#sample statistics
est_mean <- mean(X)
est_var <- var(X)

control <- matrix(round(c(theoratical_mean,theoratical_var, est_mean,est_var),5), nrow = 2, byrow = T)


colnames(control) <- c("Mean","Variance")
rownames(control) <- c("Theoritical", "Estimated")
control

Theoritical Mean = 10.00

Theoritical Variance = 20.00

Estimated Mean= 9.99888

Estimated Variance = 19.90146

Example 4)

Generate random sample of size n=10⁵ from Student-T distribution with 40 degrees of freedom by using the following transformation method. Compare the histogram with the Student-T density curve.

Hint: If 𝑍 ~N(0, 1) and V~ 𝝌²(v) are independent,

Then; T = Z / (√V/t) ~ T𝒕 has the Student-T distribution with “t” degrees of freedom.

Generate 𝝌² random variables from standard normal distribution.

set.seed(361)
n <- 10^5
v <- 40


Z1 <- rnorm(n) #standard normal dist

Z2 <- matrix(rnorm(n*v),v,n)
Z2_sqr <- Z2^2
V <- colSums(Z2_sqr) #chi-square dist with 40 dof


t <- Z1 / (sqrt(V / v))

hist(t, prob = TRUE, main = "Student-T Distribution")
y <- seq(-4,4,0.01)
lines(y, dt(y,v),col = "Red", lwd = 2.5)
png

Example 5)

Let X is a random variable from Logistic Distribution (𝜇,s) and its PDF is;

Generate random sample of size n=10⁵ from Logistic(5, 7) by using the following transformation method. Compare the histogram with the Logistic density curve.

Remember that

Inverse transform of Exponential distribution is

set.seed(361)

n<- 10^5
lambda <- 1
mu = 5
s = 7

u1 <- runif(n)
x <- -log(u1) / lambda # inverse transform method for first exponnetial variable


u2 <- runif(n)
y <- -log(u2) / lambda # inverse transform method for second exponnetial variable




logistic <- mu-(s*log(x/y))




hist(logistic, prob = TRUE, main = "Logistic(5, 7)")
a <- seq(-100,100,0.01)
lines(a, (exp(-(a-mu)/s) / (s*(1 + exp(-(a-mu)/s))^2)),col = "Red", lwd = 2.5)
png

Example 6)

Let X is a random variable from F Distribution(2, 8)

Generate random sample of size n=10⁵ from F(2, 8) by using the following transformation method. Compare the histogram with the F density curve.

Remember that

N <- 10^5
m <- 2
n <- 8


z1 <- matrix(rnorm(m*N),m,N) #standard normal matrix
z2 <- matrix(rnorm(n*N),n,N) #standard normal matrix

squared_z1 <- z1^2 #chi-square 1 dof matrix
squared_z2 <- z2^2 #chi-square 1 dof matrix

U <- colSums(squared_z1) #chi-square 2 dof
V <- colSums(squared_z2) #chi-square 8 dof

Rv.F <- (U/m)/(V/n) # F(2,8)

hist(Rv.F, prob = T, main = "F-distribution(2,8)")

a <- seq(0,50,0.01)
lines(a,df(a,m,n), col = "red", lwd = 2.5)
png

--

--

İrem Tanrıverdi

Research and Teaching Assistant. MSc in Statistics. Interested in programming, machine learning and artificial inteligence.