# TRANSFORMATION

## SUMS AND MIXTURES

## Sums and Mixtures

Sums and mixtures of a random variable is a special type of transformation.

**Convolution:** Basically, sums of independent random variables

S= X₁+X₂+…+X𝚗 = ∑{i=1} X𝒾

For example;

- X ~ N(1, 5)
- Y ~ N(2, 3)
- Z = X + Y ~ N(3, 8)$

or;

- v ~𝛘²₂
- u ~𝛘²₆
- z = v + u ~𝛘²₈

or;

- w ~ Gamma(2, 4)
- z ~ Gamma(7, 4)
- s = w + z ~ Gamma(9, 4)

This are examples of convolution.

**Mixtures:** Mixtures can be defined as combination of continuous and discretes.

- If the random variable X has a distribution as a weighted sum 𝐹(𝑥)=Σ𝜃ᵢ𝐹(𝑋ᵢ) for some sequence of Xᵢ’s and 𝜃ᵢ>0 such that Σ𝜃ᵢ=1, then X is a discrete mixture. The constant 𝜃ᵢ’s are called as mixing weights.
- If the distribution of X is 𝐹(𝑥)=∫𝐹{𝑋|𝑌=𝑦}(𝑥)𝑓(𝑦)𝑑𝑦

for a family X|Y=y indexed by real numbers y and weighting function fʸ such that ∫𝑓(𝑦)𝑑𝑦 =1, then X is a continuous mixture.

- X ~ N(1, 5)
- Y ~ N(2, 3)
- 3. Z = 0.5X + 0.5Y

or;

- w ~ Gamma(2, 4)
- z ~Gamma(7, 4)
- s = 0.7w + 0.3z

# Transformation Methods

Relationships between some distributions are given below;

Note that; The difference between Erlang and Gamma is that **in a Gamma distribution, t (shape parameter) can be a non-integer (positive real number) but in Erlang, t is positive integer only**.

## Example 1)

Generate a random sample of size n = 10⁵ from Gamma(4,9) by using relation between Exponential and Gamma distributions. Try to use only runif(1) function to generate random variables.

(Hint: Summation of t random variables from Exponential(𝝀) is distributed as Gamma(t, 𝝀 ))

Also, show that your sample has the properties of Gamma(4,9).

E(X)= t/𝝀

Var(X)=t/𝝀²

## Generating Random Variables from Gamma Distribution.

Let X is a random variable from Gamma Distribution(t,𝜆) and its PDF is

The inverse transform method cannot be applied in this case since there is not a closed form of solution for its inverse.

We know that the sum of t independent Exponential(𝜆) is distributed as Gamma(t,𝜆). This leads the following transformation based on t uniform random numbers.

Inverse transform of Exponential distribution is;

`set.seed(361)`

#parameters

n <- 10^4

t <- 4

lambda <- 9

U <- matrix(runif(n*t),nrow = t, ncol = n) #since we will take sum, we define a matrix

`logU <- -log(U) / lambda # inverse transform method`

gamma_y <- apply(logU,2,sum) # col sums of matrix logU

hist(gamma_y, prob=TRUE, main ="Gamma Distribution (4,9)")

y = seq(0,2.5,0.01)

lines(y,dgamma(y,t,lambda),col="red",lwd = 2.5)

`#theoratical parameters`

theoratical_mean <- t/lambda

theoratical_var <- t/lambda^2

#sample statistics

est_mean <- mean(gamma_y)

est_var <- var(gamma_y)

control <- matrix(round(c(theoratical_mean,theoratical_var, est_mean,est_var),5), nrow = 2, byrow = T)

colnames(control) <- c("Mean","Variance")

rownames(control) <- c("Theoritical", "Estimated")

control

Theoritical Mean= 0.44444

Theoritical Variance= 0.04938

Estimated mean= 0.44201

Estimated variance= 0.04864

## Example 2)

Generate a sample of size n=10⁵ from Pareto(1, 18) by using relation between Exponential(8), Erlang and Pareto distributions. Try to use only runif(1) function to generate random variables.

Also, show that your sample has the properties of Pareto(1,18).

Let X is a random variable from Pareto Distribution(1,18) and its PDF is

We know that the sum of t independent Exponential(8) is distributed as Erlang(t,8). This leads the following transformation based on t uniform random numbers.

Inverse transform of Exponential distribution is

Then, to generate a random sample from Erlang Distribution we can use inverse transform method for exponential distribution;

library(sads)set.seed(361)

#parameters

n <- 10^5

t <- 18

lambda <- 8

U <- matrix(runif(n*t),nrow = t, ncol = n)

exponential <- -log(U) / lambda # inverse transform method

erlang <- apply(exponential,2,sum) # col sums of matrix logU

pareto = (exponential/erlang) + 1

hist(pareto, prob=TRUE, xlim = c(0, 10),

breaks = 100,

main = "Pareto(1, 18)")

y = seq(0,10,0.01)

lines(y, dpareto(y, scale = 1, shape=t, log = FALSE),col="red",lwd = 1.5)

## Example 3)

Generate random sample of size 10⁵ from 𝝌² distribution with 10 degrees of freedom by using the relation between Chi-Square and standard normal distribution. Check the histogram and compare estimated mean and variance with the theoritical expectation and variance of 𝝌²(v) distribution (v=10).

E(x)=v and V(x) = 2v

## We can use the fact that the chi-square distribution with V degrees of freedom is the sum of V squared independent standard normal;

`set.seed(361)`

n <- 10^5

v <- 10

Z <- matrix(rnorm(v*n,0,1),nrow = v, ncol = n)

SquaredZ <- Z^2

X <- colSums(SquaredZ)

hist(X, prob=TRUE, main ="Chi-Square Distribution")

y = seq(0,35,0.01)

lines(y,dchisq(y,v),col="red",lwd = 2.5)

`#theoratical parameters`

theoratical_mean <- v

theoratical_var <- 2*v

#sample statistics

est_mean <- mean(X)

est_var <- var(X)

control <- matrix(round(c(theoratical_mean,theoratical_var, est_mean,est_var),5), nrow = 2, byrow = T)

colnames(control) <- c("Mean","Variance")

rownames(control) <- c("Theoritical", "Estimated")

control

Theoritical Mean = 10.00

Theoritical Variance = 20.00

Estimated Mean= 9.99888

Estimated Variance = 19.90146

## Example 4)

Generate random sample of size n=10⁵ from Student-T distribution with 40 degrees of freedom by using the following transformation method. Compare the histogram with the Student-T density curve.

Hint: If 𝑍 ~N(0, 1) and V~ 𝝌²(v) are independent,

Then; T = Z / (√V/t) ~ T𝒕 has the Student-T distribution with “t” degrees of freedom.

Generate 𝝌² random variables from standard normal distribution.

`set.seed(361)`

n <- 10^5

v <- 40

Z1 <- rnorm(n) #standard normal dist

Z2 <- matrix(rnorm(n*v),v,n)

Z2_sqr <- Z2^2

V <- colSums(Z2_sqr) #chi-square dist with 40 dof

t <- Z1 / (sqrt(V / v))

hist(t, prob = TRUE, main = "Student-T Distribution")

y <- seq(-4,4,0.01)

lines(y, dt(y,v),col = "Red", lwd = 2.5)

## Example 5)

Let X is a random variable from Logistic Distribution (𝜇,s) and its PDF is;

Generate random sample of size n=10⁵ from Logistic(5, 7) by using the following transformation method. Compare the histogram with the Logistic density curve.

## Remember that

## Inverse transform of Exponential distribution is

`set.seed(361)`

n<- 10^5

lambda <- 1

mu = 5

s = 7

u1 <- runif(n)

x <- -log(u1) / lambda # inverse transform method for first exponnetial variable

u2 <- runif(n)

y <- -log(u2) / lambda # inverse transform method for second exponnetial variable

logistic <- mu-(s*log(x/y))

hist(logistic, prob = TRUE, main = "Logistic(5, 7)")

a <- seq(-100,100,0.01)

lines(a, (exp(-(a-mu)/s) / (s*(1 + exp(-(a-mu)/s))^2)),col = "Red", lwd = 2.5)

## Example 6)

Let X is a random variable from F Distribution(2, 8)

Generate random sample of size n=10⁵ from F(2, 8) by using the following transformation method. Compare the histogram with the F density curve.

## Remember that

`N <- 10^5`

m <- 2

n <- 8

z1 <- matrix(rnorm(m*N),m,N) #standard normal matrix

z2 <- matrix(rnorm(n*N),n,N) #standard normal matrix

squared_z1 <- z1^2 #chi-square 1 dof matrix

squared_z2 <- z2^2 #chi-square 1 dof matrix

U <- colSums(squared_z1) #chi-square 2 dof

V <- colSums(squared_z2) #chi-square 8 dof

Rv.F <- (U/m)/(V/n) # F(2,8)

hist(Rv.F, prob = T, main = "F-distribution(2,8)")

a <- seq(0,50,0.01)

lines(a,df(a,m,n), col = "red", lwd = 2.5)