Probability For Statistics - Concise notes
MATH50010
PDFs
Problem Sheets - Class
Table of contents
Colour Code - Definition are green in these notes, Consequences are red and Causes are Blue
Content from MATH40005 assumed to be known.
Probability Review
Definition 1.1 - Experiment
Any fixed procedure with variable outcome
Definition 1.2 - Sample space $\Omega$
Set of all possible outcomes of an experiment
Definition 1.4 - $\sigma$-algebra (Sigma-algebra)
$\mathcal{F}$ a collection of subsets of $\Omega$
$\mathcal{F}$ an algebra if
$\emptyset \in \mathcal{F}$
$A \in \mathcal{F} \implies A^{c} \in \mathcal{F}$
$A,B \in \mathcal{F} \implies A \cup B \in \mathcal{F}$ (closed under finite union)
$\mathcal{F}$ a $\sigma$-algebra if closed under countable union.
Definition 1.13 - Borel sigma algebra
Let $\mathcal{F}_{i}, i \in \mathcal{I}$, the collection of all $\sigma$-algebras that contain all open intervals of $\mathbb{R}$
${\mathcal{F}_{i}}$ clearly non-empty, since power set of $\mathbb{R}$ is such a sigma algebra. Borel sigma algebra, $\mathcal{B} := \bigcup_{i\in\mathcal{I}}\mathcal{F}_{i}$
Remarks
$\mathcal{B}$ contains all open intervals, their complements, countable unions and countable intersections.
$\mathcal{F}$ a sigma algebra containing all intervals of the form above $\implies \mathcal{B} \in \mathcal{F}$.
$\mathcal{B}$ thought of as the smallest sigma algebra containing all intervals
$B \subset \mathcal{B}$ said to be a Borel set
Definition 1.16 - Kolmogorov Axioms
Given $\Omega$ and a $\sigma$-algebra $\mathcal{F}$ on $\Omega$
A Probability function/ Probability Measure is a function $Pr : \mathcal{F}\to [0,1]$ satisfying:
$Pr(A) \geq 0, \forall A \in F$
$Pr(\Omega) = 1$
${A_{i}} \in \mathcal{F}$ are pairwise disjoint then \(\textcolor{RoyalBlue}{Pr(\bigcup_{i=1}^{\infty}A_{i}) = \sum_{i=1}^{\infty}Pr(A_{i})}\)
Definition 1.17 - Probability Space
Defined as the triple $(\Omega,\mathcal{F},Pr(\cdot))$
Properties of $Pr(\cdot)$
$Pr(\emptyset) = 0$
$Pr(A) \leq 1$, $Pr(A^{c}) = 1 - Pr(A)$
$Pr(A\cup B) = Pr(A) + Pr(B) - Pr(A\cap B)$
$A \subset B \implies Pr(A) \leq Pr(B)$
$Pr(A) = \sum_{i=1}^{n=\infty}Pr(A\cap C_{i}$ for ${C_{i}}$ a partition of $\Omega$
Proposition 1.18 - Continuity Property
Let $(\Omega,\mathcal{F},Pr(\cdot))$, and $A_{1},A_{2},\dots \in \mathcal{F}$ an increasing sequence of events, ($A_{1} \subseteq A_{2} \subseteq \dots)$
\[A = \bigcup_{n=1}^{\infty}A_{n} \in \mathcal{F}\]$\mathcal{F}$ a sigma algebra $\implies$
\[\textcolor{red}{Pr(A) = \lim_{n\to \infty}Pr(A_{n}) \qquad Pr(\lim_{n\to\infty}A_{n}) = \lim_{n\to\infty}Pr(A_{n})}\]Definition 1.20 - Conditional Probability
$A, B \in \mathcal{F}, Pr(B) > 0$, Conditional Probability of $A$ given $B$ is
\[Pr(A\rvert B) = \frac{Pr(A\cap B}{Pr(B)}\]Definition 1.21 - Independence
2 events are independent if \(Pr(A\cap B) = Pr(A)Pr(B)\)
Definition 1.22 - Mutually independent
${A_{i}}\in \mathcal{F}$ mutually independent if for any subcollection ${A_{i_{j}}}_{j=1,\dots,k}$
\[Pr(\bigcap_{j=1}^{k}A_{i_{j}}) = \prod_{j=1}^{k}Pr(A_{i_{j}}\]Random Variables
Definitions
Definition 2.1 - Random Variable
A random variable on $(\Omega,\mathcal{F},Pr)$ a function
\[X:\Omega\to \mathbb{R}\]such that, $\forall \text{ Borel set } B \in \mathcal{B}, X^{-1}(B)\in \mathcal{F}$
Random vectors defined analogously, $X: \Omega \to \mathbb{R}^{n}$ and Complex Random Variables $X:\Omega\to \mathbb{C}$
Definition 2.3 - Distribution
$\forall \text{ Borel sets } B \in \mathcal{B}$ \(Pr_{X}(B) = Pr(X^{-1}(B)) = Pr(\{\omega \in \Omega: X(\omega) \in B\})\) $Pr_{X}$ the distribution of $X$. Written $Pr(X \in B)$
We say $X$ and $Y$ identically distributed if $Pr(X\in B) = Pr(Y\in B)\ \forall B \in \mathcal{B}$
Definition 2.10 - Cumulative Distributive Function (CDF)
CDF of a random variable $X$ a function $F_{x}:\mathbb{R}\to [0,1]$ \(F_{x} = Pr(X\leq x)\)
Notation - Monotone limits
Write $x_{n} \downarrow x$ for $(x_{n})$ a seq. (weakly) monotonically decreasing to limit $x$
Write $x_{n} \uparrow x$ for $(x_{n})$ a seq. (weakly) monotonically increasing to limit $x$
Properties of the CDF
$F_{X}(x)$ is non-decreasing
$\lim_{x\to -\infty}F_{X}(x) = 0; \lim_{x\to +\infty}F_{X}(x) = 1$
$\lim_{x\downarrow x_{0}}F_{X}(x) = F_{X}(x_{0})$, F is continuous from the right.
Definition - Point mass CDF
The constant random variable the most trivial RV. For $a \in \mathbb{R}$ define point mass CDF as
Definition 2.14 - Probability Mass Function PMF
if $ \exists (a_{n}){n\geq 1}$ and $(b{n}){n\geq 1}$ where $b{i} > 0$ with $\sum_{i}b_{i} = 1$ with $F_{X}(x)$ s.t
\[\textcolor{RoyalBlue}{F_{X}(x) = \sum_{i=1}^{\infty}b_{i}\delta_{\alpha_{i}}(x)}\]Then $X$ a discrete random variable, with PMF $f_{X}(x) = Pr(X=x)$
if $F_{X}(x)$ continuous $\implies$ $X$ a continuous random variable
if $X$ a continuous random variable s.t $\exists f_{X}:\mathbb{R}\to \mathbb{R}$
\[\textcolor{RoyalBlue}{F_{x}(x) = \int_{-\infty}^{x}f_{X}(t)dt, \forall x \in \mathbb{R}}\]Then $X$ an absolutely continuous random variable with probability density function (PDF) $f_{X}(x)$
Transformations of Random Variables
Suppose $X$ an absolutely continuous random variable with pdf $f_{X}$ and $g:\mathbb{R}\to \mathbb{R}$ a strictly monotonic and differentiable
\[Y = g(X) \implies f_{Y}(y) = f_{X}(g^{-1}(y))\left\lvert\frac{dg^{-1}y}{dy}\right\rvert\]\(\textcolor{red}{f_{Y}(y) = f_{X}(x)\frac{dx}{dy}}\)
Families of distributions
Scale Family
For $\sigma >0$, we have $Y = \sigma Z$ which has pdf
Location-Scale Family
Define $W = \mu + \sigma Z$, with pdf
Probability Integral Transform
Let $U \sim Unif[0,1]$ with $X = F^{-1}(U)$ s.t $F$ a strictly increasing CDF $\implies X$ a random variable with CDF $F$
Expectation
For discrete r.v $X$
\[\textcolor{green}{E(X) = \sum_{x}xPr(X=x)}\]Similary for continuous r.v
\[\textcolor{green}{E(X) = \int_{-\infty}^{+\infty}xf_{X}(x)dx}\]Properties of Expectation
$E(aX+bY) = aE(x) + bE(Y), \forall a,b \in \mathbb{R}$
If $Pr(X \geq 0) = 1 \implies E(X) \geq 0$
If $A$ an event, $E(1_{A}) = Pr(A)$
Multivariate Random Variables
Definitions
Definition 3.1 - Joint Cumulative Distribution Function (Joint CDF)
Given by \(F_{XY}(x,y) = Pr(X \leq x, Y\leq y)\) Jointly absolutely continuous case: \(F_{XY}(x,y) = \int_{-\infty}^{y}\int_{-\infty}^{x}f_{XY}(s,t)dsdt\)
Definition - Marginal Density Function (MDF)
\[f_{X}(x) = \int_{-\infty}^{\infty}f_{XY}(x,y)dy\]Definition 3.3 - Independence
Finite set of r.v ${X_{i}}$ said to be independent if \(Pr(X_{1} \in B_{1},\dots,X_{n}\in B_{n}) = \prod_{i=1}^{n}Pr(X_{i}\in B)\) $\forall \text{ Borel sets }B_{i}$
Corollary.
Any collection $(X_{i})$ independent if every finite subcollection independent.
Definition 3.4 - Covariance
For r.v $X,Y$ with finite $E(X) = \mu_{X}, E(Y) = \mu_{Y}$ \(Cov(X,Y) = E((X-\mu_{X})(Y-\mu_{Y}))\)
Definition - Correlation
\[Cor(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}\]Change of variables for pdfs
If $(U,V) = T(X,Y)$ a function of pair of r.v $(X,Y)$ with joint pdf $f_{XY}$ a joint pdf for $(U,V)$ given by
Where
\[J(u,v) = det \begin{pmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v}\\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{pmatrix}\]Remark 3.9 - (Factorisable independence)
$X,Y$ independent if $\exists g,h : \mathbb{R} \to \mathbb{R}$ such that the joint mass/density function factorises as
Definition - Conditioning
For $X$ a r.v, conditional CDF of $X$ given $A$
Definition - Conditional Probability Density Function
\[f_{Y\rvert X}(y\rvert x) = \frac{d}{dy}F_{Y\rvert X}(y\rvert x) = \frac{f_{XY}(x,y)}{f_{X}(x)}\]Bivariate Normal Distribution
Definition - Standard bivariate normal distribution
Has pdf for $-1 < \rho < 1$
Properties:
- $E(X) = E(Y) = 0, E(XY) = \rho$\
- $Var(X) = Var(Y) = 1, Cov(X,Y) = \rho$\
Vector form
$\mathbf{x} = (x,y)\ \underline{\mu} = (\mu_x,\mu_y), \Sigma = \begin{pmatrix} \sigma_x^2 & \rho\sigma_x\sigma_y\ \rho\sigma_x\sigma_y & \sigma_2^2\end{pmatrix}$
Extend this to Multivariate normal distribution:
\[f_{\mathbf{X}}(\mathbf{x}\rvert\underline{\mu}, \Sigma) = \frac{1}{(2\pi)^{d/2}(det\Sigma)^{1/2})}exp\left(-\frac{1}{2}(\mathbf{x} - \underline{\mu})^T( \Sigma^{-1}(\mathbf{x}-\underline{\mu})\right)\]Where $\mathbf{x} \in \mathbb{R}^d,\ (\Sigma_{ij}) = Cov(X_i,X_j)$
Remarks.
$\Sigma$ symmetric: $Cov(X,Y) = Cov(Y,X)$
$diag(\Sigma) = {Var(X_i)}$
constant $\mathbf{a} \in \mathbb{R}^d\ Var(\mathbf{a^T}\mathbf{x}) = \mathbf{a}^T\Sigma a$
Proposition 3.16.
$X \sim MVN_d(\underline{\mu},\Sigma)$, $A$ invertible $d\times d$ matrix
Proposition 3.17.
Can always find linear transform $Q$ of $\mathbf{X}$
s.t entries of $Z = Q\mathbf{X}$ uncorrelated and independent random variable.
Order statistic
Consider random sample $(X_1,\dots,X_n)$ with cdf $F_X$ and pdf $f_X$
with $Y_1$ smallest, $Y_2$ next smallest etc.
($Y_1,\dots,Y_n)$ the vector of order statistics of $\mathbf{X}$
Convergence of Random Variables
Convergence
Definition 4.1. Sequence $(X_i)$ of random variables said to converge in probability to $X$
\[X_n \xrightarrow[]{P} X \quad \text{ if } \quad \forall \epsilon >0 \lim_{n\to \infty} Pr(\lvert X_n - X\rvert \geq \epsilon ) = 0\]Proposition 4.4. - (Markov’s inequality)
$X$ a random variable taking non-negative values only.
$a> 0$ constant
Proposition 4.5.
Take non-negative random variable $Y = (X-\mu)^2$
Definition 4.6.
$X_1,X_2,\dots $
\[\underbrace{\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n}X_i}_{\text{\textcolor{green}{sample mean}}}\]Proposition 4.7. - (Weak law of large numbers)
$X_1,X_2,\dots$ sequence of iid random variable with finite $\mu,\sigma^2$
Definition 4.9.
$X_1,X_2,\dots$ with cdfs $F_1,F_2,\dots$
Converge in distribution to random variable $X$ with cdf $X_n$
$\forall x \in \mathbb{R}$ for which $F_X$ continuous.
Proposition 4.12.
Converge in probability $\implies$ converge in distribution
Proposition 4.14.
Suppose $(X_n)_{n\geq 1}$ sequence of random variables
Limit events
Definition
$A_1,A_2,\dots$ sequence of events
${A_n\ i.o} = A_n$ infinitely often
${A_n\ a.a} = A_n$ almost always (finitely many $A_n$ dont occur)
${A_n\ a.a} \subset {A_n\ i.o}$
Proposition 4.15.
Sequence of sets $(A_n)$ We define
And further
\[\underbrace{\liminf_{n\to\infty} A_N = \bigcup_{n=1}^{\infty}\bigcap_{n=N}^{\infty}A_n}_{=\{A_n\ a.a\}} \quad \underbrace{\limsup_{n\to\infty} A_N = \bigcap_{n=1}^{\infty}\bigcup_{n=N}^{\infty}A_n}_{=\{A_n\ i.o\}}\]Remark 4.16
$ { A_n\ i.o }^{C}$ - only finitely many $A_n$ occur
Proposition 4.17
$A_1,A_2,\dots$ sequence of events
if $\sum_{n=1}^{\infty}Pr(A_n) < \infty$ $\implies$ $Pr({A_n\ i.o}) = 0$
if $\sum_{n=1}^{\infty}Pr(A_n) = \infty$ and ${A_i}$ independent $\implies$ $Pr({A_n\ i.o}) = 1$
Central Limit Theorem
Moment generating functions
Definition 5.1 - (Moment generating functions MGFs)
\[M_{X}(t) = E\left[ \exp{(tX)} \right]\]Proposition 5.2.
$Y = aX + b \implies M_Y(t) = \exp(bt)M_X(at)$
Proposition 5.3.
$X,Y$ independent random variables
Proposition 5.4.
Suppose $\exists t_0 > 0$ s.t $M_X(t) < \infty$ for $\lvert t \rvert < t_0$
Proposition 5.5.
(Uniqueness)
Suppose $X,Y$ random variables with common MGF finite for $\lvert t \rvert < t_0$ for some $t_0 > 0$
(Continuity)
Suppose $X$ a random variable with $M_X(t)$
$(X_{n}){n\geq 1}$ sequence of random variables with respective $M{X_i}(t)$
\[\text{ if } \textcolor{RoyalBlue}{M_{X_i}(t) \xrightarrow[i\to \infty]{} M_X(t) < \infty \quad (\forall \lvert t \rvert < t_0, t_0 > 0}\] \[\implies \textcolor{red}{X_n \xrightarrow[]{D} X}\]Definition 5.11
Little-o
Say $f(x) = o(g(x))$ in $\lim_{x\to \infty}$ if
Similarly defined for case $x \to 0$ limit.
Proposition 5.12.
$X_1,X_2,\dots$ sequence of iid random variables with common MGF; $M(t)$ exists in open interval containing $0$
\[\text{ if } E(X_i) = \mu\ \forall i \implies \bar{X}_n \xrightarrow[]{P} \mu\]The Central Limit Theorem
Proposition 5.14.
$X_1,X_2,\dots$ sequence of iid random variables with common MGF $M(t)$ (existing in open interval containing 0)
$E(X_i) = \mu,\ Var(X_i) = \sigma^2 \forall i$`
Stochastic Processes
Definition
$\mathcal{E}-$ the state space (finite or countably infinite)
Random process - sequence of $\mathcal{E}$ valued random variables $X_0,X_1,\dots$
Time Homogeneous Markov Chains
Definition 6.1
Stochastic Process on state space $ \mathcal{E}$, collection of $\mathcal{E}$ - valued r.v $(X_t){t\in T}$ indexed by set $T$ often $T = \mathbb{N}{0}$
Definition 6.2
Discrete time stochastic process $(X_n)_{n\in \mathbb{N}_0}$ on $\mathcal{E}$ a Markov chain if
Definition 6.3
Markov chains Time homegenous if
Definition 6.4
Matrix $P = (p_{ij}){i,j \in \mathcal{E}}$ of transition probability $p{ij} = Pr(X_1 = j \rvert X_0 = i)$
Called the Transition Matrix for the time homogeneous Markov chain. $(X_n)$
Initial distribution
Definition.
Initial distribution and Transition Matrix specify stochastic process fully i.e.
$\lambda = (\lambda_j)_{j\in \mathcal{E}} ; \lambda_j = Pr(X_0 = j)$
Marginal distribution - $P(X_1 = j) = \sum_{i\in \mathcal{E}}P(X_1 = j \rvert X_0 = i)P(X_0 =i) = \sum_{i\in \mathcal{E}}p_{ij}\lambda_{i}$
Joint distribution - $P(X_2 = k, X_1 =j) = P(X_2 = k \rvert X_1 =j)P(X_1 = j) = p_{jk}\sum_{i\in \mathcal{E}}p_{ij}\lambda_{i}$
Definition 6.7
Given Markov chain $(X_n)_{n\in \mathbb{N}_0}$.
Proposition 6.9. - (Chapman-Kolmogorov equations)
Suppose $m \geq 0$ and $n \geq 1$
\[p_{ij}(m+n) = \sum_{l \in \mathcal{E}} p_{il}(m)p_{lj}(n)\] \[P(m+n) = P(m)P(n) \quad \text{(Matrix form)} \implies P(m) = P^m\]Class Structure
Definitions
Definition 6.11
State $j$ accessible from state $i$; $i \to j$ if $\exists n \geq 0$ s.t $p_{ij}(n) > 0$
Definition 6.13
States $i,j$ communicate; $i \longleftrightarrow j$ if $i \to j$ and $j\to i$
Proposition 6.15.
Binary relation $i \Longleftrightarrow j$ an equivalence relation on $\mathcal{E}$, partitioning $\mathcal{E}$ into communicating classes.
Definition 6.17
Set of states $C$ closed if $p_{ij} = 0, \forall i \in C, j \not\in C$
Definition 6.19
Set of states $C$ irreducible if $i \longleftrightarrow j, \forall i,j \in C$
Periodicity
Definition 6.20
Period of state $i$; $d(i) = {n>0 : p_{ii}(n) > 0}$
$d(i) = 1$ say state is aperiodic
$d(i) > 1$ say state is periodic
Proposition 6.22.
All states in same communicating class have same periodicity
Classifcation of states
Definition 6.24.
$i \in \mathcal{E}$ for Markov Chain $X_n$
Recurrent if \(P(X_n = i, n \geq 1 \rvert X_0 = i) = P\left( \bigcup_{n=1}^{\infty} \{X_n = i\} \rvert X_0 = i \right) = 1\)
Transient if \(P\left( \bigcup_{n=1}^{\infty} \{X_n = i\} \rvert X_0 = i \right) < 1\)
Definition 6.25.
First passage time of state $j \in \mathcal{E}$ \(T_{j} = \min\{n\geq 1: X_n = j\}\) First $n$ s.t $X_n = j$
Say $T_j = \infty$ if never visits state $j$ $\implies T_j$ not a random variable since its not real valued.
Definition.
Remark 6.27
\[\begin{aligned} f_{ij}(n) &= Pr(T_j = n \rvert X_0 = i)\\ f_{ij} &= Pr(T_j < \infty \rvert X_0 =i)\\ &= Pr(\bigcup_{n=1}^{\infty}\{T_j= n\} \rvert X_0 = i)\\ &= \sum_{n=1}^{\infty}f_{ij}(n) \end{aligned}\]Remark 6.28
\[\text{State } i: \begin{cases} \text{recurrent} & f_{ii} = 1 \iff \sum_{n=1}^{\infty} p_ii(n) = \infty\\ \text{transient} & f_{ii} < 1 \iff \sum_{n=1}^{\infty} p_ii(n) < \infty \end{cases}\]Proposition 6.29
$i,j \in \mathcal{E},\ n \geq 1$ \(p_{ij}(n) = \sum_{l = 1}^{n}f_{ij}(l)p_{ij}(n-l)\) $p_{ij} = p_{ij}(1) = f_{ij}(1)$
Proposition 6.32
$i \longleftrightarrow j$ $\implies$ either $i,j$ both recurrent or both transient
Proposition 6.33
$C$ a recurrent communicating class $\implies$ $C$ closed: $i \in C, j \not\in C$ we have $p_{ij} = 0$
Proposition 6.34
State space decomposes
Definition 6.36
Mean recurrence time of state $i \in \mathcal{E}$
Remark 6.37
Transient States: $\mu_i = \infty$ since $P(T_i = \infty \rvert X_0 = i) > 0$
Recurrent States: $\mu_i = \sum_{n=1}^{\infty}$ can be finite or infinite
Definition 6.38
$i \in \mathcal{E}$
null recurrent if $\mu_i = \infty$
positive recurrent if $\mu_i < \infty$
Definition 6.39
$(X_n)$ a markov chain on $\mathcal{E}$ Hitting time of set $A \subseteq \mathcal{E}$ a random variable
We take $\min\emptyset = \infty$\
Hitting probability starting at $i \in \mathcal{E}$
\[h_{i}^{A} = Pr(H^A < \infty \rvert X_0 = i)\]in the case $A = {j}$ we write $h_i^j$
Proposition 6.43
$A \subseteq \mathcal{E}$ take vector $h^A = (h_i^A)_{i \in \mathcal{E}}$ solves system
Stationary Distributions
Definition 6.44
Vector $\pi = (\pi_j)_[{j\in\mathcal{E}}$ a stationary distribution for $(X_n)$ if
$\pi_j \geq 0 \ \forall j \in \mathcal{E}$ and $\sum_{j\in\mathcal{E}}\pi_j = 1$ ($\pi$ a probability distribution on $\mathcal{E}$
$\pi P = \pi$
Proposition 6.45
$X_n$ has distribution $\pi$ with $\pi$ stationary for $(X_n)$ $\implies$ $X_{n+1}$ has distribution $\pi$
Proposition 6.46
irreducible chain has stationary distribution
$\iff$ all states positive recurrent
$\implies \pi_j = \frac{1}{\mu_j}$ for $\mu_j$ the mean recurrence time
$\implies$ stationary distribution is unique
Proposition 6.47
$(X_n)_{n\in\mathbb{N}_0}$ irreducible aperiodic Markov Chain with stationary distribution $\pi$
$\implies \forall$ initial distribution $\lambda$, $\forall j \in \mathcal{E}$ \(\lim_{n\to\infty}Pr(X_n = j) = \pi_j\) \(\forall i \in \mathcal{E}:\ \lim_{n\to\infty}Pr(X_n = j \rvert X_0 = i) = \pi_j \quad (\text{independent of } i)\)
Proposition 6.48 - (Ergodic Theorem)
$(X_n)_{n\in\mathbb{N}_0}$ irreducible Markov Chain \(\forall i \in \mathcal{E}\text{ let } V(i) = \sum_{r=0}^{n}I(X_r = i)\) Counts the number of visits to $i$ before time $n$
$\implies \forall$ initial distributions, $i\in\mathcal{E}$ we have
Proposition
Symmetrical random walk on finite graph
$i\in\mathcal{E}$ connected to $d_i$ other states \(\implies \textcolor{red}{\pi_i = \frac{d_i}{\sum_{j\in\mathcal{E}}d_j}}\)