Study notes and chapter summaries
Derived from the Greek, "Stochastic" means "random" or "chance". It's antonym is "sure", "deterministic", or "certain". While a deterministic model ensures a single outcome from a given set of circumstances, a stochastic model predicts the possible set of outcomes with their respective likelihoods or probabilities. While a coin tossed in the air will surely land somewhere on the land around us, whether it lands on its heads or tails is random, with each outcome being assigned a probability of 0.5 . It's interesting to note that phenomena are not themselves stochastic or deterministic, it is just how the observer perceives the model. For example, changes in the level of a huge population are often considered and modeled deteministically, although we may agree upon the fact that many chance events contribute to their fluctuations.
As for the definition, A stochastic process is a family of random variables \( X_t \), where t is a parameter running over a suitable index set T. Usually, t indicates the discrete units of time, and the index set becomes \( T= \{0,1,2, \ldots \} \). Stochastic processes where \( T= [0,\infty) \) become important in different applications
Let \( \Omega \) be a non-empty set equipped with a \( \sigma \)-algebra \( \mathcal{F} \). A random variable is a function \( X : \Omega \to \mathbb{R} \) such that for every Borel set \( A \subseteq \mathbb{R} \), we have the pre-image \( X^{-1}(A) \in \mathcal{F} \); i.e., \(X\) is \(\mathcal{F}\) - measurable.
If X is a discrete rv, its \(m\)-th moment is given by $$ E[X^m] = \sum_{i} x_i^m \, \Pr(X = x_i) $$ For continuous rv, the \(m\)-th moment is given by $$ E[X^m] = \int_{-\infty}^{\infty} x^m \, f(x) dx $$ The \(m\)-th central moment is defined as the \(m\)-th order moment of the random variable \(X- \mu_x \), provided that \(\mu_x\) exists finitely.
| Distribution | Notation | PMF | Characteristic Function |
|---|---|---|---|
| Bernoulli | \( X \sim \mathrm{Bern}(p) \) | \( p(x) = p^x(1-p)^{1-x},\ x \in \{0,1\} \) | \( \varphi_X(t) = 1 - p + pe^{it} \) |
| Binomial | \( X \sim \mathrm{Bin}(n, p) \) | \( p(x) = \binom{n}{x}p^x(1-p)^{n-x} \) | \( \varphi_X(t) = (1 - p + pe^{it})^n \) |
| Poisson | \( X \sim \mathrm{Pois}(\lambda) \) | \( p(x) = \frac{e^{-\lambda} \lambda^x}{x!} \) | \( \varphi_X(t) = \exp(\lambda(e^{it} - 1)) \) |
| Geometric | \( X \sim \mathrm{Geom}(n, p) \) | \( f(x) = p(1-p)^{x-1}, x \in \{1,2, \ldots\} \) | \( \varphi_X(t) = \frac{pe^{it}}{1 - (1 - p)e^{it}} \) |
| Distribution | Notation | Characteristic Function | |
|---|---|---|---|
| Uniform | \( X \sim \mathrm{Unif}(a,b) \) | \( f(x) = \frac{1} {b-a},\ x \in [a,b] \) | \( \varphi_X(t) = \frac{e^{itb}-e^{ita}} {it(b-a)} \) |
| Exponential | \( X \sim \mathrm{Exp}(\lambda) \) | \( f(x) = \lambda e^{-\lambda x} , \ x \geq 0 \) | \( \varphi_X(t) = \frac{\lambda}{\lambda -ib} \) |
| Gamma | \( X \sim \mathrm{Gamma}(p (rate), \alpha (shape)) \) | \( f(x) = \frac{p^{\alpha}x^{\alpha -1}e^{-px}}{\gamma(\alpha)} , \ x > 0 \) | \( \varphi_X(t) = {(\frac{p}{p-it})}^{\alpha} \) |
| Beta | \( X \sim \mathrm{Beta}(a, b) \) | \( f(x) = \frac{x^{a-1}(1-x)^{b-1}}{\beta(a,b)}, \ x \in (0,1) \) | Not elementary in closed form |
| Cauchy | \( X \sim \mathrm{Cauchy}(x_0, \gamma) \) | \( f(x) = \frac{1}{\pi \gamma \left[1 + \left(\frac{x - x_0}{\gamma}\right)^2 \right]} \) | \( \varphi_X(t) = e^{i x_0 t - \gamma |t|} \) |
| Normal (Univariate) | \( X \sim \mathcal{N}(\mu, \sigma^2) \) | \( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x - \mu)^2}{2\sigma^2} } \) | \( \varphi_X(t) = e^{i \mu t - \frac{1}{2} \sigma^2 t^2} \) |
| Normal (Multivariate) | \( \mathbf{\vec{X}} \sim \mathcal{N}_n(\boldsymbol{\vec{\mu}}, \boldsymbol{\Sigma}) \) | \( f(\mathbf{\vec{x}}) = \frac{1}{(2\pi)^{n/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\left( -\frac{1}{2} (\mathbf{\vec{x}} - \boldsymbol{\vec{\mu}})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{\vec{x}} - \boldsymbol{\vec{\mu}}) \right) \) | \( \varphi_{\mathbf{X}}(\mathbf{t}) = \exp\left( i \mathbf{t}^\top \boldsymbol{\mu} - \frac{1}{2} \mathbf{t}^\top \boldsymbol{\Sigma} \mathbf{t} \right) \) |
| Chi-square | \( X \sim \chi^2_k \) | \( f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2},\ x > 0 \) | \( \varphi_X(t) = (1 - 2it)^{-k/2} \) |
| t-distribution | \( T \sim t_\nu \) | \( f(t) = \frac{\Gamma\left(\frac{\nu + 1}{2}\right)}{\sqrt{\nu \pi} \ \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-(\nu + 1)/2} \) | No simple closed form, but the k-th order moment exists only when the degrees of freedom (= n) >k |
| F-distribution | \( F \sim F_{d_1, d_2} \) | \( f(x) = \frac{1}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right)} \left( \frac{d_1}{d_2} \right)^{d_1/2} \frac{x^{d_1/2 - 1}}{\left(1 + \frac{d_1}{d_2} x \right)^{(d_1 + d_2)/2}},\ x > 0 \) | No simple closed form, but the k-th moment exists only if \(d_2\) > 2π |
Let X and Y have finite second moments. Then $$ {E[XY]}^2 \leq E[X^2]E[Y^2] $$
Let X be a non-negative random variable, and t be a positive number. Then $$ P(X \geq t) \leq \frac{E[X]}{t} $$
Let X be a random variable with mean \( \mu \) and finite variance \( \sigma^2 \) . Then for any real number t > 0 $$ P(|X- \mu| \geq t) \leq \frac{\sigma^2}{t^2} $$
Let \( X_1 , X_2, \ldots \) be a sequence of IID random variables, and let N be a discrete random variable, independent of all \(X_i\)s, and having PMF \( p_N (n) = \Pr (N=n), \ n \in {0,1,2, \ldots } \) We define the random sum \(S_N\) as $$ S_N = \sum_{i=1}^{N} X_i, \ N>0 $$
(a) Queueing: N= number of customers arriving at a service facility in a given time period, \(X_i\) = Service time for the i-th customer. Then \(S_N\) = Total demand for service time.
(b) Risk Theory: N= total number of claims arriving at insurance company, \(X_i\)= amount of the i-th claim. Then \(S_N\) = Total liability of the insurance company.
Let's assume that \(E[X_i] = \mu, E[N] = \nu, Var[X_i] = \sigma^2 , Var[N] = \tau^2. \)
We now find the value of \(E[S_N]\). This can be found out as \(E[S_N] \) $$= \sum_{n=0}^{\infty} E[S_N|N=n]p_N(n) $$ $$= \sum_{n=1}^{\infty} E[X_1 + X_2 + X_3 + \ldots + X_N |N=n]p_N(n) $$ $$= \sum_{n=1}^{\infty} E[X_1 + X_2 + X_3 + \ldots + X_n]p_N(n) $$ $$= \sum_{n=1}^{\infty} \mu np_N(n) $$ $$= \mu \sum_{n=1}^{\infty} np_N(n) $$ $$= \mu \nu $$
We can as well find \( Var[S_N] \) which turns out to be $$ Var[S_N] = \mu^2 \tau^2 + \nu\sigma^2 $$
Let Z = Closing price on the (n+1)th day - Closing price on the nth day Then Z = \(\sum_{i=1}^{N} X_i\), where \(X_i\) denotes the i-th transaction, each of which are independent , with a finite variance. As a result, CLT applies, suggesting it should be approximately normally distributed. This is the classical assumption used in many financial models. However, In real data, very small and very large changes happen more often than a normal distribution predicts, while medium-sized changes occur less often than expected. This might be because Z is rather a random sum, where N is random. Let's assume \( N \sim \mathcal{Pois}(\lambda)\) and \( X_i \sim \mathbb{N}(0, \sigma^2) \) Then \( Z|n \sim \mathbb{N}(0, (n+1)\sigma^2)\) & \( p_N (n) = \frac{\lambda^n e^{-\lambda}}{n!} \) Using this two equations, we get: $$ f_Z(z) = \sum_{n=0}^{\infty} \frac{exp(\frac{z^2}{2(n+1)\sigma^2})}{\sqrt{2\pi(n+1)}\sigma} \frac{\lambda^n e^{-\lambda}}{n!}$$ The resulting distribution is not normal but still has mean 0 and finite variance. This model explains why we might see heavy tails (more extreme events) and sharper peaks in the distribution of stock price changes β without violating the independence or Gaussian assumptions per transaction. The only difference is adding randomness to how many such events happen per day. This random-sum model produces a distribution that resembles real-world data better than the simple normal model. It's not a "proof" that transaction count randomness causes this β just that itβs a plausible explanation supported by the math
A stochastic process \(X_n : n= 0,1, \ldots \) is said to be martingale if \(∀ n= 0,1, \ldots\) $$ (a) E[|X_n|] < \infty $$ $$ (b) E[X_{n+1}| X_0, X_1, \ldots , X_n ] = X_n $$
for the (b) part, we observe that $$ E[E[X_{n+1}| X_0, X_1, \ldots , X_n ]] = E[X_n] $$ also by the tower property, $$ E[E[X_{n+1}| X_0, X_1, \ldots , X_n ]] = E[X_{n+1}] $$ thus, we can observe that \(E[X_0]= E[X_k] ∀ k =0,1, \ldots \)
The martingale property captures the idea of fairness in gambling: a playerβs expected fortune after the next play equals the current fortune, regardless of past outcomes. This reflects a fair game, where no betting strategy (like the classic "martingale" doubling approach) can guarantee profit in the long run. While martingale theory has roots in gambling, it now finds broad applications beyond it.
In financial markets, martingales model the notion of efficient pricing. For a security like a stock, if future prices could be predicted, buying or selling pressure would shift the current price. In an ideal, perfect market, such adjustments lead to a situation where future prices are, on average, unpredictable, making the price process a martingale.
The most basic examples of martingales are sums of independent, mean zero random variables. Let \( Y_0, Y_1, \ldots \) be a sequence of independent, identically distributed random variables such that \( E[Y_n] = 0 \). Then the sequence of partial sums
\[ X_n = \sum_{j=1}^{n} Y_j \] is a martingale relative to the sequence \( \{0, Y_1, Y_2, \ldots\} \) (that is, relative to the natural filtration generated by the variables \( Y_n \)). This is easily verified using the linearity and stability properties and the independence law for conditional expectation:
\[ \begin{align*} E[X_{n+1} \mid \mathcal{F}_n] &= E[X_n + Y_{n+1} \mid \mathcal{F}_n] \\ &= E[X_n \mid \mathcal{F}_n] + E[Y_{n+1} \mid \mathcal{F}_n] \\ &= X_n + 0 = X_n \end{align*} \]
The importance of martingales in modern probability stems at least in part from the fact that most of the essential properties of sums of independent, identically distributed random variables are inherited (with minor modification) by martingales.
The martingale difference sequence \( \{\varepsilon_n\} \) has the following properties:
(a) The random variable \( \varepsilon_n \) is a function of \( \mathcal{F}_n \);
(b) For every \( n \geq 0 \), \( E[\varepsilon_{n+1} \mid \mathcal{F}_n] = 0 \).
Proof:Assertion (b) is a three-line calculation using the properties of conditional expectation and the definition of a martingale.
Let \( \{X_n\} \) be a martingale relative to \( \{Y_n\} \), with martingale difference sequence \( \{\varepsilon_n\} \).
Then for every \( n \geq 0 \),
\( E[X_n] = E[X_0] \).
Moreover, if \( X_0 = 0 \) and \( E[X_n^2] < \infty \) then the random variables \( \varepsilon_j \) are uncorrelated, and so
\[ E[X_n^2] = \sum_{j=1}^{n} E[\varepsilon_j^2] \]
Proof:The first property follows almost trivially from Proposition 1 and the Expectation Law for conditional expectation, as these together imply that \( E[\varepsilon_n] = 0 \) for each \( n \). Summing and using linearity of expectation gives the result.
The second property follows by observing that each \( \varepsilon_j \) has finite variance (since it is the difference of two variables with finite second moments). Using Cauchy-Schwarz, we can ensure the cross-products have finite first moments. Then, for \( j < k \leq n \), \( \varepsilon_j \) is \( \mathcal{F}_j \)-measurable, and by the properties of conditional expectation, \( E[\varepsilon_j \varepsilon_k] = E[\varepsilon_j] E[\varepsilon_k] = 0 \).
Hence the variance of \( X_n \) is the sum of variances of the \( \varepsilon_j \), analogous to the case of i.i.d. mean-zero sums.
Let \( \{X_n\}_{n \geq 0} \) be a non-negative submartingale. Then for any \( \lambda > 0 \),
\[ \mathbb{P}\left( \max_{0 \leq k \leq n} X_k \geq \lambda \right) \leq \frac{\mathbb{E}[X_n]}{\lambda} \]
Proof:
Let \( X_0, X_1, \ldots, X_n \) be non-negative random variables such that:
We want to prove the inequality:
\[ \mathbb{P}\left(\max_{0 \le k \le n} X_k \ge \lambda\right) \le \frac{\mathbb{E}[X_n]}{\lambda}, \quad \text{for all } \lambda > 0. \]
Define the event:
\( A := \left\{ \max_{0 \le k \le n} X_k \ge \lambda \right\} \).
For each \( k \), define the indicator:
\[ I_k := \begin{cases} 1, & \text{if } X_k \ge \lambda \text{ and } X_j < \lambda \text{ for all } j < k, \\ 0, & \text{otherwise}. \end{cases} \]
Then exactly one \( I_k = 1 \) on the event \( A \), so \( \sum_{k=0}^n I_k = \mathbf{1}_A \).
Now observe that:
\[ X_n \ge \sum_{k=0}^{n} X_k I_k \ge \sum_{k=0}^{n} \lambda I_k = \lambda \cdot \mathbf{1}_A. \]
Taking expectation:
\[ \mathbb{E}[X_n] \ge \mathbb{E}\left[ \lambda \cdot \mathbf{1}_A \right] = \lambda \cdot \mathbb{P}(A). \]
Rearranging gives the desired inequality:
\[ \mathbb{P}\left( \max_{0 \le k \le n} X_k \ge \lambda \right) \le \frac{\mathbb{E}[X_n]}{\lambda}. \]
We start with an urn containing 1 red and 1 green ball. We play the following game repeatedly:
This process continues indefinitely. Let \(X_n\) be the fraction of red balls after the \(n\)th draw.
At each step, the probability of drawing a red ball is \(X_n\), and green is \(1 - X_n\). The key point is:
The expected value of \(X_{n+1}\) given \(X_n\) is exactly \(X_n\). This makes \(X_n\) a martingale.
Although every player's sequence of red-ball fractions will stabilize to a fixed number, that number is random and varies across different games. In fact, the final limiting value is uniformly distributed on the interval (0, 1).
Let \(R_n\) be the number of red balls after the \(n\)th draw. Then the total number of balls is \(n + 2\). The fraction of red balls is:
\(X_n = R_n / (n + 2)\)At step \(n+1\):
So the expected number of red balls after the next draw is:
\(E[R_{n+1} | X_n] = R_n + X_n\)Then:
\( E[X_{n+1} | X_n] = E[R_{n+1} / (n+3) | X_n] = (R_n + X_n) / (n + 3) \)\(= ((n + 2)X_n + X_n) / (n + 3) = X_n\)
This confirms that \(X_n\) is a martingale.
Let us compute the probability distribution of \(R_n\). By iteratively applying the rules, we find:
\(P(R_n = k) = 1 / (n + 1), ∀ k = 1, 2, ..., n + 1\)Therefore, the values of \(X_n = k / (n + 2)\) are equally likely. In the limit as \(n → ∞\), we get:
\(P(X = x) = x, ∀ 0 < x < 1\)So the limit \(X\) is uniformly distributed over \((0, 1)\).
This example shows how a simple probabilistic process can result in a complex and interesting distribution. It also illustrates the power of martingales in analyzing such processes, even though the randomness seems to grow over time.
Use the law of total probability for conditional expectations to show:
E[Xβββ | Xβ, ..., Xβ] = E[ E[Xβββ | Xβ, ..., Xβββ] | Xβ, ..., Xβ ]
Conclude that if (Xβ) is a martingale, then:
E[Xβββ | Xβ, ..., Xβ] = Xβ
Step 1: By the law of total expectation (a.k.a. tower property):
E[Xβββ | Xβ, ..., Xβ] = E[ E[Xβββ | Xβ, ..., Xβββ] | Xβ, ..., Xβ ]
This is a standard identity in probability theory, showing how conditioning on more and then fewer variables returns to the coarser level of conditioning.
Step 2: If (Xβ) is a martingale, then:
E[Xβββ | Xβ, ..., Xβββ] = Xβββ and E[Xβββ | Xβ, ..., Xβ] = Xβ
Using the result from step 1:
E[Xβββ | Xβ, ..., Xβ] = E[Xβββ | Xβ, ..., Xβ] = Xβ
Conclusion: If (Xβ) is a martingale, then:
E[Xβββ | Xβ, ..., Xβ] = Xβ
Consider a stochastic process that evolves according to the following rules:
Xβ = 0, then Xβ = 0 for all n β₯ 0.Xβ > 0, then
Xβββ = Xβ + 1 with probability 1/2,Xβββ = Xβ - 1 with probability 1/2.
(a) Show that Xβ is a nonnegative martingale.
(b) Suppose that Xβ = i > 0. Use the maximal inequality to bound
Pr(Xβ β₯ M for some n β€ N). Note: Xβ represents the fortune of a player in a fair game who wagers $1 at each bet and quits if all money is lost (X = 0).
Xβ is a nonnegative martingaleWe check two conditions for a martingale:
Xβ are bounded below by 0 and change by Β±1 at each step, E[|Xβ|] < β.Xβ = 0, then Xβββ = 0 with probability 1, so E[Xβββ | Xβ] = 0 = Xβ.Xβ > 0, then:
E[Xβββ | Xβ] = (1/2)(Xβ + 1) + (1/2)(Xβ - 1) = Xβ
Thus, Xβ is a nonnegative martingale.
We use the maximal inequality for nonnegative martingales:
Pr(maxβ β€ k β€ N Xβ β₯ M) β€ E[X_N] / M
Since Xβ is a martingale with Xβ = i, we have E[X_N] = i. Therefore:
Pr(maxβ β€ k β€ N Xβ β₯ M) β€ i / M
Example: If M = 2i, then the probability the gambler ever reaches twice their fortune is β€ 1/2.
Last updated: May 15, 2025