Chapter 1: Introduction to Stochastic Processes

Study notes and chapter summaries

1. What is a Stochastic Process?

Derived from the Greek, "Stochastic" means "random" or "chance". It's antonym is "sure", "deterministic", or "certain". While a deterministic model ensures a single outcome from a given set of circumstances, a stochastic model predicts the possible set of outcomes with their respective likelihoods or probabilities. While a coin tossed in the air will surely land somewhere on the land around us, whether it lands on its heads or tails is random, with each outcome being assigned a probability of 0.5 . It's interesting to note that phenomena are not themselves stochastic or deterministic, it is just how the observer perceives the model. For example, changes in the level of a huge population are often considered and modeled deteministically, although we may agree upon the fact that many chance events contribute to their fluctuations.

As for the definition, A stochastic process is a family of random variables \( X_t \), where t is a parameter running over a suitable index set T. Usually, t indicates the discrete units of time, and the index set becomes \( T= \{0,1,2, \ldots \} \). Stochastic processes where \( T= [0,\infty) \) become important in different applications

2. Review of Probability

2.1. Random Variables:

Let \( \Omega \) be a non-empty set equipped with a \( \sigma \)-algebra \( \mathcal{F} \). A random variable is a function \( X : \Omega \to \mathbb{R} \) such that for every Borel set \( A \subseteq \mathbb{R} \), we have the pre-image \( X^{-1}(A) \in \mathcal{F} \); i.e., \(X\) is \(\mathcal{F}\) - measurable.

2.2. Continuous and discrete Random Variables:

  • CDF: Let \(X\) be a random vaiable. Then its Cumulative Distribution Function (CDF) is defined as \(F : \mathbb{R} \to [0,1] \) given by $$ F(x)= P(X \leq x), \quad x \in \mathbb{R} $$
  • Continuous Distribution: A random variable is said to be continuous if its CDF is continuous
  • Discrete Distributions: If the CDF of a random variable has all the properties of a CDF but is discontinuous, it is said to be discrete random variable
  • Density function: A function \(f : \mathbb{R} \to [0, \infty) \) is said to be the density function of some rv having CDF \(F\) iff $$ \int_{-\infty}^{\infty} f(x)\, dx = 1 $$
  • Probability Mass Function (PMF): A discrete random variable \(X\) taking values \( x_1, x_2, \ldots \) with probabilities \( p_1, p_2, \ldots \),respectively. The Probability Mass function (PMF) of \( X \) is given as \( p : \mathbb{R} \to [0, 1] \), where \[ p(x) = \begin{cases} p_1 & \text{if } x = x_1 \\ p_2 & \text{if } x = x_2 \\ p_3 & \text{if } x = x_3 \\ . \\ . \\ . \\ 0 & \text{otherwise.} \end{cases} \] where \( \sum p_i = 1 \).
  • 2.3. Moments and Expectations

    If X is a discrete rv, its \(m\)-th moment is given by $$ E[X^m] = \sum_{i} x_i^m \, \Pr(X = x_i) $$ For continuous rv, the \(m\)-th moment is given by $$ E[X^m] = \int_{-\infty}^{\infty} x^m \, f(x) dx $$ The \(m\)-th central moment is defined as the \(m\)-th order moment of the random variable \(X- \mu_x \), provided that \(\mu_x\) exists finitely.

    3. Some Discrete Distributions

    Distribution Notation PMF Characteristic Function
    Bernoulli \( X \sim \mathrm{Bern}(p) \) \( p(x) = p^x(1-p)^{1-x},\ x \in \{0,1\} \) \( \varphi_X(t) = 1 - p + pe^{it} \)
    Binomial \( X \sim \mathrm{Bin}(n, p) \) \( p(x) = \binom{n}{x}p^x(1-p)^{n-x} \) \( \varphi_X(t) = (1 - p + pe^{it})^n \)
    Poisson \( X \sim \mathrm{Pois}(\lambda) \) \( p(x) = \frac{e^{-\lambda} \lambda^x}{x!} \) \( \varphi_X(t) = \exp(\lambda(e^{it} - 1)) \)
    Geometric \( X \sim \mathrm{Geom}(n, p) \) \( f(x) = p(1-p)^{x-1}, x \in \{1,2, \ldots\} \) \( \varphi_X(t) = \frac{pe^{it}}{1 - (1 - p)e^{it}} \)

    4. Some Continuous Distributions

    Distribution Notation PDF Characteristic Function
    Uniform \( X \sim \mathrm{Unif}(a,b) \) \( f(x) = \frac{1} {b-a},\ x \in [a,b] \) \( \varphi_X(t) = \frac{e^{itb}-e^{ita}} {it(b-a)} \)
    Exponential \( X \sim \mathrm{Exp}(\lambda) \) \( f(x) = \lambda e^{-\lambda x} , \ x \geq 0 \) \( \varphi_X(t) = \frac{\lambda}{\lambda -ib} \)
    Gamma \( X \sim \mathrm{Gamma}(p (rate), \alpha (shape)) \) \( f(x) = \frac{p^{\alpha}x^{\alpha -1}e^{-px}}{\gamma(\alpha)} , \ x > 0 \) \( \varphi_X(t) = {(\frac{p}{p-it})}^{\alpha} \)
    Beta \( X \sim \mathrm{Beta}(a, b) \) \( f(x) = \frac{x^{a-1}(1-x)^{b-1}}{\beta(a,b)}, \ x \in (0,1) \) Not elementary in closed form
    Cauchy \( X \sim \mathrm{Cauchy}(x_0, \gamma) \) \( f(x) = \frac{1}{\pi \gamma \left[1 + \left(\frac{x - x_0}{\gamma}\right)^2 \right]} \) \( \varphi_X(t) = e^{i x_0 t - \gamma |t|} \)
    Normal (Univariate) \( X \sim \mathcal{N}(\mu, \sigma^2) \) \( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x - \mu)^2}{2\sigma^2} } \) \( \varphi_X(t) = e^{i \mu t - \frac{1}{2} \sigma^2 t^2} \)
    Normal (Multivariate) \( \mathbf{\vec{X}} \sim \mathcal{N}_n(\boldsymbol{\vec{\mu}}, \boldsymbol{\Sigma}) \) \( f(\mathbf{\vec{x}}) = \frac{1}{(2\pi)^{n/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\left( -\frac{1}{2} (\mathbf{\vec{x}} - \boldsymbol{\vec{\mu}})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{\vec{x}} - \boldsymbol{\vec{\mu}}) \right) \) \( \varphi_{\mathbf{X}}(\mathbf{t}) = \exp\left( i \mathbf{t}^\top \boldsymbol{\mu} - \frac{1}{2} \mathbf{t}^\top \boldsymbol{\Sigma} \mathbf{t} \right) \)
    Chi-square \( X \sim \chi^2_k \) \( f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2},\ x > 0 \) \( \varphi_X(t) = (1 - 2it)^{-k/2} \)
    t-distribution \( T \sim t_\nu \) \( f(t) = \frac{\Gamma\left(\frac{\nu + 1}{2}\right)}{\sqrt{\nu \pi} \ \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-(\nu + 1)/2} \) No simple closed form, but the k-th order moment exists only when the degrees of freedom (= n) >k
    F-distribution \( F \sim F_{d_1, d_2} \) \( f(x) = \frac{1}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right)} \left( \frac{d_1}{d_2} \right)^{d_1/2} \frac{x^{d_1/2 - 1}}{\left(1 + \frac{d_1}{d_2} x \right)^{(d_1 + d_2)/2}},\ x > 0 \) No simple closed form, but the k-th moment exists only if \(d_2\) > 2π‘Ÿ

    5. Inequalities at a glance

    5.1 The Schwarz Inequality

    Let X and Y have finite second moments. Then $$ {E[XY]}^2 \leq E[X^2]E[Y^2] $$

    5.2 The Markov Inequality

    Let X be a non-negative random variable, and t be a positive number. Then $$ P(X \geq t) \leq \frac{E[X]}{t} $$

    5.3 The Chebyshev's Inequality

    Let X be a random variable with mean \( \mu \) and finite variance \( \sigma^2 \) . Then for any real number t > 0 $$ P(|X- \mu| \geq t) \leq \frac{\sigma^2}{t^2} $$

    6. Random Sums

    Let \( X_1 , X_2, \ldots \) be a sequence of IID random variables, and let N be a discrete random variable, independent of all \(X_i\)s, and having PMF \( p_N (n) = \Pr (N=n), \ n \in {0,1,2, \ldots } \) We define the random sum \(S_N\) as $$ S_N = \sum_{i=1}^{N} X_i, \ N>0 $$

    Examples

    (a) Queueing: N= number of customers arriving at a service facility in a given time period, \(X_i\) = Service time for the i-th customer. Then \(S_N\) = Total demand for service time.

    (b) Risk Theory: N= total number of claims arriving at insurance company, \(X_i\)= amount of the i-th claim. Then \(S_N\) = Total liability of the insurance company.

    7. Moments of a Random Sum

    Let's assume that \(E[X_i] = \mu, E[N] = \nu, Var[X_i] = \sigma^2 , Var[N] = \tau^2. \)

    We now find the value of \(E[S_N]\). This can be found out as \(E[S_N] \) $$= \sum_{n=0}^{\infty} E[S_N|N=n]p_N(n) $$ $$= \sum_{n=1}^{\infty} E[X_1 + X_2 + X_3 + \ldots + X_N |N=n]p_N(n) $$ $$= \sum_{n=1}^{\infty} E[X_1 + X_2 + X_3 + \ldots + X_n]p_N(n) $$ $$= \sum_{n=1}^{\infty} \mu np_N(n) $$ $$= \mu \sum_{n=1}^{\infty} np_N(n) $$ $$= \mu \nu $$

    We can as well find \( Var[S_N] \) which turns out to be $$ Var[S_N] = \mu^2 \tau^2 + \nu\sigma^2 $$

    8. The Distribution of a Random Sum

    Consider the previous setup of sum of Random Variables. We are now interested in finding the distribution of their sum, given each of them have the density \(f(x)\). Then we know, $$ f_{S_1}(x) = f_{X_1}(x) = f(x) $$ $$ f_{S_2}(x) = f_{X_1 + X_2}(x) $$ which can be found using the transformation \(h(Y_1,Y_2)= (X_1,X_1+X_2) \) this turns out to be $$ f_{S_2}(x) = \int_{\infty}^{\infty} f(z)f(x-z) dz $$ Similarly, $$ f_{S_n}(x) = \int_{\infty}^{\infty} f_{S_{n-1}}(x-z)f(z) dz $$

    8.1 On Stock Price Changes:

    Let Z = Closing price on the (n+1)th day - Closing price on the nth day Then Z = \(\sum_{i=1}^{N} X_i\), where \(X_i\) denotes the i-th transaction, each of which are independent , with a finite variance. As a result, CLT applies, suggesting it should be approximately normally distributed. This is the classical assumption used in many financial models. However, In real data, very small and very large changes happen more often than a normal distribution predicts, while medium-sized changes occur less often than expected. This might be because Z is rather a random sum, where N is random. Let's assume \( N \sim \mathcal{Pois}(\lambda)\) and \( X_i \sim \mathbb{N}(0, \sigma^2) \) Then \( Z|n \sim \mathbb{N}(0, (n+1)\sigma^2)\) & \( p_N (n) = \frac{\lambda^n e^{-\lambda}}{n!} \) Using this two equations, we get: $$ f_Z(z) = \sum_{n=0}^{\infty} \frac{exp(\frac{z^2}{2(n+1)\sigma^2})}{\sqrt{2\pi(n+1)}\sigma} \frac{\lambda^n e^{-\lambda}}{n!}$$ The resulting distribution is not normal but still has mean 0 and finite variance. This model explains why we might see heavy tails (more extreme events) and sharper peaks in the distribution of stock price changes β€” without violating the independence or Gaussian assumptions per transaction. The only difference is adding randomness to how many such events happen per day. This random-sum model produces a distribution that resembles real-world data better than the simple normal model. It's not a "proof" that transaction count randomness causes this β€” just that it’s a plausible explanation supported by the math

    Stock Price Model
    Figure 1: The dashed line (random sum model) is more peaked and heavier-tailed than the solid normal curve β€” matching real stock data more closely.

    9. MARTINGALES

    9.1 Definition

    A stochastic process \(X_n : n= 0,1, \ldots \) is said to be martingale if \(∀ n= 0,1, \ldots\) $$ (a) E[|X_n|] < \infty $$ $$ (b) E[X_{n+1}| X_0, X_1, \ldots , X_n ] = X_n $$

    for the (b) part, we observe that $$ E[E[X_{n+1}| X_0, X_1, \ldots , X_n ]] = E[X_n] $$ also by the tower property, $$ E[E[X_{n+1}| X_0, X_1, \ldots , X_n ]] = E[X_{n+1}] $$ thus, we can observe that \(E[X_0]= E[X_k] ∀ k =0,1, \ldots \)

    9.2 Martingales and Fairness in Gambling & Markets

    The martingale property captures the idea of fairness in gambling: a player’s expected fortune after the next play equals the current fortune, regardless of past outcomes. This reflects a fair game, where no betting strategy (like the classic "martingale" doubling approach) can guarantee profit in the long run. While martingale theory has roots in gambling, it now finds broad applications beyond it.

    In financial markets, martingales model the notion of efficient pricing. For a security like a stock, if future prices could be predicted, buying or selling pressure would shift the current price. In an ideal, perfect market, such adjustments lead to a situation where future prices are, on average, unpredictable, making the price process a martingale.

    9.3 Martingales and Martingale Difference Sequences

    The most basic examples of martingales are sums of independent, mean zero random variables. Let \( Y_0, Y_1, \ldots \) be a sequence of independent, identically distributed random variables such that \( E[Y_n] = 0 \). Then the sequence of partial sums

    \[ X_n = \sum_{j=1}^{n} Y_j \] is a martingale relative to the sequence \( \{0, Y_1, Y_2, \ldots\} \) (that is, relative to the natural filtration generated by the variables \( Y_n \)). This is easily verified using the linearity and stability properties and the independence law for conditional expectation:

    \[ \begin{align*} E[X_{n+1} \mid \mathcal{F}_n] &= E[X_n + Y_{n+1} \mid \mathcal{F}_n] \\ &= E[X_n \mid \mathcal{F}_n] + E[Y_{n+1} \mid \mathcal{F}_n] \\ &= X_n + 0 = X_n \end{align*} \]

    The importance of martingales in modern probability stems at least in part from the fact that most of the essential properties of sums of independent, identically distributed random variables are inherited (with minor modification) by martingales.

    Proposition 1

    The martingale difference sequence \( \{\varepsilon_n\} \) has the following properties:
    (a) The random variable \( \varepsilon_n \) is a function of \( \mathcal{F}_n \);
    (b) For every \( n \geq 0 \), \( E[\varepsilon_{n+1} \mid \mathcal{F}_n] = 0 \).

    Proof:Assertion (b) is a three-line calculation using the properties of conditional expectation and the definition of a martingale.

    Corollary 1

    Let \( \{X_n\} \) be a martingale relative to \( \{Y_n\} \), with martingale difference sequence \( \{\varepsilon_n\} \). Then for every \( n \geq 0 \),
    \( E[X_n] = E[X_0] \).
    Moreover, if \( X_0 = 0 \) and \( E[X_n^2] < \infty \) then the random variables \( \varepsilon_j \) are uncorrelated, and so

    \[ E[X_n^2] = \sum_{j=1}^{n} E[\varepsilon_j^2] \]

    Proof:The first property follows almost trivially from Proposition 1 and the Expectation Law for conditional expectation, as these together imply that \( E[\varepsilon_n] = 0 \) for each \( n \). Summing and using linearity of expectation gives the result.

    The second property follows by observing that each \( \varepsilon_j \) has finite variance (since it is the difference of two variables with finite second moments). Using Cauchy-Schwarz, we can ensure the cross-products have finite first moments. Then, for \( j < k \leq n \), \( \varepsilon_j \) is \( \mathcal{F}_j \)-measurable, and by the properties of conditional expectation, \( E[\varepsilon_j \varepsilon_k] = E[\varepsilon_j] E[\varepsilon_k] = 0 \).

    Hence the variance of \( X_n \) is the sum of variances of the \( \varepsilon_j \), analogous to the case of i.i.d. mean-zero sums.

    9.4 Maximal Inequality for Non-negative martingales

    Let \( \{X_n\}_{n \geq 0} \) be a non-negative submartingale. Then for any \( \lambda > 0 \),

    \[ \mathbb{P}\left( \max_{0 \leq k \leq n} X_k \geq \lambda \right) \leq \frac{\mathbb{E}[X_n]}{\lambda} \]

    Proof:

    Let \( X_0, X_1, \ldots, X_n \) be non-negative random variables such that:

    We want to prove the inequality:

    \[ \mathbb{P}\left(\max_{0 \le k \le n} X_k \ge \lambda\right) \le \frac{\mathbb{E}[X_n]}{\lambda}, \quad \text{for all } \lambda > 0. \]

    Define the event:

    \( A := \left\{ \max_{0 \le k \le n} X_k \ge \lambda \right\} \).

    For each \( k \), define the indicator:

    \[ I_k := \begin{cases} 1, & \text{if } X_k \ge \lambda \text{ and } X_j < \lambda \text{ for all } j < k, \\ 0, & \text{otherwise}. \end{cases} \]

    Then exactly one \( I_k = 1 \) on the event \( A \), so \( \sum_{k=0}^n I_k = \mathbf{1}_A \).

    Now observe that:

    \[ X_n \ge \sum_{k=0}^{n} X_k I_k \ge \sum_{k=0}^{n} \lambda I_k = \lambda \cdot \mathbf{1}_A. \]

    Taking expectation:

    \[ \mathbb{E}[X_n] \ge \mathbb{E}\left[ \lambda \cdot \mathbf{1}_A \right] = \lambda \cdot \mathbb{P}(A). \]

    Rearranging gives the desired inequality:

    \[ \mathbb{P}\left( \max_{0 \le k \le n} X_k \ge \lambda \right) \le \frac{\mathbb{E}[X_n]}{\lambda}. \]

    9.5 Urn Game and Martingale

    Intuitive Explanation

    We start with an urn containing 1 red and 1 green ball. We play the following game repeatedly:

    This process continues indefinitely. Let \(X_n\) be the fraction of red balls after the \(n\)th draw.

    At each step, the probability of drawing a red ball is \(X_n\), and green is \(1 - X_n\). The key point is:

    The expected value of \(X_{n+1}\) given \(X_n\) is exactly \(X_n\). This makes \(X_n\) a martingale.

    Although every player's sequence of red-ball fractions will stabilize to a fixed number, that number is random and varies across different games. In fact, the final limiting value is uniformly distributed on the interval (0, 1).

    Analytical Proof

    Let \(R_n\) be the number of red balls after the \(n\)th draw. Then the total number of balls is \(n + 2\). The fraction of red balls is:

    \(X_n = R_n / (n + 2)\)

    At step \(n+1\):

    So the expected number of red balls after the next draw is:

    \(E[R_{n+1} | X_n] = R_n + X_n\)

    Then:

    \( E[X_{n+1} | X_n] = E[R_{n+1} / (n+3) | X_n] = (R_n + X_n) / (n + 3) \)

    \(= ((n + 2)X_n + X_n) / (n + 3) = X_n\)

    This confirms that \(X_n\) is a martingale.

    Limit Distribution

    Let us compute the probability distribution of \(R_n\). By iteratively applying the rules, we find:

    \(P(R_n = k) = 1 / (n + 1), ∀ k = 1, 2, ..., n + 1\)

    Therefore, the values of \(X_n = k / (n + 2)\) are equally likely. In the limit as \(n → ∞\), we get:

    \(P(X = x) = x, ∀ 0 < x < 1\)

    So the limit \(X\) is uniformly distributed over \((0, 1)\).

    Conclusion

    This example shows how a simple probabilistic process can result in a complex and interesting distribution. It also illustrates the power of martingales in analyzing such processes, even though the randomness seems to grow over time.

    10. Few Problems for revision:

    Question 1:

    Use the law of total probability for conditional expectations to show:

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™] = E[ E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™β‚Šβ‚] | Xβ‚€, ..., Xβ‚™ ]
    

    Conclude that if (Xβ‚™) is a martingale, then:

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™] = Xβ‚™
    
    Solution

    Step 1: By the law of total expectation (a.k.a. tower property):

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™] = E[ E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™β‚Šβ‚] | Xβ‚€, ..., Xβ‚™ ]
      

    This is a standard identity in probability theory, showing how conditioning on more and then fewer variables returns to the coarser level of conditioning.

    Step 2: If (Xβ‚™) is a martingale, then:

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™β‚Šβ‚] = Xβ‚™β‚Šβ‚
    and
    E[Xβ‚™β‚Šβ‚ | Xβ‚€, ..., Xβ‚™] = Xβ‚™
      

    Using the result from step 1:

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™] = E[Xβ‚™β‚Šβ‚ | Xβ‚€, ..., Xβ‚™] = Xβ‚™
      

    Conclusion: If (Xβ‚™) is a martingale, then:

    E[Xβ‚™β‚Šβ‚‚ | Xβ‚€, ..., Xβ‚™] = Xβ‚™
      
    Question 2:

    Consider a stochastic process that evolves according to the following rules:

    (a) Show that Xβ‚™ is a nonnegative martingale.

    (b) Suppose that Xβ‚€ = i > 0. Use the maximal inequality to bound Pr(Xβ‚™ β‰₯ M for some n ≀ N). Note: Xβ‚™ represents the fortune of a player in a fair game who wagers $1 at each bet and quits if all money is lost (X = 0).

    Solution
    (a) Show that Xβ‚™ is a nonnegative martingale

    We check two conditions for a martingale:

    • Integrability: Since the values of Xβ‚™ are bounded below by 0 and change by Β±1 at each step, E[|Xβ‚™|] < ∞.
    • Martingale Property:
      • If Xβ‚™ = 0, then Xβ‚™β‚Šβ‚ = 0 with probability 1, so E[Xβ‚™β‚Šβ‚ | Xβ‚™] = 0 = Xβ‚™.
      • If Xβ‚™ > 0, then:

        E[Xβ‚™β‚Šβ‚ | Xβ‚™] = (1/2)(Xβ‚™ + 1) + (1/2)(Xβ‚™ - 1) = Xβ‚™

    Thus, Xβ‚™ is a nonnegative martingale.

    (b) Maximal Inequality Bound

    We use the maximal inequality for nonnegative martingales:

    Pr(maxβ‚€ ≀ k ≀ N Xβ‚– β‰₯ M) ≀ E[X_N] / M

    Since Xβ‚™ is a martingale with Xβ‚€ = i, we have E[X_N] = i. Therefore:

    Pr(maxβ‚€ ≀ k ≀ N Xβ‚– β‰₯ M) ≀ i / M

    Example: If M = 2i, then the probability the gambler ever reaches twice their fortune is ≀ 1/2.

    Last updated: May 15, 2025