Unlocking Random Variables: Expectation, Variance, and Higher-Order Moments Explained
This article introduces the key numerical characteristics of random variables—including expectation, variance, covariance, correlation, and higher-order moments such as skewness and kurtosis—explains their definitions, properties, and relationships, and extends these concepts to random vectors and matrix representations.
Numerical Characteristics of Random Variables
Definition: For a discrete random variable X with probability mass function p(x), its expectation is E[X] = \sum x\,p(x).
Definition: For a continuous random variable X with probability density function f(x), its expectation is E[X] = \int x\,f(x)\,dx.
The expectation operator is linear: for any constants a and b, E[aX + b] = aE[X] + b.
Definition: The variance of a random variable X is Var(X) = E[(X - E[X])^2].
The square root of the variance is the standard deviation, denoted σ_X.
Definition: The covariance of random variables X and Y is Cov(X,Y) = E[(X - E[X])(Y - E[Y])].
If Cov(X,Y) > 0 the variables are positively correlated; if Cov(X,Y) < 0 they are negatively correlated. Cov(X,Y) = 0 implies uncorrelated, though not necessarily independent. A convenient formula is Cov(X,Y) = E[XY] - E[X]E[Y].
Covariance also satisfies linearity: Cov(aX+b, cY+d) = ac\,Cov(X,Y).
Standardizing covariance yields the correlation coefficient ρ_{XY} = Cov(X,Y)/(σ_X σ_Y).
For a random variable X, a series of numerical characteristics called moments can be defined. The k‑th raw (origin) moment is μ'_k = E[X^k]. The k‑th central moment is μ_k = E[(X-μ)^k], where μ = E[X] is the first raw moment (mean). The second central moment is the variance. The third central moment measures asymmetry (skewness) and the fourth central moment measures peakedness (kurtosis).
Definition: Skewness of X is γ_1 = E[(X-μ)^3] / σ^3. For a symmetric distribution (e.g., normal) the skewness is 0.
Definition: Kurtosis of X is γ_2 = E[(X-μ)^4] / σ^4. The normal distribution has kurtosis 3.
Definition: Excess kurtosis is γ_2 - 3. Distributions with excess kurtosis > 0 have heavier tails than the normal distribution.
Skewness and kurtosis of a normal distribution can be used to test normality. More generally, for any function g, the expectation E[g(X)] is called the moment of g(X).
Definition: Conditional expectation of Y given X is E[Y\mid X] = \int y\,f_{Y|X}(y\mid X)\,dy.
Because the conditioning variable is integrated out, E[Y\mid X] is a function of X.
Definition: Conditional variance of Y given X is Var(Y\mid X) = E[(Y - E[Y\mid X])^2 \mid X].
Again, after integration the conditional variance is a function of X.
To introduce numerical characteristics of random vectors, we recall related matrix concepts.
Definition: A symmetric matrix A is positive semidefinite if for any non‑zero column vector z, z^T A z \ge 0.
Definition: A symmetric matrix A is positive definite if for any non‑zero column vector z, z^T A z > 0.
Geometrically, a positive‑definite matrix can be diagonalized by a coordinate transformation into a diagonal matrix with all positive entries (its eigenvalues). Hence its determinant is non‑zero and an inverse exists. In one dimension, a positive‑definite matrix reduces to a positive number.
Similarly, one can define negative‑definite and negative‑semidefinite matrices.
Proposition: For any matrix B, the matrix B^T B is positive semidefinite.
Proof: Since B^T B is symmetric and for any vector z, z^T (B^T B) z = (Bz)^T (Bz) = \|Bz\|^2 \ge 0, the matrix is positive semidefinite.
Definition: For an n‑dimensional random vector X, its covariance matrix Σ = Cov(X) = E[(X-μ)(X-μ)^T] is a symmetric positive semidefinite matrix.
The diagonal elements of Σ are the variances of the individual components; the off‑diagonal elements are the covariances between components. If the vector is constant (non‑random), Σ = 0.
Proposition: The covariance matrix of an n‑dimensional random vector is positive semidefinite.
Proof: By definition Σ = E[(X-μ)(X-μ)^T] is symmetric, and for any vector z, z^T Σ z = E[(z^T (X-μ))^2] \ge 0, establishing positive semidefiniteness.
Definition: For two random vectors X (p‑dimensional) and Y (q‑dimensional), the cross‑covariance matrix is Cov(X,Y) = E[(X-μ_X)(Y-μ_Y)^T].
Iterated Expectation Law
Theorem (Law of Iterated Expectation): For random variables X and Y, E[ E[Y \mid X] ] = E[Y].
Proof for continuous variables follows from Fubini’s theorem; the discrete case is analogous. More generally, for any integrable function g, E[ E[g(Y) \mid X] ] = E[g(Y)].
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.