Fundamentals 8 min read

Unlocking Random Variables: Expectation, Variance, and Higher-Order Moments Explained

This article introduces the key numerical characteristics of random variables—including expectation, variance, covariance, correlation, and higher-order moments such as skewness and kurtosis—explains their definitions, properties, and relationships, and extends these concepts to random vectors and matrix representations.

Model Perspective

Jul 29, 2022

Unlocking Random Variables: Expectation, Variance, and Higher-Order Moments Explained

Numerical Characteristics of Random Variables

Definition: For a discrete random variable X with probability mass function p(x), its expectation is E[X] = \sum x\,p(x).

Definition: For a continuous random variable X with probability density function f(x), its expectation is E[X] = \int x\,f(x)\,dx.

The expectation operator is linear: for any constants a and b, E[aX + b] = aE[X] + b.

Definition: The variance of a random variable X is Var(X) = E[(X - E[X])^2].

The square root of the variance is the standard deviation, denoted σ_X.

Definition: The covariance of random variables X and Y is Cov(X,Y) = E[(X - E[X])(Y - E[Y])].

If Cov(X,Y) > 0 the variables are positively correlated; if Cov(X,Y) < 0 they are negatively correlated. Cov(X,Y) = 0 implies uncorrelated, though not necessarily independent. A convenient formula is Cov(X,Y) = E[XY] - E[X]E[Y].

Covariance also satisfies linearity: Cov(aX+b, cY+d) = ac\,Cov(X,Y).

Standardizing covariance yields the correlation coefficient ρ_{XY} = Cov(X,Y)/(σ_X σ_Y).

For a random variable X, a series of numerical characteristics called moments can be defined. The k‑th raw (origin) moment is μ'_k = E[X^k]. The k‑th central moment is μ_k = E[(X-μ)^k], where μ = E[X] is the first raw moment (mean). The second central moment is the variance. The third central moment measures asymmetry (skewness) and the fourth central moment measures peakedness (kurtosis).

Definition: Skewness of X is γ_1 = E[(X-μ)^3] / σ^3. For a symmetric distribution (e.g., normal) the skewness is 0.

Definition: Kurtosis of X is γ_2 = E[(X-μ)^4] / σ^4. The normal distribution has kurtosis 3.

Definition: Excess kurtosis is γ_2 - 3. Distributions with excess kurtosis > 0 have heavier tails than the normal distribution.

Skewness and kurtosis of a normal distribution can be used to test normality. More generally, for any function g, the expectation E[g(X)] is called the moment of g(X).

Definition: Conditional expectation of Y given X is E[Y\mid X] = \int y\,f_{Y|X}(y\mid X)\,dy.

Because the conditioning variable is integrated out, E[Y\mid X] is a function of X.

Definition: Conditional variance of Y given X is Var(Y\mid X) = E[(Y - E[Y\mid X])^2 \mid X].

Again, after integration the conditional variance is a function of X.

To introduce numerical characteristics of random vectors, we recall related matrix concepts.

Definition: A symmetric matrix A is positive semidefinite if for any non‑zero column vector z, z^T A z \ge 0.

Definition: A symmetric matrix A is positive definite if for any non‑zero column vector z, z^T A z > 0.

Geometrically, a positive‑definite matrix can be diagonalized by a coordinate transformation into a diagonal matrix with all positive entries (its eigenvalues). Hence its determinant is non‑zero and an inverse exists. In one dimension, a positive‑definite matrix reduces to a positive number.

Similarly, one can define negative‑definite and negative‑semidefinite matrices.

Proposition: For any matrix B, the matrix B^T B is positive semidefinite.

Proof: Since B^T B is symmetric and for any vector z, z^T (B^T B) z = (Bz)^T (Bz) = \|Bz\|^2 \ge 0, the matrix is positive semidefinite.

Definition: For an n‑dimensional random vector X, its covariance matrix Σ = Cov(X) = E[(X-μ)(X-μ)^T] is a symmetric positive semidefinite matrix.

The diagonal elements of Σ are the variances of the individual components; the off‑diagonal elements are the covariances between components. If the vector is constant (non‑random), Σ = 0.

Proposition: The covariance matrix of an n‑dimensional random vector is positive semidefinite.

Proof: By definition Σ = E[(X-μ)(X-μ)^T] is symmetric, and for any vector z, z^T Σ z = E[(z^T (X-μ))^2] \ge 0, establishing positive semidefiniteness.

Definition: For two random vectors X (p‑dimensional) and Y (q‑dimensional), the cross‑covariance matrix is Cov(X,Y) = E[(X-μ_X)(Y-μ_Y)^T].

Iterated Expectation Law

Theorem (Law of Iterated Expectation): For random variables X and Y, E[ E[Y \mid X] ] = E[Y].

Proof for continuous variables follows from Fubini’s theorem; the discrete case is analogous. More generally, for any integrable function g, E[ E[g(Y) \mid X] ] = E[g(Y)].

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics probability covariance Moments random variables

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.