Understanding Matrix and Vector Derivatives: Layouts and Jacobians Explained
This article introduces matrix and vector differentiation, explains the nine possible derivative cases, clarifies numerator and denominator layouts, and shows how Jacobian and gradient matrices arise, providing a concise foundation for machine‑learning calculus.
Introduction to Matrix and Vector Differentiation
In higher mathematics we have learned scalar‑to‑scalar differentiation, such as the derivative of a scalar with respect to another scalar. When a set of scalars is differentiated with respect to a scalar, we obtain a set of scalar results.
If we arrange this set of scalars as a vector, the derivative of an \(m\)-dimensional vector with respect to a scalar is also an \(m\)-dimensional vector. In other words, vector‑to‑scalar differentiation means differentiating each component of the vector separately and then stacking the results into a vector. Similar conclusions hold for scalar‑to‑vector, vector‑to‑vector, vector‑to‑matrix, matrix‑to‑vector, and matrix‑to‑matrix differentiation. Essentially, matrix‑vector calculus is multivariate differentiation expressed in vector or matrix form for convenience.
For brevity, we will use \(x\) to denote a scalar variable, \(\mathbf{x}\) for an \(n\)-dimensional vector, and \(\mathbf{X}\) for an \(m\times n\) matrix. The dependent variable will be denoted by \(y\), \(\mathbf{y}\), and \(\mathbf{Y}\) respectively.
Definition of Matrix‑Vector Derivatives
Depending on whether the independent and dependent variables are scalars, vectors, or matrices, there are nine possible derivative definitions. The scalar‑to‑scalar case is covered in basic calculus, leaving eight cases. We first discuss five cases: scalar‑to‑vector/matrix, vector/matrix‑to‑scalar, and vector‑to‑vector differentiation.
Consider a vector \(\mathbf{v}\) of dimension \(m\) differentiated with respect to a scalar \(x\); the result is an \(m\)-dimensional vector. A common question is whether this result should be a column vector or a row vector. Both are acceptable because differentiation merely stacks scalar results, but in machine‑learning algorithms the orientation must be consistent, which leads to the concept of derivative layout.
Matrix‑Vector Derivative Layouts
To make derivative results unique, we introduce two basic layouts: numerator layout and denominator layout.
In the numerator layout, the dimensions of the result follow the numerator (the independent variable). For example, if \(\mathbf{v}\) is an \(m\)-dimensional column vector, then \(\frac{\partial \mathbf{v}}{\partial x}\) is also an \(m\)-dimensional column vector; if \(\mathbf{v}\) is a row vector, the result is an \(m\)-dimensional row vector.
In the denominator layout, the dimensions follow the denominator (the dependent variable). Thus, if \(\mathbf{v}\) is a column vector, \(\frac{\partial \mathbf{v}}{\partial x}\) becomes an \(m\)-dimensional row vector, and vice versa.
The two layouts differ only by a transpose operation.
For example, differentiating a scalar \(x\) with respect to a matrix \(\mathbf{A}\) yields, under the denominator layout, a result whose dimensions match \(\mathbf{A}\); under the numerator layout, the result has transposed dimensions.
Vector‑to‑vector differentiation is slightly more involved. Considering only column‑vector‑to‑column‑vector cases, the derivative consists of \(m\times n\) scalar derivatives arranged in a matrix. Under the numerator layout this matrix is called the Jacobian; under the denominator layout it is called the gradient matrix. Some references use different symbols for these matrices, but the meaning is the same.
In practice, a mixed layout is often adopted: for vector or matrix‑to‑scalar derivatives we use the numerator layout, while for scalar‑to‑vector or scalar‑to‑matrix derivatives we use the denominator layout. For vector‑to‑vector derivatives I prefer the numerator‑layout Jacobian.
Summary of Matrix‑Vector Differentiation Basics
With the definitions and default layouts established, we can now derive common differentiation rules for the five cases and discuss the chain rule for vector differentiation.
Source
刘建平Pinard https://www.cnblogs.com/pinard/p/10750718.html
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.