Matrix-to-Matrix Derivatives: Definitions, Differential Method & Examples
This article explains the definition of matrix‑to‑matrix derivatives, introduces the vectorization‑based differential approach using Kronecker products, presents key matrix‑vectorization properties, and walks through detailed examples illustrating how to compute such derivatives, highlighting their role and limitations in machine‑learning optimization.
Definition of Matrix-to-Matrix Derivative
Assume we have a matrix A and want to differentiate with respect to a matrix B. According to the first derivative definition, each element of A is differentiated with respect to each element of B, resulting in a total of m·n·p·q derivatives (where A is m×n and B is p×q). Two intuitive definitions exist: (1) differentiate each element of A with respect to each element of B, yielding a tensor that can be reshaped into an (m·n)×(p·q) matrix; (2) differentiate each element of B with respect to each element of A, yielding a similar matrix. Although these definitions are valid, they are cumbersome for practical use, especially in machine‑learning contexts.
The prevailing approach vectorizes the matrices (using column‑wise vectorization) and then applies vector‑to‑vector derivative rules, resulting in a matrix of size (mn)×(pq).
Differential Method for Matrix-to-Matrix Derivative
Vectorizing matrices enables the use of differential calculus. Recall the scalar‑to‑vector differential rule (the article would show the formula). For matrix‑to‑matrix derivatives we replace the trace operator with vectorization.
Key matrix‑vectorization properties used in the differential method include:
Linearity.
Matrix multiplication: vec(AXB) = (Bᵀ ⊗ A) vec(X), where ⊗ denotes the Kronecker product.
Matrix transpose: vec(Aᵀ) = K vec(A), where K is the commutation matrix that converts column‑wise to row‑wise vectorization.
Element‑wise multiplication: vec(AB) = D_A vec(B), where D_A is a diagonal matrix containing the column‑wise elements of A.
Kronecker product rules similarly apply.
Example of Matrix-to-Matrix Derivative
Consider matrices X and Y of compatible dimensions and the expression Z = X Yᵀ X. Differentiating Z with respect to X using the differential method, we first write the differential, then vectorize both sides, apply the properties above, and finally reshape the result to obtain the derivative.
A more complex example is provided, showing step‑by‑step application of the vectorization properties to arrive at the final derivative expression.
Summary
Because matrix‑to‑matrix derivatives involve Kronecker products, they differ from other matrix derivative types and are rarely used directly in machine‑learning algorithm optimization, except for qualitative analysis.
Source: Liu Jianping, Pinard Blog.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.