Fundamentals 5 min read

Matrix-to-Matrix Derivatives: Definitions, Differential Method & Examples

This article explains the definition of matrix‑to‑matrix derivatives, introduces the vectorization‑based differential approach using Kronecker products, presents key matrix‑vectorization properties, and walks through detailed examples illustrating how to compute such derivatives, highlighting their role and limitations in machine‑learning optimization.

Model Perspective

Oct 10, 2022

Matrix-to-Matrix Derivatives: Definitions, Differential Method & Examples

Definition of Matrix-to-Matrix Derivative

Assume we have a matrix A and want to differentiate with respect to a matrix B. According to the first derivative definition, each element of A is differentiated with respect to each element of B, resulting in a total of m·n·p·q derivatives (where A is m×n and B is p×q). Two intuitive definitions exist: (1) differentiate each element of A with respect to each element of B, yielding a tensor that can be reshaped into an (m·n)×(p·q) matrix; (2) differentiate each element of B with respect to each element of A, yielding a similar matrix. Although these definitions are valid, they are cumbersome for practical use, especially in machine‑learning contexts.

The prevailing approach vectorizes the matrices (using column‑wise vectorization) and then applies vector‑to‑vector derivative rules, resulting in a matrix of size (mn)×(pq).

Differential Method for Matrix-to-Matrix Derivative

Vectorizing matrices enables the use of differential calculus. Recall the scalar‑to‑vector differential rule (the article would show the formula). For matrix‑to‑matrix derivatives we replace the trace operator with vectorization.

Key matrix‑vectorization properties used in the differential method include:

Linearity.

Matrix multiplication: vec(AXB) = (Bᵀ ⊗ A) vec(X), where ⊗ denotes the Kronecker product.

Matrix transpose: vec(Aᵀ) = K vec(A), where K is the commutation matrix that converts column‑wise to row‑wise vectorization.

Element‑wise multiplication: vec(AB) = D_A vec(B), where D_A is a diagonal matrix containing the column‑wise elements of A.

Kronecker product rules similarly apply.

Example of Matrix-to-Matrix Derivative

Consider matrices X and Y of compatible dimensions and the expression Z = X Yᵀ X. Differentiating Z with respect to X using the differential method, we first write the differential, then vectorize both sides, apply the properties above, and finally reshape the result to obtain the derivative.

A more complex example is provided, showing step‑by‑step application of the vectorization properties to arrive at the final derivative expression.

Summary

Because matrix‑to‑matrix derivatives involve Kronecker products, they differ from other matrix derivative types and are rarely used directly in machine‑learning algorithm optimization, except for qualitative analysis.

Source: Liu Jianping, Pinard Blog.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning vectorization derivative matrix calculus Kronecker product

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.