How ConFIG Eliminates Gradient Conflicts for Faster Multi‑Task Deep Learning

The paper introduces ConFIG (Conflict‑Free Inverse Gradients), a mathematically proven method that resolves gradient conflicts among multiple loss terms in physics‑informed neural networks, multi‑task learning, and continual learning, and its momentum‑based variant M‑ConFIG that further accelerates training while maintaining accuracy.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
How ConFIG Eliminates Gradient Conflicts for Faster Multi‑Task Deep Learning

Problem Setting

Joint optimization of multiple loss terms is common in deep learning applications such as Physics‑Informed Neural Networks (PINNs), Multi‑Task Learning (MTL), and Continual Learning. The gradients of different losses often point in conflicting directions, causing optimization to stall or fail.

Limitations of Existing Approaches

Current methods mainly adjust loss weights based on numerical stiffness, convergence speed, or network initialization. Although weighting can improve accuracy, there is no consensus on an optimal weighting strategy.

ConFIG: Conflict‑Free Inverse Gradients

ConFIG constructs a unified update direction that is non‑conflicting with all individual loss gradients. For loss gradients g_i (i=1..K), the method seeks a direction d such that d·g_i > 0 for every i. The algorithm proceeds as follows:

Compute each loss gradient g_i.

Normalize to unit vectors u_i = g_i / ‖g_i‖.

Form the matrix U = [u_1 … u_K].

Solve the linear system Uᵀ d = 1 (equivalently d = (Uᵀ)^{-1} 1), where 1 is a K‑dimensional vector of ones.

Optionally scale d by a step size.

This inverse‑gradient construction guarantees a positive inner product with every loss gradient, ensuring simultaneous reduction of all losses. The projection length of each loss gradient onto d can be made uniform, providing equal optimization speed for all losses, and can be adaptively adjusted according to the degree of conflict.

All updated gradients are mutually non‑conflicting.

The projection length onto the update direction is uniform, yielding equal optimization rates.

The length can be adapted based on conflict severity.

Mathematically, the inverse of the normalized gradient matrix exists whenever the parameter space dimension exceeds the number of loss terms, and the paper provides a rigorous convergence proof.

Momentum‑Based Variant (M‑ConFIG)

To reduce computational overhead, M‑ConFIG replaces raw gradients with their exponential moving averages (momentum): m_i^t = β m_i^{t-1} + (1‑β) g_i^t. Only a subset of losses is back‑propagated each iteration; the remaining losses reuse momentum from previous steps. This reduces the number of gradient evaluations per step and, in practice, makes M‑ConFIG faster than standard weight‑based methods (average cost ≈ 0.56×).

Computational Cost

The matrix inversion cost is modest because the number of losses K is typically small. By computing gradients for only a subset of losses and reusing momentum, M‑ConFIG achieves a lower overall training cost while preserving accuracy.

Experimental Evaluation

Physics‑Informed Neural Networks (PINNs)

Benchmarks on several PDEs (including 2‑D Poisson and 3‑D Beltrami flow) show that ConFIG consistently outperforms Adam, achieving higher PDE residual reduction and lower boundary/initial condition losses, which translates into overall accuracy gains. M‑ConFIG matches or exceeds ConFIG when enough momentum updates per iteration are used, maintaining faster convergence throughout training.

Multi‑Task Learning (CelebA)

The CelebA dataset contains 40 binary facial attributes, forming a 40‑task MTL problem. ConFIG and M‑ConFIG achieve the best average F1 score and ranking among compared methods. Increasing the number of momentum updates per iteration (e.g., 20–30 updates) improves M‑ConFIG performance, eventually surpassing ConFIG while using only about 56 % of the training time.

Convergence Guarantees

The paper proves that if the parameter dimension d > K, the normalized gradient matrix U is full rank, its inverse exists, and the update direction d yields monotonic decrease of all loss terms.

Resources

Paper: https://arxiv.org/abs/2408.11104

Project page: https://tum-pbs.github.io/ConFIG/

GitHub repository: https://github.com/tum-pbs/ConFIG

Multi-Task LearningCONFIGMomentumPhysics-Informed Neural NetworksGradient Conflictdeep learning optimizationM-ConFIG
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.