Artificial Intelligence 23 min read

Conditionally Adaptive Augmented Lagrangian PINNs for Forward and Inverse PDE Solving (CMAME Open‑Source Code)

The article analyzes the multi‑objective loss imbalance in physics‑informed neural networks, introduces the CAPU algorithm that assigns independent adaptive penalty parameters via an RMSProp‑inspired update with a max‑protection rule, and demonstrates its superior accuracy on a range of forward and inverse PDE benchmarks, providing theoretical guarantees and open‑source PyTorch code.

AI Agent Research Hub

Apr 16, 2026

Conditionally Adaptive Augmented Lagrangian PINNs for Forward and Inverse PDE Solving (CMAME Open‑Source Code)

Background

Training physics‑informed neural networks (PINNs) is a multi‑objective optimization problem: the PDE residual, boundary conditions and initial conditions each contribute a loss term. Manual weighting is unreliable and existing dynamic‑weighting schemes such as SA‑PINN lack a rigorous constrained‑optimization foundation.

From Unconstrained to Constrained Optimization

The authors build on the PECANN framework and reformulate PINN training as a strict augmented‑Lagrangian constrained problem. Boundary and initial conditions become equality constraints, while the PDE residual remains the sole objective, eliminating the need for manually tuned loss weights.

CAPU Algorithm: Core Mathematics

Independent penalty parameters are introduced for each constraint, turning the augmented‑Lagrangian loss into a Hadamard product of penalty vectors and constraint residuals. The dual update follows the standard Lagrange‑multiplier rule.

RMSProp‑inspired adaptive update treats the dual update as a gradient step and the penalty scaling factor as a learning rate. For each constraint i , a squared‑gradient moving average v̄_i is maintained:

v̄_i^{(t)} = ζ·v̄_i^{(t‑1)} + (1‑ζ)·C_i(θ^{(t)})^2

The penalty is then updated as

μ_i^{(t)} = max( μ_i^{(t‑1)}, η / sqrt(v̄_i^{(t)} + ε) )

The max‑protection ensures that penalties never decrease, resolving the conflict between RMSProp’s tendency to shrink the update when constraint violation grows and the augmented‑Lagrangian principle that penalties should increase with violation.

Convergence Conditions

Lemma 1 proves that penalty parameters remain bounded once constraints stabilize, and Theorem 1 shows convergence of the Lagrange multipliers, guaranteeing numerical stability unlike pure penalty methods that require unbounded growth.

CAPU vs. SA‑PINN

SA‑PINN is identified as an adaptive‑penalty method without Lagrange multipliers, requiring penalty parameters to tend toward infinity for exact constraint satisfaction. CAPU, by contrast, achieves precise feasibility with finite penalties thanks to the dual variables.

Constraint Aggregation and Fourier Features

Standard PECANN applies pointwise constraints, leading to thousands of Lagrange multipliers. Empirical analysis reveals that multipliers for constraints of the same type share a clear probability distribution with a well‑defined expectation. CAPU therefore aggregates constraints using a mean‑square‑residual (MSR) formulation, reducing the number of multipliers to 2‑3 and eliminating an implicit variance penalty that harms stability.

For high‑frequency, multi‑scale problems a single Fourier feature layer (weights sampled from a standard normal distribution) suffices within the CAPU framework, unlike prior work that required multiple feature groups. This simplification stems from the constrained‑optimization formulation that fully exploits the expressive power of a standard MLP.

Non‑Overlapping Time‑Window Strategy

Long‑time PDE evolution suffers from causality loss when trained over the entire spatio‑temporal domain. CAPU partitions the total time interval into non‑overlapping sub‑domains, each handled by an independent neural network. The final state of one window becomes the initial‑condition constraint for the next, a mechanism naturally supported by the independent penalty parameters.

Implementation Details

The method is implemented in PyTorch and released at https://github.com/HiPerSimLab/PECANN-CAPU. The core update logic is:

class CAPU:
    def __init__(self, eta, zeta=0.99, omega_t=0.999, eps=1e-16):
        self.eta = eta          # penalty scaling factor vector
        self.zeta = zeta        # moving‑average coefficient
        self.omega_t = omega_t  # sub‑problem convergence threshold
        self.eps = eps
        self.lam = 1.0          # Lagrange multiplier
        self.mu = 1.0           # penalty parameter
        self.v_bar = 0.0        # squared‑gradient moving average
        self.prev_loss = float('inf')

    def step(self, constraints, aug_loss):
        # update moving average of constraint squares
        self.v_bar = self.zeta * self.v_bar + (1 - self.zeta) * constraints**2
        # check sub‑problem convergence
        omega = aug_loss / self.prev_loss
        if omega >= self.omega_t:
            # dual update
            self.lam = self.lam + self.mu * constraints
            # CAPU penalty update (max protection)
            self.mu = torch.max(self.mu, self.eta / torch.sqrt(self.v_bar + self.eps))
        self.prev_loss = aug_loss

Typical training pipelines combine Adam (or L‑BFGS) updates for the network parameters with the CAPU step. Three strategies are recommended:

Adam warm‑up (≈3000 steps) followed by L‑BFGS fine‑tuning (≈2000 steps) for problems requiring high precision.

Pure Adam with a ReduceLROnPlateau scheduler for smoother loss landscapes.

Pure L‑BFGS with strong Wolfe line search when the problem size permits.

Experimental Evaluation

Composite‑Material Heat Conduction

Six‑layer fully‑connected networks (60 neurons per layer) trained for 500 k Adam steps. CAPU’s aggregated‑constraint version matches the exact solution at material interfaces, while the MPU and CPU baselines exhibit large temperature and flux errors. Relative error is reduced by roughly a factor of two.

Supersonic Sparse Wave (Burgers)

Three‑layer MLP (20 neurons per layer) trained for 10 k Adam steps. CAPU maintains accurate wave profiles on both coarse (66×33) and fine (130×65) grids, achieving an order‑of‑magnitude lower error than MPU/CPU.

Helmholtz Equation

Six‑layer MLP (128 neurons per layer) trained for 500 k Adam steps. CAPU‑Adam yields mean errors two orders of magnitude lower than the baseline PINN. Remarkably, CAPU‑L‑BFGS attains comparable accuracy with only 3 441 parameters, outperforming a 82 304‑parameter cPIKAN+RBA model.

1‑D Multi‑Scale Poisson

Standard MLP solves high‑frequency problems when equipped with a single Fourier feature layer; adding the feature reduces the error from × 10⁻³ to × 10⁻⁴, disproving the claim that function‑fitting failure inevitably leads to PDE‑solving failure.

Vortex Scalar Transport (Long‑Time Evolution)

The domain is split into 40 non‑overlapping time windows, each modeled by a six‑layer 40‑neuron MLP. After an Adam warm‑up (3000 steps) and L‑BFGS refinement (2000 steps), the solution preserves scalar conservation and reproduces the expected vortex deformation and recovery.

Inverse Heat‑Source Reconstruction

Using noisy terminal‑time temperature data, a single‑hidden‑layer network (40 neurons) with L‑BFGS (10 k steps) recovers the spatial heat source with relative errors below 10⁻³ across all noise levels, outperforming the Hasanov numerical method which shows oscillatory artifacts.

Limitations

Serial training of time windows increases total compute time; parallelization is a promising remedy.

High‑accuracy results rely heavily on L‑BFGS, whose memory consumption grows with parameter count.

Hyper‑parameter guidance (e.g., η = 0.01 for Adam, η = 1.0 for L‑BFGS) may need problem‑specific tuning.

Very large Fourier‑feature magnitudes (> 4) can cause over‑fitting and constraint violation.

Future Directions

Combine CAPU with curvature‑aware optimizers (natural gradient, SSBroyden) to accelerate convergence.

Extend constraint aggregation and CAPU to three‑dimensional irregular geometries.

Exploit domain‑decomposition parallelism already explored in PECANN.

Apply the framework to multi‑physics coupling (fluid‑structure, thermo‑mechanical).

Develop stochastic‑gradient convergence theory for CAPU and its compatibility with mini‑batch training.

Integrate emerging architectures such as KAN, DeepONet, or other operator‑learning models.

Conclusion

CAPU resolves the loss‑balancing dilemma in PINNs by assigning independent, adaptively updated penalty parameters, protecting them with a max operation, and aggregating constraints. Theoretical analysis guarantees bounded penalties and convergent multipliers. Extensive benchmarks across forward and inverse PDE problems demonstrate superior accuracy, stability, and flexibility, establishing CAPU as a robust tool for scientific machine learning.