The Forgotten Foundational Papers Behind PINNs

This article reviews the 1994 Dissanayake & Phan‑Thien and 1998 Lagaris et al. papers that first introduced feed‑forward neural networks as continuous trial functions for PDEs, contrasting their soft‑penalty and hard‑encoding boundary treatments and showing how they prefigure modern physics‑informed neural networks.

AI Agent Research Hub
AI Agent Research Hub
AI Agent Research Hub
The Forgotten Foundational Papers Behind PINNs

Introduction

Many researchers associate physics‑informed neural networks (PINNs) with Raissi et al. (2019), but the core ideas—using a feed‑forward network as a trial function, constructing a PDE residual, and minimizing it—were already fully formulated in two earlier papers: Dissanayake & Phan‑Thien (1994) and Lagaris, Likas & Fotiadis (1998). The former adopts a soft‑penalty loss for boundary conditions, while the latter encodes boundary conditions directly into the trial function (hard encoding). Both papers are examined in detail.

1990s Context

Hornik, Stinchcombe and White (1989) proved the universal approximation theorem, showing that a feed‑forward network with at least one hidden layer can approximate any Borel‑measurable function arbitrarily well. This raised the question of whether a neural network could directly approximate the solution of a differential equation, bypassing the mesh‑based discretisation of finite‑element or finite‑difference methods.

Early attempts such as Lee & Kang (1990) and Meade & Fernandez (1994) still relied on discretised systems. The two papers reviewed here are the first to treat the neural network as a continuous approximator for the PDE itself.

1994 Paper: Dissanayake & Phan‑Thien

From PDE to Unconstrained Optimisation

The authors propose a multilayer feed‑forward network as a universal approximator for the PDE solution and formulate a point‑collocation loss that sums the PDE residual and a boundary‑condition residual (soft penalty). The loss is the direct ancestor of the PINN loss.

Network and Optimisation

Architectures tested: 2‑3‑3‑1 (≈25 parameters), 2‑5‑5‑1 (≈45 parameters), 2‑10‑10‑1 (≈140 parameters).

All use sigmoid activation and a single output.

Optimiser: quasi‑Newton (BFGS) with gradients approximated by finite differences because automatic‑differentiation frameworks did not exist in 1994.

Validation Cases

Case 1: 2‑D Poisson equation – error norms decrease from 3.43 × 10⁻² (5×5 grid, 2‑3‑3‑1) to 1.21 × 10⁻⁴ (20×20 grid, 2‑10‑10‑1).

Case 2: Steady heat conduction with mixed Dirichlet/Neumann boundaries – the soft‑penalty approach incorporates the boundary residual into the loss.

Key Contributions and Limitations

First explicit proposal to use a neural network as a continuous PDE approximator.

First use of a combined PDE‑plus‑BC residual loss (the direct prototype of the PINN loss).

Limitations: only PDEs (no ODEs or coupled systems), boundary conditions handled solely by soft penalties, and gradients computed by finite differences, which are costly and less accurate.

1998 Paper: Lagaris, Likas & Fotiadis

From Soft Penalty to Hard Guarantee

The authors observe that imposing boundary conditions as penalty terms relaxes a constrained optimisation problem to an unconstrained one. Instead, they construct the trial function as the sum of two terms: (1) a boundary‑satisfying function with no trainable parameters, and (2) a neural‑network term that is forced to be zero on the boundary. This guarantees exact satisfaction of the boundary conditions and removes the BC term from the loss.

Systematic Trial‑Function Construction

Templates are provided for first‑order ODE IVPs, second‑order ODE IVPs, second‑order ODE BVPs, and 2‑D PDEs with Dirichlet or mixed Dirichlet/Neumann boundaries. In each case the network output is multiplied by a factor that vanishes on the boundary, ensuring the hard‑encoding property.

Analytic Gradient Derivation

For a single‑hidden‑layer sigmoid network the authors derive closed‑form expressions for all required derivatives (including higher‑order derivatives) and use BFGS quasi‑Newton optimisation. This manual “symbolic differentiation” is equivalent to modern torch.autograd.grad() but limited to the specific network architecture.

Seven Validation Problems

The paper covers ODE IVPs, ODE BVPs, coupled ODE systems, 2‑D Poisson problems (Dirichlet and mixed BC), and a nonlinear 2‑D PDE. All seven problems have known analytical solutions. Networks consist of a single hidden layer with ten sigmoid units (≈30–40 parameters). Training points are 10 for ODEs and a 10×10 grid for PDEs.

Comparison with Finite‑Element Method (FEM)

Using a Galerkin FEM (18×18 grid, 1369 unknowns) as baseline, the authors report:

FEM is more accurate at the training points, as expected.

FEM interpolation error grows by 2–4 orders of magnitude away from training points, whereas the neural network’s error remains essentially unchanged because the network output is globally smooth.

Neural‑network training time scales linearly with the number of parameters, while FEM time grows roughly quadratically.

Despite using far fewer parameters (40 vs. 1369), the neural‑network method matches or exceeds FEM interpolation accuracy.

Technical Evolution and Legacy

The 1994 soft‑penalty approach is directly inherited by modern PINNs, which add data‑fit terms and weighting coefficients to the same loss. The 1998 hard‑encoding idea resurfaces in recent methods such as PeRCNN (2023) that embed boundary and initial conditions into the network architecture.

Gradient computation has progressed from finite‑difference approximations (1994) → manual analytic formulas (1998) → automatic differentiation (2019), enabling arbitrary depth, activation functions, and efficient high‑order derivatives.

Network depth also evolved: the early works used shallow networks (≤2 hidden layers, ≤140 parameters) because optimisation techniques, batch normalisation, residual connections, and adaptive learning rates were unavailable. After the deep‑learning revolution, PINNs employ thousands to tens of thousands of parameters, achieving far greater expressive power.

Why the Early Works Remained Dormant

No universal automatic‑differentiation tools; hand‑derived gradients limited the method to specific activations and shallow networks.

Optimization algorithms were restricted to BFGS, which does not scale to large‑parameter networks; modern optimisers such as Adam (2015) and learning‑rate schedules were missing.

Shallow networks could not efficiently represent complex high‑dimensional functions; depth provides hierarchical feature composition.

GPU acceleration was not yet widespread; early experiments ran on Sun Ultra Sparc workstations.

The computational‑science community already had mature FEM/FD software (ANSYS, COMSOL) that solved the low‑dimensional problems of the 1990s adequately.

When automatic‑differentiation frameworks (TensorFlow, PyTorch), GPUs, and advanced optimisers became available around 2015‑2017, the dormant ideas were revived as PINNs, now capable of handling high‑dimensional forward and inverse problems.

Reflections

The article stresses that the originality of the 1994/1998 papers is unquestioned; the breakthrough of PINNs lies in the convergence of three technological components: automatic differentiation, deep networks, and a unified treatment of inverse problems.

It also notes the ongoing debate between soft‑penalty and hard‑encoding of boundary conditions, the parameter‑efficiency advantage demonstrated in 1998, and the importance of tracing technical lineage to avoid reinventing ideas.

References

Key references include the two ancestor papers, the universal approximation theorem (Hornik et al., 1989), the modern PINN formulation (Raissi et al., 2019), and recent works on hard‑encoding (e.g., PeRCNN, 2023).

Figure 1: Soft penalty vs hard encoding of boundary conditions
Figure 1: Soft penalty vs hard encoding of boundary conditions
Figure 2: Timeline of neural DE solvers from the 1990s to PINNs
Figure 2: Timeline of neural DE solvers from the 1990s to PINNs
Figure 3: Architectural comparison of 1994, 1998, and 2019 methods
Figure 3: Architectural comparison of 1994, 1998, and 2019 methods
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

automatic differentiationpartial differential equationshard boundary encodingneural network optimisationPINNssoft penalty
AI Agent Research Hub
Written by

AI Agent Research Hub

Sharing AI, intelligent agents, and cutting-edge scientific computing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.