Why Physics‑Informed Neural Networks (PINNs) Became a 20,000‑Citation Breakthrough
This article reviews the highly cited 2019 JCP paper that introduced Physics‑Informed Neural Networks, explains their core idea of embedding PDE residuals into the loss, compares them with contemporaneous methods, details implementation choices, showcases forward and inverse experiments, and discusses their impact, limitations, and future research directions.
1. Introduction
The 2019 Journal of Computational Physics paper by Raissi, Perdikaris, and Karniadakis introduced Physics‑Informed Neural Networks (PINNs), which has received over 21,000 Google Scholar citations and sparked the field of scientific machine learning.
2. Historical Context
Deep BSDE – Han, Jentzen & E (arXiv 2017.07). Transforms parabolic PDEs into backward stochastic differential equations; targets high‑dimensional problems (100‑D+). Published in PNAS 2018.
DGM – Sirignano & Spiliopoulos (arXiv 2017.08). Minimizes PDE residual on randomly sampled points without a mesh; focuses on high‑dimensional finance PDEs. Published in JCP 2018.
Deep Ritz – E & Yu (arXiv 2017.10). Converts PDEs to variational (energy) form and uses a network as trial function; suited for elliptic equations. Published in Commun. Math. Stat. 2018.
PINNs – Raissi, Perdikaris & Karniadakis (arXiv 2017.11). Uses automatic differentiation to build the PDE residual as part of the loss, handling forward and inverse problems with a single framework. Published in JCP 2019.
The earliest precursor is Lagaris et al. (1998), which used shallow networks to solve ODE/PDEs but lacked modern deep‑learning tools.
3. Why PINNs?
Traditional numerical methods (finite element, finite difference, spectral) require mesh generation, basis selection, and stability constraints, which become prohibitive for 3‑D, nonlinear, or multi‑scale problems. Deep learning needs large datasets, yet scientific simulations often provide only sparse observations. PINNs encode the known PDE as a prior, turning the residual into a regularizer that dramatically reduces the required data.
4. Forward vs. Inverse Problems
Forward (data‑driven solution) : PDE, parameters, and initial/boundary conditions are known; the unknown is the solution field over the space‑time domain. The loss combines a data term (boundary/initial fitting) and a physics term (PDE residual at collocation points).
Inverse (data‑driven discovery) : The PDE form is known but some parameters are unknown; scattered interior observations are available. The loss has the same structure, but the unknowns now include both network weights and the physical parameters, which are learned jointly.
5. Core Methodology
5.1 Continuous‑time Model
A fully‑connected network approximates the solution \(u(t,x)\). Automatic differentiation computes \(u_t, u_x, u_{xx}\) etc., and the PDE residual \(f\) is added to the loss: loss = MSE_u + MSE_f Activation : The paper uses Tanh everywhere because higher‑order derivatives of ReLU are zero.
Optimizer : Small‑scale forward problems are first trained with Adam (10 k–20 k steps) and then refined with L‑BFGS; pure Adam often stalls.
Collocation points : Latin Hypercube Sampling provides better space‑filling than pure random sampling; typical forward runs use 10 k–20 k points, inverse runs use ~5 k observations.
Input normalization : Scaling inputs to \([-1,1]\) or zero‑mean unit‑variance greatly stabilizes training.
Example PyTorch implementation (excerpt):
import torch
import torch.nn as nn
class PINN(nn.Module):
def __init__(self, layers=[2,50,50,50,50,1]):
super().__init__()
modules = []
for i in range(len(layers)-1):
modules.append(nn.Linear(layers[i], layers[i+1]))
if i < len(layers)-2:
modules.append(nn.Tanh())
self.net = nn.Sequential(*modules)
def forward(self, t, x):
return self.net(torch.cat([t, x], dim=1))
def compute_pde_residual(self, t, x, nu=0.01/3.141592653):
t = t.requires_grad_(True)
x = x.requires_grad_(True)
u = self.forward(t, x)
grads = lambda out, inp: torch.autograd.grad(out, inp, grad_outputs=torch.ones_like(out), create_graph=True)[0]
u_t = grads(u, t)
u_x = grads(u, x)
u_xx = grads(u_x, x)
return u_t + u * u_x - nu * u_xx
def train_step(self, optimizer, t_bc, x_bc, u_bc, t_col, x_col):
optimizer.zero_grad()
u_pred = self.forward(t_bc, x_bc)
loss_u = torch.mean((u_pred - u_bc)**2)
f = self.compute_pde_residual(t_col, x_col)
loss_f = torch.mean(f**2)
loss = loss_u + loss_f
loss.backward()
optimizer.step()
return loss_u.item(), loss_f.item()5.2 Discrete‑time Model
For problems with large time scales, the authors propose a discrete‑time variant that treats the stages of an implicit Runge‑Kutta (RK) scheme as network outputs. They use the A‑stable Gauss‑Legendre implicit RK format, allowing, for example, a 100‑stage RK step to jump over stiff Allen‑Cahn dynamics with negligible error.
5.3 Applying PINNs to a New PDE
Practical workflow:
Define PDE → Write residual f(t,x,…) → Choose input/output dimensions → Pick a fully‑connected network (4‑8 layers, 20‑100 neurons, Tanh) → Prepare boundary/initial data + collocation points (LHS) → Assemble loss = λ_u·MSE_u + λ_f·MSE_f → Train: Adam → L‑BFGS → Inspect residual distributionThe most error‑prone steps are correct automatic differentiation of high‑order terms and balancing the two loss components.
6. Numerical Experiments
6.1 Forward Problems
Schrödinger equation (continuous‑time): 50 initial points + 50 boundary points, 20 k collocation points, network 5 × 100. The network reproduces the complex‑valued solution almost exactly.
Allen‑Cahn equation (discrete‑time, Gauss‑Legendre RK q=100): 200 points in a single time slice, network 4 × 200. The 100‑stage RK captures the sharp interface with negligible error.
6.2 Inverse Problems
Burgers : true viscosity = 1.0; identified ≈ 0.999 and 0.003179; relative errors 0.085 % and 0.12 %.
KdV : true parameters = (1.0, 0.0025); identified ≈ (1.000, 0.0025); relative errors 0.023 % and 0.006 %.
Navier‑Stokes (Re=100) : true parameters = (1.0, 0.01); identified ≈ 0.999 and 0.01047; relative errors 0.078 % and 4.67 %. Using only 5 000 scattered velocity points (≈1 % of the data), the network also reconstructs the pressure field, which never appears in the training set.
7. Comparison with Traditional Solvers
PINNs generally achieve lower accuracy than high‑order spectral or finite‑element methods (which can reach \(10^{-6}\) error), but they excel when data are sparse, the problem is high‑dimensional, or rapid prototyping is needed. They are best viewed as complementary tools rather than outright replacements.
8. Limitations and Future Directions
Loss convergence : Gradients of the data term and physics term can differ by several orders of magnitude, causing oscillations. Adaptive weighting (e.g., Wang et al. 2021) helps.
High‑frequency solutions : Spectral bias hampers learning of sharp features; Fourier feature embeddings or multi‑scale networks mitigate this.
Insufficient collocation points : Residual‑based adaptive refinement (RAR) adds points where the residual is large.
Complex geometries : Extensions such as XPINN, hp‑VPINN, and neural operators (DeepONet, Fourier Neural Operator) address domain decomposition and operator learning.
9. Impact
PINNs introduced a low‑barrier, mesh‑free framework that bridged computational mathematics and machine learning, inspiring follow‑up works in chemistry, materials science, and climate modeling. Citation counts grew from 106 in 2019 to over 7 400 in a single year (2025), underscoring their broad influence.
10. References
Raissi M, Perdikaris P, Karniadakis G E. Physics‑Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. Journal of Computational Physics , 2019, 378:686‑707.
Lagaris I E, Likas A, Fotiadis D I. Artificial Neural Networks for Solving Ordinary and Partial Differential Equations. IEEE Transactions on Neural Networks , 1998, 9(5):987‑1000.
Wang S, Teng Y, Perdikaris P. Understanding and Mitigating Gradient Flow Pathologies in Physics‑Informed Neural Networks. SIAM Journal on Scientific Computing , 2021, 43(5):A3055‑A3081.
Lu L, Meng X, Mao Z, Karniadakis G E. DeepXDE: A deep learning library for solving differential equations. SIAM Review , 2021, 63(1):208‑228.
Lu L, Jin P, Pang G, Zhang Z, Karniadakis G E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence , 2021, 3:218‑229.
Li Z, Kovachki N, Azizzadenesheli K, et al. Fourier neural operator for parametric partial differential equations. ICLR , 2021.
Jagtap A D, Kharazmi E, Karniadakis G E. Conservative physics‑informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering , 2020, 365:113028.
Wang S, Yu X, Perdikaris P. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics , 2022, 449:110768.
E W, Yu B. The Deep Ritz method: A deep learning‑based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics , 2018, 6(1):1‑12.
Sirignano J, Spiliopoulos K. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics , 2018, 375:1339‑1364.
Han J, Jentzen A, E W. Solving high‑dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences , 2018, 115(34):8505‑8510.
AI Agent Research Hub
Sharing AI, intelligent agents, and cutting-edge scientific computing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
