Artificial Intelligence 12 min read

Constrained Symbolic Regression and Weak Form Uncover Laws from Noisy Incomplete Data

By integrating universal physical symmetries, weak‑form integral transformations, and sparse symbolic regression, the authors devise a hybrid framework that extracts governing Navier‑Stokes equations from high‑dimensional, noisy, and partially observed fluid experiments, while also reconstructing hidden pressure and Lorentz force fields.

AI Agent Research Hub

Apr 2, 2026

Constrained Symbolic Regression and Weak Form Uncover Laws from Noisy Incomplete Data

1. The Challenge: High‑Dimensional State Space and Latent Variables

Extracting physical laws directly from experimental data typically involves building a large library of candidate terms (derivatives, nonlinear products) and applying sparse regression such as SINDy. For high‑dimensional spatiotemporal systems like fluid PDEs, the strong‑form approach faces two fundamental conflicts.

1) Latent variables : Real experiments often cannot measure all state variables. In the shallow‑electrolyte fluid experiment, only the planar velocity field is obtained via particle‑image velocimetry (PIV), leaving pressure and Lorentz force unobserved.

2) Noise amplification by derivatives : High‑order spatial and temporal derivatives in the candidate library dramatically amplify measurement noise when computed by finite differences or polynomial fits, obscuring the true signal.

2. First Stroke of the Hybrid Method: Physical Symmetry Prunes the Candidate Library

The authors embed universal physical principles—causality, locality, smoothness—into the model, expressing the dynamics as a Volterra series. Uniformity and isotropy of the fluid layer enforce Euclidean symmetry, requiring each term to transform as a vector with constant coefficients. This symmetry trimming reduces the candidate library to a few low‑order terms, eliminating many spurious candidates.

Physical symmetry constrained candidate library diagram

3. Second Stroke: Weak Formulation as a Mathematical Trick

The core innovation is abandoning the strong form at each discrete point and mapping the governing equations onto an integral domain—the weak form.

3.1 Integration by Parts Transfers Noise

By introducing smooth test weight functions and integrating the product of these weights with the measured data, differential operators are transferred from noisy data to the analytically defined weights, effectively smoothing the noise.

Weak form integration and noise reduction diagram

3.2 Clever Test Functions Eliminate Latent Variables

Specially constructed weight functions exploit vector calculus identities so that pressure gradient and Lorentz force terms become divergence‑free or time‑odd, allowing integration by parts to cancel them out. Consequently, the unknown pressure and force fields disappear from the linear system, leaving only observable quantities.

4. Experimental Validation and Reverse Reconstruction of Latent Variables

With the integral‑based matrix equation assembled, the authors apply ensemble sparse symbolic regression (thresholded least squares) by randomly sampling 50 integration domains multiple times, assessing both coefficient stability and model structure.

4.1 Recovering the Navier‑Stokes Skeleton

For Reynolds numbers spanning periodic flow to weak turbulence, the algorithm consistently selects three non‑zero terms with a relative residual as low as 0.02. These correspond to the convective term, horizontal viscous diffusion, and vertical viscous dissipation from the bottom boundary. The learned coefficients match theoretical values derived from first principles, even explaining a previously observed 25 % discrepancy in instability thresholds.

Extracted coefficients and error analysis

4.2 Reverse Reconstruction via Helmholtz Decomposition

After identifying the governing equation, the remaining latent pressure (scalar potential) and Lorentz force (vector potential) are reconstructed using Helmholtz decomposition. Fast Fourier Transform (FFT) enables the authors to recover the full pressure field and average Lorentz force from velocity‑only data.

Reconstructed driving force field vs. experimental measurement

5. Limitations Discussed

Steady‑flow dead‑end : The method relies on time integration; for truly steady flows (low Reynolds number), time‑integrated terms vanish, collapsing the linear system to trivial identities, as observed when Re drops to 18.

Highly customized test functions : Successful elimination of latent variables presumes prior knowledge that pressure is a gradient field and the driving force is time‑invariant. Extending the approach to systems lacking such symmetries would require non‑trivial test‑function design.

FFT reconstruction discards high‑frequency information : To avoid noise in the final Helmholtz step, the authors apply a low‑pass filter in Fourier space, which removes fine‑scale pressure fluctuations and multi‑scale structures.

Conclusion

The Nature Communications study demonstrates that, for noisy, incomplete experimental data, embedding classical calculus identities and fluid‑physics priors into machine‑learning pipelines can overcome dimensionality curses and over‑fitting. Rather than relying on black‑box, large‑scale end‑to‑end learning, the physically informed hybrid framework achieves accurate PDE discovery and latent‑variable reconstruction, highlighting the essential role of domain knowledge in AI‑driven scientific discovery.

noise reduction latent variables fluid dynamics symbolic regression Navier-Stokes physics-informed machine learning weak formulation

Written by

AI Agent Research Hub

Sharing AI, intelligent agents, and cutting-edge scientific computing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.