DeepONet Neural Operator for Fast Prediction of Non‑Smooth Discontinuities in the Sod Shock Tube
This tutorial presents a complete DeepONet workflow—two‑step separated training, Rowdy activation, SVD orthogonalisation, and a 10‑member ensemble—that predicts the density, velocity and pressure fields of the one‑dimensional Sod shock‑tube problem with an average test‑set relative error of 2.23% after only 22 minutes of training on an RTX 4090.
Introduction
The tutorial demonstrates how to use the DeepONet neural operator to learn the mapping from initial‑condition parameters to the full solution of the one‑dimensional Euler Riemann problem (Sod shock‑tube) that contains shock waves, contact discontinuities and rarefaction waves.
Riemann problem and non‑smooth challenges
The 1‑D compressible Euler equations describe conservation of density, velocity and total energy. With piecewise‑constant initial data the exact solution consists of three wave families: a rarefaction wave (continuous), a contact discontinuity (density jump) and a shock wave (strong discontinuity). These non‑smooth structures are difficult for traditional smooth‑function approximators.
Neural operator motivation
Neural operators aim to learn a mapping from functions (or parameters) to output functions. DeepONet, introduced by Lu et al. (2021), is based on the universal approximation theorem for operators and represents the operator as a bilinear combination of a Branch network (encoding input parameters) and a Trunk network (encoding spatial coordinates).
Tutorial objectives
Explain the DeepONet Branch‑Trunk architecture and its mathematical foundation.
Introduce the Rowdy activation function that adds trainable frequency and phase terms to improve approximation of discontinuities.
Derive a two‑step separated training strategy and the SVD orthogonalisation bridge that decouples Branch and Trunk optimisation.
Show an efficient JAX implementation using jax.lax.fori_loop to fuse 500 Adam steps into a single XLA call.
Provide a full experimental pipeline, error analysis and visualisation of results.
Methodology
Two‑step separated training
Direct end‑to‑end training of Branch and Trunk suffers from gradient coupling and basis degradation. The proposed strategy first trains the Trunk network together with a coefficient matrix, then orthogonalises the Trunk output via SVD, and finally freezes the Trunk while training the Branch to map parameters to the SVD‑transformed coefficients.
def fnn(X, W, b, a, c, a1, F1, c1, base_fn):
inputs = X
for i in range(len(W) - 1):
linear = jnp.dot(inputs, W[i]) + b[i]
inputs = base_fn(10.0 * a[i] * linear + c[i]) + \
10.0 * a1[i] * jnp.sin(10.0 * F1[i] * linear + c1[i])
return jnp.dot(inputs, W[-1]) + b[-1]Rowdy activation function
Standard smooth activations converge slowly on functions with jumps. Rowdy augments each hidden layer with a trainable sinusoidal term, providing high‑frequency components that can capture sharp features.
SVD orthogonalisation bridge
After Trunk training the basis matrix is generally non‑orthogonal. An SVD decomposition U_s, S_s, Vt_s = jnp.linalg.svd(phi, full_matrices=False) yields a well‑conditioned transformation R = jnp.diag(S_s) @ Vt_s. The Branch network is then trained to predict the coefficients in the orthogonal basis targets = jnp.einsum('ij,kjm->kim', R, Am_r).
U_s, S_s, Vt_s = jnp.linalg.svd(phi, full_matrices=False)
R = jnp.dot(jnp.diag(S_s), Vt_s) # R = Σ @ Vᵀ
targets = jnp.einsum('ij,kjm->kim', R, Am_r)Ensemble for uncertainty quantification
Ten independently initialised DeepONet members are trained. During inference the ensemble mean provides the prediction while the spread gives a point‑wise confidence estimate, which is naturally larger near discontinuities.
Network architecture and parameter statistics
Both Branch and Trunk are 5‑layer fully‑connected networks with hidden size 150. The Trunk takes a single spatial coordinate as input and outputs 150 basis functions; the Branch takes the single pressure parameter and outputs 450 coefficients (150 × 3 channels). Rowdy adds five trainable scalars per hidden layer. Parameter counts per member are:
Trunk weights + biases: 293 575
Branch weights + biases: 158 875
Total per member: 452 450
10‑member ensemble total: 4 524 500
Experimental results
Configuration
Training uses 400 samples and 100 test samples on a uniform 200‑point spatial grid. Adam optimiser, 400 000 epochs for each of Trunk and Branch, 500 step fused updates per outer loop, and RTX 4090 GPU. Total wall‑time is 22 minutes (≈1 317 s).
Training convergence
Loss curves show rapid initial decrease, steady mid‑training improvement, and late‑stage oscillations typical of cosine‑type activations. Best‑model tracking mitigates the risk of final performance degradation.
Prediction accuracy
Average relative errors on the test set are 0.85 % (density), 3.17 % (velocity) and 2.37 % (pressure), yielding an overall mean error of 2.23 %. The training‑set error is 1.53 %, indicating limited over‑fitting.
Visualization observations
In smooth regions the ensemble mean matches the exact Riemann solution to machine precision. Near shocks and contact discontinuities the point‑wise absolute error spikes, confirming the known difficulty of learning discontinuous jumps. Ensemble variance is highest in these regions, providing a useful uncertainty indicator.
Efficiency
Inference for 100 test parameters completes in under one second (millisecond‑level per forward pass). The jax.lax.fori_loop fusion reduces the per‑step overhead to ~16 ms, achieving a ~4.5× speed‑up over naïve Python loops.
Conclusion and outlook
Main conclusions
DeepONet with Rowdy activation and two‑step training attains sub‑3 % average error on a non‑smooth PDE benchmark.
Training completes in 22 minutes on a single RTX 4090 thanks to JAX JIT compilation.
Ensemble predictions deliver millisecond‑scale inference and meaningful uncertainty estimates.
Advantages and limitations
Fast inference and stable training via decoupled Branch‑Trunk optimisation.
Rowdy activation improves representation of discontinuities.
Current approach is purely data‑driven; physics‑informed regularisation is absent.
Accuracy near shocks remains the dominant error source.
Parameter space is limited to a single pressure variable; scalability to higher dimensions is untested.
Future directions
Extend to higher pressure ratios and stronger shock problems (e.g., Le Blanc tube).
Incorporate physics‑informed loss terms to enforce conservation laws.
Generalise to 2‑D and multi‑dimensional flow configurations.
Adopt more complex equations of state for real‑gas or multiphase flows.
Explore adaptive architectures (attention, adaptive grids) that allocate resolution near discontinuities.
References
[1] Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence , 3(3), 218‑229.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Agent Research Hub
Sharing AI, intelligent agents, and cutting-edge scientific computing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
