NeuroFlow: A Unified Visual‑Neural Bidirectional Model Presented at CVPR 2026
NeuroFlow introduces a reversible flow architecture that jointly learns visual encoding and neural decoding, overcoming the long‑standing split between these tasks, and achieves superior image reconstruction, consistent bidirectional mapping, realistic fMRI‑based neural signals, and efficient training on the large‑scale NSD dataset.
Background and Motivation
The ultimate goal of visual brain‑machine interfaces (BMIs) is to create a two‑way channel between cortical neural activity and external visual perception, moving from one‑way "understanding" to true "bidirectional interaction" that can both read visual information from the brain and write visual information back into the cortex.
Challenges in Existing Visual‑Neural Modeling
Separate encoding and decoding pipelines : current methods treat visual encoding (writing) and neural decoding (reading) as independent problems, requiring distinct models and latent spaces, which prevents interaction and hampers consistency.
Cross‑modal alignment difficulty : most approaches rely on simple linear or diffusion‑based one‑way mappings, making it hard to achieve reversible alignment between visual and neural modalities.
Insufficient neural interpretability : generated neural signals often contain voxel‑level noise and deviate from true cortical activation patterns, limiting biological insight.
NeuroFlow Architecture
NeuroFlow unifies visual encoding and neural decoding within a single reversible flow framework, consisting of two key modules:
1. NeuroVAE – Probabilistic Variational Backbone
Human neural responses to the same visual stimulus exhibit one‑to‑many stochasticity due to physiological noise, brain state fluctuations, and trial‑to‑trial variability. NeuroVAE models this variability by mapping high‑dimensional fMRI voxel data into a compact, semantically structured latent space using a variational auto‑encoder. Contrastive learning and cycle‑consistency constraints filter out redundant noise while preserving essential neural information, enabling deep alignment between neural representations and visual semantics.
2. XFM – Cross‑Modal Flow Matching
Unlike conventional conditional diffusion that requires a single‑modal condition to guide generation, XFM treats the transformation as a continuous ordinary differential equation (ODE) in a shared latent space. It learns a reversible flow that directly maps the visual latent distribution to the NeuroVAE neural latent distribution (and vice‑versa) without any external conditioning, allowing smooth forward (encoding) and backward (decoding) transformations.
Temporal Evolution Mechanism
The model defines a vector field over time t∈[0,1]. In the forward direction (t: 0→1), visual features evolve along the flow to produce neural representations, which NeuroVAE then converts into realistic fMRI signals. In the reverse direction (t: 1→0), the ODE is solved backward, reconstructing visual features from neural representations and finally generating the original image via a visual generator. This time‑direction distinction guarantees strong semantic consistency between encoding and decoding.
Experimental Validation
NeuroFlow was evaluated on the large‑scale Natural Scenes Dataset (NSD) and compared against state‑of‑the‑art models MindEye2, BrainDiffuser, and SynBrain.
Visual decoding performance : reconstructed images showed higher semantic and contour similarity to the original stimuli, surpassing all baselines.
Encoding‑decoding consistency : the round‑trip image→neural→image pipeline remained highly stable across trials.
Neural signal realism : synthesized neural activity suppressed early‑visual noise and emphasized higher‑order regions such as FFA, EBA, and PPA, closely matching real cortical activation patterns.
Efficiency and lightweight design : NeuroFlow required only 25 % of the parameters of the best decoding model while achieving better performance, enabling fast training and easy deployment.
Ablation and Interpretability Analyses
Four complementary analyses were performed:
Ablation studies : removing any loss term or module caused noticeable degradation in image fidelity and semantic completeness, confirming the necessity of each component.
Flow trajectory visualization : during encoding, the model automatically attenuated noise in early visual areas and progressively aligned representations toward higher‑level regions (FFA, EBA). Decoding progressed from coarse outlines to high‑resolution images, distinct from diffusion models that start from pure noise.
Category activation comparison : for face stimuli, the synthetic activation maps matched the spatial distribution and intensity of measured fMRI responses, demonstrating precise semantic‑region mapping.
Quantitative evaluation : using explained variance (EV) and Spearman correlation on the NSD test set, NeuroFlow achieved higher scores in FFA, EBA, and PPA than competing methods, indicating stronger modeling of high‑level visual semantics.
Implications and Future Directions
Beyond performance gains, NeuroFlow provides a computable, verifiable tool for cognitive neuroscience, facilitating deeper investigation of visual perception, semantic processing, and higher‑order cognition. It also offers a core algorithmic foundation for next‑generation visual prosthetics and bidirectional BMIs. The authors anticipate extending the framework toward more general, robust, and biologically faithful visual‑neural modeling, accelerating the convergence of brain science and artificial intelligence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
