Introducing AION-1: The First Astronomical Multimodal Foundation Model Trained on 200M Targets

AION-1, developed by a consortium including UC Berkeley, Cambridge and Oxford, is the first large‑scale multimodal foundation model for astronomy that unifies images, spectra and catalog data via an early‑fusion backbone, achieving zero‑shot and linear‑probe performance that rivals or surpasses task‑specific models across diverse scientific tasks.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
Introducing AION-1: The First Astronomical Multimodal Foundation Model Trained on 200M Targets

Researchers from more than ten institutions such as UC Berkeley, Cambridge and Oxford have released AION-1, the first large‑scale multimodal foundation model family for astronomy. By using a unified early‑fusion backbone, the model integrates heterogeneous observations—images, spectra and tabular catalog data—into a single representation, delivering strong zero‑shot results and linear‑probe accuracy comparable to or better than models trained for specific tasks.

AION-1 Pre‑training Foundation: MMU Dataset and Tokenization Scheme

The pre‑training data, called the Multimodal Universe (MMU) dataset, aggregates public observations from five major surveys, including galaxy images from HSC, high‑resolution spectra from DESI and SDSS, and low‑resolution Gaia spectra with precise photometry and astrometry. To handle this heterogeneity, AION‑1 introduces a universal tokenization pipeline that converts each modality into a common token format.

For multiband imaging, a flexible channel‑embedding design and a modified ResNet with quantization encode varying resolutions, channel counts and noise characteristics, while a noise‑weighted loss exploits known sensor noise. Spectroscopic data are standardized onto a shared wavelength grid and processed by a ConvNeXt‑V2‑based tokenizer with a similar quantization and noise‑aware loss. Tabular and scalar data are discretized based on distribution statistics to avoid errors from extreme value ranges. Specialized tokenizers also handle segmentation maps, property maps, and bounding‑ellipse annotations, mapping coordinates to pixel grids and ordering detections by distance from the image centre.

AION-1: Multimodal Foundation Model for Astronomy

The architecture follows the early‑fusion multimodal paradigm and adopts the scalable multimodal masked modeling (4M) scheme. After converting all inputs to tokens, a random subset is masked and the model learns to reconstruct the missing parts, encouraging cross‑modal reasoning. Each data type receives a dedicated embedding function, learnable type identifiers and positional encodings, allowing the model to distinguish sources even when the same modality originates from different instruments.

Training efficiency is improved by a simplified sampling strategy: a global token budget is set, a data type is chosen at random and a subset of its tokens is selected; remaining slots are filled from other types. Reconstruction targets are sampled with a bias toward smaller token groups, reducing wasted computation while matching real‑world usage patterns.

Three model sizes were trained—Base (300 M parameters), Large (800 M) and XL (3 B)—using AdamW for 205 k steps with a warm‑up then decay learning‑rate schedule. Experiments also compare versions with and without Gaia data to assess its impact.

Experimental Results: 16× Redshift Accuracy Boost and Major AI Performance Gains

In cross‑modal generation, AION‑1 can synthesize high‑resolution DESI spectra from sparse Gaia observations, accurately reproducing line centers, widths and amplitudes despite Gaia’s 50–100× higher sparsity. This enables cost‑effective, high‑fidelity analysis from widely available low‑resolution data.

For redshift posterior estimation, the model’s predictions become progressively tighter as more modalities are added: photometry alone yields a broad distribution, adding multiband imaging sharpens it, and incorporating high‑resolution spectra achieves the highest precision, demonstrating effective multimodal fusion.

The authors evaluate four additional directions:

Physical Property Estimation : On 120 k galaxies and 240 k stars, AION‑1 matches or exceeds dedicated supervised models, even outperforming baselines when predicting high‑resolution stellar parameters from low‑resolution Gaia data.

Learning from Semantic Human Labels : On a galaxy morphology classification set (8 k labeled samples) and a semantic segmentation set (2.8 k samples), AION‑1 surpasses zero‑shot baselines and rivals models trained with many times more labeled data.

Performance in Low‑Data Regime : With limited training data, AION‑1’s accuracy matches or exceeds supervised models that require an order of magnitude more samples.

Similarity‑Based Retrieval of Rare Targets : For rare objects such as strong gravitational lenses (≈0.1 % of the dataset), AION‑1’s embedding similarity search outperforms other self‑supervised models across spiral, merging and lens candidate categories.

Overall, these results show that AION‑1 provides a unified, efficient solution for multimodal astronomical data analysis, especially in data‑scarce and cross‑modal inference scenarios.

Multimodal AI Empowers Astronomy – Academia and Industry Synergy

Recent years have seen a surge of multimodal AI research in astronomy, with academic groups linking AI to concrete scientific problems and industry partners productizing the technology. Examples include MIT Media Lab’s AR‑enabled lunar mission analysis system, Oxford’s deep‑learning supernova alert filter that reduces manual workload by ~85 % while keeping false‑alarm rates near 1 %, NVIDIA’s integration of TensorRT‑accelerated multimodal models into ESO’s VLT pipeline (tripling spectral classification speed), and IBM’s collaboration with ESO to improve VLT scheduling, raising variable‑star capture success by 30 %.

These collaborations illustrate a clear trend: building universal multimodal representations is becoming a cornerstone of both scientific discovery and practical deployment in astronomy.

Paper: AION-1: Omnimodal Foundation Model for Astronomical Sciences – https://openreview.net/forum?id=6gJ2ZykQ5W

multimodal AItokenizationfoundation modelastronomycross‑modal generationredshift estimation
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.