Artificial Intelligence 16 min read

CNNs Help Top Universities Find 7 Rare Quasar Lenses in 810k Spectra

A multinational team of researchers from Stanford, Peking University, UCL and UC Berkeley built a data‑driven pipeline using convolutional neural networks to scan DESI DR1 spectra, expanding the known quasar‑lens sample from a handful to seven high‑quality candidates and demonstrating the power of AI for rare‑object astronomy.

HyperAI Super Neural

Dec 4, 2025

CNNs Help Top Universities Find 7 Rare Quasar Lenses in 810k Spectra

Background and Motivation

Strong gravitational lenses formed by quasars (QSOs) are valuable probes of black‑hole–galaxy co‑evolution, but they are extremely rare; only 12 candidates and 3 confirmed lenses were found in the SDSS catalog of ~300,000 QSOs. The DESI first‑data release (DR1) provides ~1.8 million QSO spectra, offering a chance to enlarge this sample.

Data Set Construction

The team selected 812,118 QSOs from DESI DR1 using Redrock outputs (TARGETID, redshift, redshift error) and strict quality cuts (OBJTYPE = TGT, ZCAT PRIMARY = 1, ZWARN = 0, SPECTYPE = QSO). They also built an emission‑line galaxy (ELG) sample of 16,500 objects with OII flux > 2×10⁻¹⁷ erg cm⁻² s⁻¹ using the FastSpec pipeline, which serves as background sources for simulated lenses.

Training Data and Two‑Phase Strategy

Because genuine QSO lenses are scarce, the authors created simulated lens spectra by overlaying real QSO spectra with high‑redshift ELG spectra. The training set was split into two phases: Phase 1 used 47 % of the QSO sample (384,873 objects) for training/validation, while the remaining 53 % formed the blind set. Phase 2 swapped the roles of training and blind sets, ensuring that the model never saw the blind data during training.

Phase 1 : 70 % of the Phase 1 training QSO subset for training, 30 % for validation, and a blind set of 427,245 QSOs for testing (3,170 of which were turned into simulated lenses).

Phase 2 : the Phase 1 blind set (427,245 QSOs) was used analogously for training/validation, with a new blind set of 384,873 QSOs for final inference.

CNN Architecture and Training

The classifier consists of six convolutional layers (first three with 50 filters, last three with 100 filters) followed by two fully‑connected layers (30 and 25 nodes). The network outputs a score between 0 and 1; a threshold of 0.5 was used during training, later raised to 0.7 for blind‑set inference to maximize the F1 score. Training employed the Adam optimizer with exponential learning‑rate decay (0.95× every 500 steps) in TensorFlow; scikit‑learn handled data splitting and metric computation.

Redshift Regression

In addition to classification, a CNN‑based regression model predicts the redshift of the background ELG. The authors compared this model with the traditional Redrock PCA‑template fitting. After CNN prediction, a local double‑Gaussian fit to the [OII] doublet refined the redshift within Δz = 0.1 and provided a signal‑to‑noise ratio (SNR) filter for high‑quality candidates.

Performance Evaluation

Both phases achieved high F1 and AUC scores on training and validation sets (see Figure 1). For redshift estimation, the CNN outperformed Redrock across all SNR regimes:

High SNR : CNN recovered 100 % of true redshifts within Δz = 0.1 (99.48 % after Gaussian refinement) vs. Redrock 51.04 %.

Medium SNR : CNN 99.48 % (Gaussian 100 %) vs. Redrock 37.70 %.

Low SNR : CNN 100 % (Gaussian 96.88 %) vs. Redrock 29.17 %.

These results show that the CNN‑based redshift estimator, especially when combined with Gaussian fitting, is markedly more accurate than the standard Redrock pipeline, even in noisy spectral regions.

Discovery of New Lenses

Applying the trained CNNs to the full set of 812,118 QSO spectra yielded 494 lens candidates. After visual inspection and SNR/redshift filtering, seven A‑grade strong‑lens candidates were confirmed. All seven exhibit strong [OII] doublet emission at redshifts higher than the foreground QSO; four also show Hβ and [OIII] λ4959/λ5007 lines (see Figure 2 and Figure 3).

Implications

The study demonstrates that a data‑driven CNN pipeline can efficiently expand the sample of quasar strong lenses, providing a new statistical foundation for black‑hole–galaxy co‑evolution studies. It also illustrates the broader trend of deep learning reshaping astronomical research, where massive surveys demand automated, high‑precision analysis.

CNN Quasar astronomy DESI Gravitational Lensing

Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.