How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Researchers at Xi'an Jiaotong University built a closed‑loop AI framework centered on a large language model that generates and evaluates thousands of carbon structures, rapidly discovering ultra‑hard, highly anisotropic and novel carbon allotropes such as C16_3, C12 and C8 within minutes.

Data Party THU
Data Party THU
Data Party THU
How LLMs Are Uncovering Ultra‑Hard Carbon Allotropes in Minutes

Background

Carbon exhibits three hybridization states (sp, sp², sp³) that can combine into an astronomically large structural space, ranging from one‑dimensional carbyne chains to three‑dimensional diamond lattices. Traditional discovery relies on exhaustive first‑principles calculations and enumeration, which are computationally prohibitive.

LLM‑Driven Closed‑Loop Framework

The team from Xi'an Jiaotong University introduced a dual‑loop active‑learning system called CrystaLLM . The first loop uses a large language model to generate candidate carbon crystals (up to 100 atoms per cell). Generated structures are quickly screened with the PINK code for thermal conductivity estimates and with Phonopy‑augmented machine‑learning potentials (MLP) for dynamical stability.

The second loop iteratively refines a neural‑evolution‑potential (NEP) model, ensuring DFT‑level accuracy across the entire potential energy surface.

Workflow diagram
Workflow diagram

Training Set and NEP Performance

To achieve broad generalization, the researchers assembled a diverse dataset covering fullerenes (C₆₀), 1‑D chains, 2‑D graphene, 3‑D diamond, mixed hybridizations, and extreme mechanical states from –400 GPa tension to >1000 GPa compression. The trained NEP reproduces DFT energies, stresses, and forces with high fidelity, accurately predicting phonon spectra for six representative carbon allotropes (diamond, graphene, BC8, C₄, etc.) without imaginary frequencies.

Training performance and dataset diversity
Training performance and dataset diversity

New Carbon Allotropes Discovered

Ultra‑hard phase C16_3 : Predicted Vickers hardness of 103.3 GPa, surpassing diamond’s 96 GPa, indicating potential as a new super‑hard material.

“Acetylene‑diamond” series (C12 and C8) : Formed by inserting linear sp‑hybridized –C≡C– chains into diamond C–C bonds, yielding extreme thermal‑conductivity anisotropy—very high along the rigid backbone and much lower in the perpendicular plane, suitable for directional heat‑spreading applications.

sp‑sp²‑sp³ hybrid C12 : Contains a mixed network of π‑delocalized electrons, giving metallic conductivity and a rare negative Poisson’s ratio; however, it is only dynamically stable below ~100 K.

Representative structures and charge density
Representative structures and charge density

Synthesis Considerations

The authors evaluated the experimental feasibility of the new phases. Their calculations suggest thermodynamic stability comparable to known synthesizable carbon forms such as fullerenes. Some structures could be assembled via stepwise chemical routes, while the densest, hardest allotropes may require extreme‑pressure synthesis from suitable precursors.

Conclusion

The study demonstrates that a closed‑loop pipeline—LLM‑generated candidates, rapid MLP screening, and active‑learning feedback—can dramatically accelerate materials discovery. Although showcased on carbon, the approach is readily extensible to other elemental systems.

LLMactive learningAI-driven researchcarbon allotropesmachine learning potentialMaterials Discovery
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.