Open-Source Reasoning Datasets: NVIDIA, OpenAI, Labs – Math, Spatial, Wiki QA

HyperAI has compiled a collection of high‑quality open‑source reasoning datasets—including Open‑RL, CHIMERA, Nemotron‑Math‑v2, OmniSpatial, FrontierScience, HotpotQA, VCR, and CIRR—covering math, multi‑step STEM problems, spatial reasoning, scientific tasks, wiki QA, and visual commonsense, all available for download or online use.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
Open-Source Reasoning Datasets: NVIDIA, OpenAI, Labs – Math, Spatial, Wiki QA

HyperAI organized a set of premium inference datasets spanning multiple domains and tasks, supporting both download and online usage to lower the barrier for researchers and developers.

Open‑RL Inference Question Dataset

Open‑RL, released by Turing in 2026, contains multi‑step STEM reasoning problems in physics, mathematics, biology, and chemistry. Each question requires symbolic operations or numeric computation and provides an objectively verifiable answer, making it suitable for reinforcement learning fine‑tuning, reward modeling, and benchmark testing.

CHIMERA General Synthetic Reasoning Dataset

CHIMERA is a synthetic reasoning dataset covering a broad range of STEM subjects with long‑chain (CoT) trajectories. It includes 9,225 questions across eight disciplines (mathematics, computer science, chemistry, physics, literature, history, biology, phonetics). All examples are generated by large language models and automatically verified without human annotation. Distribution: mathematics 4,452; computer science 1,303; chemistry 1,102; physics 742; literature 504; history 422; biology 383; phonetics 317.

Nemotron‑Math‑v2 Mathematics Reasoning Dataset

Released by NVIDIA Corporation, Nemotron‑Math‑v2 targets structured mathematical reasoning, tool‑enhanced versus pure language reasoning, and long‑context or multi‑trajectory reasoning systems. It contains approximately 347,000 high‑quality math problems and 7 million model‑generated reasoning traces. Each problem is solved under six configurations (high/medium/low reasoning depth with or without Python TIR), and answers are validated by an LLM‑based judge pipeline.

OmniSpatial Panoramic Spatial Reasoning Benchmark

Co‑developed by Tsinghua University, Shanghai Qizhi Research Institute, and Shanghai AI Lab in 2025, OmniSpatial provides about 1,533 image‑question pairs covering dynamic reasoning, complex spatial logic, spatial interaction, and perspective taking (50 sub‑tasks). Data sources include internet images, psychological tests, and driving exam questions, with multi‑round review ensuring quality and diversity. It aims to fill the evaluation gap for vision‑language models in spatial understanding.

FrontierScience Scientific Task Evaluation Dataset

Published by OpenAI in 2025, FrontierScience assesses large models on expert‑level scientific reasoning and research tasks. It features two subsets: Olympiad (short‑answer reasoning aligned with IPhO, IChO, IBO difficulty) and Research (real‑world scientific sub‑problems with fine‑grained 10‑point scoring across physics, chemistry, and biology). Problems are authored by medalists, coaches, PhDs, post‑docs, and professors.

HotpotQA Question‑Answering Dataset

HotpotQA is a large‑scale English QA dataset collected from Wikipedia, containing 113,000 crowd‑sourced questions that require consulting two supporting paragraphs. Each question includes gold paragraphs and sentence‑level supporting facts, enabling multi‑document reasoning, diverse question types, and fact‑comparison challenges.

VCR Visual Commonsense Reasoning Dataset

VCR (Visual Commonsense Reasoning) presents challenging image‑based questions requiring both answer selection and justification. It comprises 212 K training, 26 K validation, and 25 K test questions, with answers and rationales drawn from over 110 K distinct movie scenes.

CIRR Image Retrieval Dataset

CIRR (Compose Image Retrieval on Real‑life images) includes over 36,000 crowdsourced image‑text pairs, aiming to advance research on subtle visual‑language reasoning and iterative retrieval via dialogue, emphasizing discrimination among open‑domain visually similar images.

All datasets can be accessed online via the provided URLs (e.g., https://go.hyper.ai/jeDjn for Open‑RL) or downloaded from https://hyper.ai/datasets .

open-sourceOpenAINVIDIAmultimodalspatial reasoningreasoning datasetsvisual commonsense
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.