Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Open‑Source Models Dominate 21 Scientific Discovery Tasks with SimpleTES

The SimpleTES framework decomposes trial‑and‑error into three scalable dimensions—Concurrency, Length, and Candidates—enabling test‑time scaling that lets open‑source models outperform closed‑source rivals across 21 diverse scientific benchmarks, from LASSO regression to quantum circuit compilation.

AI for ScienceSimpleTESevaluation-driven search
0 likes · 13 min read
Open‑Source Models Dominate 21 Scientific Discovery Tasks with SimpleTES
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference

The paper introduces Squeeze Evolve, a validator‑free multi‑model evolutionary framework that orchestrates diverse large language models to break the performance ceiling of any single model, delivering up to 23‑point accuracy improvements and 1.4‑3.3× cost reductions across math, vision, and scientific benchmarks.

AI researchSqueeze Evolveinference optimization
0 likes · 8 min read
Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 27, 2026 · Artificial Intelligence

Qwen3‑Max‑Thinking Boosts Performance with Test‑Time Scaling—Why It Still Isn’t Open‑Source

Alibaba’s new Qwen3‑Max‑Thinking model adds inference‑time scaling and adaptive tool use, delivering large gains on math, coding, and agent benchmarks while remaining closed‑source, and it offers drop‑in OpenAI‑compatible API access at the cost of higher latency and token usage.

AI benchmarkAdaptive Tool UseLarge Language Model
0 likes · 7 min read
Qwen3‑Max‑Thinking Boosts Performance with Test‑Time Scaling—Why It Still Isn’t Open‑Source
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
vivo Internet Technology
vivo Internet Technology
Aug 25, 2025 · Artificial Intelligence

How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training

DiMo-GUI is a plug‑and‑play framework that dramatically improves multimodal large language models' ability to locate GUI elements by using a hierarchical dynamic visual reasoning loop and modality‑aware optimization, achieving up to double the performance on high‑resolution GUI benchmarks without any additional training data.

GUI groundingdynamic visual reasoningmodality-aware optimization
0 likes · 7 min read
How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training
Kuaishou Large Model
Kuaishou Large Model
Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Diffusion ModelsImage GenerationVideo Generation
0 likes · 8 min read
How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchDiffusion ModelsImage Generation
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
AI Frontier Lectures
AI Frontier Lectures
Apr 4, 2025 · Artificial Intelligence

Why Test‑Time Scaling Is Revolutionizing LLM Reasoning in 2025

This article surveys the latest research on large language model reasoning, highlighting test‑time scaling methods, chain‑of‑thought variants, and novel inference‑time techniques that boost performance while exposing trade‑offs, costs, and future directions for AI developers.

AILLMresearch
0 likes · 26 min read
Why Test‑Time Scaling Is Revolutionizing LLM Reasoning in 2025
Architect
Architect
Feb 19, 2025 · Artificial Intelligence

Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics

The article critically examines whether the pre‑training Scaling Law still applies to Grok 3, compares its compute usage and model size with DeepSeek and OpenAI models, evaluates the cost‑effectiveness of pre‑training, RL and test‑time scaling, and explores how these insights shape future large‑language‑model development strategies.

Grok-3Model EfficiencyPre‑training
0 likes · 11 min read
Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 29, 2024 · Artificial Intelligence

Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies

This article analyses the training tricks behind OpenAI's o1 model, explaining test/inference‑time scaling laws, post‑training techniques, process‑supervised reward models (PRM), various inference‑time search methods, data‑collection pipelines, and the trade‑offs between allocating compute to pre‑training versus inference.

LLM inferenceOpenAI o1Reward Model
0 likes · 34 min read
Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies