Tagged articles
42 articles
Page 1 of 1
SuanNi
SuanNi
May 16, 2026 · Artificial Intelligence

Can a 4B Small Model Replace Top‑Tier Closed‑Source LLMs? Microsoft’s Terminus‑4B Cuts Token Use by 30%

Microsoft’s research shows that a 4‑billion‑parameter small model, Terminus‑4B, can act as an execution sub‑agent for terminal tasks, trimming token consumption by about 30% while preserving performance on demanding SWE‑Bench benchmarks, demonstrating a practical alternative to costly large models.

AI programmingRL trainingSWE-bench
0 likes · 7 min read
Can a 4B Small Model Replace Top‑Tier Closed‑Source LLMs? Microsoft’s Terminus‑4B Cuts Token Use by 30%
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 14, 2026 · Artificial Intelligence

How a Multi‑Agent Team Built an HTML Page in One Take (No More “Continue” Prompts)

The author used MiniMax’s new Mavis Agent Team to generate a complete, interactive HTML showcase in 28 minutes with a single prompt, illustrating how Leader‑Worker‑Verifier coordination and a Team Engine overcome the laziness, context anxiety, and silent‑agent problems of single‑agent workflows while discussing token costs and referencing the “Cost of Consensus” study.

AI AgentsAgent TeamPrompt Engineering
0 likes · 14 min read
How a Multi‑Agent Team Built an HTML Page in One Take (No More “Continue” Prompts)
Old Zhang's AI Learning
Old Zhang's AI Learning
May 11, 2026 · Artificial Intelligence

Ling-2.6-1T: 1T‑Parameter, Fast‑Thinking, Agent‑Ready Model After DeepSeek‑V4

Ant Group's Ling‑2.6‑1T, a 1‑trillion‑parameter LLM built for token efficiency and fast‑thinking, outperforms on elite reasoning and agentic benchmarks, offers easy local deployment via vLLM or SGLang, provides a quantized 3.6‑bit version, and includes practical usage tips for developers and knowledge workers.

Agentic ModelClaude Code IntegrationLing-2.6-1T
0 likes · 12 min read
Ling-2.6-1T: 1T‑Parameter, Fast‑Thinking, Agent‑Ready Model After DeepSeek‑V4
DataFunTalk
DataFunTalk
May 11, 2026 · Artificial Intelligence

Ultraman crowns GPT‑5.5 a “Socially Awkward Genius” as 16‑person team ditches Claude, saving $32K/month

The article analyzes GPT‑5.5’s launch, highlighting its superior token efficiency and performance that prompted a 16‑person engineering team to replace Claude with Codex + Cursor, saving over $32,000 monthly, while Codex’s downloads surged to 86 million in May, outpacing Claude by twelve‑fold and sparking widespread developer feedback on model personality and usability.

AI model comparisonClaudeCodex
0 likes · 7 min read
Ultraman crowns GPT‑5.5 a “Socially Awkward Genius” as 16‑person team ditches Claude, saving $32K/month
Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

How Laser Cuts Token Use by 97% with Probabilistic Superposition for Implicit Multimodal Reasoning

Laser introduces a latent‑superposition paradigm that replaces step‑by‑step token prediction with dynamic windowed alignment, achieving over 97% token‑consumption reduction, new SOTA performance on six visual benchmarks, and improved interpretability for multimodal large models.

ACL 2026Dynamic Window AlignmentLatent Superposition
0 likes · 13 min read
How Laser Cuts Token Use by 97% with Probabilistic Superposition for Implicit Multimodal Reasoning
Old Zhang's AI Learning
Old Zhang's AI Learning
May 4, 2026 · Artificial Intelligence

How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives

DeepSeek’s new paper "Thinking with Visual Primitives" tackles the reference gap in multimodal models by introducing points and boxes as reasoning units, achieving up to 8× token efficiency and leading benchmark scores in counting, spatial reasoning, and maze navigation compared with GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash.

BenchmarkDeepSeekToken efficiency
0 likes · 10 min read
How DeepSeek’s New Paper Redefines Multimodal Reasoning with Visual Primitives
DataFunTalk
DataFunTalk
Apr 30, 2026 · Artificial Intelligence

How GenericAgent Cuts Token Costs by 10× While Boosting AI Agent Performance

The technical report on GenericAgent, a self‑evolving LLM‑based agent, shows that by maximizing context information density and using a minimal atomic toolset with hierarchical memory, it achieves up to ten‑fold token savings, 100% task accuracy, and progressive efficiency gains across multiple benchmarks.

AI benchmarksGenericAgentLLM
0 likes · 15 min read
How GenericAgent Cuts Token Costs by 10× While Boosting AI Agent Performance
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 30, 2026 · Artificial Intelligence

Xiaomi Opens MiMo‑V2.5 and Gives 100 Trillion Free Tokens – A Must‑Grab

Xiaomi has open‑sourced its MiMo‑V2.5 series, including a 1.02 T‑parameter Pro model, and is giving developers up to 100 trillion free tokens for 30 days; the article details the models' token‑efficiency benchmarks, a macOS‑like demo, MIT‑license benefits, and step‑by‑step usage instructions.

AI benchmarkingMIT licenseMiMo-V2.5
0 likes · 12 min read
Xiaomi Opens MiMo‑V2.5 and Gives 100 Trillion Free Tokens – A Must‑Grab
PaperAgent
PaperAgent
Apr 29, 2026 · Artificial Intelligence

Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy

The article introduces the TRS (Thinking with Reasoning Skills) framework, which distills historical LLM reasoning traces into reusable skill cards, enabling offline skill‑base construction and online retrieval that dramatically reduces token consumption (6‑59%) and often improves accuracy on math and coding tasks.

Code GenerationInference OptimizationReasoning Skills
0 likes · 13 min read
Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy
Machine Heart
Machine Heart
Apr 28, 2026 · Artificial Intelligence

Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax

The SHAPE framework (Stage‑aware Hierarchical Advantage via Potential Estimation) adds a milestone‑based “reasoning tax” to large language model inference, providing step‑wise correctness signals and penalizing verbosity, which yields an average 3% accuracy gain and a 30% reduction in token consumption across multiple math‑reasoning benchmarks.

ACL 2026LLMMathematical Reasoning
0 likes · 10 min read
Can LLMs Answer More Accurately While Writing Less? Introducing SHAPE’s Reasoning Tax
ArcThink
ArcThink
Apr 27, 2026 · Artificial Intelligence

GPT-5.5 Deep Dive: What Makes This True Generational Leap Stand Out?

GPT‑5.5, the first fully retrained base model since GPT‑4.5, delivers an 11.7‑point jump on ARC‑AGI‑2, dramatic long‑context gains, and wins 9 of 10 shared benchmarks against GPT‑5.4, while a side‑by‑side comparison with Claude Opus 4.7 shows each model excelling in different domains, heralding a multi‑polar era for frontier AI.

AgentBenchmarkClaude Opus 4.7
0 likes · 16 min read
GPT-5.5 Deep Dive: What Makes This True Generational Leap Stand Out?
SuanNi
SuanNi
Apr 26, 2026 · Artificial Intelligence

Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM

Xiaomi unveiled the MiMo‑V2.5 and MiMo‑V2.5‑Pro large language models, highlighting up to 50% lower API cost, multimodal perception, token‑efficiency gains, benchmark superiority over Claude Opus 4.6 and GPT‑5.4, and real‑world demos that built a full compiler in 4.3 hours and a video‑editing web app in 11.5 hours.

AI AgentBenchmarkMiMo-V2.5
0 likes · 6 min read
Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM
DataFunTalk
DataFunTalk
Apr 25, 2026 · Artificial Intelligence

DeepSeek‑V4 vs GPT‑5.5: First Real‑World Tests Reveal Surprising Results

On the day GPT‑5.5 launched, DeepSeek‑V4 followed, and a series of head‑to‑head tests—including a logic puzzle, an IMO math problem, HTML generation, game‑engine coding, token‑efficiency measurement, and a network‑security challenge—showed GPT‑5.5 generally leading while DeepSeek demonstrated notable strengths and cost advantages.

AI model benchmarkAI securityCoding Agent
0 likes · 14 min read
DeepSeek‑V4 vs GPT‑5.5: First Real‑World Tests Reveal Surprising Results
AI Insight Log
AI Insight Log
Apr 23, 2026 · Artificial Intelligence

GPT-5.5 Launches Overnight, Beats Claude Opus 4.7 in Key Programming Benchmarks

OpenAI unveiled GPT-5.5 at 2 a.m., emphasizing autonomous task execution; benchmark tables show it outperforms Claude Opus 4.7 in most programming and agentic tests while lagging on a few specialized metrics, and it also offers token‑efficiency gains, new research‑assistant capabilities, and updated pricing.

AI research assistanceAgentic CodingBenchmark
0 likes · 9 min read
GPT-5.5 Launches Overnight, Beats Claude Opus 4.7 in Key Programming Benchmarks
AntTech
AntTech
Apr 23, 2026 · Artificial Intelligence

Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads

Ling-2.6-flash is a 104B‑parameter Instruct model that uses a mixed‑linear architecture and token‑efficiency optimizations to achieve up to 340 tokens/s inference speed, 4× higher throughput than comparable models, and ten‑fold lower token consumption on Agent benchmarks, while maintaining SOTA performance.

Agent OptimizationBenchmarkLLM
0 likes · 15 min read
Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads
ITPUB
ITPUB
Apr 22, 2026 · Artificial Intelligence

Unveiling the ‘Elephant’: Ant’s Ling‑2.6‑flash LLM Delivers 1M Tokens for $0.10

Ant’s newly released Ling‑2.6‑flash model, hidden as the anonymous “Elephant Alpha,” combines a 104B‑parameter MoE design with only 7.4B active weights per inference, achieving ten‑fold token savings, top‑tier benchmark scores and a $0.10 per‑million‑token price that dramatically cuts inference costs for developers and enterprises.

AI inferenceBenchmarkToken efficiency
0 likes · 6 min read
Unveiling the ‘Elephant’: Ant’s Ling‑2.6‑flash LLM Delivers 1M Tokens for $0.10
Data Party THU
Data Party THU
Apr 20, 2026 · Artificial Intelligence

How MemPO Uses Reinforcement Learning to Turn Agent Memory into a Trainable Policy

MemPO introduces a self‑memory policy optimization framework that lets long‑horizon LLM agents autonomously manage and refine their memory via reinforcement learning, using global‑trajectory and informative‑memory advantage estimates, achieving up to 25.98% F1 gain and 73% token reduction on benchmark tasks.

BenchmarkLLMLong-Horizon Agents
0 likes · 8 min read
How MemPO Uses Reinforcement Learning to Turn Agent Memory into a Trainable Policy
AI Architecture Path
AI Architecture Path
Apr 16, 2026 · Artificial Intelligence

How Claude‑Mem Eliminates AI Assistant Forgetfulness and Cuts Token Costs

This article analyzes the open‑source Claude‑Mem plugin, detailing developers' pain points with AI assistants, the plugin's persistent memory architecture, core features, MCP search workflow, practical usage examples, best‑practice tips, installation methods, system requirements, and common troubleshooting advice.

AIInstallationMCP
0 likes · 15 min read
How Claude‑Mem Eliminates AI Assistant Forgetfulness and Cuts Token Costs
Machine Heart
Machine Heart
Apr 3, 2026 · Artificial Intelligence

Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI

Kimi, a three‑year‑old AI‑native unicorn valued over $120 billion, launches a “Time‑Machine” option program that grants interns equity while showcasing its rapid valuation growth, record‑breaking context lengths, novel Kimi Linear architecture, token‑efficiency gains, and open‑source models that rival leading LLMs.

AI Talent ProgramAgent SwarmsAttention Residuals
0 likes · 10 min read
Kimi’s ‘Option Time Machine’: Interns Gain Equity While Building Cutting‑Edge AI
Java Backend Technology
Java Backend Technology
Apr 2, 2026 · Artificial Intelligence

Avoid Common Pitfalls When Designing AGENTS.md for LLM Agents

This article analyzes frequent misunderstandings about AGENTS.md files—such as treating them as encyclopedias, over‑explaining basics, bloating with full text files, poor structure, excessive permissions, and ineffective usage patterns—and provides concrete best‑practice recommendations to keep them concise, modular, and token‑efficient.

AGENTS.mdAI AgentDocumentation Best Practices
0 likes · 10 min read
Avoid Common Pitfalls When Designing AGENTS.md for LLM Agents
ArcThink
ArcThink
Mar 29, 2026 · Artificial Intelligence

Claude Code vs Codex: Deep Technical Architecture, Performance, and Real‑World Experience

This article provides a comprehensive, data‑driven comparison of Anthropic's Claude Code and OpenAI's Codex CLI, covering their divergent architectures, token efficiency, benchmark results, pricing models, and developer community feedback to help engineers choose the tool that best fits their workflow.

AI coding agentsClaude CodeCodex CLI
0 likes · 22 min read
Claude Code vs Codex: Deep Technical Architecture, Performance, and Real‑World Experience
AgentGuide
AgentGuide
Mar 27, 2026 · Artificial Intelligence

What Are Skills in LLM Agents? How They Work and When to Use Them

The article defines Skills as structured local folders that encapsulate domain‑specific processes, knowledge, and tools for large language models, contrasts them with temporary Prompts, outlines suitable use cases, details their components, and explains their on‑demand loading mechanism that saves tokens.

On-demand LoadingPrompt EngineeringSkills
0 likes · 4 min read
What Are Skills in LLM Agents? How They Work and When to Use Them
AI Insight Log
AI Insight Log
Mar 16, 2026 · Artificial Intelligence

Cursor’s Own Large‑Model Benchmark Shakes Up SWE‑bench Rankings

Although SWE‑bench scores for top coding models now differ by only a tenth of a point, Cursor’s newly released CursorBench reveals dramatic ranking changes, highlights three fundamental flaws in public benchmarks, and introduces token‑efficiency as a crucial evaluation dimension.

AI CodingBenchmarkCursorBench
0 likes · 8 min read
Cursor’s Own Large‑Model Benchmark Shakes Up SWE‑bench Rankings
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 26, 2026 · Artificial Intelligence

Why Longer Token Chains Don't Mean Better Reasoning: Google's Deep Thinking Ratio

Google’s recent study shows that the length of a model’s token chain is negatively correlated with inference accuracy, and introduces the Deep Thinking Ratio (DTR) metric to identify truly reasoning tokens, enabling the Think@n strategy to halve compute cost without sacrificing performance.

Deep Thinking RatioInferenceLLM
0 likes · 6 min read
Why Longer Token Chains Don't Mean Better Reasoning: Google's Deep Thinking Ratio
AntTech
AntTech
Feb 16, 2026 · Artificial Intelligence

Ling‑2.5‑1T: Open‑Source 1‑Trillion‑Parameter Instant LLM with 1M‑Token Context

Ling‑2.5‑1T is an open‑source instant large language model with 1 trillion total parameters, 63 B active weights, and a 1 M token context window, featuring mixed‑linear attention, a composite correctness‑plus‑process reward for token efficiency, fine‑grained alignment, and leading benchmark performance across reasoning, instruction‑following, and agentic tasks.

BenchmarkToken efficiencyagentic interaction
0 likes · 13 min read
Ling‑2.5‑1T: Open‑Source 1‑Trillion‑Parameter Instant LLM with 1M‑Token Context
ITPUB
ITPUB
Feb 4, 2026 · Product Management

Why AI Is Undermining Traditional SaaS and What 2026 Software Startups Must Do

The recent plunge in software stocks reveals that large‑model AI is eroding the core value of traditional SaaS, forcing a shift from GUI‑driven products to language‑based interfaces, prompting firms to focus on token efficiency, plugin architectures, and outcome‑based pricing to survive.

AILanguage InterfaceSaaS
0 likes · 10 min read
Why AI Is Undermining Traditional SaaS and What 2026 Software Startups Must Do
Data Party THU
Data Party THU
Jan 18, 2026 · Artificial Intelligence

OptScale: Probabilistic Optimal Stopping for Inference‑Time Scaling

OptScale introduces a probabilistic framework that determines the optimal number of inference samples needed to meet a target accuracy with a confidence guarantee, dramatically reducing token usage while maintaining or improving performance across various large‑language‑model benchmarks.

Inference ScalingOptimal StoppingToken efficiency
0 likes · 12 min read
OptScale: Probabilistic Optimal Stopping for Inference‑Time Scaling
AI Engineering
AI Engineering
Jan 15, 2026 · Artificial Intelligence

Why Anthropic Introduced Agent Skills and How They Could Transform AI Agents

The article analyzes Anthropic's new Agent Skills concept, explaining how it addresses the token‑bloat and positioning gaps of MCP, outlines its progressive‑disclosure design, shows a file‑system structure, and discusses practical usage and early platforms for skill sharing.

AI AgentsAgent SkillsAnthropic
0 likes · 12 min read
Why Anthropic Introduced Agent Skills and How They Could Transform AI Agents
Design Hub
Design Hub
Jan 8, 2026 · Artificial Intelligence

Say Goodbye to Bloated Prompts! Cursor's Dynamic Context Discovery Makes AI Coding Smarter

Cursor introduces a "dynamic context discovery" approach that lets AI coding agents fetch only the information they need, cutting token usage by 46.9% and improving response quality through five practical techniques such as file‑based tool output, history archives, Agent Skills, on‑demand MCP loading, and treating terminal sessions as files.

AI CodingAgent SkillsCursor
0 likes · 7 min read
Say Goodbye to Bloated Prompts! Cursor's Dynamic Context Discovery Makes AI Coding Smarter
Frontend AI Walk
Frontend AI Walk
Dec 30, 2025 · Artificial Intelligence

What Are Claude Skills? The Brain Plug that Turns AI into a Digital Expert

Claude Skills are lightweight, open‑format collections of instructions, scripts, and resources that act as modular knowledge packages, enabling the AI to lazily load expertise on demand, improve token efficiency by up to 90 %, support version control, and turn the model into a reusable, domain‑specific expert.

AI skillsAgent PluginsClaude
0 likes · 13 min read
What Are Claude Skills? The Brain Plug that Turns AI into a Digital Expert
Instant Consumer Technology Team
Instant Consumer Technology Team
Dec 18, 2025 · Artificial Intelligence

How a Multi‑Agent Framework Boosts Graph Chain‑of‑Thought Reasoning Efficiency

The paper introduces GLM, a multi‑agent Graph‑CoT framework with an optimized LLM serving architecture that dramatically improves accuracy, reduces token consumption, lowers latency, and increases throughput across diverse domains, as demonstrated by extensive GRBench evaluations.

LLM optimizationMulti-AgentToken efficiency
0 likes · 10 min read
How a Multi‑Agent Framework Boosts Graph Chain‑of‑Thought Reasoning Efficiency
Data STUDIO
Data STUDIO
Nov 19, 2025 · Artificial Intelligence

Why TOON Beats JSON for LLM Data Exchange: Token Savings and Accuracy Gains

The article explains how the Token‑Oriented Object Notation (TOON) format reduces token usage by 30‑60% and improves accuracy compared to JSON when feeding structured data to large language models, offering concrete syntax, benchmark results, code examples, and guidance on when to adopt it.

BenchmarkJSON alternativeLLM
0 likes · 10 min read
Why TOON Beats JSON for LLM Data Exchange: Token Savings and Accuracy Gains
DataFunTalk
DataFunTalk
Sep 14, 2025 · Artificial Intelligence

Why Modern LLMs Skip Thinking: Token Routing and Zero‑Compute Experts Explained

The article examines how large language models now use routing mechanisms and token‑level expert selection to reduce computation and cost, illustrating the trade‑offs with real‑world examples from OpenAI, LongCat, and DeepSeek while highlighting both the benefits and the pitfalls of this approach.

AIDeep LearningToken efficiency
0 likes · 8 min read
Why Modern LLMs Skip Thinking: Token Routing and Zero‑Compute Experts Explained
Architect
Architect
Jun 12, 2025 · Artificial Intelligence

Why Large Reasoning Models Collapse Under Complex Tasks: Insights from Apple’s Study

Apple’s research reveals that large reasoning models, despite sophisticated self‑reflection mechanisms, experience a complete performance collapse when problem complexity exceeds a threshold, highlighting fundamental limits in their ability to achieve generalized reasoning.

AI EvaluationToken efficiencylarge reasoning models
0 likes · 7 min read
Why Large Reasoning Models Collapse Under Complex Tasks: Insights from Apple’s Study
AI Frontier Lectures
AI Frontier Lectures
Apr 23, 2025 · Artificial Intelligence

Why Skipping the Thinking Step Makes Large Language Models More Accurate

UC Berkeley researchers found that forcing large language models to skip explicit reasoning—using a “NoThinking” mode—can achieve comparable or better accuracy with significantly fewer tokens, especially under token budget constraints, across math, coding, and theorem‑proving benchmarks.

NoThinkingToken efficiencyreasoning
0 likes · 7 min read
Why Skipping the Thinking Step Makes Large Language Models More Accurate
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 28, 2025 · Artificial Intelligence

Can Small 7B Models Beat the State‑of‑the‑Art? A Critical Analysis of R1‑Zero Training and Unbiased GRPO

This article critically examines R1‑Zero‑style training by analyzing foundation models and reinforcement learning, uncovering pre‑training and optimization biases, proposing an unbiased Dr. GRPO method, and demonstrating a minimalist 7B‑model recipe that achieves new state‑of‑the‑art performance on AIME 2024.

GRPOLLM evaluationR1-Zero
0 likes · 20 min read
Can Small 7B Models Beat the State‑of‑the‑Art? A Critical Analysis of R1‑Zero Training and Unbiased GRPO
CSS Magic
CSS Magic
May 16, 2024 · Artificial Intelligence

GPT-4o API Hands‑On Review: Blessing or Challenge for Developers?

The article evaluates GPT‑4o’s API by comparing its halved pricing, 50% higher token utilization, roughly double inference speed, and new prompt‑sensitivity quirks against GPT‑4‑Turbo and other models, then offers practical tips for integration and troubleshooting.

APIGPT-4oPrompt Engineering
0 likes · 13 min read
GPT-4o API Hands‑On Review: Blessing or Challenge for Developers?
CSS Magic
CSS Magic
Mar 13, 2024 · Artificial Intelligence

How Moonshot’s Kimi Model Beats Big‑Tech LLMs with 200k‑Token Context

The author tests Moonshot’s Kimi API, revealing its 200 k‑character context window, superior token‑to‑character ratio compared with GPT‑3.5 and Gemini, and performance that, while slower than GPT‑3.5 Turbo, rivals GPT‑4 Turbo, all while offering OpenAI‑compatible endpoints and free credit for developers.

API compatibilityKimiMoonshot
0 likes · 8 min read
How Moonshot’s Kimi Model Beats Big‑Tech LLMs with 200k‑Token Context
CSS Magic
CSS Magic
Dec 15, 2023 · Artificial Intelligence

Google Gemini Free API Launch: A Deep Dive for Developers

Google has opened its Gemini Pro large‑language model via a completely free API with a 60‑calls‑per‑minute limit, offering an online playground, straightforward key registration, efficient token usage, and streaming output, while noting it remains a technical preview rather than a consumer‑ready service.

AIAPI UsageFree API
0 likes · 3 min read
Google Gemini Free API Launch: A Deep Dive for Developers