Tagged articles

Self‑Verification

7 articles · Page 1 of 1

Jun 12, 2026 · Artificial Intelligence

Iterative Agent Skill Development: Turning Expert Knowledge into Zero‑Dependency SOPs

This article defines Agent Skill as a modular, file‑system‑driven knowledge asset, explains its three‑layer progressive‑disclosure architecture, outlines core features such as decision‑tree logic and dual verification, details suitable scenarios, and provides a step‑by‑step iterative workflow with concrete code snippets and tooling.

AI AgentAgent SkillIterative Development

0 likes · 14 min read

Iterative Agent Skill Development: Turning Expert Knowledge into Zero‑Dependency SOPs

Machine Heart

May 11, 2026 · Artificial Intelligence

How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models

The article analyzes how the PRISM framework redesigns test‑time scaling for discrete diffusion language models by replacing costly Best‑of‑N sampling with a three‑stage hierarchical search, local branching via partial remasking, and self‑verified feedback, achieving large accuracy gains on math and code benchmarks while cutting inference compute by up to four‑fold.

Discrete DiffusionHierarchical SearchSelf‑Verification

0 likes · 11 min read

How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models

Tech Minimalism

Mar 21, 2026 · Artificial Intelligence

Mastering Harness Engineering: The Key to AI Agent Programming

The article explains how Harness Engineering—comprising system prompts, tool integration, file systems, sandboxed execution, context management, and self‑verification loops—extends AI models into fully functional agents capable of memory, code execution, and long‑term autonomous tasks.

Context ManagementSelf‑Verificationagent tooling

0 likes · 16 min read

Mastering Harness Engineering: The Key to AI Agent Programming

PMTalk Product Manager Community

Dec 24, 2025 · Artificial Intelligence

Why AI Hallucinates and How Product Managers Can Tame It

The article explains the internal and external causes of AI hallucinations, examines how pre‑training data flaws and fine‑tuning choices amplify them, and presents a five‑pronged technical toolbox—including RAG, prompt engineering, chain‑of‑thought, self‑verification, and safety APIs—plus risk‑based product strategies for different industries.

AI hallucinationModel reliabilityProduct Management

0 likes · 12 min read

Why AI Hallucinates and How Product Managers Can Tame It

Old Meng AI Explorer

Dec 7, 2025 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math reasoning model from DeepSeek, introduces a self‑verification mechanism that ensures step‑by‑step logical correctness, achieving gold‑medal scores in IMO 2025, CMO 2024 and near‑perfect results in the Putnam 2024 competition, while offering free, extensible deployment for research, training, and scientific computation.

AI MathDeepSeekMathematical Reasoning

0 likes · 13 min read

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

Fun with Large Models

Dec 5, 2025 · Artificial Intelligence

DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations

This article provides a detailed, easy‑to‑understand analysis of DeepSeek‑Math‑V2’s self‑verification training method and DeepSeek‑V3.2’s GRPO framework, sparse‑attention DSA mechanism, massive agent data pipeline, and benchmark results that place both models among the world’s top open‑source large language models.

DeepSeekGRPOLLM

0 likes · 19 min read

DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations

ShiZhen AI

Nov 28, 2025 · Artificial Intelligence

DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance

DeepSeekMath‑V2, released open‑source on 27 Nov 2025, attains gold‑level results on IMO 2025, scores 118 out of 120 on the Putnam 2024 competition, introduces a generator‑verifier self‑verification architecture, uses GRPO training, and outperforms leading closed‑source models on IMO‑ProofBench.

DeepSeekMath-V2GRPOLLM

0 likes · 7 min read

DeepSeekMath‑V2 Scores 118/120 on Putnam and Achieves Gold‑Level IMO Performance