Author

Shi's AI Notebook

AI technology observer documenting AI evolution and industry news, sharing development practices.

Articles

Likes

Views

Comments

Latest from Shi's AI Notebook

14 recent articles

Shi's AI Notebook

Apr 23, 2026 · Artificial Intelligence

Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices

Anthropic’s engineering blog outlines a systematic approach to evaluating AI agents, highlighting why agents are harder to test than traditional software, defining key concepts like tasks, trials, transcripts, and outcomes, and detailing the three grader types, evaluation timing, and practical decisions for building robust eval pipelines.

AI agentsLLM-as-judgecapability eval

0 likes · 23 min read

Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices

Shi's AI Notebook

Apr 17, 2026 · Artificial Intelligence

Claude Opus 4.7 Enhances Long‑Task Handling & Qwen 3.6‑35B‑A3B Open‑Source Release

The roundup covers Anthropic’s Claude Opus 4.7 launch with improved long‑task processing and higher rate limits, Alibaba’s open‑source Qwen 3.6‑35B‑A3B sparse‑MoE model, Anthropic usage tips, OpenAI Codex’s expanded plugin suite, GLM‑5.1 tool‑call fix, Ternary Bonsai’s ternary‑weight efficiency, Tencent’s HY‑World 2.0, Sim2Reason physics learning, plus Gemini on Spot and π0.7 robot model releases.

Claude Opus 4.7Gemini RoboticsOpenAI Codex

0 likes · 10 min read

Claude Opus 4.7 Enhances Long‑Task Handling & Qwen 3.6‑35B‑A3B Open‑Source Release

Shi's AI Notebook

Apr 11, 2026 · Artificial Intelligence

Claude for Word Public Preview and the Controversial Claude Mythos: What the AI Community Is Saying

Anthropic unveiled Claude for Word's public preview while the AI ecosystem debates Claude Mythos's looped‑language‑model architecture, faces OpenClaw compatibility hurdles, highlights new neural‑computer research, warns that memory will become the next bottleneck for agents, and questions the impact of soaring model prices on the industry.

AI Industry AnalysisAI Model PricingClaude Mythos

0 likes · 8 min read

Claude for Word Public Preview and the Controversial Claude Mythos: What the AI Community Is Saying

Shi's AI Notebook

Apr 11, 2026 · Artificial Intelligence

How to Future‑Proof Agent Systems by Virtualizing Sessions, Harnesses, and Sandboxes

The article analyzes Anthropic's Managed Agents design, showing how OS‑style virtualization of core components—Session, Harness, and Sandbox—creates stable interfaces that keep agent systems functional as model capabilities evolve, improve security, and boost performance.

Agent architectureAnthropicManaged Agents

0 likes · 13 min read

How to Future‑Proof Agent Systems by Virtualizing Sessions, Harnesses, and Sandboxes

Shi's AI Notebook

Apr 11, 2026 · Artificial Intelligence

Anthropic’s Agent Harness: Six‑Hour Full‑Stack Build with Multi‑Agent Design

The article analyzes Anthropic’s “Agent harness” design, showing how separating generation and evaluation into distinct agents—drawing inspiration from GANs—overcomes context‑window limits and self‑evaluation bias, enabling a three‑agent planner‑generator‑evaluator pipeline that builds a full‑stack app in six hours.

Agent OrchestrationArtificial IntelligenceFull-Stack Development

0 likes · 16 min read

Anthropic’s Agent Harness: Six‑Hour Full‑Stack Build with Multi‑Agent Design

Shi's AI Notebook

Apr 6, 2026 · Industry Insights

Farzapedia Sparks Personalized AI Memory Trend; Claude API Streaming Refusal Handling Goes Live

The article reviews recent AI developments, including the low‑VRAM Gemma‑4‑21B‑REAP model, Qwen3‑Coder‑Next REAP variants, Farzapedia's file‑plus‑Wiki memory system for agents, turboquant‑gpu's 5.02× KV‑cache compression, Claude API's new streaming refusal mechanism, and DeepMind AlphaEvolve's logistics savings.

AI model releasesAlphaEvolveClaude API

0 likes · 6 min read

Farzapedia Sparks Personalized AI Memory Trend; Claude API Streaming Refusal Handling Goes Live

Shi's AI Notebook

Mar 30, 2026 · Artificial Intelligence

AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights

The March 30 AI daily digest curates recent open‑source voice input and TypeScript libraries, new development workflows, a 30B parameter model that runs on 24 GB GPUs, and NVIDIA's PivotRL research that reduces reinforcement‑learning rollouts while matching end‑to‑end performance, all with concrete benchmarks and links.

AI toolsReinforcement LearningTypeScript

0 likes · 13 min read

AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights

Shi's AI Notebook

Mar 27, 2026 · Artificial Intelligence

Decoding Prompt Caching: From PagedAttention Mechanics to Cost‑Saving Practices

The article explains how Prompt Caching leverages vLLM's PagedAttention and block‑level hashing to reuse KV cache across identical prefixes, dramatically cutting LLM inference latency and cost, and provides concrete engineering tips for maximizing cache hit rates.

HashingLLM InferencePagedAttention

0 likes · 7 min read

Decoding Prompt Caching: From PagedAttention Mechanics to Cost‑Saving Practices

Shi's AI Notebook

Mar 25, 2026 · Information Security

LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack

In March 2026, attackers hijacked the official PyPI maintainer account of LiteLLM, released two malicious versions that were downloaded 46,996 times in 46 minutes, exfiltrated credentials, launched a fork‑bomb, and demonstrated how unpinned dependencies and .pth files can turn a simple package install into a full‑scale supply‑chain breach.

KubernetesLiteLLMPyPI

0 likes · 12 min read

LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack

Shi's AI Notebook

Mar 16, 2026 · Artificial Intelligence

What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion

This article walks through MiniMind's Attention.forward implementation, explaining why Q, K, and V are created, how tensors are reshaped for multi‑head attention, the role of masks, KV cache, GQA, and how each token aggregates information from the entire context.

attentiondeep learninggqa

0 likes · 21 min read

What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion