Shi's AI Notebook
Author

Shi's AI Notebook

AI technology observer documenting AI evolution and industry news, sharing development practices.

14
Articles
0
Likes
0
Views
0
Comments
Recent Articles

Latest from Shi's AI Notebook

14 recent articles
Shi's AI Notebook
Shi's AI Notebook
Apr 23, 2026 · Artificial Intelligence

Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices

Anthropic’s engineering blog outlines a systematic approach to evaluating AI agents, highlighting why agents are harder to test than traditional software, defining key concepts like tasks, trials, transcripts, and outcomes, and detailing the three grader types, evaluation timing, and practical decisions for building robust eval pipelines.

AI agentsLLM-as-judgecapability eval
0 likes · 23 min read
Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices
Shi's AI Notebook
Shi's AI Notebook
Apr 17, 2026 · Artificial Intelligence

Claude Opus 4.7 Enhances Long‑Task Handling & Qwen 3.6‑35B‑A3B Open‑Source Release

The roundup covers Anthropic’s Claude Opus 4.7 launch with improved long‑task processing and higher rate limits, Alibaba’s open‑source Qwen 3.6‑35B‑A3B sparse‑MoE model, Anthropic usage tips, OpenAI Codex’s expanded plugin suite, GLM‑5.1 tool‑call fix, Ternary Bonsai’s ternary‑weight efficiency, Tencent’s HY‑World 2.0, Sim2Reason physics learning, plus Gemini on Spot and π0.7 robot model releases.

Claude Opus 4.7Gemini RoboticsOpenAI Codex
0 likes · 10 min read
Claude Opus 4.7 Enhances Long‑Task Handling & Qwen 3.6‑35B‑A3B Open‑Source Release
Shi's AI Notebook
Shi's AI Notebook
Apr 11, 2026 · Artificial Intelligence

Claude for Word Public Preview and the Controversial Claude Mythos: What the AI Community Is Saying

Anthropic unveiled Claude for Word's public preview while the AI ecosystem debates Claude Mythos's looped‑language‑model architecture, faces OpenClaw compatibility hurdles, highlights new neural‑computer research, warns that memory will become the next bottleneck for agents, and questions the impact of soaring model prices on the industry.

AI Industry AnalysisAI Model PricingClaude Mythos
0 likes · 8 min read
Claude for Word Public Preview and the Controversial Claude Mythos: What the AI Community Is Saying
Shi's AI Notebook
Shi's AI Notebook
Apr 11, 2026 · Artificial Intelligence

Anthropic’s Agent Harness: Six‑Hour Full‑Stack Build with Multi‑Agent Design

The article analyzes Anthropic’s “Agent harness” design, showing how separating generation and evaluation into distinct agents—drawing inspiration from GANs—overcomes context‑window limits and self‑evaluation bias, enabling a three‑agent planner‑generator‑evaluator pipeline that builds a full‑stack app in six hours.

Agent OrchestrationArtificial IntelligenceFull-Stack Development
0 likes · 16 min read
Anthropic’s Agent Harness: Six‑Hour Full‑Stack Build with Multi‑Agent Design
Shi's AI Notebook
Shi's AI Notebook
Apr 6, 2026 · Industry Insights

Farzapedia Sparks Personalized AI Memory Trend; Claude API Streaming Refusal Handling Goes Live

The article reviews recent AI developments, including the low‑VRAM Gemma‑4‑21B‑REAP model, Qwen3‑Coder‑Next REAP variants, Farzapedia's file‑plus‑Wiki memory system for agents, turboquant‑gpu's 5.02× KV‑cache compression, Claude API's new streaming refusal mechanism, and DeepMind AlphaEvolve's logistics savings.

AI model releasesAlphaEvolveClaude API
0 likes · 6 min read
Farzapedia Sparks Personalized AI Memory Trend; Claude API Streaming Refusal Handling Goes Live
Shi's AI Notebook
Shi's AI Notebook
Mar 30, 2026 · Artificial Intelligence

AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights

The March 30 AI daily digest curates recent open‑source voice input and TypeScript libraries, new development workflows, a 30B parameter model that runs on 24 GB GPUs, and NVIDIA's PivotRL research that reduces reinforcement‑learning rollouts while matching end‑to‑end performance, all with concrete benchmarks and links.

AI toolsReinforcement LearningTypeScript
0 likes · 13 min read
AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights
Shi's AI Notebook
Shi's AI Notebook
Mar 25, 2026 · Information Security

LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack

In March 2026, attackers hijacked the official PyPI maintainer account of LiteLLM, released two malicious versions that were downloaded 46,996 times in 46 minutes, exfiltrated credentials, launched a fork‑bomb, and demonstrated how unpinned dependencies and .pth files can turn a simple package install into a full‑scale supply‑chain breach.

KubernetesLiteLLMPyPI
0 likes · 12 min read
LiteLLM Compromised in 46 Minutes: Inside the 47,000‑Download Supply‑Chain Attack