Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLarge Language ModelLogic Reasoning
0 likes · 11 min read
Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles
Java Tech Enthusiast
Java Tech Enthusiast
Feb 26, 2025 · Artificial Intelligence

Claude 3.7 Sonnet: How It Crushes Coding, Physics Simulations, and Logic Puzzles

Claude 3.7 Sonnet demonstrates unprecedented programming speed, realistic physics simulation, advanced reasoning on misleading benchmarks, and strong productivity tools, while Anthropic secures a $3.5 billion funding round, making it a standout AI model in both technical capability and market impact.

AI model evaluationClaude 3.7Logic Reasoning
0 likes · 11 min read
Claude 3.7 Sonnet: How It Crushes Coding, Physics Simulations, and Logic Puzzles