Karpathy’s 4 AI Coding Guidelines: 65‑Line Markdown to Eliminate Over‑Engineering
The article analyzes Karpathy’s three common LLM coding pitfalls, presents four concrete guidelines—Think Before Coding, Simplicity First, Surgical Changes, and Goal‑Driven Execution—implemented in a 65‑line Markdown file, and shows how to install and validate them across Claude Code and Cursor.
Background
Andrej Karpathy recently posted a tweet that identified three core problems with large‑language‑model (LLM) generated code: the model makes unchecked assumptions, it tends to over‑engineer solutions, and it often makes unsolicited changes while fixing bugs. The article explains these issues and shows how the open‑source Karpathy‑Inspired Claude Code Guidelines convert the observations into actionable rules.
Four Practical Principles
1. Think Before Coding
The LLM should not guess; it must surface assumptions and ask clarifying questions. For example, a request to “add a user data export feature” leads the model to generate a 200‑line function that assumes exporting all users, writes to a local file, and includes CSV fields without any discussion.
def export_users(format='json'):
"""Export all user data"""
users = User.query.all()
if format == 'json':
with open('users.json', 'w') as f:
json.dump([u.to_dict() for u in users], f)
elif format == 'csv':
with open('users.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])
writer.writeheader()
writer.writerows([u.to_dict() for u in users])
return f"Exported {len(users)} users"Applying the guideline, the AI should first ask:
1. Export scope: all users or a filtered subset? (privacy concerns)
2. Export method: browser download, API response, or background job?
3. Fields to include: which user attributes?
4. Approximate data volume?Only after clarification does it generate a minimal solution, such as a paginated JSON endpoint.
2. Simplicity First
LLMs often replace a five‑line arithmetic task with a full strategy‑pattern implementation. The article contrasts a 30‑line discount‑calculation hierarchy with a concise two‑line function.
# Over‑engineered version (30+ lines)
class DiscountStrategy(ABC):
@abstractmethod
def calculate(self, amount: float) -> float:
pass
class PercentageDiscount(DiscountStrategy):
def __init__(self, percentage: float):
self.percentage = percentage
def calculate(self, amount: float) -> float:
return amount * (self.percentage / 100)
class FixedDiscount(DiscountStrategy):
def __init__(self, fixed_amount: float):
self.fixed_amount = fixed_amount
def calculate(self, amount: float) -> float:
return min(self.fixed_amount, amount)
@dataclass
class DiscountConfig:
strategy: DiscountStrategy
min_purchase: float = 0.0
max_discount: float = float('inf')
class DiscountCalculator:
def __init__(self, config: DiscountConfig):
self.config = config
def apply_discount(self, amount: float) -> float:
if amount < self.config.min_purchase:
return 0
discount = self.config.strategy.calculate(amount)
return min(discount, self.config.max_discount)After applying the guideline, the same task is solved with:
def calculate_discount(amount: float, percent: float) -> float:
"""Calculate discount amount; percent is 0‑100"""
return amount * (percent / 100)
# Example usage
discount = calculate_discount(100.0, 10.0) # reduces 10 units3. Surgical Changes
When fixing a bug such as “empty email causes a crash”, the LLM may also reformat comments, add type hints, and introduce unrelated validation. The guideline restricts modifications to lines directly related to the bug.
# Original diff (unwanted changes highlighted)
- if not user_data.get('email'):
+ email = user_data.get('email', '')
+ if not email or not email.strip():
raise ValueError("Email required")
- if '@' not in user_data['email']:
+ if '@' not in email:
raise ValueError("Invalid email")After applying the rule, only the essential checks remain.
4. Goal‑Driven Execution
LLMs excel at looping until a concrete goal is met. Provide a success criterion instead of a vague instruction.
The article provides a table that maps vague commands to testable goals and shows a concrete example for sorting with duplicate scores.
# Test for duplicate‑score sorting bug
def test_sort_with_duplicate_scores():
scores = [
{'name': 'Alice', 'score': 100},
{'name': 'Bob', 'score': 100},
{'name': 'Charlie', 'score': 90},
]
result = sort_scores(scores)
assert result[0]['score'] == 100
assert result[1]['score'] == 100
assert result[2]['score'] == 90Only after the test fails does the AI rewrite the sorting function.
Installation Options
The guidelines can be injected into AI coding tools via three methods, ordered by recommendation.
Claude Code plugin (recommended)
# 1. Add plugin marketplace entry
/plugin marketplace add forrestchang/andrej-karpathy-skills
# 2. Install the plugin
/plugin install andrej-karpathy-skills@karpathy-skillsCLAUDE.md (project‑level)
# New project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
# Existing project – append
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.mdCursor rule file
.cursor/rules/karpathy-guidelines.mdc # contains "alwaysApply: true"Copy the .cursor/rules/karpathy-guidelines.mdc file into any Cursor project to activate the rules.
Technical Implementation
The core artifact is a 65‑line Markdown file with no executable code. It is duplicated in three formats ( CLAUDE.md, .cursor/rules/…mdc, and a Skill SKILL.md) so that the same textual instructions can be consumed by Claude Code, Cursor, or other tools. The repository forrestchang/andrej-karpathy-skills includes a plugin.json and marketplace.json that enable one‑click installation as a Claude Code plugin.
Validation Metrics
The project defines four observable indicators to judge effectiveness:
Fewer unnecessary changes in diffs – only requested modifications appear.
Reduced rewrites caused by over‑engineering.
Problems clarified before implementation rather than after.
Cleaner pull‑requests without unrelated refactoring.
Practitioners are encouraged to compare diff histories before and after a week of using the guidelines.
Customization
Project‑specific rules can be appended to CLAUDE.md. Example:
## Project‑specific guidelines
- Enforce strict TypeScript mode
- All API endpoints must have tests
- Follow error‑handling patterns in src/utils/errors.tsSpecific, testable constraints are more actionable for the LLM than vague statements.
Relation to Existing Workflow
The guidelines act as a pre‑commit hook for AI‑generated code. They complement, rather than replace, code review, CI/CD pipelines, and team style guides.
Trade‑offs
The cautious approach may slow trivial edits (e.g., fixing a typo) because each step requires explicit assumptions and validation. The rules are advisory; an LLM can still ignore them in long conversations, so developers may need to remind the model of the relevant guideline.
Conclusion
The four‑principle, 65‑line Markdown set is lightweight yet effective. By adopting especially “Think Before Coding” and “Simplicity First”, developers see noticeably cleaner diffs and fewer accidental over‑engineered solutions when using Claude Code or Cursor.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shuge Unlimited
Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
