Difficulty: Beginner | Duration: ~15 minutes | You’ll learn: 4 principles + how to use karpathy-guidelines to standardize AI programming behavior
Target audience
- Developers using AI coding assistants like Claude Code, Cursor, or Copilot
- 1–5 years of development experience, wanting AI-generated code to be more accurate, less verbose, and not randomly change things
- Some hands-on experience with AI coding, looking for structured collaboration guidelines
Core dependencies and environment
| Dependency | Version requirement | Purpose |
|---|---|---|
| Node.js | 18+ | Run the Claude Code CLI |
| Claude Code CLI | Latest version | Core tool; install plugins |
| curl | Any version | Download the CLAUDE.md file |
TIP
If you haven’t installed Claude Code yet, visit claude.ai/code to learn how to set it up.
Full project folder structure
andrej-karpathy-skills/
├── CLAUDE.md # Core guide file (condensed 4 principles; shared across both plugin/file approaches)
├── README.md # Installation and usage instructions
├── EXAMPLES.md # Lots of real-world examples—both good and bad
└── skills/
└── karpathy-guidelines/ # Claude Code plugin format (recommended installation method)
WARNING
This project is a behavior-spec plugin for Claude Code. It is not OpenClaw’s skills package—don’t mix them up.
1. First understand the problem: Why LLMs “do their own thing”
At the end of 2025, Andrej Karpathy posted a tweet pinpointing the core issue with current LLM-based programming:
"The models make wrong assumptions on their own behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."
In other words: LLMs like to silently assume things; when instructions are vague, they don’t ask; they write increasingly bloated code; and they may casually modify parts they don’t truly understand.
This is a problem almost every developer who deeply uses AI programming has run into: you ask the AI to fix a small bug, and it rewrites half a module; you ask it to add a simple function, and it introduces layers of abstraction.
karpathy-guidelines is designed to address these four issues:
| Symptom | Matching principle |
|---|---|
| Makes assumptions silently and starts without getting clarity | Think Before Coding |
| Writes more and more bloated code—200 lines become 2,000 | Simplicity First |
| When fixing a bug, it refactors half the project as a “bonus” | Surgical Changes |
| “I’ll fix it”—no verifiable success criteria | Goal-Driven Execution |
2. Install karpathy-guidelines
There are two installation methods; the first one is recommended.
Method 1: Plugin approach (recommended)
# Step 1: Add the plugin marketplace
/plugin marketplace add forrestchang/andrej-karpathy-skills
# Step 2: Install the plugin
/plugin install andrej-karpathy-skills@karpathy-skills
After installation, Claude Code will automatically load these 4 principles in any project.
Method 2: Download CLAUDE.md directly (works for any project)
For a new project:
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
For an existing project (append to your current CLAUDE.md):
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md
TIP
You can use both method 1 and method 2 together. The plugin provides global default behavior, while CLAUDE.md provides project-level overrides.
3. Parse the four principles one by one
Principle 1: Think Before Coding
Core idea: Don’t assume. Don’t hide uncertainty. Actively present tradeoffs.
This is the principle that most directly violates AI instincts. The training objective of AI is to “produce an answer.” When faced with vague instructions, it tends to guess the most plausible interpretation and keep going—rather than stopping to ask, “What exactly do you mean?”
The correct way to do it looks like this:
# User says: "Make the search faster"
# ❌ Common AI behavior: silently adds cache, indexes, async processing—then writes ~200 lines
@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
# ... a bunch of optimization code
pass
# ✅ What karpathy-guidelines requires:
"""
"Make the search faster" can be understood in several ways:
1. Faster response time (< 100ms): add a database index + caching; estimate 2 hours
2. Higher concurrent throughput: async refactor + connection pooling; estimate 4 hours
3. Faster perceived speed (UX): progressive loading; estimate 3 hours
Which one do you mean?
"""
When instructions are vague, pausing to ask is far more efficient than doing everything and then reworking afterward.
At the code level, this principle requires you to:
- State your assumptions clearly; ask when something is uncertain
- When multiple interpretations are reasonable, list them all so the user can choose
- When you find a simpler approach, bring it up proactively
- When something is unclear, stop immediately and ask
Principle 2: Simplicity First
Core idea: Write the minimum code needed to solve the current problem. No predictive design.
AI has a tendency: it will dump “best practices” all at once, even when they’re unnecessary right now. If you ask it to write a discount calculation, it might give you a full strategy pattern + factory pattern + configuration class ecosystem.
A real comparison:
User request: “Add a discount calculation function”
# ❌ What AI often does: introduce unnecessary abstractions
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Protocol, Union
class DiscountStrategy(ABC):
@abstractmethod
def calculate(self, amount: float) -> float:
pass
class PercentageDiscount(DiscountStrategy):
def __init__(self, percentage: float):
self.percentage = percentage
def calculate(self, amount: float) -> float:
return amount * (self.percentage / 100)
@dataclass
class DiscountConfig:
strategy: DiscountStrategy
min_purchase: float = 0.0
max_discount: float = float('inf')
class DiscountCalculator:
def __init__(self, config: DiscountConfig):
self.config = config
def apply_discount(self, amount: float) -> float:
# ... a lot of code
# Usage requires 30+ lines of setup code
# ✅ Simplicity First approach:
def calculate_discount(amount: float, percent: float) -> float:
"""Calculate the discount amount. percent is a percentage (e.g., 10 means 10%)."""
return amount * (percent / 100)
# Usage
discount = calculate_discount(100.0, 10.0) # 10% off
How to judge this principle:
- Don’t add functionality nobody asked for
- Don’t abstract code that’s used only once
- Don’t reserve configuration “for flexibility” if flexibility isn’t needed
- Don’t write error handling for scenarios that can’t happen
- After writing the code, ask yourself: “Would an experienced engineer call this over-engineering?” If yes, rewrite it
WARNING
The opposite of this principle isn’t “writing sloppily.” It’s “restraint.” Abstract only when complexity truly appears.
Principle 3: Surgical Changes
Core idea: Change only what must be changed. Clean up the mess you created, but don’t touch things that were already there.
When fixing bugs, AI is especially prone to “optimization by accident”: changing 3 lines of bugfix and also renaming variables, reformatting code, swapping comments, etc. karpathy-guidelines requires surgical, precision-level changes.
A typical scenario: “Fix the crash in the validator when email is empty.”
def validate_user(user_data):
- # Check email format
- if not user_data.get('email'):
+ email = user_data.get('email', '')
+ if not email or not email.strip():
raise ValueError("Email required")
- # Basic email validation
- if '@' not in user_data['email']:
+ if '@' not in email:
raise ValueError("Invalid email")
- # Check username
- if not user_data.get('username'):
+ # Check username ← Keep it exactly as it was
+ if not user_data.get('username'):
raise ValueError("Username required")
return True
❌ Common AI behavior (beyond the scope of the request):
# It also adds:
# - Stricter email format validation (regular expressions)
# - Username length validation
# - Username character validation
# - docstring
# - Changed quote style ('' → "")
✅ What Surgical Changes requires (only change the two lines related to empty email):
- if not user_data.get('email'):
+ email = user_data.get('email', '')
+ if not email or not email.strip():
raise ValueError("Email required")
- if '@' not in user_data['email']:
+ if '@' not in email:
raise ValueError("Invalid email")
Judgment standard: Every changed line must be traceable to the user’s specific request.
Principle 4: Goal-Driven Execution
Core idea: Translate “what to do” into “how to verify success.” Replace vague instructions with success criteria.
This is the point Andrej Karpathy emphasizes the most: “LLMs are very good at iterating until they reach a specific goal. Rather than telling it what to do, give it the success criteria and let it run.”
Compare two ways of giving instructions:
❌ Vague instruction (AI has no clear direction):
Fix a problem in the login system
✅ Goal-Driven instruction (AI has clear direction):
Specific login system problem: after a user changes their password, the old session is not invalidated.
Plan:
1. Write tests: change password → verify old session is rejected
Verification: tests fail (reproduces the bug)
2. Implement: when changing the password, clear that user’s sessions
Verification: tests pass
3. Edge cases: logins on multiple devices, concurrent password changes
Verification: all newly added tests pass
4. Regression check: all existing login-related tests pass
Verification: pnpm test
Current login module test coverage: [data]
Tell me which specific login problem you’re encountering.
Standard format for multi-step tasks:
1. [step description] → Verification: [verification method]
2. [step description] → Verification: [verification method]
3. [step description] → Verification: [verification method]
The benefit of this is: after each step, the AI has a clear checklist, so it doesn’t need to repeatedly ask “Is this right?”
4. How to verify whether the guidelines are actually working
Once you install karpathy-guidelines, you can tell whether it’s having an effect by looking at the following dimensions:
Check diff quality:
- Are the changes all things the user asked for?
- Is there any unnecessary “bonus optimization”?
- Did it change quote style, comments, or docstrings without permission?
Check when it asks questions:
- When the AI encounters vague instructions, does it stop and ask proactively?
- Or does it just guess and start writing?
Check code complexity:
- Is the new code just enough to do the job?
- Is there over-abstraction or over-design?
If all three dimensions are improving, then the guidelines are working.
5. Merging with an existing project CLAUDE.md
Most projects already have their own CLAUDE.md. If you simply overwrite it, you’ll lose the original configuration. The correct approach is to append, not replace:
# First, confirm the existing CLAUDE.md content
cat CLAUDE.md
# Then merge manually in your editor, or:
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md
Recommended structure after merging:
# Project description (keep the original content)
---
# karpathy-guidelines (new content, PS: recommended by Andrej Karpathy)
[Paste the content of CLAUDE.md here]
---
## Project-specific guidelines
- Use TypeScript strict mode
- Every API endpoint must have tests
- Follow the error-handling pattern in src/utils/errors.ts
TIP
karpathy-guidelines provides general behavior guidelines; project-specific guidelines have higher priority. When there’s a conflict, follow the project guidelines.
Common troubleshooting
Q1: The plugin installs but doesn’t seem to take effect?
# Confirm the plugin is loaded
/plugin list
# If not, restart Claude Code
# In Claude Code, type /exit to quit, then enter again
Q2: What if the project CLAUDE.md and the plugin conflict?
Follow the project CLAUDE.md. The plugin is a global fallback; the project file overrides the plugin settings.
Q3: It gets stuck even on simple tasks and asks for confirmation—what should I do?
That’s normal—karpathy-guidelines applies the same clarification requirement to all tasks. For clearly simple tasks (e.g., “change variable a to b”), you can explicitly add something like “this is a simple replacement, just do it,” and the AI will skip the confirmation step.
Q4: If the AI doesn’t follow Surgical Changes and still messes up formatting, what then?
Clearly restrict the scope of changes in your instruction:
Change the empty-value check inside the validateEmail function only—just this part. Don’t touch any other code or formatting.
Q5: After merging CLAUDE.md, I get a conflict—how do I resolve it?
Edit CLAUDE.md manually: keep the original content and the newly added principles, and avoid duplicate paragraphs. If the same principle is defined in two places, follow the project’s original, more specific guidelines.
Q6: Should I apply these principles retroactively to an existing project?
No. karpathy-guidelines constrains behavior for future changes, not a requirement to refactor existing code. Use it to guide new AI-assisted programming interactions.
Further reading and advanced directions
- Andrej Karpathy’s original tweet — the origin observation of LLM programming pitfalls
- forrestchang/andrej-karpathy-skills — this project’s GitHub repo, including the full EXAMPLES.md (many good and bad examples worth reading closely)
- Claude Code official documentation — learn the full capabilities of the CLI and configuration options
- OpenClaw integration tutorial — if you want to explore more about local practice of AI agents, OpenClaw provides deeper tool-calling capabilities
TIP
EXAMPLES.md contains many pairwise “good vs bad” code comparisons. Each principle also has 2–3 real scenarios. These are the most effective materials for understanding the principles. It’s recommended to read through everything once before you start practicing.
Advanced directions
After mastering the four principles, you can explore further in these areas:
Custom extensions: Add project-specific coding conventions into CLAUDE.md so the AI keeps both the general constraints of karpathy-guidelines and your project’s own style requirements.
Team sharing: Include karpathy-guidelines in your team’s coding standards documentation. New team members can clone the project and automatically get the AI programming behavior guidelines.
Quantify the impact: Track the number of reworks after introducing the guidelines, the scope of changes, and the number of unrelated changes in PRs. Validate the real value of this methodology with data.