Getting Started with karpathy-guidelines: Andrej Karpathy’s Recommended Programming Standards for Large Language Models

Difficulty: Beginner | Duration: ~15 minutes | You’ll learn: 4 principles + how to use karpathy-guidelines to standardize AI programming behavior

Target audience

Developers using AI coding assistants like Claude Code, Cursor, or Copilot
1–5 years of development experience, wanting AI-generated code to be more accurate, less verbose, and not randomly change things
Some hands-on experience with AI coding, looking for structured collaboration guidelines

Core dependencies and environment

Dependency	Version requirement	Purpose
Node.js	18+	Run the Claude Code CLI
Claude Code CLI	Latest version	Core tool; install plugins
curl	Any version	Download the CLAUDE.md file

TIP

If you haven’t installed Claude Code yet, visit claude.ai/code to learn how to set it up.

Full project folder structure

andrej-karpathy-skills/
├── CLAUDE.md                # Core guide file (condensed 4 principles; shared across both plugin/file approaches)
├── README.md                # Installation and usage instructions
├── EXAMPLES.md              # Lots of real-world examples—both good and bad
└── skills/
    └── karpathy-guidelines/  # Claude Code plugin format (recommended installation method)

WARNING

This project is a behavior-spec plugin for Claude Code. It is not OpenClaw’s skills package—don’t mix them up.

1. First understand the problem: Why LLMs “do their own thing”

At the end of 2025, Andrej Karpathy posted a tweet pinpointing the core issue with current LLM-based programming:

"The models make wrong assumptions on their own behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."

In other words: LLMs like to silently assume things; when instructions are vague, they don’t ask; they write increasingly bloated code; and they may casually modify parts they don’t truly understand.

This is a problem almost every developer who deeply uses AI programming has run into: you ask the AI to fix a small bug, and it rewrites half a module; you ask it to add a simple function, and it introduces layers of abstraction.

karpathy-guidelines is designed to address these four issues:

Symptom	Matching principle
Makes assumptions silently and starts without getting clarity	Think Before Coding
Writes more and more bloated code—200 lines become 2,000	Simplicity First
When fixing a bug, it refactors half the project as a “bonus”	Surgical Changes
“I’ll fix it”—no verifiable success criteria	Goal-Driven Execution

2. Install karpathy-guidelines

There are two installation methods; the first one is recommended.

Method 1: Plugin approach (recommended)

# Step 1: Add the plugin marketplace
/plugin marketplace add forrestchang/andrej-karpathy-skills

# Step 2: Install the plugin
/plugin install andrej-karpathy-skills@karpathy-skills

After installation, Claude Code will automatically load these 4 principles in any project.

Method 2: Download CLAUDE.md directly (works for any project)

For a new project:

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

For an existing project (append to your current CLAUDE.md):

echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

TIP

You can use both method 1 and method 2 together. The plugin provides global default behavior, while CLAUDE.md provides project-level overrides.

3. Parse the four principles one by one

Principle 1: Think Before Coding

Core idea: Don’t assume. Don’t hide uncertainty. Actively present tradeoffs.

This is the principle that most directly violates AI instincts. The training objective of AI is to “produce an answer.” When faced with vague instructions, it tends to guess the most plausible interpretation and keep going—rather than stopping to ask, “What exactly do you mean?”

The correct way to do it looks like this:

# User says: "Make the search faster"

# ❌ Common AI behavior: silently adds cache, indexes, async processing—then writes ~200 lines
@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
    # ... a bunch of optimization code
    pass

# ✅ What karpathy-guidelines requires:
"""
"Make the search faster" can be understood in several ways:
1. Faster response time (< 100ms): add a database index + caching; estimate 2 hours
2. Higher concurrent throughput: async refactor + connection pooling; estimate 4 hours
3. Faster perceived speed (UX): progressive loading; estimate 3 hours

Which one do you mean?
"""

When instructions are vague, pausing to ask is far more efficient than doing everything and then reworking afterward.

At the code level, this principle requires you to:

State your assumptions clearly; ask when something is uncertain
When multiple interpretations are reasonable, list them all so the user can choose
When you find a simpler approach, bring it up proactively
When something is unclear, stop immediately and ask

Principle 2: Simplicity First

Core idea: Write the minimum code needed to solve the current problem. No predictive design.

AI has a tendency: it will dump “best practices” all at once, even when they’re unnecessary right now. If you ask it to write a discount calculation, it might give you a full strategy pattern + factory pattern + configuration class ecosystem.

A real comparison:

User request: “Add a discount calculation function”

# ❌ What AI often does: introduce unnecessary abstractions
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Protocol, Union

class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, amount: float) -> float:
        pass

class PercentageDiscount(DiscountStrategy):
    def __init__(self, percentage: float):
        self.percentage = percentage
    def calculate(self, amount: float) -> float:
        return amount * (self.percentage / 100)

@dataclass
class DiscountConfig:
    strategy: DiscountStrategy
    min_purchase: float = 0.0
    max_discount: float = float('inf')

class DiscountCalculator:
    def __init__(self, config: DiscountConfig):
        self.config = config
    def apply_discount(self, amount: float) -> float:
        # ... a lot of code

# Usage requires 30+ lines of setup code

# ✅ Simplicity First approach:
def calculate_discount(amount: float, percent: float) -> float:
    """Calculate the discount amount. percent is a percentage (e.g., 10 means 10%)."""
    return amount * (percent / 100)

# Usage
discount = calculate_discount(100.0, 10.0)  # 10% off

How to judge this principle:

Don’t add functionality nobody asked for
Don’t abstract code that’s used only once
Don’t reserve configuration “for flexibility” if flexibility isn’t needed
Don’t write error handling for scenarios that can’t happen
After writing the code, ask yourself: “Would an experienced engineer call this over-engineering?” If yes, rewrite it

WARNING

The opposite of this principle isn’t “writing sloppily.” It’s “restraint.” Abstract only when complexity truly appears.

Principle 3: Surgical Changes

Core idea: Change only what must be changed. Clean up the mess you created, but don’t touch things that were already there.

When fixing bugs, AI is especially prone to “optimization by accident”: changing 3 lines of bugfix and also renaming variables, reformatting code, swapping comments, etc. karpathy-guidelines requires surgical, precision-level changes.

A typical scenario: “Fix the crash in the validator when email is empty.”

  def validate_user(user_data):
-     # Check email format
-     if not user_data.get('email'):
+     email = user_data.get('email', '')
+     if not email or not email.strip():
          raise ValueError("Email required")
-     # Basic email validation
-     if '@' not in user_data['email']:
+     if '@' not in email:
          raise ValueError("Invalid email")
-     # Check username
-     if not user_data.get('username'):
+     # Check username  ← Keep it exactly as it was
+     if not user_data.get('username'):
          raise ValueError("Username required")
      return True

❌ Common AI behavior (beyond the scope of the request):

# It also adds:
# - Stricter email format validation (regular expressions)
# - Username length validation
# - Username character validation
# - docstring
# - Changed quote style ('' → "")

✅ What Surgical Changes requires (only change the two lines related to empty email):

-     if not user_data.get('email'):
+     email = user_data.get('email', '')
+     if not email or not email.strip():
          raise ValueError("Email required")
-     if '@' not in user_data['email']:
+     if '@' not in email:
          raise ValueError("Invalid email")

Judgment standard: Every changed line must be traceable to the user’s specific request.

Principle 4: Goal-Driven Execution

Core idea: Translate “what to do” into “how to verify success.” Replace vague instructions with success criteria.

This is the point Andrej Karpathy emphasizes the most: “LLMs are very good at iterating until they reach a specific goal. Rather than telling it what to do, give it the success criteria and let it run.”

Compare two ways of giving instructions:

❌ Vague instruction (AI has no clear direction):

Fix a problem in the login system

✅ Goal-Driven instruction (AI has clear direction):

Specific login system problem: after a user changes their password, the old session is not invalidated.

Plan:
1. Write tests: change password → verify old session is rejected
   Verification: tests fail (reproduces the bug)

2. Implement: when changing the password, clear that user’s sessions
   Verification: tests pass

3. Edge cases: logins on multiple devices, concurrent password changes
   Verification: all newly added tests pass

4. Regression check: all existing login-related tests pass
   Verification: pnpm test

Current login module test coverage: [data]
Tell me which specific login problem you’re encountering.

Standard format for multi-step tasks:

1. [step description] → Verification: [verification method]
2. [step description] → Verification: [verification method]
3. [step description] → Verification: [verification method]

The benefit of this is: after each step, the AI has a clear checklist, so it doesn’t need to repeatedly ask “Is this right?”

4. How to verify whether the guidelines are actually working

Once you install karpathy-guidelines, you can tell whether it’s having an effect by looking at the following dimensions:

Check diff quality:

Are the changes all things the user asked for?
Is there any unnecessary “bonus optimization”?
Did it change quote style, comments, or docstrings without permission?

Check when it asks questions:

When the AI encounters vague instructions, does it stop and ask proactively?
Or does it just guess and start writing?

Check code complexity:

Is the new code just enough to do the job?
Is there over-abstraction or over-design?

If all three dimensions are improving, then the guidelines are working.

5. Merging with an existing project CLAUDE.md

Most projects already have their own CLAUDE.md. If you simply overwrite it, you’ll lose the original configuration. The correct approach is to append, not replace:

# First, confirm the existing CLAUDE.md content
cat CLAUDE.md

# Then merge manually in your editor, or:
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

Recommended structure after merging:

# Project description (keep the original content)

---

# karpathy-guidelines (new content, PS: recommended by Andrej Karpathy)

[Paste the content of CLAUDE.md here]

---

## Project-specific guidelines

- Use TypeScript strict mode
- Every API endpoint must have tests
- Follow the error-handling pattern in src/utils/errors.ts

TIP

karpathy-guidelines provides general behavior guidelines; project-specific guidelines have higher priority. When there’s a conflict, follow the project guidelines.

Common troubleshooting

Q1: The plugin installs but doesn’t seem to take effect?

# Confirm the plugin is loaded
/plugin list

# If not, restart Claude Code
# In Claude Code, type /exit to quit, then enter again

Q2: What if the project CLAUDE.md and the plugin conflict?

Follow the project CLAUDE.md. The plugin is a global fallback; the project file overrides the plugin settings.

Q3: It gets stuck even on simple tasks and asks for confirmation—what should I do?

That’s normal—karpathy-guidelines applies the same clarification requirement to all tasks. For clearly simple tasks (e.g., “change variable a to b”), you can explicitly add something like “this is a simple replacement, just do it,” and the AI will skip the confirmation step.

Q4: If the AI doesn’t follow Surgical Changes and still messes up formatting, what then?

Clearly restrict the scope of changes in your instruction:

Change the empty-value check inside the validateEmail function only—just this part. Don’t touch any other code or formatting.

Q5: After merging CLAUDE.md, I get a conflict—how do I resolve it?

Edit CLAUDE.md manually: keep the original content and the newly added principles, and avoid duplicate paragraphs. If the same principle is defined in two places, follow the project’s original, more specific guidelines.

Q6: Should I apply these principles retroactively to an existing project?

No. karpathy-guidelines constrains behavior for future changes, not a requirement to refactor existing code. Use it to guide new AI-assisted programming interactions.

Advanced directions

After mastering the four principles, you can explore further in these areas:

Custom extensions: Add project-specific coding conventions into CLAUDE.md so the AI keeps both the general constraints of karpathy-guidelines and your project’s own style requirements.

Team sharing: Include karpathy-guidelines in your team’s coding standards documentation. New team members can clone the project and automatically get the AI programming behavior guidelines.

Quantify the impact: Track the number of reworks after introducing the guidelines, the scope of changes, and the number of unrelated changes in PRs. Validate the real value of this methodology with data.