> ## Documentation Index
> Fetch the complete documentation index at: https://docs.karpathy.phronesisagent.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Karpathy Skills examples: before-and-after code patterns

> Real code examples showing what LLMs do wrong and how the four Karpathy guidelines fix it—covering assumptions, overengineering, diff bloat, and vague tasks.

These examples show concrete before-and-after code for each of the four principles. Each one captures a pattern that appears repeatedly in real LLM-assisted coding sessions — use them to recognize when a model is about to go off the rails, and to calibrate what "correct" behavior looks like before you get there.

## 1. Think before coding

When a request is ambiguous, LLMs tend to pick an interpretation silently and implement it. The result looks complete but is built on assumptions the user never validated.

### Hidden assumptions: exporting user data

**Request:** "Add a feature to export user data"

<Warning>
  **Wrong approach — silent assumptions**

  ```python theme={null}
  def export_users(format='json'):
      """Export all users to a file."""
      users = User.query.all()
      
      if format == 'json':
          with open('users.json', 'w') as f:
              json.dump([u.to_dict() for u in users], f)
      elif format == 'csv':
          with open('users.csv', 'w') as f:
              writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])
              writer.writeheader()
              writer.writerows([u.to_dict() for u in users])
      
      return f"Exported {len(users)} users"
  ```

  This code assumes: export ALL users (ignoring privacy and pagination), write to a local file, include specific fields, use hardcoded fieldnames — none of which were specified.
</Warning>

<Check>
  **Correct approach — surface assumptions first**

  ```
  Before implementing, I need to clarify:

  1. **Scope**: Export all users or filtered subset? (privacy implications)
  2. **Format**: You mentioned "export" - did you mean:
     - Download file in browser?
     - Background job with email notification?
     - API endpoint returning data?
  3. **Fields**: Which user fields? (some might be sensitive)
  4. **Volume**: How many users typically? (affects approach)

  Simplest approach: Add an API endpoint that returns paginated JSON.
  Would need more info for file-based exports.

  What's your preference?
  ```
</Check>

## 2. Simplicity first

LLMs frequently reach for abstractions and design patterns before any complexity is warranted. The result is code that follows best practices but is harder to read, harder to test, and solves a problem that doesn't exist yet.

### Over-abstraction: discount calculator

**Request:** "Add a function to calculate discount"

<Warning>
  **Wrong approach — overengineered for a single use case**

  ```python theme={null}
  from abc import ABC, abstractmethod
  from enum import Enum
  from typing import Protocol, Union
  from dataclasses import dataclass

  class DiscountStrategy(ABC):
      @abstractmethod
      def calculate(self, amount: float) -> float:
          pass

  class PercentageDiscount(DiscountStrategy):
      def __init__(self, percentage: float):
          self.percentage = percentage
      
      def calculate(self, amount: float) -> float:
          return amount * (self.percentage / 100)

  class FixedDiscount(DiscountStrategy):
      def __init__(self, fixed_amount: float):
          self.fixed_amount = fixed_amount
      
      def calculate(self, amount: float) -> float:
          return min(self.fixed_amount, amount)

  @dataclass
  class DiscountConfig:
      strategy: DiscountStrategy
      min_purchase: float = 0.0
      max_discount: float = float('inf')

  class DiscountCalculator:
      def __init__(self, config: DiscountConfig):
          self.config = config
      
      def apply_discount(self, amount: float) -> float:
          if amount < self.config.min_purchase:
              return 0
          discount = self.config.strategy.calculate(amount)
          return min(discount, self.config.max_discount)

  # Usage requires 30+ lines of setup for a simple calculation
  ```
</Warning>

<Check>
  **Correct approach — minimum code that solves the problem**

  ```python theme={null}
  def calculate_discount(amount: float, percent: float) -> float:
      """Calculate discount amount. percent should be 0-100."""
      return amount * (percent / 100)

  # Usage
  discount = calculate_discount(100.0, 10.0)  # $10 off
  ```

  Add complexity only when you actually need multiple discount types. If that requirement comes later, refactor then.
</Check>

## 3. Surgical changes

When fixing a bug, LLMs often "improve" unrelated code in the same function — reformatting, renaming, adding type hints, strengthening validation. Each individual change seems reasonable, but together they produce a diff that is hard to review and risky to ship.

### Drive-by refactoring: fixing an email validation bug

**Request:** "Fix the bug where empty emails crash the validator"

<Warning>
  **Wrong approach — changes far beyond the bug fix**

  ```diff theme={null}
    def validate_user(user_data):
  -     # Check email format
  -     if not user_data.get('email'):
  +     """Validate user data."""
  +     email = user_data.get('email', '').strip()
  +     
  +     # Validate email
  +     if not email:
            raise ValueError("Email required")
  -     
  -     # Basic email validation
  -     if '@' not in user_data['email']:
  +     if '@' not in email or '.' not in email.split('@')[1]:
            raise ValueError("Invalid email")
        
  -     # Check username
  -     if not user_data.get('username'):
  +     # Validate username
  +     username = user_data.get('username', '').strip()
  +     if not username:
            raise ValueError("Username required")
  +     if len(username) < 3:
  +         raise ValueError("Username too short")
  +     if not username.isalnum():
  +         raise ValueError("Username must be alphanumeric")
        
        return True
  ```

  Problems: strengthened email validation beyond the reported bug, added username length and format rules nobody asked for, changed comments, added a docstring.
</Warning>

<Check>
  **Correct approach — only the lines that fix empty email handling**

  ```diff theme={null}
    def validate_user(user_data):
        # Check email format
  -     if not user_data.get('email'):
  +     email = user_data.get('email', '')
  +     if not email or not email.strip():
            raise ValueError("Email required")
        
        # Basic email validation
  -     if '@' not in user_data['email']:
  +     if '@' not in email:
            raise ValueError("Invalid email")
        
        # Check username
        if not user_data.get('username'):
            raise ValueError("Username required")
        
        return True
  ```

  Every changed line traces directly to fixing the empty email crash.
</Check>

## 4. Goal-driven execution

Vague instructions produce vague plans. When a task has no explicit success criteria, the model improvises — and you have no way to verify the result is correct until something breaks in production.

### Multi-step with verification: adding rate limiting

**Request:** "Add rate limiting to the API"

<Warning>
  **Wrong approach — all at once, no verification**

  ```python theme={null}
  # Implements full rate limiting with Redis, multiple strategies,
  # configuration system, and monitoring in one 300-line commit.
  # No clear verification steps.
  ```
</Warning>

<Check>
  **Correct approach — incremental steps, each independently verifiable**

  ```
  Plan for rate limiting:

  1. Add basic in-memory rate limiting (single endpoint)
     Verify: 
     - Test: 100 requests → first 10 succeed, rest get 429
     - Manual: curl endpoint 11 times, see rate limit error

  2. Extract to middleware (apply to all endpoints)
     Verify:
     - Test: Rate limits apply to /users and /posts
     - Existing endpoint tests still pass

  3. Add Redis backend (for multi-server)
     Verify:
     - Test: Rate limit persists across app restarts
     - Test: Two app instances share rate limit counter

  4. Add configuration (rates per endpoint)
     Verify:
     - Test: /search allows 10/min, /users allows 100/min
     - Config file parsed correctly

  Each step is independently verifiable and deployable.
  Start with step 1?
  ```
</Check>

## Anti-patterns summary

| Principle             | Anti-pattern                                       | Fix                                                           |
| --------------------- | -------------------------------------------------- | ------------------------------------------------------------- |
| Think before coding   | Silently assumes file format, fields, scope        | List assumptions explicitly, ask for clarification            |
| Simplicity first      | Strategy pattern for single discount calculation   | One function until complexity is actually needed              |
| Surgical changes      | Reformats quotes, adds type hints while fixing bug | Only change lines that fix the reported issue                 |
| Goal-driven execution | "I'll review and improve the code"                 | "Write test for bug X → make it pass → verify no regressions" |

## Key insight

The "overcomplicated" examples are not obviously wrong — they follow design patterns and best practices. The problem is **timing**: they add complexity before it is needed, which makes code harder to understand, introduces more bugs, takes longer to implement, and is harder to test.

> Good code is code that solves today's problem simply, not tomorrow's problem prematurely.

The simple versions are easier to understand, faster to implement, easier to test, and can be refactored when complexity is actually needed.
