mattpocock Skills Getting Started Guide: Teach AI TDD and Structured Debugging

TIP

mattpocock/skills is a set of AI coding Skills built by TypeScript expert Matt Pocock. GitHub repo: github.com/mattpocock/skills. If you’ve used tools like Claude Code or Codex and felt the output quality is inconsistent, these Skills are exactly what you need.

Project Overview

mattpocock/skills is an “engineering-methodology skill set” for AI programming tools. The core idea is simple: AI coding tools move fast, but that doesn’t make them reliable.

From extensive hands-on practice, Matt Pocock has identified four common high-frequency failure modes in AI coding:

Misaligned requirements: You think the AI understands—but it’s actually drifting
Too much redundancy in outputs: AI uses 20 words to explain something that 1 word could say clearly
Code that doesn’t work: Without a feedback loop, the AI writes in a black box, blindly
Architecture that degrades fast: AI speeds up development—and also speeds up software entropy

These problems aren’t tool bugs; they’re the result of a missing methodology. mattpocock/skills tackles them directly by compressing decades of engineering experience into a set of composable Skills, ensuring the AI follows the same engineering discipline on any model.

Difficulty / Duration / What You’ll Gain

Difficulty: Beginner-friendly — no TDD experience needed; just be comfortable with the command line and AI coding tools Duration: About 30 minutes to complete the walkthrough end to end What You’ll Gain: A full workflow from requirement alignment → test-driven development → structured debugging → architectural reflection

Target Audience

Backend / full-stack developers with 1–5 years of experience
Already using Claude Code, Codex, or similar AI coding tools
Feeling that AI-generated code is “usable but unstable,” and wanting a controllable development rhythm
Interested in TDD, but never found the right entry point

Core Dependencies and Environment

Dependency	Minimum Version	Notes
Node.js	18+	Skills are distributed via npx
pnpm / npm	Any stable version	Use either one
AI coding tool	Any version that supports Skills	Claude Code, Codex, etc.

WARNING

mattpocock/skills is invoked via the /slash command, so your AI tool must support the Skills mechanism. Confirm that Skills are enabled in your tool before continuing.

Full Project Structure Tree

skills/
├── CLAUDE.md                          # Skill directory specification
├── README.md                          # Overview and usage documentation
├── skills/
│   ├── engineering/
│   │   ├── diagnose/
│   │   │   └── SKILL.md               # Structured debugging workflow
│   │   ├── grill-with-docs/
│   │   │   └── SKILL.md               # Deep interview with supplementary docs
│   │   ├── improve-codebase-architecture/
│   │   │   └── SKILL.md               # Architecture improvement diagnosis
│   │   ├── setup-matt-pocock-skills/
│   │   │   └── SKILL.md               # Initialization entry point
│   │   ├── tdd/
│   │   │   ├── SKILL.md               # TDD vertical-slice workflow
│   │   │   ├── tests.md               # Good / bad test examples
│   │   │   ├── mocking.md             # Mocking best practices
│   │   │   ├── refactoring.md         # When to refactor
│   │   │   ├── deep-modules.md        # Deep module design principles
│   │   │   └── interface-design.md    # Testable interface design
│   │   ├── to-issues/
│   │   │   └── SKILL.md               # Break requirements into GitHub Issues
│   │   ├── to-prd/
│   │   │   └── SKILL.md               # PRD generation
│   │   ├── triage/
│   │   │   └── SKILL.md               # Issue triage state machine
│   │   └── zoom-out/
│   │       └── SKILL.md               # Global code perspective interpretation
│   ├── productivity/
│   │   ├── caveman/
│   │   │   └── SKILL.md               # Compressed communication pattern, saves 75% of tokens
│   │   ├── grill-me/
│   │   │   └── SKILL.md               # Requirement deep interview
│   │   └── write-a-skill/
│   │       └── SKILL.md               # Guide to writing custom Skills
│   └── misc/
│       ├── git-guardrails-claude-code/
│       │   └── SKILL.md               # Git safety guardrails
│       └── scaffold-exercises/
│           └── SKILL.md               # Exercises directory scaffolding
└── .claude-plugin/
    └── plugin.json                    # Plugin metadata

Step-by-Step

Step 1: Install mattpocock Skills

In any AI coding tool that supports Skills, run the following command to complete the installation:

npx skills@latest add mattpocock/skills

The install script will guide you to choose which Skills to activate, and which AI tool each Skill should be bound to. Key step: make sure to select /setup-matt-pocock-skills, which is the initialization entry point for all subsequent Skills.

TIP

The install process doesn’t require sudo, and you don’t need to modify your project code. Skills are stored in your AI tool’s configuration directory, fully isolated from your current workspace.

Step 2: Initialize Configuration

After installation, run in your AI coding tool:

/setup-matt-pocock-skills

This Skill will ask you, in order:

Issue Tracker: Which tool you use to manage Issues (GitHub / Linear / local files)
Triage Labels: Which vocabulary to use when tagging an Issue (the /triage Skill uses these)
Document storage path: Where generated ADRs and documents should go

When configuration is complete, it will generate a CONTEXT.md in your project root. This becomes a “shared-language dictionary” between the AI and you. This file is very important: it standardizes how terms are used in the project, and after that, all naming and annotations by the AI will be based on it.

Step 3: Use `/grill-me` to Align Requirements

When you have a new requirement or a design proposal you want to hand to the AI, don’t start coding immediately. First run:

/grill-me

The AI will ask you questions one by one, enumerating every branch of the decision tree. For example, if you plan to “add refund functionality to the payments module,” the AI will follow up with:

Is the refund full or partial?
What are the refund triggers (user-initiated / risk-control automated / customer-service manual)?
What happens to inventory after the refund?
Should refund logs be persisted?

For every question, it will recommend an answer—you only need to confirm or correct it. This process is like a structured interview: it ensures the AI truly understands what you want to build, instead of drifting literally.

TIP

/grill-me is suitable for non-code scenarios. If you’re working within an existing codebase project, you can use /grill-with-docs—it runs the same interview, but also updates CONTEXT.md and ADRs (Architecture Decision Records).

Step 4: Use `/tdd` for Vertical-Slice, Test-Driven Development

Once requirements are aligned, move into the development phase. This is the most core part of mattpocock/skills: no horizontal slicing.

What is horizontal slicing? Finish all the tests first, then finish all the code. This is the worst-case misunderstanding area for TDD, and AI is especially likely to do it—resulting in tests that “verify imagined behavior rather than real behavior.”

The correct approach: only do one vertical slice at a time.

RED:   Write one test that describes the first behavior → test fails
GREEN: Write the minimum code to make the test pass → test passes
REFACTOR: Refactor (optional)

Repeat this loop. Each time the AI finishes a test, it has already “seen” the implementation for that slice, so it’s testing real behavior, not assumed behavior.

Let’s look at a concrete example. Suppose we need to implement a cart checkout feature:

// Round 1: Test only “the cart can add items”
import { describe, it, expect } from 'vitest';
import { Cart } from './cart';

describe('Cart', () => {
  it('allows adding an item', () => {
    const cart = new Cart();
    cart.addItem({ id: 'book-1', name: 'TypeScript 入门', price: 59 });
    expect(cart.items).toHaveLength(1);
  });
});

// cart.ts — Minimal implementation to make the test pass
export interface CartItem {
  id: string;
  name: string;
  price: number;
}

export class Cart {
  items: CartItem[] = [];

  addItem(item: CartItem) {
    this.items.push(item);
  }
}

// Round 2: Test “the cart can calculate total price”
it('calculates total price', () => {
  const cart = new Cart();
  cart.addItem({ id: 'book-1', name: 'TypeScript 入门', price: 59 });
  cart.addItem({ id: 'book-2', name: 'React 实战', price: 79 });
  expect(cart.totalPrice()).toBe(138);
});

// cart.ts — Add totalPrice method
export class Cart {
  items: CartItem[] = [];

  addItem(item: CartItem) {
    this.items.push(item);
  }

  totalPrice(): number {
    return this.items.reduce((sum, item) => sum + item.price, 0);
  }
}

Each round follows: test fails → minimum code passes → next round. That’s the core rhythm of vertical slicing.

WARNING

/tdd forbids “writing all tests at once, then writing all code at once.” If you break this rule, tests become insensitive to refactoring—after refactoring, tests turn fully red even though the behavior hasn’t actually changed.

Step 5: Use `/diagnose` for Structured Debugging

When code goes wrong, don’t immediately ask the AI to “help me fix the bug.” First establish a feedback loop by running:

/diagnose

/diagnose forces the AI to complete a six-stage debugging loop:

Stage	Core action
1. Build a feedback loop	Find a repeatable failure signal
2. Reproduce	Make the bug reliably reproducible
3. Hypothesise	Generate 3–5 falsifiable hypotheses
4. Instrument	Change only one variable at a time to locate the root cause
5. Fix + regression test	Write a regression test first, then fix
6. Cleanup + post-mortem	Clean up debug logs and summarize prevention steps

Why is this process important? Because the most common AI debugging approach is “guess, change, check the result; if wrong, guess again.” Without a feedback loop, that’s like throwing darts in the dark. /diagnose turns debugging into an engineering discipline—not mysticism.

A typical scenario: suppose your payment API suddenly starts returning 500 errors:

// Stage 1: Build a feedback loop — write a failing test that targets a specific API
import { describe, it, expect } from 'vitest';
import { createHonoServer } from './server';
import fetch from 'node-fetch';

describe('Payment API', () => {
  it('returns 200 for valid checkout', async () => {
    const app = createHonoServer();
    const response = await fetch('http://localhost:3000/api/checkout', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        items: [{ id: 'book-1', quantity: 1 }],
        userId: 'user-123',
      }),
    });
    expect(response.status).toBe(200); // currently failing; it’s actually returning 500
  });
});

Stage 3 will generate a hypothesis list:

If it’s “inventory service timing out causing an uncaught exception,” then mocking the inventory service to return success should eliminate the 500
If it’s “database transaction not committed,” then checking the transaction.commit() call will reveal the issue
If it’s “payment gateway response format parsing error,” then validating the gateway mock data format should reproduce the problem

AI verifies one hypothesis at a time, rather than bundling all changes together.

Step 6: Use `/write-a-skill` to Build a Custom Skill

When you’ve built a repeatable workflow for a particular project, you can package it as a Skill so the AI can invoke it on demand in that project:

/write-a-skill

The AI will ask you in sequence:

Which domain does this Skill cover?
What are the concrete usage scenarios?
Do you need to include scripts, or just pure instructions?
Do you have any reference materials?

Then it generates the standard structure:

my-custom-skill/
├── SKILL.md              # Main command file (required)
├── REFERENCE.md          # Detailed documentation (split if content exceeds 100 lines)
├── EXAMPLES.md           # Usage examples
└── scripts/
    └── helper.ts         # Helper scripts (optional)

In SKILL.md, the description field is the AI’s only entry point to perceive this Skill, and it has strict formatting requirements:

---
name: my-custom-skill
description: Generate a type-safe API client from a JSON Schema, and inject Mock data.
              Use when the user mentions "API client", "generate types", or "mock data".
---

the first sentence of description should say what it does, and the second sentence should say when it should trigger. Use trigger phrases to help the AI auto-load it.

TIP

The description is limited to 1024 characters, so be precise. Vague descriptions (e.g., “help process documentation”) won’t let the AI distinguish this Skill from other Skills.

Troubleshooting

1. AI doesn’t run tests and writes code directly

Symptom: You call /tdd, but the AI jumps straight to implementation and skips the RED phase.

Troubleshooting:

Confirm that CONTEXT.md already defines the project’s test framework and test conventions
At the start of /tdd, explicitly say: “Write only one test; don’t write any other code yet.”
If it still skips, manually type into the conversation: “First write a failing test,” to pull it back to RED

Root cause: The AI has a literal misunderstanding of the concept of TDD—it thinks “writing tests” is TDD, without understanding that “failing tests” themselves are a signal.

2. Horizontal-slicing trap — writing all tests at once

Symptom: In /tdd, the AI generates 10 test files in one go, then writes all implementations in one go.

Troubleshooting:

Direct intervention: tell the AI “Write only the first test, then implement it.”
Add an anti-pattern warning to CONTEXT.md: “No horizontal slicing—do only one vertical slice at a time.”

Root cause: The AI likes the illusion of “finishing everything at once.” Horizontal slicing may make it feel complete, but it produces fragile output in practice.

3. No feedback signal in the debugging loop

Symptom: You call /diagnose, but the AI repeatedly edits code without establishing a repeatable failure signal.

Troubleshooting:

Phase 1 didn’t pass—don’t proceed to Phase 2. Tell the AI clearly: “We don’t have a repeatable failure signal yet. Keep building the feedback loop.”
Try using node-fetch plus real HTTP requests to build an end-to-end signal, instead of mocking the entire module

Root cause: A feedback loop is the heart of debugging; without it, all subsequent actions are blind.

4. Skill description doesn’t work — Skill isn’t auto-triggered

Symptom: You wrote a custom Skill, but the AI doesn’t load it.

Troubleshooting:

Check whether that Skill is registered in plugin.json
Confirm that the description includes trigger words (the words users are actually likely to say)
Confirm that the SKILL.md’s name and description format meet the requirements (YAML frontmatter)

Root cause: Skills loading depends on keyword matching in description. If the description isn’t precise, it can’t trigger.

5. `/grill-me` interviews turn into one-way reporting

Symptom: When you run /grill-me, the AI only receives your information and doesn’t proactively ask follow-up questions.

Troubleshooting:

Only when the AI asks follow-ups should it explain that it’s entering interview mode—keep responding without interrupting
If the AI goes silent for more than 3 rounds, manually say: “Keep asking—I have more information.”
The underlying issue is usually that the AI believes it “already understands,” when it’s actually only understood literally

Root cause: The AI tends to converge quickly to a state that “looks understood,” rather than exhaustively exploring all branches.

6. `CONTEXT.md` gets too big, causing token explosions

Symptom: As the project progresses, CONTEXT.md keeps growing. The AI starts to experience context overflow or slower responses.

Troubleshooting:

Regularly refactor CONTEXT.md: keep “stable, general terminology,” and move “temporary decision records” into docs/adr/
/improve-codebase-architecture can help identify what should be extracted from CONTEXT.md

Root cause: CONTEXT.md is a dynamically growing file; without maintenance, it turns into a second monorepo.

Project Overview

Difficulty / Duration / What You’ll Gain

Target Audience

Core Dependencies and Environment

Full Project Structure Tree

Step-by-Step

Step 1: Install mattpocock Skills

Step 2: Initialize Configuration

Step 3: Use /grill-me to Align Requirements

Step 4: Use /tdd for Vertical-Slice, Test-Driven Development

Step 5: Use /diagnose for Structured Debugging

Step 6: Use /write-a-skill to Build a Custom Skill

Troubleshooting

1. AI doesn’t run tests and writes code directly

2. Horizontal-slicing trap — writing all tests at once

3. No feedback signal in the debugging loop

4. Skill description doesn’t work — Skill isn’t auto-triggered

5. /grill-me interviews turn into one-way reporting

6. CONTEXT.md gets too big, causing token explosions

Further Reading / Advanced Directions

Step 3: Use `/grill-me` to Align Requirements

Step 4: Use `/tdd` for Vertical-Slice, Test-Driven Development

Step 5: Use `/diagnose` for Structured Debugging

Step 6: Use `/write-a-skill` to Build a Custom Skill

5. `/grill-me` interviews turn into one-way reporting

6. `CONTEXT.md` gets too big, causing token explosions