Repository: https://github.com/DietrichGebert/ponytail
Low effort: get started in 10 minutes, finish your first benchmark within an hour—so you can finally fix your AI coding “runaway” habit.
Preface
Have you ever experienced this—ask Claude Code to build a date picker. It takes itself very seriously: installs flatpickr, writes a wrapper component, adds a CSS file, then starts discussing time zone handling… and hands you 287 lines.
What you meant to ask for was: “I just need an input box where I can pick a date.” What you got was: “A date-picker framework.”
That’s the AI Agent “overengineering illness”: models are trained to look “professional,” so they automatically add abstraction layers, configuration options, error handling, and test coverage—when what you actually want might be nothing more than a simple <input type="date">.
Ponytail is here to cure it. Open-sourced by Dietrich Gebert, its core idea in one sentence: “The best code is the code that never gets written.”
Ponytail isn’t a model and it’s not an IDE plugin. It’s a rule set in the style of a “lazy senior engineer.” It gives your AI a six-step ladder:
1. Does this actually need to be built? → No: skip (YAGNI)
2. Can the standard library do it? → Use it
3. Can a native platform feature do it? → Use it
4. Can an already-installed dependency do it? → Use it
5. Can it be done in one line? → Write one line
6. If all else fails: write the smallest amount of code that works
In real experiments on tiangolo’s full-stack-fastapi-template repository, with 12 feature tickets, n=4, and Haiku 4.5, Ponytail produced this results sheet:
| vs no-rules baseline | LOC | tokens | cost | time | safe |
|---|---|---|---|---|---|
| Ponytail | -54% | -22% | -20% | -27% | 100% |
| Bare “YAGNI + one-line” prompt | -33% | -14% | -21% | -30% | 95% |
| caveman (cramped prompt) | -20% | +7% | +3% | +2% | 100% |
Ponytail is the only approach where all metrics drop, and the only one that cuts code without sacrificing safety. The date picker goes from 404 lines to 23 lines; the color picker goes from 287 lines to 23 lines—because it directly uses the browser’s built-in <input type="color"> and <input type="date">.
It supports 14 AI programming tools: Claude Code, Codex, Cursor, Windsurf, Cline, Copilot CLI, Aider, Kiro, Zed, CodeWhale, OpenCode, Pi, Gemini CLI, and OpenClaw. Today, we’ll install this rule set into Claude Code and cure your Agent’s “overengineering illness” in 30 seconds.
Target audience
- Developers with 1–5 years of experience writing code with AI Agents day-to-day
- People who feel that “AI code is too long” and “it drags in a bunch of unused dependencies”
- Teams who want consistent coding style, but don’t want to lock everything down with rigid ESLint rules
- Small business owners / Tech Leads who care about AI programming costs—and hate token bills
Core dependencies and environment
- Node.js 18+ (required: Ponytail lifecycle hooks run on Node; if you use nvm, make sure it’s on the PATH for non-interactive shells)
- A supported AI Agent (Claude Code for the demo)
- An LLM API key (demo uses Defapi for Claude Haiku 4.5 at half price)
- Optional: Python 3.10+ (used when running benchmarks with pandas)
- Optional: Git (clone the repo)
TIP
About API key costs: Ponytail itself is open-source and free. But to get your AI Agent running, you’ll need to spend tokens. If you also care about the bill, I strongly recommend Defapi—it offers official half-price Claude, GPT, and Gemini models, with a completely OpenAI / Anthropic–compatible API. Just swap the base URL. The tutorial below shows how to switch.
Full project structure
ponytail/
├── AGENTS.md # Core rules (the "lazy philosophy" read by all agents)
├── README.md / README.es.md # Trilingual README (EN/中文/ES)
├── package.json # Defines the pi-agent package
├── commands/ # 6 slash commands
│ ├── ponytail.toml # /ponytail [lite|full|ultra|off]
│ ├── ponytail-review.toml # /ponytail-review (cuts the current diff)
│ ├── ponytail-audit.toml # /ponytail-audit (scans the entire repo)
│ ├── ponytail-debt.toml # /ponytail-debt (collects ponytail: comments)
│ ├── ponytail-gain.toml # /ponytail-gain (looks at benchmark results)
│ └── ponytail-help.toml # /ponytail-help
├── skills/ # 6 skill images
│ ├── ponytail/ # Main rules
│ ├── ponytail-review/ ... # The other 5
├── hooks/ # Claude / Codex lifecycle hooks
│ ├── ponytail-config.js # Mode parsing (env + config.json)
│ └── ponytail-instructions.js
├── ponytail-mcp/ # MCP server adapter (for MCP-only hosts)
│ ├── index.js
│ ├── instructions.js
│ └── test/
├── examples/ # 12 real "overengineering vs one-line" comparisons
│ ├── date-picker.md / color-picker.md(web built-ins)
│ ├── deep-clone.md(structuredClone)
│ ├── debounce.md
│ ├── email-validation.md(75 lines → 3 lines)
│ └── ... 12 total
├── benchmarks/ # promptfoo + agentic benchmarks
│ ├── promptfooconfig.yaml # single-round benchmark config
│ ├── benchmark-local.py # agentic real-repo benchmark
│ ├── agentic/ # 12 ticket scripts
│ └── results/2026-06-18-agentic.md # Complete data
├── docs/
│ ├── agent-portability.md # Which agent loads which file
│ └── platform-native.md
├── .openclaw/ # OpenClaw skill adapter (auto-generated)
├── .cursor/ .windsurf/ # Cursor / Windsurf rule files
├── .clinerules/ # Cline rules
├── .kiro/steering/ # Kiro rules
└── tests/ # Tests for rule consistency
Step-by-step tutorial
Step 1: Install Ponytail into Claude Code
In Claude Code, Ponytail is a plugin marketplace item—you can set it up in 30 seconds.
# Add Ponytail repo to your plugin marketplace list
/plugin marketplace add DietrichGebert/ponytail
# Install the main skill (once per session)
/plugin install ponytail@ponytail
After installation, open a new session. The startup text will display the current mode (default is full). You’ll see output like this:
Ponytail v0.1.0 [full] Lazy senior dev mode active
1. Need to build? 2. Stdlib? 3. Platform? 4. Installed dep?
5. One line? 6. Minimum that works.
WARNING
nvm / Nix users note: Claude Code’s lifecycle hooks run in a non-interactive shell. Node must be on the PATH of that shell. If you use nvm, make sure you’ve sourced nvm in ~/.zshrc or ~/.bashrc. Just being able to run node -v in your current terminal is not enough.
If you want to change the strength, input:
/ponytail lite # light mode (protects steps 1–2)
/ponytail full # default
/ponytail ultra # aggressive mode (no abstraction layers at all)
/ponytail off # off
/ponytail # no parameters = show current mode
You can also persist it:
# permanently default to ultra
export PONYTAIL_DEFAULT_MODE=ultra
Or write a config file:
// ~/.config/ponytail/config.json
// Windows: %APPDATA%\ponytail\config.json
{ "defaultMode": "ultra" }
Step 2: Run a counterexample comparison
Let’s look at the result directly. Prepare two identical prompts:
Prompt A (Ponytail off): first run /ponytail off, then ask
Add a color picker to the settings page
You’ll get an answer of about 287 lines: install a react-color library (or create a custom 5-file component), add prop validation, add onChange throttling, add an accessible label, and wire up CSS variables.
Prompt B (Ponytail on): run /ponytail full first, and ask the exact same thing again.
You’ll get:
// ponytail: browser has one
<input type="color" />
One line. Done.
Ponytail’s AGENTS.md contains this core rule snippet (verbatim from the original):
Before writing any code, stop at the first rung that holds:
1. Does this need to be built at all? (YAGNI)
2. Does the standard library already do this? Use it.
3. Does a native platform feature cover it? Use it.
4. Does an already-installed dependency solve it? Use it.
5. Can this be one line? Make it one line.
6. Only then: write the minimum code that works.
Notice the last line: “Only then”—it doesn’t mean “don’t write,” it means “go through the first 5 steps first.”
Step 3: Use Defapi to get the bill down by half
Ponytail reduces code volume, but to run the AI Agent itself, you still need to burn tokens for Claude / GPT. Defapi provides official half-price Claude / GPT / Gemini with a fully compatible API.
We switch to Defapi and run a benchmark:
Step 3.1: Get a Defapi Key
Go to defapi.org, register, grab a key starting with dk-xxx, and write it into Ponytail’s .env:
# in the root directory of the ponytail repo
cat > .env <<'EOF'
ANTHROPIC_API_KEY=dk-Your-Defapi-Key
ANTHROPIC_BASE_URL=https://api.defapi.org
EOF
Step 3.2: Verify with curl first
curl -s https://api.defapi.org/api/v1/messages \
-H "Authorization: Bearer dk-Your-Defapi-Key" \
-H "content-type: application/json" \
-d '{
"model": "anthropic/claude-haiku-4.5",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Describe the word ponytail using an emoji"}
]
}'
Response:
{
"id": "msg_01H...",
"role": "assistant",
"content": [{"type": "text", "text": "🦄 (it should never happen)"}],
"usage": {"input_tokens": 22, "output_tokens": 12}
}
Step 3.3: Point promptfoo to Defapi
Ponytail ships with promptfoo benchmarks. Edit benchmarks/promptfooconfig.yaml:
providers:
- id: anthropic:messages:anthropic/claude-haiku-4.5
config:
baseURL: https://api.defapi.org
apiKey: ${ANTHROPIC_API_KEY}
Run:
npx promptfoo@latest eval -c benchmarks/promptfooconfig.yaml
TIP
Defapi key facts:
- Compatible with
v1/chat/completions(OpenAI protocol) - Compatible with
v1/messages(Anthropic protocol) - Compatible with
v1beta/models/(Gemini protocol) - The same
dk-key works across Claude / GPT / Gemini - Price examples: Claude Sonnet 4.5 official $3 / $15, Defapi $1.5 / $7.5; Claude Haiku 4.5 official $1 / $5, Defapi $0.5 / $2.5
Money saved (estimated from Ponytail benchmark: 12 tickets × n=4 = 48 runs):
| Model | Official cost / month | Defapi cost / month | Saved |
|---|---|---|---|
| Claude Sonnet 4.5 | ~$60 | ~$30 | $30 |
| Claude Haiku 4.5 | ~$20 | ~$10 | $10 |
| Claude Opus 4.5 | ~$250 | ~$125 | $125 |
Ponytail cuts costs by 20%, Defapi cuts by another 50%. Together, that means your real bill gets to one-quarter (4x discount).
Step 4: Run /ponytail-review to cut the current diff
Just guarding the code at write-time with Ponytail isn’t enough—you already have a backlog of “overengineering” code. That’s where the review command comes in.
# in a repo with changes
/ponytail-review
It reviews only the current git diff, and outputs in a fixed format:
L12: delete unused `cache` parameter; no caller passes it
L34: stdlib Array.prototype.sort is stable since ES2019; drop `lodash.orderby`
L88: native `URLSearchParams` covers this; remove custom `parseQuery`
L102: yagni `BaseRepository` has one implementation; inline it
L150: shrink loop into `arr.filter(x => x.active).map(x => x.id)`
---
Net removable: 47 lines, 1 dependency
Tag types:
delete— dead code / speculative featuresstdlib— reimplementing the standard librarynative— work already covered by a dependency / can be done by platform-native featuresyagni— an abstraction layer with only one implementationshrink— same logic, fewer lines
The last line tells you the “net removable lines”—that’s your tech debt metric.
If the output is Lean already. Ship., it means your code is already lean enough, so you can merge with confidence.
Step 5: Run /ponytail-audit to scan the entire repo
Review focuses on the diff; audit focuses on the whole tree.
/ponytail-audit
The output looks similar, but it’s sorted by “what can be cut the most”:
delete src/utils/cache.ts (412 lines) — only used in 1 place; inline
stdlib src/utils/deep-clone.ts — use structuredClone
native src/components/DatePicker/ (287 lines) — <input type="date">
yagni src/repositories/BaseRepository.ts (180 lines) — 1 impl, inline
shrink src/api/users.ts:42-78 — same logic, 60 → 18 lines
---
Net removable: 1,247 lines, 4 dependencies
Practical advice: run review before audit. Review refactors the diff and you merge it; audit then helps you prioritize the next cleanup wave.
Step 6: Enable it in other Agents
The core advantage of Ponytail is: one set of rules, works everywhere. It provides adaptation files for every major AI programming tool:
Codex (CLI mode)
codex plugin marketplace add DietrichGebert/ponytail
codex
# Open /plugins → select Ponytail → install
# Open /hooks → trust two lifecycle hooks → open a new thread
Cursor
Copy .cursor/rules/ponytail.mdc into your project:
cp .cursor/rules/ponytail.mdc ~/your-project/.cursor/rules/
Or install globally:
cp .cursor/rules/ponytail.mdc ~/.cursor/rules/
Windsurf
cp .windsurf/rules/ponytail.md ~/.codeium/windsurf/memories/
GitHub Copilot CLI
copilot plugin marketplace add DietrichGebert/ponytail
copilot plugin install ponytail@ponytail
OpenClaw (if you’re already using it)
# The most elegant one-liner
clawhub install ponytail
Or copy manually:
cp -r .openclaw/skills/ponytail ~/.openclaw/skills/
Gemini CLI
gemini extensions install https://github.com/DietrichGebert/ponytail
Pi / Aider / Kiro / Zed / CodeWhale
These agents directly read AGENTS.md:
# project-level
cp AGENTS.md ~/your-project/AGENTS.md
# global (pi / Aider / CodeWhale can all recognize it)
cp AGENTS.md ~/.pi/AGENTS.md
Complete mapping table (from docs/agent-portability.md):
| Agent | Load method | Support /ponytail commands |
|---|---|---|
| Claude Code | marketplace | ✅ |
| Codex | marketplace + hooks | ✅ |
| OpenCode | plugin + opencode.json | ✅ |
| OpenClaw | clawhub | ✅ |
| Gemini CLI | extension | ✅ |
| Pi | pi install | ✅ |
| Copilot CLI | plugin | ✅ (with ponytail: namespace) |
| Cursor | .cursor/rules/ | ❌ (read-only rules) |
| Windsurf | .windsurf/rules/ | ❌ |
| Cline | .clinerules/ | ❌ |
| Kiro | .kiro/steering/ | ❌ |
| Aider | AGENTS.md | ❌ |
| Zed | AGENTS.md | ❌ |
| CodeWhale | AGENTS.md | ❌ |
| GitHub Copilot (editor) | .github/copilot-instructions.md | ❌ |
Step 7: Run your own benchmark
Ponytail’s real data isn’t dreamed up—it’s generated by running benchmarks/benchmark-local.py. You can also pick 5 of your own real tasks and reproduce the experiment.
Step 7.1: Prepare prompts
Edit benchmarks/prompts.json (or use the provided 5 prompts):
[
{ "id": "date-picker", "task": "Add a date picker to the settings page" },
{ "id": "color-picker", "task": "Add a color picker to the settings page" },
{ "id": "email-validate", "task": "Write a Python function that validates email addresses" },
{ "id": "deep-clone", "task": "Deep clone this object: {sample}" },
{ "id": "debounce", "task": "Write a debounce function in JavaScript" }
]
Step 7.2: Run three arms for comparison
# baseline: nothing added
npx promptfoo@latest eval -c benchmarks/promptfooconfig.yaml
# ponytail: add the skill plugin arm
PONYTAIL_DEFAULT_MODE=full npx promptfoo@latest eval -c benchmarks/promptfooconfig.yaml
Step 7.3: Check the results
Output benchmarks/output.json, which includes for each prompt under each arm:
loc— code line counttokens— total tokenscost— USDtime— end-to-end timepassed_safety— whether it passes the safety tests (input validation, error handling, a11y)
In most cases, the ponytail arm reduces LOC by 50–80% compared to baseline.
Step 8: Run ponytail-mcp (advanced)
If your AI host can only use MCP (for example, some desktop apps), Ponytail also has an MCP server adapter:
cd ponytail-mcp
npm install
node index.js # start the stdio MCP server
Add this to your host’s MCP configuration:
{
"mcpServers": {
"ponytail": {
"command": "node",
"args": ["ponytail-mcp/index.js"]
}
}
}
It exposes:
- Prompt
ponytail: returns rule text, optionally with amodeparameter (lite / full / ultra) - Tool
ponytail_instructions: same as above, but includesstructuredContentfor code-execution style hosts
WARNING
MCP mode is “user manually calls”—if you click it once in the prompt menu, it takes effect once. It is not “automatically injected every round.” If you need always-on behavior, use the plugin mode in Claude Code / Codex, not MCP.
Troubleshooting FAQs
Q1: No response after install—startup text isn’t showing?
99% of the time: Node isn’t on the PATH in a non-interactive shell. Verify with:
# Run this in a new shell (simulate non-interactive)
bash -lc 'node -v' # bash
zsh -lc 'node -v' # zsh
If you get command not found, source nvm into ~/.bashrc / ~/.zshrc, or install system Node directly:
# macOS
brew install node@20
# Windows
winget install OpenJS.NodeJS.LTS
Q2: “I really need that 120-line cache class”—what if it’s a hard requirement?
Two options:
# Temporarily turn it off
/ponytail off
# Or allow it locally: say it explicitly in the prompt
"Build a 120-line cache class, ignore ponytail for this task"
Ponytail is a rules set, not handcuffs. With an explicit override, the model will comply.
Q3: Does Ponytail conflict with ESLint / Prettier?
No—no conflict, different responsibilities:
- Ponytail: controls “should we write this at all”—whether an abstraction is needed, whether a dependency is installed, whether a wrapper exists
- ESLint: controls “is the code correct”—naming, style, unused variables
- Prettier: controls “does it look nicely formatted”—indentation, semicolons, line breaks
Best results come from enabling all three. Ponytail decides how long the code should be at the very top; ESLint / Prettier handle the details downstream.
Q4: How do teams standardize the rules?
AGENTS.md is a repository-level file—just commit it to git:
# in your team repository root
curl -o AGENTS.md https://raw.githubusercontent.com/DietrichGebert/ponytail/main/AGENTS.md
git add AGENTS.md
git commit -m "chore: adopt ponytail team-wide coding rules"
All agents that read AGENTS.md (CodeWhale, Aider, Zed, Pi, Kiro, Codex extension) will automatically follow.
Q5: Does Ponytail degrade safety tests?
No—this is Ponytail’s most critical benchmark metric. In the comparison table, baseline / caveman / Ponytail are all 100% safe pass; only the bare “YAGNI + one-line” prompt drops to 95%.
Ponytail’s AGENTS.md includes a dedicated section:
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs.
Translation: Whether it’s “lazy” depends on where. Save effort if it’s business logic or UI plumbing; do not skip input validation, error fallbacks, security, or a11y.
Q6: How do I run it in CI?
Extract the /ponytail-review logic into a standalone script (Ponytail repo’s benchmarks/correctness.test.js provides a reference implementation), then:
# .github/workflows/ponytail.yml
name: ponytail
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: node scripts/ponytail-review.js origin/main
env:
ANTHROPIC_API_KEY: ${{ secrets.DEFAPI_KEY }}
ANTHROPIC_BASE_URL: https://api.defapi.org
Failing in PR means the current diff still looks like something Ponytail thinks can be “trimmed further.”
Q7: MCP mode vs always-on mode—what should I choose?
Look at your host:
| Host type | Recommended mode |
|---|---|
| Claude Code / Codex | always-on (plugin + hook) |
| OpenCode | always-on (plugin) |
| Cursor / Windsurf / Cline | always-on (rules files) |
| Gemini CLI | always-on (extension) |
| Pi / Aider / Zed | always-on (AGENTS.md) |
| Desktop apps that only offer MCP prompt menus | MCP (manual trigger) |
| Fully custom agent framework | MCP + tool mode |
Further reading / advanced directions
The philosophical roots of Ponytail
- Rich Hickey’s “Simple Made Easy” (what the “easy” and “simple” differences really are—talk by the father of Clojure)
- Sandi Metz’s “The Magic Tricks of Testing” (principles for minimizing test coverage)
- Ted Neward’s “Thirty Years of ‘WTF’” (satire on enterprise code bloat)
- Another way to phrase the six-step ladder idea: Coad’s “Just Enough Architecture”
Write ponytail-style comments yourself
A ponytail: comment is a note left for “the developer who will read this code in the future.” Suggested format:
// ponytail: <reason>, <known limitations / upgrade path>
Example:
# ponytail: the stdlib re module is enough; no need to install email-validator
# Known limitations: it doesn’t validate all RFC 5322 edge cases; if you need to support [email protected] edge cases, switch to email-validator
import re
def is_valid_email(email: str) -> bool:
return bool(re.match(r'^[^@]+@[^@]+\.[^@]+$', email))
The /ponytail-debt command scans all ponytail: annotations and generates a tech debt ledger—“later” won’t become “never.”
Integrate into the team Code Review workflow
Run /ponytail-review like a lint, and add a checklist item to the PR template:
## Ponytail review
- [ ] Ran `/ponytail-review` on this diff
- [ ] Net removable lines: ___
- [ ] If > 0, justification: ___
Complete benchmark data
benchmarks/results/2026-06-18-agentic.md contains the full methodology: 12 tickets × 3 arms × 4 rounds, plus limitations notes. It’s worth reading—especially the part about the “fair agentic baseline.” Previously, 80–94% “single-round myths” were inflated by the conversational baseline; 54% is the honest number.
About Ponytail’s boundaries
Ponytail’s ability to cut code relies on the assumption that the code is “correct.” It does not verify:
- Whether the business logic is right (you need to write tests for that)
- Whether performance is sufficient (it may flag O(n²) in ponytail comments, but it won’t automatically change it)
- Whether edge cases are fully covered (lazy comments list gaps, but it won’t add tests for you)
It’s a starting point, not the finish line. After installing Ponytail, there’s less code produced by AI—but also less review burden on you. These two are two sides of the same coin.
Final takeaway: Ponytail isn’t “write less code”; it’s “first ask whether you should write it.”
Next step: use Defapi to get a half-price Claude key and run Ponytail on your own real tasks. You’ll find that the 60% cut on your bill is split: half from Ponytail cutting tokens, and half from Defapi cutting unit price.