Getting Started with gstack: Master Product Design, Code Review, and QA Deployment Solo

Difficulty: Beginner | Duration: 15 minutes | Takeaway: Master gstack core usage and understand AI programming workflow design concepts

Target Audience

You're already using Claude Code to write code, but you feel a lack of a "team feel" when building products solo—no one to review your architecture, no one to run QA, and no one to vet your designs. Want to upgrade AI programming from "advanced autocomplete" to a "real engineering team"?

This article is for you. We'll start from installation and walk through a complete /office-hours → /ship workflow, letting you experience firsthand how gstack turns one person into a whole army.

TIP

The author of gstack is Garry Tan, President of Y Combinator. According to him, he and his team used these tools to deliver 600,000 lines of production code in the first two months of 2026, peaking at 10k-20k lines per day. The same tools are open-source and free.

Prerequisites: Basic knowledge of Git and command line, understanding of basic Claude Code usage.

Core Dependencies and Environment

Dependency	Description	Minimum Version
Claude Code	AI programming tool, Download from official site	Latest
Git	Version control	Any
Bun	JavaScript runtime, used to compile gstack binaries	v1.0+
Node.js	Required for Windows users only (resolves known Bun bugs on Windows)	Any

Repository Address: https://github.com/garrytan/gstack

Operating Systems: macOS, Linux (Git Bash), Windows 11 (WSL or Git Bash)

Complete Project Structure

After cloning locally, gstack's core directory structure is as follows:

gstack/
├── browse/                  # Persistent headless browser engine (Playwright + custom CDP layer)
│   ├── src/                 # CLI + HTTP server + command implementation
│   └── dist/                # Compiled single-file binary (~58MB)
├── office-hours/            # YC-style product dialogue, starts all design docs
├── plan-ceo-review/         # CEO-level product strategy review (scope, positioning, priority)
├── plan-eng-review/         # Engineering architecture review (data flow, state machines, test plans)
├── plan-design-review/      # Design review (UI/UX scoring, Slop detection)
├── design-consultation/     # Building design systems from scratch
├── review/                  # PR code review (auto-fix + issue grading)
├── investigate/             # Systematic root cause debugging (Iron Law: No investigation, no fix)
├── qa/                      # QA testing + automated atomic commit fixes
├── qa-only/                # QA report mode (report only, no code changes)
├── cso/                    # Chief Security Officer mode (OWASP Top 10 + STRIDE)
├── ship/                   # Release workflow (sync main → test → push → open PR)
├── land-and-deploy/        # Merge PR → deploy → production health verification
├── canary/                 # Post-launch monitoring loop (error rates, performance regression)
├── benchmark/              # Performance benchmarking (Core Web Vitals, page size)
├── document-release/       # Post-release documentation sync
├── retro/                  # Retrospective meetings (supports global cross-project mode)
├── codex/                  # OpenAI Codex second opinion (cross-model analysis)
├── browse/                 # Browser automation (persistent Chromium, ~100ms/command)
├── setup-browser-cookies/  # Import cookies from real browsers (Chrome/Arc/Brave/Edge)
├── setup-deploy/           # Deployment configuration wizard (detects platform, URL, commands)
├── careful/                # Dangerous command warnings (rm -rf / DROP TABLE, etc.)
├── freeze/                 # Lock editing scope to prevent out-of-bounds modifications
├── guard/                  # Combines careful + freeze, equivalent to maximum security mode
├── unfreeze/               # Remove freeze
├── autoplan/               # Automated chaining: Full CEO → Design → Engineering review process
└── bin/                    # CLI toolset (gstack-config / gstack-analytics, etc.)

28 skills covering the full lifecycle from product design → architecture review → development → security → QA → deployment → retrospective.

Step-by-Step Guide

Step 1 — Installing gstack

Global Installation (Recommended)

Open Claude Code and paste the following command into the chat box; Claude will automatically complete the remaining steps:

git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup

The ./setup script will:

Detect the system environment (Bun / Node / Git)
Compile the browse/dist/browse binary (bun build --compile, approx. 10 seconds)
Register all 28 skills with Claude Code

After installation, you need to tell Claude Code about the gstack configuration. Add the following content to the CLAUDE.md file in your project root:

## gstack
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse,
/qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro,
/investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard,
/unfreeze, /gstack-upgrade.

Project-level Installation (For teammates)

If you want team members to automatically get gstack after a git clone:

cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup

WARNING

Project-level installation commits actual files to the repository (not as a git submodule), and the .git/ directory will be deleted. Ensure you understand this before proceeding.

Verify Installation

# Check if binary exists
ls ~/.claude/skills/gstack/browse/dist/browse

# Check if skills are successfully registered, type in Claude Code:
/help
# You should see gstack-related skills in the list of available commands

Step 2 — Running Your First Complete Workflow

This is the core usage of gstack: 5 commands covering the full process from product design to release. Let's run it in a real local project.

2.1 Define Product with `/office-hours`

/office-hours

gstack starts YC Office Hours mode—it asks you 6 mandatory questions to reframe your product thinking. The key is to voice real pain points, not just describe features.

Example Dialogue:

You: I want a feature that generates a daily briefing based on my calendar every morning.

gstack: I'm reframing your thinking. You're talking about a "briefing generator," but the pain points you've described are actually:
1. Multiple Google Calendars with stale info
2. Preparation time is too long
3. Results aren't good enough
You're actually building a "Personal Chief of Staff AI."

Should we re-define the scope? Or tell me why the original scope was correct?

After the design dialogue, gstack generates a design document that automatically flows into subsequent review skills.

2.2 Review Product Strategy with `/plan-ceo-review`

/plan-ceo-review

gstack reads the design document from the previous step and performs a four-quadrant review from a CEO/Founder perspective:

Expand: Where to expand the scope?
Selectively Expand: Keep the core, what to cut?
Maintain Scope: Current size is just right
Contract: Focus on the narrowest wedge

It also provides 3 implementation paths, each marking the Man-Days vs. AI-Time comparison:

Implementation Path	Manual Estimate	AI+gstack Estimate
MVP (Narrowest Wedge)	2 Weeks	1 Day
Full v1.0	3 Months	1 Week

2.3 Review Code with `/review`

Once the code is written, run this on your branch:

/review

gstack will:

Auto-fix obvious issues (e.g., unhandled edge cases)
Pop up an AskUserQuestion for complex issues to get your input
Output a graded report: [AUTO-FIXED] / [ASK] / [WARN]

Example Output:

[/review] Analysis complete:

[AUTO-FIXED] 2 items
  - src/api/calendar.ts:67 — Missing Promise reject handling
  - src/utils/date.ts:12 — Hardcoded timezone offset

[ASK] Race condition in refresh loop — Recommended fixes:
A) Add mutex lock (safer, +15 lines)
B) Add debounce (simpler, +5 lines)
Recommendation A to comprehensively handle concurrency scenarios.

2.4 Test Your App with `/qa`

# Replace with your staging URL
/qa https://staging.myapp.com

gstack’s /qa launches a persistent Chromium browser (reusing cookies and login state) to automatically:

Open the page and take screenshots
Click through critical paths (Login → Core Feature → Edge Scenarios)
Find bugs → Atomic commit fixes → Re-verify
Generate regression tests for each fix

Difference from standard automated testing: Standard tests run assertions you've already written. /qa proactively seeks scenarios where you haven't written assertions—empty states, error states, and concurrency boundaries.

2.5 Release with `/ship`

/ship

/ship executes the full release pipeline:

# Equivalent to manual operations:
git checkout main && git pull
git merge your-branch
pnpm test           # Or your test command
pnpm build
git push
gh pr create        # Automatically opens PR with change summary

Each step has a checklist and only proceeds if passed. If an issue is encountered, it stops and waits for your decision.

Step 3 — Deep Dive into Core Skills by Stage

3.1 Design Phase: office-hours and plan series

The core of this phase is turning "ideas" into "design documents" that all subsequent steps can read.

/office-hours — Not a brainstorm, but a product reframing. gstack will:

Identify unspoken real needs (back-solving from described pain points)
Challenge your default assumptions
Generate an implementation path with specific milestones

/plan-ceo-review — Reads design docs to judge based on market size, competitive landscape, and priority.

/plan-eng-review — Locks down the technical architecture:

ASCII data flow diagrams
State machine design
Failure path analysis
Test matrix (at least one test per branch)

TIP

Each plan skill outputs Markdown files to the project root. These files are inputs for downstream skills—/review reads architecture decisions, and /qa reads test plans.

3.2 Development Phase: review and investigate

/review core logic is to "find bugs that pass CI but explode in production." It doesn't replace a linter; it performs semantic analysis for:

Logical errors (missing conditions, unbalanced branches)
Concurrency issues (race conditions, deadlock patterns)
Security risks (injection, auth bypass)

/investigate is dedicated to debugging. When a bug keeps recurring, run:

/investigate

gstack will:

Propose hypotheses
Verify them one by one (no fixing, just investigating)
Stop once the root cause is found (Iron Law: No fixing without investigation)

3.3 Release Phase: ship, land-and-deploy, canary

/ship          # After code review passes → Open PR
/land-and-deploy  # Merge PR → Deploy → Verify production health
/canary        # Monitor for 30 minutes after deployment

/canary monitors three metrics:

Console Error Rate: Are JavaScript exceptions rising?
Performance Regression: Has Core Web Vitals worsened?
Page Failure Rate: Are critical pages returning 500s?

If any metric is abnormal, it auto-alerts and can trigger a rollback.

Step 4 — Browser Automation in Practice

/browse is gstack's most unique capability: it runs a long-lived Chromium process that receives commands via a local HTTP API, with an average response time of ~100ms, supporting true login state reuse.

4.1 Basic Usage

# Open a page
$B goto https://example.com

# View interactive elements (auto-numbered @e1, @e2 ...)
$B snapshot -i

# Click the 3rd element
$B click @e3

# Fill out a form
$B fill @e1 "[email protected]"
$B fill @e2 "password123"

# Screenshot
$B screenshot /tmp/result.png

IMPORTANT

The first call auto-starts Chromium (~3 seconds), and subsequent commands take ~100-200ms. The browser auto-closes after 30 minutes of inactivity.

4.2 Verifying UI Changes

# Baseline snapshot
$B snapshot -i

# Perform action
$B click @e5

# Diff comparison (shows only what changed)
$B snapshot -D

The -D flag outputs in unified diff format, telling you which elements appeared, disappeared, or changed content after an operation.

Import cookies from a real browser:

# Interactive selector (macOS supports Chrome / Arc / Brave / Edge)
$B cookie-import-browser

# Specify domain directly
$B cookie-import-browser chrome --domain .github.com

WARNING

Cookie import relies on the macOS Keychain; a system authorization dialog will pop up the first time. gstack does not access silently—the user must manually click "Allow." Cookie values are decrypted in memory and injected into the browser without touching the disk.

4.4 Responsive Testing

# Generate screenshots for mobile, tablet, and desktop simultaneously
$B goto https://yourapp.com
$B responsive /tmp/layout
# Output: layout-mobile.png, layout-tablet.png, layout-desktop.png

Step 5 — Team Collaboration and Advanced Usage

5.1 Vendored Installation Explained

Global installation is for personal use; Vendored installation allows team sharing:

# Execute in project root
git clone https://github.com/garrytan/gstack.git .claude/skills/gstack
cd .claude/skills/gstack
./setup

Key points:

You are installing a snapshot, not a submodule
The .git directory is deleted, so it won't pollute your repo history
Upgrading: Each member reruns ./setup after pulling the latest

5.2 Parallel Sprints and Conductor

gstack's structure naturally supports parallelism. A single Claude Code is one "person," but multiple instances can run different sprint stages in parallel.

Conductor (officially recommended by gstack) can launch multiple Claude Code instances simultaneously, each running in an independent workspace:

Instance A: /office-hours re-defining the product
Instance B: Implementing a specific feature
Instance C: /review checking another branch
Instance D: /qa testing the staging environment

All instances share the same Git repository. No conflicts arise—Git’s branching model provides natural isolation.

5.3 Safety Mode

Enable safety guardrails when handling production code:

/guard   # = /careful + /freeze

/careful: Dangerous commands (rm -rf, DROP TABLE, git push --force) will trigger confirmation pop-ups
/freeze: Limits file editing scope to specified directories to prevent accidental production code changes during debugging
/unfreeze: Lifts the freeze

Troubleshooting FAQs

1. Skills Not Showing Up

# Go to the gstack directory and run setup manually
cd ~/.claude/skills/gstack && ./setup

If binary compilation fails, check your Bun version:

bun --version  # Needs v1.0+

For macOS/Linux installation issues:

# Ensure directory exists
mkdir -p ~/.claude/skills/

# Ensure execution permissions
chmod +x ~/.claude/skills/gstack/setup

2. `/browse` Fails to Start

# Manually build the binary
cd ~/.claude/skills/gstack/browse
bun install
./setup

If the Bun version is too low:

# Upgrade Bun
curl -fsSL https://bun.sh/install | bash

3. Codex Error "Skilled loading invalid"

Codex's skill description cache has expired. Fix:

# For globally installed Codex
cd ~/.codex/skills/gstack && git pull && ./setup --host codex

# For Vendored installation of Codex
cd "$(readlink -f .agents/skills/gstack)" && git pull && ./setup --host codex

4. Windows Compatibility

gstack on Windows relies on WSL or Git Bash (not PowerShell). Ensure:

# Run in Git Bash or WSL
# Both Bun and Node must be in PATH
which bun   # Should have output
which node  # Should have output

On Windows, Playwright has a known bug (bun#4253); gstack will automatically fall back to Node.js execution.

Cookie import is supported only on macOS (using system Keychain). Linux and Windows are currently not supported.

If you need to test pages requiring login on Linux, you can manually export cookies as JSON:

[
  {"name": "session_token", "value": "...", "domain": ".example.com", "path": "/"}
]

Then use:

$B cookie-import /path/to/cookies.json

6. Version Upgrades

# Option 1: Run the upgrade skill
/gstack-upgrade

# Option 2: Manually pull latest code
cd ~/.claude/skills/gstack && git pull && ./setup

gstack-upgrade will detect if it's a Global or Vendored installation and handle it accordingly.

Extended Reading / Advanced Directions

ETHOS.md — Core philosophy document of gstack. The "Boil the Lake" principle: AI makes the marginal cost of full implementation near zero; don't settle for "good enough." Also, "Search Before Building": A three-layer knowledge system—verified solutions (Layer 1), trending new solutions (Layer 2), and First Principles (Layer 3).

Autoplan — A single command to chain CEO Review → Design Review → Engineering Review. If you don't want to call skills one by one, use /autoplan; it runs all plans in order, popping up for confirmation at key decision points.

Conductor — gstack's official multi-session parallel management tool. Organize multiple Claude Code instances into a real AI engineering team, suitable for advancing multiple features simultaneously.

Source Code Reading Path:

browse/src/commands.ts — Registry for all browser commands, the single source of truth
browse/src/snapshot.ts — Core implementation of the Ref system (ARIA tree → Playwright Locator)
scripts/gen-skill-docs.ts — SKILL.md automatic generation pipeline
ARCHITECTURE.md — Complete architectural design document

TIP

gstack follows the SKILL.md standard and can be used with any compatible AI Agent (Codex, Cursor, etc.). If you use multiple AI programming tools, ~/.codex/skills/gstack and ~/.claude/skills/gstack can coexist without interference.