Medium difficulty | ~20 minutes | You’ll learn the end-to-end workflow to transform any code repository into a queryable knowledge graph with graphify—no more blind grep. Achieve 71.5x Token compression with structured code understanding.
Project Overview
graphify is an AI coding assistant skill (skill) developed by Safi Shamsi, available on GitHub at safishamsi/graphify. Its core goal is very simple: compress any folder (code, notes, papers, images, even videos) into a single queryable knowledge graph, so every conversation you have with an AI assistant is grounded in structured knowledge—not blind full-text matching.
It stands out in three ways:
- Multi-modal input: Supports code (20 languages), documents, images, PDFs, and audio/video—everything is ingested into the same graph
- Zero embedding clustering: Uses the Leiden community detection algorithm to cluster based on graph topology directly—no vector database, and no need to run separate embedding models
- Honest confidence levels: Each relationship edge is labeled as
EXTRACTED/INFERRED/AMBIGUOUS, so you know which links are found vs. which are inferred
graphify can be used as the /graphify command in popular AI coding assistants like Claude Code, OpenClaw, Codex, Cursor, Trae, and others. It can also be called independently via an MCP Server or the CLI.
Target Audience
You’ve worked on at least one mid-sized project (thousands of lines of code) and regularly use Claude Code / OpenClaw / Codex or other AI coding assistants. When you encounter a new codebase, you’re used to having the assistant read the source directly—but as the number of files grows, Token consumption and context fragmentation become increasingly obvious. This article exists to solve that.
Core Dependencies & Environment
| Dependency | Minimum Version | Notes |
|---|---|---|
| Python | 3.10+ | graphify’s core language |
| AI coding assistant | Any | Claude Code / OpenClaw / Codex / Cursor, etc. |
| Git | Any | git hooks are required |
| pip | Latest | Used to install the PyPI package graphifyy |
TIP
The PyPI package name for graphify is graphifyy (with an extra y), while the CLI command is still graphify. Don’t mix them up—other packages with the graphify* prefix are unrelated to this project.
Full Project Structure
my-project/
├── graphify-out/
│ ├── graph.json # Persistent knowledge graph for cross-session querying
│ ├── GRAPH_REPORT.md # Structured analysis report (god nodes, unexpected connections, suggested questions)
│ ├── graph.html # Interactive graph (vis.js; open directly in the browser)
│ ├── cache/ # SHA256 cache directory
│ └── transcripts/ # Audio/video transcription cache (only if you install video dependencies)
├── .graphifyignore # Exclusion rules (same syntax as .gitignore)
└── .git/hooks/ # Automatically generated after graphify hook install
Step-by-Step
Step 1 — Install graphify
graphify has two parts: install the Python package, then install the platform integration layer for your AI coding assistant.
# Step 1: Install the PyPI package
pip install graphifyy
# Step 2: Install the platform integration layer
# Example for Claude Code:
graphify install
# If you’re on another platform, use the corresponding command:
# graphify install --platform codex
# graphify install --platform opencode
# graphify install --platform claw # OpenClaw
# graphify install --platform aider
# graphify install --platform droid # Factory Droid
# graphify install --platform trae
# graphify install --platform cursor
# graphify install --platform gemini
# On Windows, if you run into issues, explicitly specify the platform:
graphify install --platform windows
WARNING
graphify install needs to write configuration files (e.g., Claude Code’s settings.json). Make sure the current directory or user directory is writable. If you run inside a Docker container, mount a persistent directory before installing.
After installation, your AI coding assistant will recognize the /graphify command.
Step 2 — First Run: Generate the Knowledge Graph
Go to any code directory you want to analyze and run:
# Analyze current directory
/graphify .
# Analyze a specific directory
/graphify ./src
# Analyze a directory and enable a more aggressive inference mode
/graphify ./src --mode deep
# Skip HTML visualization; generate only report and JSON (faster)
/graphify ./src --no-viz
graphify follows a three-step process:
- AST Extraction (no LLM) — uses tree-sitter to parse code files and extract classes, functions, import relationships, call graphs, docstrings, and explanatory comments
- Semantic Extraction (LLM call) — for documents, papers, images, etc., call Claude to extract concepts, relationships, and design motivations
- Graph Construction & Clustering — merge all extracted results and cluster them using the Leiden community detection algorithm (based on edge density; no embeddings needed)
When the run finishes, you’ll see three outputs in the graphify-out/ directory:
ls graphify-out/
# graph.json GRAPH_REPORT.md graph.html cache/
Step 3 — How to Use the Three Outputs
graphify-out/graph.json — the persisted graph itself
This is the core of the whole system. The format is a standard NetworkX JSON export:
{
"nodes": [
{
"id": "DigestAuth",
"label": "DigestAuth",
"source_file": "src/auth.py",
"source_location": "L42",
"community": "auth"
}
],
"edges": [
{
"source": "DigestAuth",
"target": "Response",
"relation": "imports",
"confidence": "EXTRACTED",
"confidence_score": 1.0
},
{
"source": "Attention",
"target": "Adam",
"relation": "semantically_similar_to",
"confidence": "INFERRED",
"confidence_score": 0.87
}
]
}
Every edge has a confidence label:
EXTRACTED: relationships that exist directly in the source code (confidence 1.0)INFERRED: reasonable inferences, with aconfidence_score(0.0–1.0)AMBIGUOUS: uncertain relationships, marked in the report for manual review
graphify-out/GRAPH_REPORT.md — a structured analysis report
This is the assistant-friendly summary, including:
- God Nodes: the most central core concepts (all other nodes must pass through them)
- Surprising Connections: unexpected cross-file/cross-type links, with an explanation of the “why”
- Suggested Questions: 4–5 questions that the graph can answer uniquely
- Design Rationale: design motivations extracted from docstrings and comments (
# NOTE:,# WHY:,# HACK:)
graphify-out/graph.html — the interactive graph
Open it directly in your browser. It supports clicking nodes, searching, and filtering by community. Great for manual exploration of code structure.
Step 4 — Query the Graph: Three Main Commands
A graph isn’t enough—you need to be able to query it. graphify includes three query commands:
query — semantic query
# Most common: find nodes and paths related to your question
graphify query "what connects Attention to the optimizer?"
# Limit your Token budget
graphify query "show the auth flow" --budget 1500
# Use DFS to trace the specific path (not random sampling)
graphify query "what connects DigestAuth to Response?" --dfs
# Specify a non-default graph path
graphify query "..." --graph path/to/another-graph.json
The output includes node labels, edge types, confidence, source files, and source locations. You can paste this output directly to your AI assistant and ask it to answer your question:
Use this graph query output to answer the question. Prefer the graph
structure over guessing, and cite the source files when possible.
path — find a path between two nodes
# Trace the full path from node A to node B
graphify path "DigestAuth" "Response"
explain — explain a single node’s context
# Show a node’s neighbors, community, and related edges
graphify explain "SwinTransformer"
TIP
If your AI coding assistant supports MCP (Model Context Protocol), you can skip the CLI and use MCP so the AI can access the graph natively. See Step 8.
Step 5 — Integrate an Always-On Hook (Recommended)
Typing /graphify manually every time is a bit annoying. graphify supports automatically injecting graph knowledge into your AI assistant’s conversation context—so whenever you ask a question, it will prioritize the graph instead of blindly searching files.
Claude Code
graphify claude install
It does two things: writes a set of rules to CLAUDE.md (so Claude reads GRAPH_REPORT.md), and installs a PreToolUse hook in settings.json that injects graph prompts before every Glob and Grep call:
graphify: Knowledge graph exists. Read GRAPH_REPORT.md for god nodes
and community structure before searching raw files.
OpenClaw / Aider / Trae, etc.
These platforms don’t support tool hooks, so use AGENTS.md instead:
graphify claw install # OpenClaw
graphify aider install # Aider
graphify trae install # Trae
Cursor
graphify cursor install
Writes to .cursor/rules/graphify.mdc and sets alwaysApply: true. Cursor will automatically load it in every conversation.
To uninstall, use the corresponding uninstall commands:
graphify claude uninstall
graphify cursor uninstall
# ...
Step 6 — Incremental Updates: Stop Rebuilding Everything Every Time
Codebases change every day; rebuilding everything from scratch is too slow. graphify provides two incremental mechanisms:
--update: incremental extraction based on SHA256 cache
# Only process files that changed, merge them into the existing graph
/graphify ./src --update
In the cache directory, file-content hashes record the results from the last run. graphify compares file hashes and reruns the AST + LLM extraction pipeline only for changed files.
--watch: automatically monitor file changes
# Run in the background; when code files are saved, AST rebuild triggers immediately
/graphify ./src --watch
# After document/image changes, you’ll be prompted to manually run --update
Rebuild triggered by code changes is pure AST, so it’s very fast and does not require calling the LLM. Document and image changes still need LLM re-extraction, so watch will notify you to run --update manually.
Step 7 — Git Hooks: Rebuild the Graph on Every Commit
Don’t want to run it manually, and don’t want a background process either? Hand the rebuild over to git itself:
graphify hook install
This writes two hooks—post-commit and post-checkout—into .git/hooks/. Every time you commit or switch branches, git automatically triggers graph rebuilding. If the rebuild fails (AST parsing errors, LLM timeouts, etc.), the hook exits with a non-zero code to stop the git operation and prevent silent failures.
Check status and uninstall:
graphify hook status
graphify hook uninstall
Step 8 — Advanced Usage
MCP Server: Native graph access by the AI
If your AI assistant supports the MCP protocol, expose the graph as an MCP tool:
python -m graphify.serve graphify-out/graph.json
The output is an MCP stdio configuration—paste it into your AI assistant’s MCP configuration. The exposed tools include:
query_graph: semantic query subgraphget_node: fetch node detailsget_neighbors: fetch neighbor nodesshortest_path: find the shortest path between two nodes
Neo4j Export
Want to import the graph into Neo4j for more advanced graph analytics?
# Generate Cypher scripts (manual import)
/graphify ./src --neo4j
# Or push directly to a running Neo4j instance
/graphify ./src --neo4j-push bolt://localhost:7687
Generate Wiki Pages Readable by Agents
# Export to a Markdown wiki so AI agents can navigate the knowledge base by reading files
/graphify ./src --wiki
Generates graphify-out/wiki/index.md (the entry point) + one Markdown article for each community and each god node.
Supported File Types at a Glance
| Type | Extensions | Extraction Method |
|---|---|---|
| Code | .py .ts .js .go .rs .java .c .cpp .rb .cs .kt .scala .php .swift .lua .zig .ps1 .ex .m .jl | tree-sitter AST |
| Documents | .md .txt .rst | Claude semantic extraction |
| Images | .png .jpg .webp | Claude Vision |
| PDFs | .pdf | Claude extraction (requires pip install graphifyy[pdf]) |
| Audio/Video | .mp4 .mp3 .wav | Local Whisper transcription (requires pip install graphifyy[video]) |
| Video URL | YouTube, etc. | yt-dlp downloads audio → Whisper transcribes |
Specify Alternative LLM Providers
graphify uses the model built into your platform by default (Claude Code uses Anthropic; Codex uses OpenAI). If you want to force a different model, specify it via environment variables (refer to the corresponding skill.md file).
Troubleshooting
1. After installation, graphify command is not found
Usually it’s a Python environment or PATH issue:
# Verify the package install location
pip show graphifyy
# Use python -m (doesn’t rely on PATH)
python -m graphify --help
# Or install with pipx in isolation (recommended)
pip install pipx && pipx install graphifyy
2. Very few or almost no graph nodes
Check whether .graphifyignore accidentally excludes your target files:
# See which files graphify actually scanned
/graphify ./src --no-viz # Add --no-viz to run only AST (no LLM), which is fast
# Check .graphifyignore syntax (identical to .gitignore)
cat .graphifyignore
It could also be that your file types aren’t in the supported list—confirm that your code file extensions are included in the “Supported File Types” table above.
3. LLM extraction stage times out or returns an API error
This is usually due to network issues or missing/misconfigured API keys:
# Confirm the API key exists (Claude Code users are often auto-configured)
# For other platforms, check the corresponding platform’s API key environment variable
# Use --budget to limit Tokens and avoid overly large single requests
/graphify ./src --budget 2000
# After a timeout, force skipping and continue processing other files (won’t block the whole run)
4. Incremental updates produce an inconsistent graph due to cache
Your SHA256 cache may be corrupted, or files were modified externally:
# Clear cache and force a full rebuild
rm -rf graphify-out/cache
/graphify ./src
# Or clear only the cache for specific files
rm graphify-out/cache/<corresponding-hash>.json
/graphify ./src --update
5. Claude Code PreToolUse hook doesn’t take effect
Typically due to settings.json path or formatting issues:
# Check whether hooks were written successfully
cat ~/.claude/settings.json | grep graphify
# If the format was modified by your IDE, manually add back the graphify hook section
# Refer to the PreToolUse hook format in graphify skill.md
6. --watch reports missing watchdog
watchdog is an optional dependency:
pip install graphifyy[watch]
Further Reading / Advanced Directions
1. Custom Extractor: Add support for a new language
If your codebase uses a language graphify doesn’t support yet (e.g., Racket or Nim), you can add a new extract_<lang> function in extract.py. Implement it based on tree-sitter AST parsing and follow the “Adding a new language extractor” section in ARCHITECTURE.md. The workflow is clear: write the function → register the suffix → add dependencies → write tests.
2. GraphRAG Pipeline: Connect graph queries to RAG
graphify’s core value is that queries don’t require reading the original files again—Token usage drops from O(n×file size) to O(subgraph size). When integrating with a RAG pipeline, a suggested approach is:
import json
# 1. Use GRAPH_REPORT.md for high-level intent assessment
# 2. Then use graphify query to fetch the relevant subgraph
# 3. Feed the subgraph as context to the LLM
with open("graphify-out/graph.json") as f:
graph = json.load(f)
# graph["nodes"] and graph["edges"] are the context
3. Penpax: the side-edge digital twin vision from graphify’s author
graphify’s long-term roadmap is Penpax—an on-device digital twin project that connects meeting notes, browser history, files, emails, and code into a continuously updated knowledge graph, with no cloud upload and no training your data to train a model.
4. Multi-modal expansion: Turn video courses into a graph
If you have technical talks, Conference videos, or podcast audio, you can use the --video dependency to ingest them into the knowledge graph:
pip install 'graphifyy[video]'
/graphify ./corpus --whisper-model medium
Whisper runs locally; audio never leaves your machine. Transcription results are cached in graphify-out/transcripts/, and repeated runs will read from the cache.
5. Graph collaboration: Share knowledge structure across a team
graphify outputs (graph.json, GRAPH_REPORT.md) are plain text, so they can be committed to git. After team members pull the changes, graphify hook install will automatically reuse the same graph. This means architectural knowledge becomes versionable; during code review, you can reference graph nodes, and in PR descriptions you can directly cite god nodes. From then on, your team’s understanding no longer depends on “who asks whom.”