Graphify in Practice: Turn Any Codebase into a Queryable Knowledge Graph

April 12, 2026

Medium difficulty | ~20 minutes | You’ll learn the end-to-end workflow to transform any code repository into a queryable knowledge graph with graphify—no more blind grep. Achieve 71.5x Token compression with structured code understanding.

Project Overview

graphify is an AI coding assistant skill (skill) developed by Safi Shamsi, available on GitHub at safishamsi/graphify. Its core goal is very simple: compress any folder (code, notes, papers, images, even videos) into a single queryable knowledge graph, so every conversation you have with an AI assistant is grounded in structured knowledge—not blind full-text matching.

It stands out in three ways:

  • Multi-modal input: Supports code (20 languages), documents, images, PDFs, and audio/video—everything is ingested into the same graph
  • Zero embedding clustering: Uses the Leiden community detection algorithm to cluster based on graph topology directly—no vector database, and no need to run separate embedding models
  • Honest confidence levels: Each relationship edge is labeled as EXTRACTED / INFERRED / AMBIGUOUS, so you know which links are found vs. which are inferred

graphify can be used as the /graphify command in popular AI coding assistants like Claude Code, OpenClaw, Codex, Cursor, Trae, and others. It can also be called independently via an MCP Server or the CLI.

Target Audience

You’ve worked on at least one mid-sized project (thousands of lines of code) and regularly use Claude Code / OpenClaw / Codex or other AI coding assistants. When you encounter a new codebase, you’re used to having the assistant read the source directly—but as the number of files grows, Token consumption and context fragmentation become increasingly obvious. This article exists to solve that.

Core Dependencies & Environment

DependencyMinimum VersionNotes
Python3.10+graphify’s core language
AI coding assistantAnyClaude Code / OpenClaw / Codex / Cursor, etc.
GitAnygit hooks are required
pipLatestUsed to install the PyPI package graphifyy

TIP

The PyPI package name for graphify is graphifyy (with an extra y), while the CLI command is still graphify. Don’t mix them up—other packages with the graphify* prefix are unrelated to this project.

Full Project Structure

my-project/
├── graphify-out/
│   ├── graph.json          # Persistent knowledge graph for cross-session querying
│   ├── GRAPH_REPORT.md     # Structured analysis report (god nodes, unexpected connections, suggested questions)
│   ├── graph.html          # Interactive graph (vis.js; open directly in the browser)
│   ├── cache/              # SHA256 cache directory
│   └── transcripts/        # Audio/video transcription cache (only if you install video dependencies)
├── .graphifyignore         # Exclusion rules (same syntax as .gitignore)
└── .git/hooks/            # Automatically generated after graphify hook install

Step-by-Step

Step 1 — Install graphify

graphify has two parts: install the Python package, then install the platform integration layer for your AI coding assistant.

# Step 1: Install the PyPI package
pip install graphifyy

# Step 2: Install the platform integration layer
# Example for Claude Code:
graphify install

# If you’re on another platform, use the corresponding command:
# graphify install --platform codex
# graphify install --platform opencode
# graphify install --platform claw        # OpenClaw
# graphify install --platform aider
# graphify install --platform droid       # Factory Droid
# graphify install --platform trae
# graphify install --platform cursor
# graphify install --platform gemini

# On Windows, if you run into issues, explicitly specify the platform:
graphify install --platform windows

WARNING

graphify install needs to write configuration files (e.g., Claude Code’s settings.json). Make sure the current directory or user directory is writable. If you run inside a Docker container, mount a persistent directory before installing.

After installation, your AI coding assistant will recognize the /graphify command.

Step 2 — First Run: Generate the Knowledge Graph

Go to any code directory you want to analyze and run:

# Analyze current directory
/graphify .

# Analyze a specific directory
/graphify ./src

# Analyze a directory and enable a more aggressive inference mode
/graphify ./src --mode deep

# Skip HTML visualization; generate only report and JSON (faster)
/graphify ./src --no-viz

graphify follows a three-step process:

  1. AST Extraction (no LLM) — uses tree-sitter to parse code files and extract classes, functions, import relationships, call graphs, docstrings, and explanatory comments
  2. Semantic Extraction (LLM call) — for documents, papers, images, etc., call Claude to extract concepts, relationships, and design motivations
  3. Graph Construction & Clustering — merge all extracted results and cluster them using the Leiden community detection algorithm (based on edge density; no embeddings needed)

When the run finishes, you’ll see three outputs in the graphify-out/ directory:

ls graphify-out/
# graph.json  GRAPH_REPORT.md  graph.html  cache/

Step 3 — How to Use the Three Outputs

graphify-out/graph.json — the persisted graph itself

This is the core of the whole system. The format is a standard NetworkX JSON export:

{
  "nodes": [
    {
      "id": "DigestAuth",
      "label": "DigestAuth",
      "source_file": "src/auth.py",
      "source_location": "L42",
      "community": "auth"
    }
  ],
  "edges": [
    {
      "source": "DigestAuth",
      "target": "Response",
      "relation": "imports",
      "confidence": "EXTRACTED",
      "confidence_score": 1.0
    },
    {
      "source": "Attention",
      "target": "Adam",
      "relation": "semantically_similar_to",
      "confidence": "INFERRED",
      "confidence_score": 0.87
    }
  ]
}

Every edge has a confidence label:

  • EXTRACTED: relationships that exist directly in the source code (confidence 1.0)
  • INFERRED: reasonable inferences, with a confidence_score (0.0–1.0)
  • AMBIGUOUS: uncertain relationships, marked in the report for manual review

graphify-out/GRAPH_REPORT.md — a structured analysis report

This is the assistant-friendly summary, including:

  • God Nodes: the most central core concepts (all other nodes must pass through them)
  • Surprising Connections: unexpected cross-file/cross-type links, with an explanation of the “why”
  • Suggested Questions: 4–5 questions that the graph can answer uniquely
  • Design Rationale: design motivations extracted from docstrings and comments (# NOTE:, # WHY:, # HACK:)

graphify-out/graph.html — the interactive graph

Open it directly in your browser. It supports clicking nodes, searching, and filtering by community. Great for manual exploration of code structure.

Step 4 — Query the Graph: Three Main Commands

A graph isn’t enough—you need to be able to query it. graphify includes three query commands:

query — semantic query

# Most common: find nodes and paths related to your question
graphify query "what connects Attention to the optimizer?"

# Limit your Token budget
graphify query "show the auth flow" --budget 1500

# Use DFS to trace the specific path (not random sampling)
graphify query "what connects DigestAuth to Response?" --dfs

# Specify a non-default graph path
graphify query "..." --graph path/to/another-graph.json

The output includes node labels, edge types, confidence, source files, and source locations. You can paste this output directly to your AI assistant and ask it to answer your question:

Use this graph query output to answer the question. Prefer the graph
structure over guessing, and cite the source files when possible.

path — find a path between two nodes

# Trace the full path from node A to node B
graphify path "DigestAuth" "Response"

explain — explain a single node’s context

# Show a node’s neighbors, community, and related edges
graphify explain "SwinTransformer"

TIP

If your AI coding assistant supports MCP (Model Context Protocol), you can skip the CLI and use MCP so the AI can access the graph natively. See Step 8.

Typing /graphify manually every time is a bit annoying. graphify supports automatically injecting graph knowledge into your AI assistant’s conversation context—so whenever you ask a question, it will prioritize the graph instead of blindly searching files.

Claude Code

graphify claude install

It does two things: writes a set of rules to CLAUDE.md (so Claude reads GRAPH_REPORT.md), and installs a PreToolUse hook in settings.json that injects graph prompts before every Glob and Grep call:

graphify: Knowledge graph exists. Read GRAPH_REPORT.md for god nodes
and community structure before searching raw files.

OpenClaw / Aider / Trae, etc.

These platforms don’t support tool hooks, so use AGENTS.md instead:

graphify claw install   # OpenClaw
graphify aider install  # Aider
graphify trae install   # Trae

Cursor

graphify cursor install

Writes to .cursor/rules/graphify.mdc and sets alwaysApply: true. Cursor will automatically load it in every conversation.

To uninstall, use the corresponding uninstall commands:

graphify claude uninstall
graphify cursor uninstall
# ...

Step 6 — Incremental Updates: Stop Rebuilding Everything Every Time

Codebases change every day; rebuilding everything from scratch is too slow. graphify provides two incremental mechanisms:

--update: incremental extraction based on SHA256 cache

# Only process files that changed, merge them into the existing graph
/graphify ./src --update

In the cache directory, file-content hashes record the results from the last run. graphify compares file hashes and reruns the AST + LLM extraction pipeline only for changed files.

--watch: automatically monitor file changes

# Run in the background; when code files are saved, AST rebuild triggers immediately
/graphify ./src --watch

# After document/image changes, you’ll be prompted to manually run --update

Rebuild triggered by code changes is pure AST, so it’s very fast and does not require calling the LLM. Document and image changes still need LLM re-extraction, so watch will notify you to run --update manually.

Step 7 — Git Hooks: Rebuild the Graph on Every Commit

Don’t want to run it manually, and don’t want a background process either? Hand the rebuild over to git itself:

graphify hook install

This writes two hooks—post-commit and post-checkout—into .git/hooks/. Every time you commit or switch branches, git automatically triggers graph rebuilding. If the rebuild fails (AST parsing errors, LLM timeouts, etc.), the hook exits with a non-zero code to stop the git operation and prevent silent failures.

Check status and uninstall:

graphify hook status
graphify hook uninstall

Step 8 — Advanced Usage

MCP Server: Native graph access by the AI

If your AI assistant supports the MCP protocol, expose the graph as an MCP tool:

python -m graphify.serve graphify-out/graph.json

The output is an MCP stdio configuration—paste it into your AI assistant’s MCP configuration. The exposed tools include:

  • query_graph: semantic query subgraph
  • get_node: fetch node details
  • get_neighbors: fetch neighbor nodes
  • shortest_path: find the shortest path between two nodes

Neo4j Export

Want to import the graph into Neo4j for more advanced graph analytics?

# Generate Cypher scripts (manual import)
/graphify ./src --neo4j

# Or push directly to a running Neo4j instance
/graphify ./src --neo4j-push bolt://localhost:7687

Generate Wiki Pages Readable by Agents

# Export to a Markdown wiki so AI agents can navigate the knowledge base by reading files
/graphify ./src --wiki

Generates graphify-out/wiki/index.md (the entry point) + one Markdown article for each community and each god node.

Supported File Types at a Glance

TypeExtensionsExtraction Method
Code.py .ts .js .go .rs .java .c .cpp .rb .cs .kt .scala .php .swift .lua .zig .ps1 .ex .m .jltree-sitter AST
Documents.md .txt .rstClaude semantic extraction
Images.png .jpg .webpClaude Vision
PDFs.pdfClaude extraction (requires pip install graphifyy[pdf])
Audio/Video.mp4 .mp3 .wavLocal Whisper transcription (requires pip install graphifyy[video])
Video URLYouTube, etc.yt-dlp downloads audio → Whisper transcribes

Specify Alternative LLM Providers

graphify uses the model built into your platform by default (Claude Code uses Anthropic; Codex uses OpenAI). If you want to force a different model, specify it via environment variables (refer to the corresponding skill.md file).

Troubleshooting

1. After installation, graphify command is not found

Usually it’s a Python environment or PATH issue:

# Verify the package install location
pip show graphifyy

# Use python -m (doesn’t rely on PATH)
python -m graphify --help

# Or install with pipx in isolation (recommended)
pip install pipx && pipx install graphifyy

2. Very few or almost no graph nodes

Check whether .graphifyignore accidentally excludes your target files:

# See which files graphify actually scanned
/graphify ./src --no-viz  # Add --no-viz to run only AST (no LLM), which is fast

# Check .graphifyignore syntax (identical to .gitignore)
cat .graphifyignore

It could also be that your file types aren’t in the supported list—confirm that your code file extensions are included in the “Supported File Types” table above.

3. LLM extraction stage times out or returns an API error

This is usually due to network issues or missing/misconfigured API keys:

# Confirm the API key exists (Claude Code users are often auto-configured)
# For other platforms, check the corresponding platform’s API key environment variable

# Use --budget to limit Tokens and avoid overly large single requests
/graphify ./src --budget 2000

# After a timeout, force skipping and continue processing other files (won’t block the whole run)

4. Incremental updates produce an inconsistent graph due to cache

Your SHA256 cache may be corrupted, or files were modified externally:

# Clear cache and force a full rebuild
rm -rf graphify-out/cache
/graphify ./src

# Or clear only the cache for specific files
rm graphify-out/cache/<corresponding-hash>.json
/graphify ./src --update

5. Claude Code PreToolUse hook doesn’t take effect

Typically due to settings.json path or formatting issues:

# Check whether hooks were written successfully
cat ~/.claude/settings.json | grep graphify

# If the format was modified by your IDE, manually add back the graphify hook section
# Refer to the PreToolUse hook format in graphify skill.md

6. --watch reports missing watchdog

watchdog is an optional dependency:

pip install graphifyy[watch]

Further Reading / Advanced Directions

1. Custom Extractor: Add support for a new language

If your codebase uses a language graphify doesn’t support yet (e.g., Racket or Nim), you can add a new extract_<lang> function in extract.py. Implement it based on tree-sitter AST parsing and follow the “Adding a new language extractor” section in ARCHITECTURE.md. The workflow is clear: write the function → register the suffix → add dependencies → write tests.

2. GraphRAG Pipeline: Connect graph queries to RAG

graphify’s core value is that queries don’t require reading the original files again—Token usage drops from O(n×file size) to O(subgraph size). When integrating with a RAG pipeline, a suggested approach is:

import json

# 1. Use GRAPH_REPORT.md for high-level intent assessment
# 2. Then use graphify query to fetch the relevant subgraph
# 3. Feed the subgraph as context to the LLM

with open("graphify-out/graph.json") as f:
    graph = json.load(f)

# graph["nodes"] and graph["edges"] are the context

3. Penpax: the side-edge digital twin vision from graphify’s author

graphify’s long-term roadmap is Penpax—an on-device digital twin project that connects meeting notes, browser history, files, emails, and code into a continuously updated knowledge graph, with no cloud upload and no training your data to train a model.

4. Multi-modal expansion: Turn video courses into a graph

If you have technical talks, Conference videos, or podcast audio, you can use the --video dependency to ingest them into the knowledge graph:

pip install 'graphifyy[video]'
/graphify ./corpus --whisper-model medium

Whisper runs locally; audio never leaves your machine. Transcription results are cached in graphify-out/transcripts/, and repeated runs will read from the cache.

5. Graph collaboration: Share knowledge structure across a team

graphify outputs (graph.json, GRAPH_REPORT.md) are plain text, so they can be committed to git. After team members pull the changes, graphify hook install will automatically reuse the same graph. This means architectural knowledge becomes versionable; during code review, you can reference graph nodes, and in PR descriptions you can directly cite god nodes. From then on, your team’s understanding no longer depends on “who asks whom.”

Updated April 12, 2026
    Graphify in Practice: Turn Any Codebase into a Queryable Knowledge Graph | OpenClaw API Documentation - Open Source AI Assistant Integration Guide