MemPalace Deployment & Hands-On: Build an AI Memory System You’ll Never Forget

April 8, 2026

TIP

MemPalace was co-developed by actress Milla Jovovich (from The Fifth Element) and her technical partner Ben Sigman. It’s a local AI memory system. In the LongMemEval benchmark, it achieves 96.6% R@5 in raw verbatim mode, with no external API calls throughout—completely free. GitHub: milla-jovovich/mempalace

Intermediate · About 20 minutes · You’ll master the full MemPalace deployment workflow, understand the Palace layer structure, and learn how to connect the memory system to any AI using MCP.


Target Audience

1–3 years of experience developers who have used AI coding tools like Claude Code / Cursor / Copilot. You want your AI to retain contextual memory across long-running projects—so it doesn’t start from scratch every time.


Core Dependencies & Environment

  • Python: 3.9+
  • Core dependencies: chromadb>=0.5.0,<0.7, pyyaml>=6.0
  • Install: pip install mempalace or uv pip install mempalace
  • Operating System: macOS / Linux / Windows (WSL works too)
  • Storage: ChromaDB (vector store) + SQLite (knowledge graph), all stored locally—no network required

WARNING

MemPalace is designed to run purely locally; your data never leaves your machine. However, during the first installation it installs ChromaDB via pip. ChromaDB itself does not require internet access—just make sure you can import it normally.


Full Project Structure

mempalace/
├── mempalace/                 # Core Python package
│   ├── cli.py                 # CLI entry point, routes to mine/search/init, etc.
│   ├── mcp_server.py          # MCP server, exposes 19 tools
│   ├── knowledge_graph.py     # Temporal knowledge graph (SQLite)
│   ├── palace_graph.py        # Palace navigation graph (BFS traversal, tunnel discovery)
│   ├── convo_miner.py         # Conversation mining, split by Q+A
│   ├── miner.py               # Project file mining, split by paragraphs
│   ├── searcher.py            # Semantic search (ChromaDB)
│   ├── normalize.py           # Normalize 5 dialogue formats
│   ├── dialect.py             # AAAK compressed dialect
│   ├── layers.py              # Four-layer memory stack (L0–L3)
│   ├── onboarding.py          # Initialization onboarding
│   ├── entity_detector.py     # Automatically detects person/project names
│   └── split_mega_files.py    # Split/merge large session files
├── hooks/                     # Claude Code auto-save hooks
│   ├── mempal_save_hook.sh     # Auto-save every 15 messages
│   └── mempal_precompact_hook.sh  # Emergency save before context compression
├── benchmarks/               # Reproducible benchmarks (LongMemEval / LoCoMo)
│   ├── longmemeval_bench.py
│   ├── locomo_bench.py
│   └── BENCHMARKS.md
└── examples/
    ├── basic_mining.py
    └── mcp_setup.md

Step-by-Step

Step 1: Install

pip install mempalace

The minimum Python version is 3.9. After installation, confirm it works:

mempalace --version

TIP

If you use uv: uv pip install mempalace, the result is exactly the same.


Step 2: Initialize the Memory Palace

mempalace init ~/projects/myapp

The init command starts the guided flow and asks you, in order:

  • The people you often collaborate with (add them to the wing config)
  • The project you’re working on (one wing per project)
  • Your AI identity (writes to the L0 layer)

After onboarding, it generates two configuration files:

  • ~/.mempalace/config.json — global config (palace path, etc.)
  • ~/.mempalace/wing_config.json — wing and keyword mapping

The generated wing_config.json looks roughly like this:

{
  "default_wing": "wing_general",
  "wings": {
    "wing_kai": { "type": "person", "keywords": ["kai", "kai's"] },
    "wing_driftwood": { "type": "project", "keywords": ["driftwood", "analytics", "saas"] }
  }
}

Every time the AI starts, it only loads L0 + L1 (about 170 tokens), and it already knows what your world looks like.


Step 3: Mine Data

MemPalace supports two mining modes—choose based on your data source.

Mode A: Mine Project Files (code, documents, notes)

mempalace mine ~/projects/myapp

The miner recursively scans the directory, splits content by paragraphs, stores it in ChromaDB, and keeps the original content in the Drawer.

Mode B: Mine Conversation Exports (Claude/ChatGPT/Slack)

# Basic usage
mempalace mine ~/chats/ --mode convos

# Specify a wing to make later project filtering easier
mempalace mine ~/chats/ --mode convos --wing myapp

# Enable auto-classification (extract decision-making, preferences, milestones, questions, and emotional context)
mempalace mine ~/chats/ --mode convos --extract general

The convo_miner splits conversations by Q+A, and automatically detects which room it belongs to (via 70+ matching modes in room_detector_local.py; no API needed).

TIP

If your ChatGPT/Claude export file contains multiple merged sessions, split it into single-session files first using mempalace split ~/chats/; mining results will be better.


Step 4: Semantic Search Validation

After mining, try searching:

mempalace search "why did we switch to GraphQL"

Add wing filtering to search only within a specific project:

mempalace search "auth decision" --wing driftwood

Make it even more precise by adding room filtering:

mempalace search "auth decision" --wing driftwood --room auth-migration

The results are the original text from the Drawer (verbatim)—no summaries, no information loss. ChromaDB performs vector search, while Closet provides structured summaries.


Step 5: Connect the MCP Server

MCP (Model Context Protocol) makes MemPalace available as a tool to any AI. Configure once—effective permanently.

Connect Claude Code:

claude mcp add mempalace -- python -m mempalace.mcp_server

After configuration, Claude Code automatically gets 19 tools. The AI will call mempalace_search itself when needed—you don’t have to search manually.

Connect the Gemini CLI:

# See examples/gemini_cli_setup.md
claude mcp add mempalace -- python -m mempalace.mcp_server

Gemini CLI has more complete support for MCP, and save hooks can also be configured automatically.

MCP Tool List (19 tools):

ToolPurpose
mempalace_statusReturn a full view of the Palace + AAAK protocol
mempalace_list_wingsList all wings and memory counts
mempalace_list_roomsList rooms inside a given wing
mempalace_searchSemantic search with wing/room filtering
mempalace_kg_queryQuery temporal relations of entities
mempalace_kg_addAdd fact triples
mempalace_kg_invalidateInvalidate a fact
mempalace_kg_timelineGenerate a temporal story for an entity
mempalace_diary_writeAgent writes an AAAK diary
mempalace_diary_readAgent reads an AAAK diary
mempalace_traverseBFS traverse a wing
mempalace_find_tunnelsDiscover tunnels across wings
......

From the mempalace_status output, the AI automatically learns the AAAK syntax and memory protocol—no prompt configuration required.


Step 6: Configure Claude Code Auto-Save Hooks

Claude Code’s Hooks let MemPalace automatically save memories during every conversation.

Edit ~/.claude/settings.json (Claude Code global config) and add:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/mempalace/hooks/mempal_save_hook.sh"
          }
        ]
      }
    ],
    "PreCompact": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/mempalace/hooks/mempal_precompact_hook.sh"
          }
        ]
      }
    ]
  }
}

Difference between the two hooks:

  • Stop: Triggers once every 15 messages. Performs structured saving—everything about the topic, decisions, references, and code changes is recorded, and it also rebuilds the L1 layer (key facts layer).
  • PreCompact: Triggers before context compression. It urgently rescues any memory that hasn’t been saved yet, preventing important context from being lost during compression.

WARNING

The hook scripts include shell call paths. After cloning, place them in a fixed location and write those paths into the config. The scripts themselves don’t involve dangerous operations—they only write structured memories into ChromaDB.


Step 7: Understand the Palace Structure

MemPalace’s core abstraction is the “memory palace”—borrowing memory techniques from ancient Greek orators, using spatial structure instead of a flat search index.

  WING: kai (person)

    ┌──────────┐  ──hall──  ┌──────────┐
    │ auth-mig │            │ security  │
    └────┬─────┘            └──────────┘
         │
         ▼
    ┌──────────┐      ┌──────────┐
    │  Closet  │ ───▶ │  Drawer  │  ← original text exists here
    └──────────┘      └──────────┘

  TUNNEL (cross-wing connections):
  kai/auth-mig  ←→  driftwood/auth-mig  ←→  priya/auth-mig

Wings: Either a person or a project; this is the main category for memories. Each wing can contain multiple rooms.

Rooms: Specific topics within a wing, such as auth-migration, ci-pipeline, pricing. If a room with the same name appears in different wings, tunnels are generated automatically.

Halls: Corridors for different memory types. Every wing has the same hall set:

  • hall_facts — locked-in decisions
  • hall_events — sessions, milestones, and debugging processes
  • hall_discoveries — breakthroughs and new insights
  • hall_preferences — habits, preferences, opinions
  • hall_advice — recommendations and solutions

Closets: The summary layer. It points to where the original content lives (the Drawer). Original text is never lost—this just adds a navigable structure.

Drawers: Where the original text is stored. The raw verbatim mode in MemPalace reads original content from here for vector search, achieving 96.6% R@5.


Step 8: Use Knowledge Graph Temporal Relations

ChromaDB stores vectors of the original text. The Knowledge Graph (SQLite) stores structured fact triples. They complement each other.

from mempalace.knowledge_graph import KnowledgeGraph

kg = KnowledgeGraph()

# Add facts with validity windows
kg.add_triple("Kai", "works_on", "Orion", valid_from="2025-06-01")
kg.add_triple("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
kg.add_triple("Maya", "completed", "auth-migration", valid_from="2026-02-01")

# Query what Kai is doing now
print(kg.query_entity("Kai"))
# → [Kai → works_on → Orion (current)]

# Query the state on 2026-01-20 (when Maya has not yet completed auth-migration)
print(kg.query_entity("Maya", as_of="2026-01-20"))
# → [Maya → assigned_to → auth-migration]

# View the timeline of the Orion project
print(kg.timeline("Orion"))
# → a chain of facts ordered by time

# Maya switches projects, invalidating the old fact
kg.invalidate("Maya", "assigned_to", "auth-migration", ended="2026-02-01")
# Now query_entity("Maya") no longer returns auth-migration

The validity windows of facts (valid_from / ended) are MemPalace’s key capability. When you query historical states, it tells you “what happened then,” not “what happens now.”


Step 9: The Four-Layer Memory Stack Architecture

MemPalace’s retrieval strategy has four layers: the higher the layer, the lighter the load; the lower the layer, the more precise:

LayerContentSizeWhen loaded
L0AI identity (who you are)~50 tokensEvery session
L1Key facts (team, project, preferences)~120 tokensEvery session
L2Room recall (recent sessions for the current project)On demandWhen a topic hits L2
L3Deep search (full semantic retrieval)On demandWhen explicitly asked

Each time the AI starts, it loads L0 + L1 (mempalace wake-up). With just 170 tokens, it establishes a complete context background. Only when a topic triggers specific rooms does it load L2. Only when you explicitly ask a question does it trigger L3’s full ChromaDB search.

This is also why MemPalace has extremely low cost—$10/year for search vs. $507/year for the summarization approach.


Common Troubleshooting

Q1: Search results are empty, but you’re sure the content exists

Troubleshoot in three steps:

# 1. Confirm wing and room names are correct
mempalace list-wings
mempalace list-rooms --wing myapp

# 2. Broaden the scope—search without specifying wing/room
mempalace search "keyword"   # without --wing

# 3. Check whether ChromaDB was actually written to
mempalace status            # see whether drawer total count is 0

If mempalace status shows 0 drawers, mining didn’t succeed. The conversation file format may not be supported. Currently supported formats include: Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, and plain text.

Q2: ChromaDB collection name conflicts

The default collection name is mempalace_drawers. If you run init multiple times or in different directories, conflicts may occur. Explicitly set the path in ~/.mempalace/config.json:

{
  "palace_path": "/custom/path/to/palace",
  "collection_name": "mempalace_drawers"
}

Then override with --palace <path>:

mempalace search "query" --palace /custom/path/to/palace

Q3: MCP connection fails

First manually verify the MCP service starts correctly:

python -m mempalace.mcp_server
# Normally it outputs nothing and keeps running in the foreground
# Ctrl+C to exit

If you see ModuleNotFoundError, check whether MemPalace is installed correctly:

pip show mempalace

If you’re using a virtual environment, confirm that the Python path in Claude Code’s MCP config is correct:

which python   # get the correct python path
claude mcp add mempalace -- /path/to/python -m mempalace.mcp_server

Q4: MCP tool calls work, but results aren’t as expected

When the AI calls mempalace_search, the wing/room parameters must match precisely to get the maximum benefit from the Palace structure. In your prompt, guide the AI to use the correct filters:

When searching for project-specific memories, always pass --wing <project>.
When searching for a specific topic, always pass --room <room-name>.

Q5: Hook scripts aren’t triggering

# Check whether Claude Code hooks are enabled
claude doctor

Make sure the hooks paths in settings.json are absolute paths. Relative paths may fail to resolve in Claude Code due to different working directories.

Q6: Knowledge graph temporal queries return unexpected results

Temporal queries depend on the as_of parameter format, which must be YYYY-MM-DD:

# Wrong format
kg.query_entity("Kai", as_of="2026/03/01")

# Correct format
kg.query_entity("Kai", as_of="2026-03-01")

Also confirm that when adding facts with add_triple, you used the correct format for valid_from; otherwise the temporal window won’t take effect.


Further Reading / Advanced Directions

AAAK Experimental Compression Layer

AAAK is a lossy compressed dialect that uses regex replacements to compress repeated entities into code. In large-scale scenarios (the same project mentioned hundreds of times), it can save token costs. However, the current raw verbatim mode (96.6%) still outperforms AAAK mode (84.2%). It’s best suited for long-running projects across many sessions with frequent repeated entities.

Specialist Agents: Isolated Multi-Agent Memory

Each agent has its own independent wing and AAAK diary:

~/.mempalace/agents/
  ├── reviewer.json    # code quality, patterns, bugs
  ├── architect.json   # architecture decisions, trade-offs
  └── ops.json         # deployment, incidents, infrastructure

At runtime, the AI dynamically discovers agents from the palace—no need to write any configuration in CLAUDE.md.

Reproduce Benchmarks

The benchmarks/ directory contains full reproduction scripts for LongMemEval and LoCoMo:

python benchmarks/longmemeval_bench.py

No API key is needed throughout. On an M2 Ultra, it completes 500 questions within 5 minutes, verifying the reproducibility of the 96.6% result.

Horizontal comparison with other systems

SystemLongMemEval R@5API requirementCost
MemPalace (raw)96.6%NoneFree
MemPalace (hybrid + rerank)100%OptionalFree
Mem0~85%Required$19–249/month
Zep~85%Required$25/month+
Mastra94.87%Required (GPT)API costs

MemPalace is the only方案 that reaches the highest score with zero API calls.

Updated April 8, 2026