One-sentence overview: Want to "escape" from ChatGPT/Claude web UI and run a complete AI workstation on your own machine? This guide walks you through Odysseus — a local-first, privacy-first self-hosted alternative to ChatGPT.
Project Overview
Odysseus is a self-hosted AI workspace (Self-hosted AI Workspace) from GitHub user pewdiepie-archdaemon, open-sourced under the MIT license. What it’s trying to do is: give you the UI experience of ChatGPT / Claude running on your own computer, but keep all data, all conversations, and all tool calls on the local machine.
Compared with traditional ChatGPT-like apps, the differences are very clear:
- Chat: Take any local or remote model (vLLM / llama.cpp / Ollama / OpenRouter / OpenAI / GitHub Copilot)
- Agent: A real agent that can actually "take action" using MCP / Web / files / Shell / Skills / memory (based on opencode)
- Cookbook: Automatically scans your machine’s hardware, recommends models that can run, then one-click download + start the serving (based on llmfit, VRAM-aware)
- Deep Research: Multi-step research—fetch + read + synthesize—producing a visualized report (adapted from Tongyi DeepResearch)
- Compare: Blind tests across multiple models to pick the one that’s truly useful—no bias
- Documents: You write text; the AI helps alongside you, not the other way around
- Memory / Skills: Long-term memory + a skills system—the more you use the agent, the more it understands you (ChromaDB + fastembed ONNX)
- Email: An IMAP/SMTP inbox where AI automatically categorizes, summarizes, drafts replies, and fights spam
- Notes & Tasks / Calendar: Notes + tasks + calendar, CalDAV sync with Radicale / Nextcloud / Apple / Fastmail
- PWA support: Mobile experience that doesn’t feel worse than desktop—you can even install it to the home screen
In one line: It’s not just a chat box. It’s your entire AI work environment—fully on your machine, not on the cloud.
Difficulty / Time / What You Get
- Difficulty: ⭐⭐⭐ (Moderate — you need basic familiarity with Docker / Python)
- Time: 30–45 minutes (Docker path); 1 hour (native install + connecting a local model)
- What you get:
- A fully functional self-hosted AI workstation
- A clear understanding of the typical architecture of local-first AI apps (FastAPI + ChromaDB + SearXNG + ntfy)
- Connect Ollama / vLLM / OpenAI-compatible APIs all into one unified interface
- Learn the "long-tail" but awesome capabilities: Cookbook’s automatic model selection, GPU pass-through, and the memory system
Target Readers
- You don’t want to keep paying for ChatGPT Plus / Claude Pro subscriptions and want your AI data to stay local
- You want to run open-source models on NUC / workstation / servers with 8G / 16G / 24G VRAM, and you want one consistent UI
- Full-stack / independent developers who want to combine chat / email / calendar / agent tasks into a single panel
- Ops/platform engineers interested in "self-hosting" who want a production-grade FastAPI app engineering template
- You’ve already paid for cloud LLMs and are looking for a way to do it that’s less painful
If you only want a lightweight chat UI, Odysseus might not be your best fit—its feature set is larger, and it’s a bit more complex to get started than Open WebUI, but the payoff is the whole suite: email, memory, Cookbook, and the agent.
Core Dependencies and Environment
Minimum Requirements:
| Project | Requirement |
|---|---|
| Python | 3.11+ |
| Memory | 2 GB (Web UI only) / 8 GB+ (running local models) |
| Disk | 5 GB (system + dependencies) / 50 GB+ (multiple GGUF models) |
| Docker | 20.10+ (Docker Compose v2 recommended) |
| OS | Linux / macOS / Windows (including WSL2) |
Optional / Enhanced Dependencies:
| Project | Use |
|---|---|
| Ollama | The simplest local model (Windows friendly) |
| vLLM / llama.cpp | High-performance local inference (requires NVIDIA/AMD GPU) |
| NVIDIA Container Toolkit | Run GPU inside Docker |
| Git for Windows | Cookbook backend download/start on Windows |
| Tailscale + mkcert | Securely expose services on LAN / HTTPS |
TIP
Don’t want to mess with GPU? Use the two-legged approach: Ollama + any OpenAI-compatible API—you can be up and running in 5 minutes. GPU is a "bonus", not a "prerequisite".
Full Project Structure Tree
odysseus/
├── app.py # FastAPI entrypoint
├── setup.py # Initialization script (create admin / database / directories)
├── requirements.txt # Core Python dependencies
├── requirements-optional.txt # Optional dependencies (PDF/Office/voice/STT)
├── docker-compose.yml # Default orchestration (CPU)
├── docker-compose.gpu-nvidia.yml # NVIDIA GPU override
├── docker-compose.gpu-amd.yml # AMD ROCm override
├── Dockerfile
├── core/ # Infrastructure layer
│ ├── auth.py # Authentication / sessions
│ ├── database.py # SQLAlchemy initialization
│ ├── middleware.py
│ ├── constants.py
│ └── atomic_io.py
├── src/ # Business logic layer
│ ├── llm_core.py # LLM abstraction
│ ├── agent_loop.py # Agent loop
│ ├── agent_tools.py # Agent tools
│ ├── chat_processor.py # Chat processing
│ ├── cookbook_serve_lifecycle.py # Cookbook model serving lifecycle
│ ├── memory_vector.py # Long-term memory (ChromaDB)
│ ├── deep_research.py # Deep research
│ └── ...
├── routes/ # FastAPI routes (40+ modules)
│ ├── chat_routes.py
│ ├── agent_routes.py
│ ├── cookbook_routes.py # Model recommendations / downloads
│ ├── memory_routes.py
│ ├── email_routes.py
│ ├── calendar_routes.py
│ └── ...
├── services/ # Background services
│ ├── docs/ # Document processing
│ ├── hwfit/ # Hardware scanning (Cookbook)
│ ├── memory/ # Memory service
│ ├── research/ # Research service
│ ├── search/ # Search
│ ├── stt/ tts/ # Speech-to-text / Text-to-speech
│ └── shell/ # Shell tools
├── static/ # Frontend (index.html + JS/CSS)
├── docs/ # Docs site + screenshots + demo
├── config/ # Config for sub-services like SearXNG
├── companion/ # Desktop companion (macOS)
└── data/ # User data (gitignored, generated at runtime)
├── app.db # SQLite
├── chroma/ # Vector database
├── uploads/
├── personal_docs/
└── huggingface/ # Model cache
Inside the repo, you only need to care about app.py / docker-compose.yml / .env. Everything else is handled by Docker.
Step-by-Step Instructions
Step 1: Clone the repository
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
WARNING
The dev branch of the repository is the latest but may be unstable. For production, it’s recommended to switch to the main branch: git checkout main. The instructions below use dev.
Step 2: Prepare .env (Optional, but recommended)
cp .env.example .env
A minimal working .env looks like this:
# === LLM connection points ===
LLM_HOST=localhost
# If you want to use a cloud API, access your local Ollama from inside the container via host.docker.internal
OLLAMA_BASE_URL=http://host.docker.internal:11434/v1
# === Ports and binding ===
APP_BIND=127.0.0.1
APP_PORT=7000
# === Security (critical!) ===
AUTH_ENABLED=true
LOCALHOST_BYPASS=false
SECURE_COOKIES=false # Change to true only when using an HTTPS reverse proxy
ODYSSEUS_ADMIN_USER=admin
# ODYSSEUS_ADMIN_PASSWORD= # If left blank, it will be randomly generated on first startup and printed
# === Search (SearXNG) ===
SEARXNG_INSTANCE=http://searxng:8080
TIP
Ollama cross-host access: When starting Ollama, it must listen on 0.0.0.0 instead of loopback:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Otherwise, the host.docker.internal:11434 inside the container will be rejected by Ollama.
Step 3: One-command Docker start (recommended path)
docker compose up -d --build
During the build, it pulls the base images and installs Python dependencies. The first run usually takes about 3–5 minutes. When you see all four containers—odysseus, chromadb, searxng, and ntfy—are healthy / running, you’re done:
docker compose ps
Open http://localhost:7000. The first screen is the login page. The initial admin password is in the terminal logs:
docker compose logs --tail=200 odysseus | grep -i "temporary\|admin\|password"
You should see something like:
[odysseus] Created admin user 'admin' with temporary password: aB3x-9pQz-2vRt
WARNING
This temporary password is printed only once on the first startup. After you get it, log in immediately and go to Settings → Account to change it to your own password.
Step 4: Verify all services are healthy
# 1. Container status
docker compose ps
# 2. Check Odysseus logs (note: first startup may have delay because models are pulled)
docker compose logs --tail=120 odysseus
# 3. Is ChromaDB really up (not HTTP fallback)
docker compose logs odysseus | grep -E "ChromaDB|MemoryVectorStore|DEGRADED"
If you see DEGRADED, there’s a good chance chromadb-client conflicts with chromadb. Troubleshooting is described later in "Common Issues".
Step 5: Connect your first model
Go to Settings → Models / Providers, then choose a path depending on your setup:
Path A: Ollama is already running (simplest)
Base URL: http://host.docker.internal:11434/v1
API Key: (leave empty)
Model: qwen2.5:7b (or any model you already have locally)
Path B: Use Cookbook to auto-download + start
Go into Cookbook. It scans your VRAM, recommends open-source models you can run, and gives you a score (fit score). Click "Download" and it will use huggingface-cli to pull the model into ./data/huggingface/, then start serving it directly with llama.cpp / vLLM.
Path C: Connect to a cloud API — this is what we’re going to dig into today
Odysseus Provider configuration is protocol-driven. That means any service that correctly responds to either of these endpoints—v1/chat/completions or v1/messages—can be plugged in by filling in the Base URL. This is not limited to OpenAI/Anthropic official services; self-built gateways and third-party relays work the same.
In my deployment, I use cloud APIs for the "daily Agent runs code / long documents" category. Two reasons:
- Running a 70B quantized model on my local 24G GPU is already tough; running multi-step Agents often leads to OOM.
- Frontier models like Claude / GPT still outperform Qwen2.5-Coder for complex reasoning.
But then the problem comes: official APIs are too expensive. Claude Opus alone can easily exceed a hundred dollars in a week—if you’re just playing around, you’re basically working for Anyscale.
My workaround is to connect Defapi, using the endpoint compatible with v1/messages:
# Odysseus Settings → Custom Anthropic Provider
Base URL: https://api.defapi.org
API Key: sk-xxxxx # Get it from the defapi.org dashboard
Model: anthropic/claude-sonnet-4.5 # Or haiku-4.5 / opus-4.6
Example configs: Defapi Claude Sonnet 4.5 (main power for programming/long documents), Claude Haiku 4.5 (preferred for everyday Agent calls—it’s cheaper).
In practice, for the same amount of conversation/tasks, my monthly bill is less than half compared with connecting directly to Anthropic official. Model capability, Agent tool usage, and prompt formatting are no different at all—because this is a protocol-layer proxy; the model is still the same. That naturally matches Odysseus’s "protocol-driven" design.
TIP
With Defapi, you need zero code changes on Odysseus. Tool calls and context management in agent_tools.py remain untouched. In Settings, you can even use Sonnet for "chat", Opus for "Deep Research", and Haiku for "Agent tasks"—each with its own Provider and independent billing.
Path D: Connect GitHub Copilot / OpenRouter, etc.
These are also OpenAI-protocol compatible. Set the Base URL to https://api.githubcopilot.com / https://openrouter.ai/api/v1, respectively. No more details here.
Step 6: First chat + first Agent task
Back on the main UI, select a model and send a simple prompt to test connectivity:
Introduce yourself in three sentences, then give me a Linux performance troubleshooting checklist I can use today.
If that works, the LLM connection path is confirmed.
Next, try Agent mode: create a new session, switch to "Agent", and input:
List the line counts of all .py files in the current directory, then tell me which file is most worth refactoring.
The Agent plans it out:
- Call the
shelltool to runfind . -name "*.py" | xargs wc -l - Make a judgment based on the results
- Provide recommendations
This is the fundamental difference between a local-first Agent and ChatGPT web: it can truly take action, and it has access to your code, files, and terminal.
Step 7 (Optional): Enable GPU
This step is only needed if you want to use "heavyweight" local inference like vLLM / SGLang / llama.cpp CUDA. If you only use Ollama or cloud APIs, skip this.
NVIDIA:
# One-command diagnosis
scripts/check-docker-gpu.sh
# Install NVIDIA Container Toolkit (Ubuntu/Debian, needs sudo)
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Enable GPU pass-through successfully before turning on overlay
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Verify
docker compose exec odysseus nvidia-smi -L
It adds this in .env:
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
AMD ROCm:
scripts/check-docker-amd-gpu.sh
# Write the output RENDER_GID into .env
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989
WARNING
GPU pass-through ≠ installing llama.cpp CUDA. If nvidia-smi in the container can see the card, it only proves device pass-through works. llama.cpp also needs the cudart and the CUDA Toolkit runtime—this is handled by Cookbook → Dependencies where you click to reinstall, not something Docker-layer changes can solve. If the logs say Unable to find cudart library, that’s it.
Step 8 (Optional): Enable Playwright MCP (browser agent)
Odysseus includes several MCP servers, but the browser one requires pulling it with npx first:
npx -y @playwright/mcp@latest --version
Restart the Odysseus container, and the Agent can use the browser MCP (screenshots, navigation, form filling):
Open https://news.ycombinator.com, capture the first 10 titles and links, and summarize them into 5 core topics.
Step 9 (Optional): Run natively on Windows
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
The script automatically creates a venv → installs dependencies → runs setup → starts uvicorn. Open http://localhost:7000.
TIP
Windows local vLLM / SGLang is not supported. If you want to run models locally, install Ollama for Windows, then set the endpoint to http://localhost:11434/v1 in Settings. If you want to use Claude/GPT without local inference, you can also use any cloud API that is OpenAI-compatible.
Step 10 (Optional): Make it usable on your phone
Assuming you’ve connected to Tailscale:
ODYSSEUS_HOST=0.0.0.0 docker compose up -d
# Or set it in .env
APP_BIND=0.0.0.0
But exposing plain HTTP to your LAN / Tailscale is not safe. Strongly recommended: add mkcert HTTPS:
mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 100.x.y.z # your Tailscale IP
python -m uvicorn app:app --host 0.0.0.0 --port 7000 \
--ssl-certfile=cert.pem --ssl-keyfile=key.pem
On your phone, access https://<tailscale-ip>:7000. Install the PWA to your home screen, and the experience is almost the same as a native app.
Troubleshooting FAQs
Q1: Port 7000 is already in use (common on macOS)
On macOS, AirPlay usually uses 7000 by default. Two solutions:
# Method 1: Change the port in .env
APP_PORT=7001
docker compose up -d
# Open http://localhost:7001 in your browser
# Method 2: Apple menu → System Settings → General → AirPlay Receiver → turn it off
Q2: ChromaDB starts, but logs show DEGRADED
Usually this happens when chromadb-client and chromadb are installed at the same time. It silently falls back to HTTP-only mode and the vector capability breaks. Fix:
# In the running container (or your local venv)
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
docker compose restart odysseus
Verify the fix:
docker compose logs odysseus | grep -E "ChromaDB|MemoryVectorStore". You should see successful initialization and no moreDEGRADED.
Q3: Cookbook can’t see my GPU—only sees iGPU or CPU
By default, Docker exposes all GPUs of the host, but if you only mount the iGPU or another card, it’s almost always because the NVIDIA Container Toolkit isn’t installed / configured, or nvidia-ctk runtime configure --runtime=docker hasn’t been run:
# Diagnose
scripts/check-docker-gpu.sh
# Check the output—it tells you which step is missing
# One-click install + config (Ubuntu/Debian)
scripts/check-docker-gpu.sh --install-nvidia-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Q4: GPU pass-through succeeds, but llama.cpp logs say Unable to find cudart
The device is fine; what’s missing is the CUDA runtime. Docker can’t fix this layer—you need to reinstall the CUDA build of llama-cpp-python via Cookbook → Dependencies:
Open Cookbook → Dependencies → find llama-cpp-python → Reinstall (CUDA build)
It will run pip install --no-cache-dir llama-cpp-python[cuda] automatically. It takes about 2–3 minutes.
Q5: Outlook / Office 365 mailbox won’t connect
Odysseus’s email module currently supports only IMAP + username/password authentication. After 2024, Microsoft basically disabled normal password logins and requires OAuth. This is a known limitation for now.
Workarounds: use Gmail / Fastmail / a self-hosted mailbox, or generate and use a Microsoft App Password (for accounts with 2FA) to simulate password login.
Q6: If I want public access, how do I keep it secure?
Directly setting APP_BIND=0.0.0.0 and exposing it to the public internet is a disaster. LOCALHOST_BYPASS bypasses authentication, and the Agent also has shell tools—Shodan can scan and find a bunch of exposed instances within 24 hours.
The right approach:
[Internet] → Cloudflare Access / Tailscale / WireGuard
→ Caddy / Nginx / Traefik (HTTPS termination)
→ http://127.0.0.1:7000 (Odysseus; keep localhost binding)
And set in .env:
AUTH_ENABLED=true
LOCALHOST_BYPASS=false
SECURE_COOKIES=true
Q7: How do you control cloud API billing?
Let’s expand on the issue in Step 5, Path C.
When you play with this as a person, there are three strategies—ranked from highest cost to lowest cost:
- Connect official APIs — best experience, but the most expensive. Claude Opus + frequent Agent calls hitting $100+ per week is normal.
- Connect a protocol-compatible relay (I use Defapi) — same Claude / GPT / Gemini models, and the monthly bill is about one tier lower than official. I didn’t compute the exact difference; roughly, $100 on a weekly basis becomes roughly $100 on a monthly basis. The advantage is full protocol compatibility: Odysseus config stays the same—no need to change any prompt / Agent code.
- Run local quantized models — let Cookbook download a
Q4_K_MGGUF; with 16G VRAM you can run around the 30B class. No API costs, but inference is slower than cloud and complex task quality is a notch lower.
My actual combination: Chat + Deep Research go through Defapi Claude Sonnet 4.5, Agent tool-calling runs Claude Haiku 4.5 (cheap), and simple classification/summarization tasks use a local Qwen2.5 quantized variant. Each Provider charges independently and doesn’t interfere with the others—my monthly total is double-digit RMB.
TIP
Key trick: Don’t use the same model for everything. Odysseus’s Settings lets you assign different Providers to different scenarios. Many people overlook this great feature.
Further Reading / Advanced Directions
- MCP ecosystem: Built-in MCP packages that Odysseus auto-registers include
playwright(browser) /filesystem/shell. You can write your own MCP server, register it undermcp_servers/, and then the Agent can use it. - Skills system:
routes/skills_routes.pymanages reusable "skill snippets", similar to OpenClaw’s Skills—write once, then any session can call them. - Tailscale + mkcert HTTPS: The previous section demonstrated the basic flow; advanced use can add automatic renewal + DNS-01 challenges.
- Reverse proxy combinations: With Caddy configured for HTTPS + a Cloudflare proxy, you can achieve near-zero operational burden for public internet access.
- CalDAV sync: Run a self-hosted Radicale (a lightweight CalDAV service). Odysseus calendar can sync bidirectionally with your phone’s system calendar.
- Email AI auto-bucketing: After connecting IMAP and letting it train for a while, it will automatically categorize into "urgent / general / spam", and drafts are pre-written—you only need to click send.
- Cookbook remote servers: Cookbook → Settings → Servers lets you configure a remote GPU machine. Model serving/pulling goes via SSH—your host can be small, your GPU can be far away, and it’s set up cleanly.
- Deep Research multi-step reports: Adapted from Alibaba Tongyi DeepResearch, using SearXNG to fetch multiple sources → summarize → synthesize. Great for competitor research / industry research.
- Multiple Provider routing: In Odysseus, assign different Providers for chat/Agent/Research. Run local Ollama for 80% of everyday conversations; route complex questions to the cloud—balancing cost and experience.
GitHub repo: pewdiepie-archdaemon/odysseus. Documentation is in the docs/ directory. Roadmap: ROADMAP.md. If you want to contribute, see CONTRIBUTING.md.