Odysseus Deployment in the Real World: Set Up a Self-Hosted AI Workstation in 30 Minutes

One-sentence overview: Want to "escape" from ChatGPT/Claude web UI and run a complete AI workstation on your own machine? This guide walks you through Odysseus — a local-first, privacy-first self-hosted alternative to ChatGPT.

Project Overview

Odysseus is a self-hosted AI workspace (Self-hosted AI Workspace) from GitHub user pewdiepie-archdaemon, open-sourced under the MIT license. What it’s trying to do is: give you the UI experience of ChatGPT / Claude running on your own computer, but keep all data, all conversations, and all tool calls on the local machine.

Compared with traditional ChatGPT-like apps, the differences are very clear:

Chat: Take any local or remote model (vLLM / llama.cpp / Ollama / OpenRouter / OpenAI / GitHub Copilot)
Agent: A real agent that can actually "take action" using MCP / Web / files / Shell / Skills / memory (based on opencode)
Cookbook: Automatically scans your machine’s hardware, recommends models that can run, then one-click download + start the serving (based on llmfit, VRAM-aware)
Deep Research: Multi-step research—fetch + read + synthesize—producing a visualized report (adapted from Tongyi DeepResearch)
Compare: Blind tests across multiple models to pick the one that’s truly useful—no bias
Documents: You write text; the AI helps alongside you, not the other way around
Memory / Skills: Long-term memory + a skills system—the more you use the agent, the more it understands you (ChromaDB + fastembed ONNX)
Email: An IMAP/SMTP inbox where AI automatically categorizes, summarizes, drafts replies, and fights spam
Notes & Tasks / Calendar: Notes + tasks + calendar, CalDAV sync with Radicale / Nextcloud / Apple / Fastmail
PWA support: Mobile experience that doesn’t feel worse than desktop—you can even install it to the home screen

In one line: It’s not just a chat box. It’s your entire AI work environment—fully on your machine, not on the cloud.

Difficulty / Time / What You Get

Difficulty: ⭐⭐⭐ (Moderate — you need basic familiarity with Docker / Python)
Time: 30–45 minutes (Docker path); 1 hour (native install + connecting a local model)
What you get:
- A fully functional self-hosted AI workstation
- A clear understanding of the typical architecture of local-first AI apps (FastAPI + ChromaDB + SearXNG + ntfy)
- Connect Ollama / vLLM / OpenAI-compatible APIs all into one unified interface
- Learn the "long-tail" but awesome capabilities: Cookbook’s automatic model selection, GPU pass-through, and the memory system

Target Readers

You don’t want to keep paying for ChatGPT Plus / Claude Pro subscriptions and want your AI data to stay local
You want to run open-source models on NUC / workstation / servers with 8G / 16G / 24G VRAM, and you want one consistent UI
Full-stack / independent developers who want to combine chat / email / calendar / agent tasks into a single panel
Ops/platform engineers interested in "self-hosting" who want a production-grade FastAPI app engineering template
You’ve already paid for cloud LLMs and are looking for a way to do it that’s less painful

If you only want a lightweight chat UI, Odysseus might not be your best fit—its feature set is larger, and it’s a bit more complex to get started than Open WebUI, but the payoff is the whole suite: email, memory, Cookbook, and the agent.

Core Dependencies and Environment

Minimum Requirements:

Project	Requirement
Python	3.11+
Memory	2 GB (Web UI only) / 8 GB+ (running local models)
Disk	5 GB (system + dependencies) / 50 GB+ (multiple GGUF models)
Docker	20.10+ (Docker Compose v2 recommended)
OS	Linux / macOS / Windows (including WSL2)

Optional / Enhanced Dependencies:

Project	Use
Ollama	The simplest local model (Windows friendly)
vLLM / llama.cpp	High-performance local inference (requires NVIDIA/AMD GPU)
NVIDIA Container Toolkit	Run GPU inside Docker
Git for Windows	Cookbook backend download/start on Windows
Tailscale + mkcert	Securely expose services on LAN / HTTPS

TIP

Don’t want to mess with GPU? Use the two-legged approach: Ollama + any OpenAI-compatible API—you can be up and running in 5 minutes. GPU is a "bonus", not a "prerequisite".

Full Project Structure Tree

odysseus/
├── app.py                    # FastAPI entrypoint
├── setup.py                  # Initialization script (create admin / database / directories)
├── requirements.txt          # Core Python dependencies
├── requirements-optional.txt # Optional dependencies (PDF/Office/voice/STT)
├── docker-compose.yml        # Default orchestration (CPU)
├── docker-compose.gpu-nvidia.yml  # NVIDIA GPU override
├── docker-compose.gpu-amd.yml     # AMD ROCm override
├── Dockerfile
├── core/                     # Infrastructure layer
│   ├── auth.py               # Authentication / sessions
│   ├── database.py           # SQLAlchemy initialization
│   ├── middleware.py
│   ├── constants.py
│   └── atomic_io.py
├── src/                      # Business logic layer
│   ├── llm_core.py           # LLM abstraction
│   ├── agent_loop.py         # Agent loop
│   ├── agent_tools.py        # Agent tools
│   ├── chat_processor.py     # Chat processing
│   ├── cookbook_serve_lifecycle.py  # Cookbook model serving lifecycle
│   ├── memory_vector.py      # Long-term memory (ChromaDB)
│   ├── deep_research.py      # Deep research
│   └── ...
├── routes/                   # FastAPI routes (40+ modules)
│   ├── chat_routes.py
│   ├── agent_routes.py
│   ├── cookbook_routes.py    # Model recommendations / downloads
│   ├── memory_routes.py
│   ├── email_routes.py
│   ├── calendar_routes.py
│   └── ...
├── services/                 # Background services
│   ├── docs/                 # Document processing
│   ├── hwfit/                # Hardware scanning (Cookbook)
│   ├── memory/               # Memory service
│   ├── research/             # Research service
│   ├── search/               # Search
│   ├── stt/ tts/             # Speech-to-text / Text-to-speech
│   └── shell/                # Shell tools
├── static/                   # Frontend (index.html + JS/CSS)
├── docs/                     # Docs site + screenshots + demo
├── config/                   # Config for sub-services like SearXNG
├── companion/                # Desktop companion (macOS)
└── data/                     # User data (gitignored, generated at runtime)
    ├── app.db                # SQLite
    ├── chroma/               # Vector database
    ├── uploads/
    ├── personal_docs/
    └── huggingface/          # Model cache

Inside the repo, you only need to care about app.py / docker-compose.yml / .env. Everything else is handled by Docker.

Step-by-Step Instructions

Step 1: Clone the repository

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus

WARNING

The dev branch of the repository is the latest but may be unstable. For production, it’s recommended to switch to the main branch: git checkout main. The instructions below use dev.

Step 2: Prepare `.env` (Optional, but recommended)

cp .env.example .env

A minimal working .env looks like this:

# === LLM connection points ===
LLM_HOST=localhost
# If you want to use a cloud API, access your local Ollama from inside the container via host.docker.internal
OLLAMA_BASE_URL=http://host.docker.internal:11434/v1

# === Ports and binding ===
APP_BIND=127.0.0.1
APP_PORT=7000

# === Security (critical!) ===
AUTH_ENABLED=true
LOCALHOST_BYPASS=false
SECURE_COOKIES=false   # Change to true only when using an HTTPS reverse proxy
ODYSSEUS_ADMIN_USER=admin
# ODYSSEUS_ADMIN_PASSWORD=  # If left blank, it will be randomly generated on first startup and printed

# === Search (SearXNG) ===
SEARXNG_INSTANCE=http://searxng:8080

TIP

Ollama cross-host access: When starting Ollama, it must listen on 0.0.0.0 instead of loopback:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Otherwise, the host.docker.internal:11434 inside the container will be rejected by Ollama.

Step 3: One-command Docker start (recommended path)

docker compose up -d --build

During the build, it pulls the base images and installs Python dependencies. The first run usually takes about 3–5 minutes. When you see all four containers—odysseus, chromadb, searxng, and ntfy—are healthy / running, you’re done:

docker compose ps

Open http://localhost:7000. The first screen is the login page. The initial admin password is in the terminal logs:

docker compose logs --tail=200 odysseus | grep -i "temporary\|admin\|password"

You should see something like:

[odysseus] Created admin user 'admin' with temporary password: aB3x-9pQz-2vRt

WARNING

This temporary password is printed only once on the first startup. After you get it, log in immediately and go to Settings → Account to change it to your own password.

Step 4: Verify all services are healthy

# 1. Container status
docker compose ps

# 2. Check Odysseus logs (note: first startup may have delay because models are pulled)
docker compose logs --tail=120 odysseus

# 3. Is ChromaDB really up (not HTTP fallback)
docker compose logs odysseus | grep -E "ChromaDB|MemoryVectorStore|DEGRADED"

If you see DEGRADED, there’s a good chance chromadb-client conflicts with chromadb. Troubleshooting is described later in "Common Issues".

Step 5: Connect your first model

Go to Settings → Models / Providers, then choose a path depending on your setup:

Path A: Ollama is already running (simplest)

Base URL:  http://host.docker.internal:11434/v1
API Key:   (leave empty)
Model:     qwen2.5:7b (or any model you already have locally)

Path B: Use Cookbook to auto-download + start

Go into Cookbook. It scans your VRAM, recommends open-source models you can run, and gives you a score (fit score). Click "Download" and it will use huggingface-cli to pull the model into ./data/huggingface/, then start serving it directly with llama.cpp / vLLM.

Path C: Connect to a cloud API — this is what we’re going to dig into today

Odysseus Provider configuration is protocol-driven. That means any service that correctly responds to either of these endpoints—v1/chat/completions or v1/messages—can be plugged in by filling in the Base URL. This is not limited to OpenAI/Anthropic official services; self-built gateways and third-party relays work the same.

In my deployment, I use cloud APIs for the "daily Agent runs code / long documents" category. Two reasons:

Running a 70B quantized model on my local 24G GPU is already tough; running multi-step Agents often leads to OOM.
Frontier models like Claude / GPT still outperform Qwen2.5-Coder for complex reasoning.

But then the problem comes: official APIs are too expensive. Claude Opus alone can easily exceed a hundred dollars in a week—if you’re just playing around, you’re basically working for Anyscale.

My workaround is to connect Defapi, using the endpoint compatible with v1/messages:

# Odysseus Settings → Custom Anthropic Provider
Base URL:  https://api.defapi.org
API Key:   sk-xxxxx                    # Get it from the defapi.org dashboard
Model:     anthropic/claude-sonnet-4.5 # Or haiku-4.5 / opus-4.6

Example configs: Defapi Claude Sonnet 4.5 (main power for programming/long documents), Claude Haiku 4.5 (preferred for everyday Agent calls—it’s cheaper).

In practice, for the same amount of conversation/tasks, my monthly bill is less than half compared with connecting directly to Anthropic official. Model capability, Agent tool usage, and prompt formatting are no different at all—because this is a protocol-layer proxy; the model is still the same. That naturally matches Odysseus’s "protocol-driven" design.

TIP

With Defapi, you need zero code changes on Odysseus. Tool calls and context management in agent_tools.py remain untouched. In Settings, you can even use Sonnet for "chat", Opus for "Deep Research", and Haiku for "Agent tasks"—each with its own Provider and independent billing.

Path D: Connect GitHub Copilot / OpenRouter, etc.

These are also OpenAI-protocol compatible. Set the Base URL to https://api.githubcopilot.com / https://openrouter.ai/api/v1, respectively. No more details here.

Step 6: First chat + first Agent task

Back on the main UI, select a model and send a simple prompt to test connectivity:

Introduce yourself in three sentences, then give me a Linux performance troubleshooting checklist I can use today.

If that works, the LLM connection path is confirmed.

Next, try Agent mode: create a new session, switch to "Agent", and input:

List the line counts of all .py files in the current directory, then tell me which file is most worth refactoring.

The Agent plans it out:

Call the shell tool to run find . -name "*.py" | xargs wc -l
Make a judgment based on the results
Provide recommendations

This is the fundamental difference between a local-first Agent and ChatGPT web: it can truly take action, and it has access to your code, files, and terminal.

Step 7 (Optional): Enable GPU

This step is only needed if you want to use "heavyweight" local inference like vLLM / SGLang / llama.cpp CUDA. If you only use Ollama or cloud APIs, skip this.

NVIDIA:

# One-command diagnosis
scripts/check-docker-gpu.sh

# Install NVIDIA Container Toolkit (Ubuntu/Debian, needs sudo)
scripts/check-docker-gpu.sh --install-nvidia-toolkit

# Enable GPU pass-through successfully before turning on overlay
scripts/check-docker-gpu.sh --enable-nvidia-overlay

# Verify
docker compose exec odysseus nvidia-smi -L

It adds this in .env:

COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml

AMD ROCm:

scripts/check-docker-amd-gpu.sh
# Write the output RENDER_GID into .env

COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989

WARNING

GPU pass-through ≠ installing llama.cpp CUDA. If nvidia-smi in the container can see the card, it only proves device pass-through works. llama.cpp also needs the cudart and the CUDA Toolkit runtime—this is handled by Cookbook → Dependencies where you click to reinstall, not something Docker-layer changes can solve. If the logs say Unable to find cudart library, that’s it.

Step 8 (Optional): Enable Playwright MCP (browser agent)

Odysseus includes several MCP servers, but the browser one requires pulling it with npx first:

npx -y @playwright/mcp@latest --version

Restart the Odysseus container, and the Agent can use the browser MCP (screenshots, navigation, form filling):

Open https://news.ycombinator.com, capture the first 10 titles and links, and summarize them into 5 core topics.

Step 9 (Optional): Run natively on Windows

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1

The script automatically creates a venv → installs dependencies → runs setup → starts uvicorn. Open http://localhost:7000.

TIP

Windows local vLLM / SGLang is not supported. If you want to run models locally, install Ollama for Windows, then set the endpoint to http://localhost:11434/v1 in Settings. If you want to use Claude/GPT without local inference, you can also use any cloud API that is OpenAI-compatible.

Step 10 (Optional): Make it usable on your phone

Assuming you’ve connected to Tailscale:

ODYSSEUS_HOST=0.0.0.0 docker compose up -d
# Or set it in .env
APP_BIND=0.0.0.0

But exposing plain HTTP to your LAN / Tailscale is not safe. Strongly recommended: add mkcert HTTPS:

mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 100.x.y.z   # your Tailscale IP

python -m uvicorn app:app --host 0.0.0.0 --port 7000 \
  --ssl-certfile=cert.pem --ssl-keyfile=key.pem

On your phone, access https://<tailscale-ip>:7000. Install the PWA to your home screen, and the experience is almost the same as a native app.

Troubleshooting FAQs

Q1: Port 7000 is already in use (common on macOS)

On macOS, AirPlay usually uses 7000 by default. Two solutions:

# Method 1: Change the port in .env
APP_PORT=7001
docker compose up -d
# Open http://localhost:7001 in your browser

# Method 2: Apple menu → System Settings → General → AirPlay Receiver → turn it off

Q2: ChromaDB starts, but logs show `DEGRADED`

Usually this happens when chromadb-client and chromadb are installed at the same time. It silently falls back to HTTP-only mode and the vector capability breaks. Fix:

# In the running container (or your local venv)
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
docker compose restart odysseus

Verify the fix: docker compose logs odysseus | grep -E "ChromaDB|MemoryVectorStore". You should see successful initialization and no more DEGRADED.

Q3: Cookbook can’t see my GPU—only sees iGPU or CPU

By default, Docker exposes all GPUs of the host, but if you only mount the iGPU or another card, it’s almost always because the NVIDIA Container Toolkit isn’t installed / configured, or nvidia-ctk runtime configure --runtime=docker hasn’t been run:

# Diagnose
scripts/check-docker-gpu.sh
# Check the output—it tells you which step is missing

# One-click install + config (Ubuntu/Debian)
scripts/check-docker-gpu.sh --install-nvidia-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Q4: GPU pass-through succeeds, but llama.cpp logs say `Unable to find cudart`

The device is fine; what’s missing is the CUDA runtime. Docker can’t fix this layer—you need to reinstall the CUDA build of llama-cpp-python via Cookbook → Dependencies:

Open Cookbook → Dependencies → find llama-cpp-python → Reinstall (CUDA build)

It will run pip install --no-cache-dir llama-cpp-python[cuda] automatically. It takes about 2–3 minutes.

Q5: Outlook / Office 365 mailbox won’t connect

Odysseus’s email module currently supports only IMAP + username/password authentication. After 2024, Microsoft basically disabled normal password logins and requires OAuth. This is a known limitation for now.

Workarounds: use Gmail / Fastmail / a self-hosted mailbox, or generate and use a Microsoft App Password (for accounts with 2FA) to simulate password login.

Q6: If I want public access, how do I keep it secure?

Directly setting APP_BIND=0.0.0.0 and exposing it to the public internet is a disaster. LOCALHOST_BYPASS bypasses authentication, and the Agent also has shell tools—Shodan can scan and find a bunch of exposed instances within 24 hours.

The right approach:

[Internet] → Cloudflare Access / Tailscale / WireGuard
            → Caddy / Nginx / Traefik (HTTPS termination)
            → http://127.0.0.1:7000 (Odysseus; keep localhost binding)

And set in .env:

AUTH_ENABLED=true
LOCALHOST_BYPASS=false
SECURE_COOKIES=true

Q7: How do you control cloud API billing?

Let’s expand on the issue in Step 5, Path C.

When you play with this as a person, there are three strategies—ranked from highest cost to lowest cost:

Connect official APIs — best experience, but the most expensive. Claude Opus + frequent Agent calls hitting $100+ per week is normal.
Connect a protocol-compatible relay (I use Defapi) — same Claude / GPT / Gemini models, and the monthly bill is about one tier lower than official. I didn’t compute the exact difference; roughly, $100 on a weekly basis becomes roughly $100 on a monthly basis. The advantage is full protocol compatibility: Odysseus config stays the same—no need to change any prompt / Agent code.
Run local quantized models — let Cookbook download a Q4_K_M GGUF; with 16G VRAM you can run around the 30B class. No API costs, but inference is slower than cloud and complex task quality is a notch lower.

My actual combination: Chat + Deep Research go through Defapi Claude Sonnet 4.5, Agent tool-calling runs Claude Haiku 4.5 (cheap), and simple classification/summarization tasks use a local Qwen2.5 quantized variant. Each Provider charges independently and doesn’t interfere with the others—my monthly total is double-digit RMB.

TIP

Key trick: Don’t use the same model for everything. Odysseus’s Settings lets you assign different Providers to different scenarios. Many people overlook this great feature.

Project Overview

Difficulty / Time / What You Get

Target Readers

Core Dependencies and Environment

Full Project Structure Tree

Step-by-Step Instructions

Step 1: Clone the repository

Step 2: Prepare .env (Optional, but recommended)

Step 3: One-command Docker start (recommended path)

Step 4: Verify all services are healthy

Step 5: Connect your first model

Step 6: First chat + first Agent task

Step 7 (Optional): Enable GPU

Step 8 (Optional): Enable Playwright MCP (browser agent)

Step 9 (Optional): Run natively on Windows

Step 10 (Optional): Make it usable on your phone

Troubleshooting FAQs

Q1: Port 7000 is already in use (common on macOS)

Q2: ChromaDB starts, but logs show DEGRADED

Q3: Cookbook can’t see my GPU—only sees iGPU or CPU

Q4: GPU pass-through succeeds, but llama.cpp logs say Unable to find cudart

Q5: Outlook / Office 365 mailbox won’t connect

Q6: If I want public access, how do I keep it secure?

Q7: How do you control cloud API billing?

Further Reading / Advanced Directions

Step 2: Prepare `.env` (Optional, but recommended)

Q2: ChromaDB starts, but logs show `DEGRADED`

Q4: GPU pass-through succeeds, but llama.cpp logs say `Unable to find cudart`