Introduction
OpenFang is an open-source Agent Operating System, written in Rust and compiled into a single binary file of approximately 32MB. It inherently supports 20 mainstream LLM providers, including Anthropic Claude, Google Gemini, OpenAI GPT, DeepSeek, Groq, and more. With simple configuration, you can quickly integrate various AI capabilities into your application.
This article will detail how to configure and use various LLM APIs in OpenFang, along with some practical tips.
Method 1: Using Defapi (Recommended - Half Price Discount)
If you want to use high-quality AI models at a lower cost, Defapi is an excellent choice. Defapi provides APIs for all mainstream models at just 50% of the official prices. For example:
- Gemini 2.5 Pro: Official price $1.25/M tokens โ Defapi only $0.625/M tokens
- Claude Sonnet 4: Official price $3.00/M tokens โ Defapi only $1.50/M tokens
Configuration Steps
- Obtain Defapi API Key (visit https://defapi.org)
- Set in the configuration file:
# ~/.openfang/config.toml
[default_model]
provider = "openai" # or "anthropic", "gemini"
model = "claude-sonnet-4-20250514" # or other models
base_url = "https://api.defapi.org/v1"
[env]
DEFAPI_API_KEY = "your-defapi-key"
Using Custom Endpoints
[[providers]]
name = "defapi"
base_url = "https://api.defapi.org/v1"
api_key_env = "DEFAPI_API_KEY"
[default_model]
provider = "defapi"
model = "claude-sonnet-4-20250514"
Defapi Supported Protocols
Defapi supports various API protocols, perfectly compatible with OpenFang:
- v1/chat/completions - OpenAI compatible interface
- v1/messages - Anthropic Claude interface
- v1beta/models/*:generateContent - Google Gemini interface
Method 2: Directly Using Official API
Quick Configuration with Environment Variables
OpenFang supports automatic identification of multiple providers through environment variables:
# Anthropic Claude
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenAI GPT
export OPENAI_API_KEY="sk-..."
# Google Gemini (also supports free quota)
export GEMINI_API_KEY="AIza..."
# DeepSeek
export DEEPSEEK_API_KEY="sk-..."
# Groq (free quota, extremely fast)
export GROQ_API_KEY="gsk_..."
Configuration via Configuration Files
Detail the configuration in ~/.openfang/config.toml:
# Global default model
[default_model]
provider = "groq"
model = "llama-3.3-70b-versatile"
# Cost Control
[agents.defaults]
max_cost_per_hour_usd = 10.00
Quick Reference of Available Models
| Provider | Recommended Model | Context | Features |
|---|---|---|---|
| Anthropic | claude-sonnet-4-20250514 | 200K | High cost-performance |
| OpenAI | gpt-4o-mini | 128K | Fast and affordable |
| Gemini | gemini-2.5-flash | 1M | Free quota |
| DeepSeek | deepseek-chat | 64K | Strong reasoning capabilities |
| Groq | llama-3.3-70b-versatile | 128K | Extremely fast |
Method 3: Using OpenRouter Aggregation Platform
OpenRouter supports over 200 models, suitable for scenarios requiring flexible model switching:
export OPENROUTER_API_KEY="sk-or-..."
[default_model]
provider = "openrouter"
model = "openrouter/auto" # Automatically select the best model
Method 4: Integrating Local Models
Ollama (Recommended for Local Development)
# Install and start Ollama
ollama serve
ollama pull llama3.2
[default_model]
provider = "ollama"
model = "llama3.2"
vLLM (Production-grade Local Deployment)
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-70B-Instruct
[default_model]
provider = "vllm"
model = "meta-llama/Llama-3.1-70B-Instruct"
Method 5: Connecting to Any OpenAI-Compatible API
If you have a custom API endpoint, OpenFang supports fully customizable configurations:
[[providers]]
name = "custom-llm"
base_url = "https://your-api-endpoint.com/v1"
api_key_env = "CUSTOM_API_KEY"
[default_model]
provider = "custom-llm"
model = "your-model"
Verifying Functionality
1. Check Health Status
curl http://127.0.0.1:4200/api/health
2. View Available Models
curl http://127.0.0.1:4200/api/models
3. Check Provider Status
curl http://127.0.0.1:4200/api/providers
4. Send Test Message
curl -X POST http://127.0.0.1:4200/api/agents/{agent-id}/message \
-H "Content-Type: application/json" \
-d '{"message": "Hello, say hi in 3 words"}'
5. Test Using OpenAI-Compatible Interface
curl -X POST http://127.0.0.1:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "hello-world",
"messages": [{"role": "user", "content": "Hi"}]
}'
Internal Mechanism Analysis
Driver Architecture
OpenFang has a three-layer driver architecture to support various LLMs:
- Native Drivers: Anthropic and Gemini drivers optimized for specific API protocols
- OpenAI-Compatible Drivers: Support all providers adhering to OpenAI API format
- Backup Drivers: Support multi-provider chained calls, automatically switching when the primary provider fails
Intelligent Model Routing Selection
OpenFang has a built-in smart routing mechanism that automatically selects the appropriate model based on task complexity:
- Simple (Score < 100): Use Haiku or Gemini Flash
- Medium (100-500): Use Sonnet or Gemini Pro
- Complex (>= 500): Use Opus or GPT-4
Scoring is based on factors such as message length, tool count, code tags, dialogue depth, and system prompt length.
Cost Tracking
After each API call, OpenFang automatically calculates the cost:
Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514
This is made possible by the built-in model directory, which contains precise pricing information for all models.
Common Use Cases
1. Intelligent Customer Service Bot
Configure low-cost models (like gpt-4o-mini or llama-3.1-8b) to handle a large volume of simple inquiries, reducing operational costs.
2. Code Review Assistant
Use Claude Opus or GPT-4 for in-depth code analysis, achieving rapid feedback with Groq's high-speed inference.
3. Content Creation Assistant
Utilize the extensive context window (1M tokens) of Gemini 2.5 Pro for writing long documents and handling complex creative tasks.
4. Data Analysis Assistant
Utilize the online search capability of the Perplexity Sonar model to obtain the most up-to-date data for statistical analysis in real-time.
5. Multilingual Translation Service
Deploy translation models using local Ollama to ensure data privacy, suitable for internal use within enterprises.