AI Provider Configuration¶

Configure language model providers to power MAESTRO's research and writing capabilities.

AI Configuration Interface

Overview¶

MAESTRO supports multiple AI providers and allows flexible configuration:

Advanced Mode: Configure separate providers and credentials for each model type
Custom Provider: Connect to any OpenAI-compatible API endpoint
Local LLMs: Connect to self-hosted models via custom endpoints

Supported Providers¶

Custom Provider (Recommended)¶

Connect to any OpenAI-compatible API endpoint. This is the most flexible option.

Configuration:

Provider: Select "Custom Provider" from dropdown
API Key: Enter your API key (or leave blank for local models)
Base URL: Your endpoint URL

Common Endpoints:

OpenRouter: https://openrouter.ai/api/v1/
OpenAI: https://api.openai.com/v1/
Azure OpenAI: https://your-resource.openai.azure.com/openai/v1/ (must end with /openai/v1/)
Local Ollama: http://host.docker.internal:11434/v1/ or check your ollama config
Local vLLM: http://host.docker.internal:8000/v1/ or check your vllm config

Note: Different providers require specific URL path suffixes. Azure OpenAI requires /openai/v1/, while most OpenAI-compatible APIs use /v1/. Always verify the correct format for your provider.

OpenRouter¶

Access to 100+ models through a unified API.

Setup:

Select "Openrouter" as the AI Provider
API Key: Get from OpenRouter Dashboard
Base URL: https://openrouter.ai/api/v1/
Click "Test" to verify and load models

Available Models:

Claude models (Anthropic)
GPT models (OpenAI)
Llama models (Meta)
Mistral models
Many more open and commercial models

Pricing: Pay-per-token, varies by model. Check OpenRouter Pricing

OpenAI¶

Direct access to OpenAI's GPT models.

Setup:

Select "OpenAI" as the AI Provider
API Key: Get from OpenAI Platform
Base URL: https://api.openai.com/v1/
Click "Test" to verify and load models

Available Models:

gpt-5-chat
gpt-5-mini
gpt-5-nano
gpt-4o
gpt-4o-mini

Pricing: Check OpenAI Pricing

Azure OpenAI¶

Use Azure-hosted OpenAI models through your Azure subscription.

Setup:

Select "Custom Provider" as the AI Provider
API Key: Your Azure OpenAI API key
Base URL: https://your-resource.openai.azure.com/openai/v1/
Important: URL must end with /openai/v1/ (not /openai/deployments/)
Enable Manual Model Entry toggle in Model Selection
Enter your Azure deployment names directly (e.g., gpt-4, gpt-5-preview)

Manual Model Entry:

Azure OpenAI doesn't support the /models endpoint, so you must use manual model entry: - Toggle the switch in the Model Selection card header - Enter your exact deployment names (not model names) - Example: If you deployed GPT-4 as "my-gpt4-deployment", enter that exact name

Configuration Mode¶

Advanced Configuration¶

MAESTRO uses Advanced Mode to configure separate providers and credentials for each model type:

Check "Advanced Configuration" checkbox
For each model type (Fast, Mid, Intelligent, Verifier):
- Select "Custom Provider" from dropdown
- Enter API key (if required)
- Enter base URL for your provider
- Click "Test" to load available models
- Select model from dropdown
Click "Save & Close" to apply settings

Example Setup:

Fast Model:
- Provider: Custom Provider
- Base URL: https://openrouter.ai/api/v1/
- Model: meta-llama/llama-3.2-3b-instruct
Mid Model:
- Provider: Custom Provider
- Base URL: https://openrouter.ai/api/v1/
- Model: anthropic/claude-3.5-haiku
Intelligent Model:
- Provider: Custom Provider
- Base URL: https://api.openai.com/v1/
- Model: gpt-5-chat-latest
Verifier Model:
- Provider: Custom Provider
- Base URL: http://host.docker.internal:11434/v1/
- Model: llama3.2 (local)

Model Types and Agent Usage¶

MAESTRO uses four model categories, automatically assigned to different agents and tasks:

Fast Model¶

Agents using Fast Model:

Planning Agent - Creates research plans and outlines
Note Assignment Agent - Distributes information to sections
Query Strategy Agent - Determines search strategies
Router Agent - Routes tasks to appropriate agents

Use Cases: Quick decisions, simple formatting, routing logic

Recommended Models: Smaller, faster models (GPT-4o-mini, Claude Haiku, GPT-5-nano)

Mid Model (Default)¶

Agents using Mid Model:

Research Agent - Main research and information gathering
Writing Agent - Document composition
Simplified Writing Agent - Streamlined writing tasks
Messenger Agent - User interaction and messaging
Default fallback - Any undefined agent modes

Use Cases: General research, standard writing, analysis, user interaction

Recommended Models: Balanced models (GPT-5-mini, Claude-4-Sonnet, Qwen-2.5-72b-instruct)

Intelligent Model¶

Agents using Intelligent Model:

Reflection Agent - Critical analysis and feedback
Query Preparation Agent - Complex query reformulation
Research Agent (critical tasks) - When explicitly needed for complex analysis

Use Cases: Deep analysis, complex reasoning, quality assessment

Recommended Models: Most capable models (GPT-5-chat, Claude-4.1-Opus, Qwen3-235b-a22b-2507)

Verifier Model¶

Agents using Verifier Model:

Verification tasks - Fact-checking and validation
Quality control - Ensuring accuracy of information

Use Cases: Fact verification, consistency checking

Recommended Models: Accurate, reliable models (typically same as Intelligent)

Local LLM Setup¶

Using Ollama¶

Install and run Ollama:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama2

# Run Ollama server
ollama serve

Configure in MAESTRO:
- Provider: Custom Provider
- API Key: (use dummy unless you configured authentication)
- Base URL: http://host.docker.internal:11434/v1/
- Click "Test" to load available models
- Model: Select from dropdown (e.g., "llama3.2", "mistral")

Using LM Studio¶

Start LM Studio server on port 1234
Configure in MAESTRO:
- Provider: Custom Provider
- API Key: (use dummy unless you configured authentication)
- Base URL: http://host.docker.internal:1234/v1/
- Click "Test" to verify connection

Using vLLM¶

Start vLLM server:

   python -m vllm.entrypoints.openai.api_server \
      --model "/home/user/models/Qwen_Qwen3-32B-AWQ" \
      --tensor-parallel-size 2 \
      --port 5000 \
      --host 0.0.0.0 \
      --gpu-memory-utilization 0.90 \
      --served-model-name "localmodel" \
      --disable-log-requests \
      --disable-custom-all-reduce \
      --enable-prefix-caching \
      --guided-decoding-backend "xgrammar" \
      --chat-template /home/user/vllm/qwen3_nonthinking.jinja

Configure in MAESTRO:
- Provider: Custom Provider
- API Key: (use dummy unless you configured authentication)
- Base URL: http://192.168.xxx.xxx:5000/v1/
- Click "Test" to load available model (will show as localmodel)

Testing Your Configuration¶

Connection Test¶

Enter your API credentials (API Key and Base URL)
Click the "Test" button next to the API Key field
Wait for verification (this fetches available models)
Success will populate the model dropdowns

Model Selection¶

After successful connection test:

Fast Model: Select from dropdown for rapid, simple tasks
Mid Model: Select for balanced performance tasks
Intelligent Model: Select for complex analysis (defaults to same as Mid if not set)
Verifier Model: Select for fact-checking (defaults to same as Mid if not set)

Saving Configuration¶

After selecting all models, click "Save & Close"
Settings are saved per user
Changes take effect immediately for new research sessions

Troubleshooting Connection Issues¶

"Connection failed" error:

Verify API key is correct and active
Check internet connection
Ensure billing is set up with provider
For custom endpoints, verify server is running

"Models not loading":

Wait a moment and try refreshing
Check if API key has correct permissions
Verify base URL format (should end with /v1/ for OpenAI-compatible APIs)

Best Practices¶

Model Selection¶

Match complexity to task
- Don't use GPT-5-chat for simple formatting
- Don't use GPT-5-nano for complex analysis
Consider cost
- Fast models for high-volume tasks
- Intelligent models for critical analysis
Test different combinations
- Find the optimal balance for your use case

Performance Optimization¶

Use local models for:
- Sensitive data
- High-volume processing
- Offline operation
Use cloud models for:
- Best quality
- Latest capabilities
- No infrastructure management

Cost Management¶

Monitoring Usage¶

Track usage through provider dashboards
Monitor job costs in UI
Set up billing alerts
Monitor token consumption in MAESTRO logs

Important Note on Cost Tracking¶

Cost Discrepancies: Tracked costs in MAESTRO may differ from your provider dashboard, especially with API aggregators like OpenRouter. This happens because:

Aggregators route to different backend providers with varying actual costs
Dynamic routing optimizes for speed/availability, not just price
Some providers count tokens differently for billing

Tracked costs are typically 40-60% of dashboard charges, especially with providers like openrouter. This is not a bug - MAESTRO calculates correctly based on advertised rates. For details and testing tools, see Cost Tracking Discrepancies.

Common Configurations¶

Budget-Conscious Setup¶

All models: OpenRouter with open models (Mistral, Llama)
Cost: Very low

Quality-Focused Setup¶

Fast: GPT-5-nano or GPT-4o-mini or Claude Haiku
Mid: GPT-5-mini or Claude Sonnet
Intelligent: GPT-5-chat or Claude Opus
Verifier: same as Intelligent

Privacy-Focused Setup¶

All models: Local LLMs via VLLM or SGLang with structured generation
No external API calls

Hybrid Setup¶

Fast/Mid: Local models
Intelligent: Cloud model for complex tasks
Balance of privacy and capability

Provider Comparison¶

Provider	Pros	Cons	Best For
OpenRouter	100+ models, unified billing	Adds small overhead	Flexibility
OpenAI	Direct access, latest models	Single vendor	GPT users
Local LLMs	Privacy, no costs	Requires hardware	Sensitive data

Troubleshooting¶

API Key Issues¶

Invalid API key:

Double-check key from provider dashboard
Ensure no extra spaces
Verify key is active

Rate limiting:

Check provider rate limits
Implement retry logic
Consider upgrading plan

Model Selection Issues¶

Model not available:

Verify model name spelling
Check if model is available in your region
Ensure API key has access to model

Wrong model behavior:

Verify correct model selected
Check model parameters
Test with different model

Next Steps¶

Search Provider Configuration - Set up web search
Environment Variables - System configuration
First Login - Initial setup guide