AI Model Troubleshooting¶
Quick fixes for AI model configuration and API issues.
Configuration Issues¶
No Models in Dropdown¶
Solution:
- Go to Settings → AI Config
- Select provider (OpenAI, OpenRouter, or Custom Provider)
- Enter API key
- Click "Test" button
- Models should populate automatically
If still empty:
# Check logs for errors
docker compose logs maestro-backend | grep -i "api\|model"
# Restart backend
docker compose restart maestro-backend
Wrong Model Being Used¶
Check current configuration:
- Go to Settings → AI Config
- Verify each model type is set correctly:
- Fast Model: For quick tasks
- Mid Model: For balanced performance
- Intelligent Model: For complex analysis
- Verifier Model: For verification
API Issues¶
Authentication Failed¶
Error: Invalid API key
Solution:
# Test OpenAI key
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
# Test OpenRouter key
curl https://openrouter.ai/api/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
Rate Limit Errors¶
Error: Rate limit exceeded
Solution:
# Configure retry settings in .env
MAX_RETRIES=3
RETRY_DELAY=5
MAX_CONCURRENT_REQUESTS=2
# Restart
docker compose restart maestro-backend
Context Too Large¶
Error: Context length exceeded
Solution in Settings → Research:
- Reduce
writing_agent_max_context_chars
(e.g., 100,000) - Reduce
main_research_doc_results
(e.g., 3) - Reduce
main_research_web_results
(e.g., 3)
Provider-Specific Issues¶
OpenAI¶
Common issues:
- Wrong API key format
- Insufficient credits
- Model not available in your region
Check account:
OpenRouter¶
Model not found:
# Use full model path
# Correct: anthropic/claude-3-sonnet
# Wrong: claude-3-sonnet
# Check available models
curl https://openrouter.ai/api/v1/models \
-H "Authorization: Bearer YOUR_KEY"
Custom Provider¶
Connection failed:
# Test endpoint
curl YOUR_CUSTOM_BASE_URL/v1/models
# Common base URLs:
# Local vLLM: http://localhost:5000/v1
# Local SGLang: http://localhost:30000/v1
# Local Ollama: http://localhost:11434/v1
# LM-Studio: http://localhost:1234/v1
Performance Issues¶
Slow Response¶
Quick fixes:
- Use faster models (e.g., gpt-4o-mini)
- Reduce context sizes in Research settings
- Check network latency to API
High Costs¶
Reduce costs:
- Use cheaper models for Fast/Mid tiers
- Reduce research parameters:
initial_research_max_questions
structured_research_rounds
writing_passes
Cost Tracking Discrepancies¶
Why Don't My Tracked Costs Match My API Provider Dashboard?¶
This is a known issue with some API providers' pricing and billing. MAESTRO correctly tracks costs based on providers' advertised pricing, but actual charges often differ significantly.
The Problem¶
Some API providers, particularly aggregators/routers, have inconsistent billing:
- API aggregators route to different providers - Each backend provider may have different actual costs
- Dynamic routing affects pricing - Aggregators choose providers based on availability and latency, not just price
- API
usage.cost
field may be unreliable - Sometimes returns values ~100x lower than actual charges - Dashboard charges don't match advertised pricing - Can be 0.4x to 4x the calculated cost
Real Example¶
Here's actual data from testing with a popular model:
Prompt Tokens | Completion | Our Calculation | API usage.cost | Dashboard Charge | Variance |
---|---|---|---|---|---|
4,405 | 200 | $0.000601 | $0.000006 | $0.000594 | 0.99x |
349 | 300 | $0.000275 | $0.000003 | $0.000272 | 0.99x |
52 | 1,000 | $0.000805 | $0.000003 | $0.000312 | 0.39x |
243 | 200 | $0.000184 | $0.000003 | $0.000331 | 1.80x |
64 | 124 | $0.000106 | $0.000002 | $0.000157 | 1.48x |
Key Findings:
- Token counts are usually accurate across API and dashboard
- MAESTRO's pricing calculation is correct based on advertised rates
- Dashboard charges can be inconsistent with advertised rates
Testing Your Own Costs¶
We provide a test script to verify pricing discrepancies:
# Run the pricing test (example for OpenRouter)
python scripts/test_openrouter_pricing.py --api-key YOUR_API_KEY
# Example output will show:
# - Token counts from API
# - Calculated costs based on advertised pricing
# - API reported costs (usually wrong)
# - Comparison with dashboard charges
What This Means for You¶
- Your tracked costs may differ from actual charges - typically 40-60% of dashboard values
- This is NOT a bug in MAESTRO - we calculate correctly based on advertised prices
- Some providers' billing is inconsistent - they may charge differently than advertised
Workarounds¶
-
Apply a multiplier to displayed costs based on your provider:
-
Monitor your actual provider dashboard for true costs
-
Use providers with consistent pricing if cost accuracy is critical:
- Some providers have more predictable pricing than others
- Local models have zero API costs
Technical Details¶
The discrepancy may be caused by:
- Aggregator routing: Services like OpenRouter route to different backend providers with varying costs
- Dynamic provider selection: Aggregators optimize for availability and latency, not just price
- Hidden tokens: Providers may count system prompts or special tokens not reported in API
- Different tokenizers: Billing tokenizer may differ from API response tokenizer
- Overhead charges: Routing or processing overhead not disclosed
- Minimum charges or rounding: Some providers may have minimum charge amounts
Note: Direct providers (like OpenAI, Anthropic) typically have more consistent pricing than aggregators/routers, as they don't route between multiple backends.
For more technical details and provider-specific test scripts, see our scripts directory.
Debugging¶
Enable Debug Logging¶
# In .env
LOG_LEVEL=DEBUG
# Restart
docker compose restart maestro-backend
# Watch logs
docker compose logs -f maestro-backend | grep -i "model\|api"
Test Models Directly¶
# Check configured models
docker exec maestro-backend python -c "
from ai_researcher.dynamic_config import get_fast_model_name, get_mid_model_name, get_intelligent_model_name
print('Fast:', get_fast_model_name())
print('Mid:', get_mid_model_name())
print('Intelligent:', get_intelligent_model_name())
"
Structured Outputs & Provider Compatibility¶
What are Structured Outputs?¶
Maestro uses OpenAI's structured outputs feature (json_schema
response format) to ensure LLM responses match exact Pydantic model schemas. However, not all providers support this advanced feature.
Provider Support Status¶
Provider | json_object | json_schema (Structured) | Notes |
---|---|---|---|
OpenAI | ✅ | ✅ | Full support (gpt-4o 2024-08-06+) |
Azure OpenAI | ✅ | ⚠️ | Needs model 2024-08-06+, API 2024-10-21+ |
Anthropic Claude | ✅ | ✅ | Full support via different API |
Google Gemini | ✅ | ⚠️ | Limited support, different implementation |
DeepSeek | ✅ | ❌ | Only basic json_object mode |
Moonshot/Kimi | ⚠️ | ❌ | Own incompatible JSON schema format |
Local (Ollama) | ✅ | ❌ | Most models only support json_object |
Local (LM-Studio) | ✅ | ❌ | Basic JSON mode only |
Local (vLLM) | ✅ | ✅ | Full support with guided_json parameter (docs) |
Local (SGLang) | ✅ | ✅ | Full support with grammar constraints (docs) |
Automatic Fallback Mechanism¶
Maestro automatically handles incompatible providers:
- First attempts structured outputs for maximum reliability
- Detects errors and falls back to
json_object
mode - Enhances prompts with schema instructions in fallback mode
- Validates responses against Pydantic models regardless
Common Compatibility Errors¶
Error Message | Provider | Meaning |
---|---|---|
"This response_format type is unavailable" | DeepSeek | No json_schema support |
"Invalid moonshot flavored json schema" | Moonshot/Kimi | Incompatible schema format |
"keyword 'default' is not allowed" | Moonshot | Schema validation error |
"structured outputs are not supported" | Azure (older) | Need newer model/API |
Configuring Local Model Servers¶
vLLM with Structured Outputs¶
# Start vLLM with guided decoding
python -m vllm.entrypoints.openai.api_server \
--model your-model \
--guided-decoding-backend outlines \
--port 5000
# In Maestro Settings → AI Config:
# Provider: Custom Provider
# Base URL: http://localhost:5000/v1
# Model: your-model
SGLang with Grammar Support¶
# Start SGLang server
python -m sglang.launch_server \
--model-path your-model \
--port 30000
# In Maestro Settings → AI Config:
# Provider: Custom Provider
# Base URL: http://localhost:30000/v1
# Model: your-model
For detailed local LLM deployment, see Local LLM Deployment Guide.
Common Error Messages¶
Error | Solution |
---|---|
"API key invalid" | Check key in Settings → AI Config |
"Model not found" | Use full model path (provider/model) |
"Rate limit exceeded" | Wait or upgrade API plan |
"Context length exceeded" | Reduce context in Research settings |
"Connection timeout" | Check network/firewall |
"Invalid json schema" | Provider doesn't support structured outputs (auto-fallback) |
"response_format unavailable" | Provider limitation (auto-fallback) |
Quick Fixes¶
Reset AI Configuration¶
# Clear settings and reconfigure
docker exec maestro-postgres psql -U maestro_user -d maestro_db -c "
UPDATE users SET settings = settings - 'ai_endpoints' WHERE username = 'admin';
"
# Then reconfigure in web UI
Switch to Local Models¶
# In Settings → AI Config
# Select "Custom Provider"
# Base URL: http://your-server:5000/v1
# Model: localmodel
Still Having Issues?¶
- Check logs:
docker compose logs maestro-backend
- Verify API keys are valid
- Check provider status pages
- Try different models