Linux Installation¶
Complete installation guide for running MAESTRO on Linux distributions with full GPU support.
NVIDIA Driver Requirement for GPU Users
MAESTRO uses CUDA 12.9, which requires NVIDIA driver version 575 or newer. If you plan to use GPU acceleration, ensure you have the correct driver version installed. Older drivers (like 535) will cause container startup failures.
Prerequisites¶
System Requirements¶
- Distribution: Ubuntu 20.04+, Debian 11+, RHEL 8+, or compatible
- RAM: 16GB minimum (32GB recommended)
- Storage: 30GB free space minimum (8GB for models, 22GB for Docker and data)
- GPU VRAM (if using GPU):
- 4GB minimum for single process (research OR document processing)
- 8GB recommended for concurrent operations (research AND document processing)
- Network: Internet connection for initial setup and web search
Required Software¶
-
Docker Engine
-
Docker Compose V2
-
Git
Optional: NVIDIA GPU Support¶
For GPU acceleration (highly recommended):
- NVIDIA Drivers
!!! warning "Driver Version Requirement" MAESTRO uses CUDA 12.9, which requires NVIDIA driver version 575 or newer. Older drivers will cause container startup failures.
# Ubuntu/Debian - Install latest driver (575+)
sudo apt-get update
sudo apt-get install nvidia-driver-575
# OR install the latest available
# Verify installation and check driver version
nvidia-smi
# Should show Driver Version: 575.xx or higher
If driver 575+ is not available in your repository:
# Add NVIDIA PPA for latest drivers
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-driver-575
- NVIDIA Container Toolkit
# Add repository distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list # Install toolkit sudo apt-get update sudo apt-get install nvidia-container-toolkit # Configure Docker to use NVIDIA runtime sudo nvidia-ctk runtime configure --runtime=docker # Restart Docker sudo systemctl restart docker # Verify GPU access with CUDA 12.9 docker run --rm --gpus all nvidia/cuda:12.9.1-base-ubuntu24.04 nvidia-smi
Pre-Installation Verification¶
GPU Setup Verification (Optional but Recommended)¶
If you plan to use GPU acceleration, verify your setup before proceeding:
# Check driver version (must be 575+)
nvidia-smi | grep "Driver Version"
# Test Docker GPU access with CUDA 12.9
docker run --rm --gpus all nvidia/cuda:12.9.1-base-ubuntu24.04 nvidia-smi
# If the above fails, your driver is too old or GPU support is not configured
Installation Steps¶
Step 1: Clone Repository¶
Step 2: Configure Environment¶
Use the interactive setup script for easy configuration:
The script will guide you through:
-
Network Configuration
- Simple (localhost only) - if installing on the machine you are using
- Network (LAN access) - if accessing maestro over network
- Custom domain - if running maestro with a domain
-
Security Configuration
- Generates secure passwords automatically
- Sets up JWT secrets
- Configures admin credentials
-
Port Configuration
- Sets the main application port
Step 3: Build and Start MAESTRO¶
Always build on first run to ensure latest changes:
# Build and start all services
docker compose up -d --build
# Monitor startup progress
docker compose logs -f maestro-backend
First-time startup: The backend downloads AI models on first run (5-10 minutes). Wait for "MAESTRO Backend Started Successfully!" message.
Step 4: Access MAESTRO¶
Once startup is complete:
- Open browser to
http://localhost
(or configured port) - Login with admin credentials from setup
- Change default password immediately
Docker Volume Configuration¶
Persistent Model Storage¶
To avoid re-downloading models after container restarts, MAESTRO uses persistent volumes:
# In docker-compose.yml
volumes:
- ./maestro_model_cache:/root/.cache/huggingface
- ./maestro_datalab_cache:/root/.cache/datalab
These volumes persist:
- BGE-M3 Embedding Model (560M parameters): ~1.1GB on disk, ~1.3GB VRAM per instance
- BGE-Reranker-v2-m3: ~1.1GB on disk, ~1.2GB VRAM when loaded
- Document Processing Models (Marker): ~3GB on disk
- Total Model Cache: ~6-8GB disk space recommended
- Processed documents and embeddings: Varies by document volume
Multiple Model Instances
MAESTRO loads separate instances of embedding models for:
- Document processing: When uploading/ingesting documents
- Research queries: When searching and retrieving information
If both processes run simultaneously, VRAM usage doubles (~2.6GB for embeddings + ~1.2GB for reranker = ~4GB total). The higher VRAM usage you see (~7-8GB) likely includes overhead, memory fragmentation, and potentially multiple model instances.
GPU Device Assignment¶
Single GPU Configuration¶
Default configuration uses GPU 0. To change:
# In docker-compose.yml
services:
backend:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0'] # Change to your GPU ID
capabilities: [gpu]
Multi-GPU Configuration¶
Assign different GPUs to different services:
services:
backend:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0'] # Backend uses GPU 0
capabilities: [gpu]
doc-processor:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1'] # Document processor uses GPU 1
capabilities: [gpu]
Resource Management¶
Performance Tuning¶
MAESTRO uses multiple layers of concurrency control for different purposes:
Concurrency Layers Explained
Think of these settings as different traffic control systems:
-
MAX_WORKER_THREADS = General purpose workers (for all background tasks)
- Controls: Web scraping, file processing, general async tasks
-
GLOBAL_MAX_CONCURRENT_LLM_REQUESTS = LLM API capacity
- Controls: Total LLM API calls across ALL users/sessions
- Prevents: Overwhelming your LLM provider
-
MAX_CONCURRENT_REQUESTS = Individual user API calls
- Controls: LLM API calls per research session
- Prevents: One user/session from monopolizing resources
-
Web Search = (hardcoded to 2)
- Controls: Search API calls (Tavily, Jina, etc.)
- Prevents: Rate limiting from search providers
Worker Thread Configuration¶
The MAX_WORKER_THREADS
environment variable controls concurrent background tasks:
# In .env file
MAX_WORKER_THREADS=20 # Default value
# Recommended values based on system:
# - Low-end (8GB RAM, 4 cores): 10
# - Mid-range (16GB RAM, 8 cores): 20
# - High-end (32GB+ RAM, 16+ cores): 30-50
This setting affects:
- Concurrent web fetches during research
- Parallel document processing
- Background task execution
Finding Optimal Value
Start with the default (20) and adjust based on: - System responsiveness during heavy loads - Memory usage (monitor with docker stats
) - Number of concurrent users
LLM Concurrency Configuration¶
There are two levels of LLM concurrency control:
-
Global limit (
GLOBAL_MAX_CONCURRENT_LLM_REQUESTS
): -
Per-session limit (
MAX_CONCURRENT_REQUESTS
):Settings Precedence
For per-session concurrency, the order of precedence is:
- Mission-specific settings (per research task)
- User settings (UI: Settings → Research → Performance → Concurrent Requests)
- Environment variable (MAX_CONCURRENT_REQUESTS)
- Default (10, minimum enforced to prevent deadlocks)
Most users should configure this in the UI rather than the environment variable.
Memory and CPU Limits¶
Add resource constraints in docker-compose.yml
:
services:
backend:
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
GPU Memory Management¶
If you encounter GPU memory issues:
# Set GPU memory growth
export TF_FORCE_GPU_ALLOW_GROWTH=true
# Limit GPU memory usage
export TF_GPU_MEMORY_LIMIT=4096 # Limit to 4GB
Data Backup and Recovery¶
Backup Volumes¶
# Stop services
docker compose down
# Backup PostgreSQL data
docker run --rm -v maestro_postgres-data:/data -v $(pwd):/backup \
ubuntu tar czf /backup/postgres_backup.tar.gz /data
# Backup model cache
tar czf models_backup.tar.gz maestro_model_cache maestro_datalab_cache
Restore from Backup¶
# Restore PostgreSQL
docker run --rm -v maestro_postgres-data:/data -v $(pwd):/backup \
ubuntu tar xzf /backup/postgres_backup.tar.gz -C /
# Restore models
tar xzf models_backup.tar.gz
# Restart services
docker compose up -d
Troubleshooting¶
GPU Not Detected¶
# Verify NVIDIA driver and version
nvidia-smi
# Must show Driver Version: 575.xx or higher for CUDA 12.9
# Check Docker GPU access with CUDA 12.9
docker run --rm --gpus all nvidia/cuda:12.9.1-base-ubuntu24.04 nvidia-smi
# Check container toolkit
nvidia-container-cli info
Common GPU Issues:¶
-
Driver too old for CUDA 12.9:
-
Container fails with CUDA error:
- Usually means driver version mismatch
- Ensure driver is 575+ for CUDA 12.9
- After driver update, restart Docker:
sudo systemctl restart docker
Permission Issues¶
# Fix Docker permissions
sudo usermod -aG docker $USER
newgrp docker
# Fix volume permissions
sudo chown -R $USER:$USER ./maestro_model_cache ./maestro_datalab_cache
Container Startup Issues¶
# Check logs
docker compose logs maestro-backend
docker compose logs maestro-postgres
# Rebuild if needed
docker compose down
docker compose build --no-cache
docker compose up -d
Port Conflicts¶
Maintenance¶
Update MAESTRO¶
# Pull latest changes
git pull
# Rebuild and restart
docker compose down
docker compose up -d --build
Clean Up¶
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune
# Remove unused volumes (careful!)
docker volume prune