CPU Mode Setup¶
Guide for running MAESTRO without GPU acceleration, suitable for systems without NVIDIA GPUs, AMD systems, or resource-constrained environments.
When to Use CPU Mode¶
CPU mode is recommended for:
- AMD GPU Systems - No CUDA support available
- Intel Graphics - Integrated graphics only
- macOS - All Mac systems (no CUDA support)
- Cloud Instances - Without GPU allocation
- Development/Testing - When GPU isn't necessary
- Resource Constraints - Limited GPU memory
Quick Start¶
Method 1: CPU-Only Docker Compose¶
The simplest way to run in CPU mode:
This configuration: - Automatically configures for CPU processing - Removes all GPU-related settings - Optimizes for CPU performance
Method 2: Force CPU Mode via Environment¶
Add to your .env
file:
# Force CPU mode for all operations
FORCE_CPU_MODE=true
# Optional: Explicitly set device type to CPU
PREFERRED_DEVICE_TYPE=cpu
Then start normally:
Performance Optimization¶
CPU Configuration¶
Optimize CPU performance in .env
:
Recommended System Specs¶
Minimum Requirements¶
- CPU: 8 cores (16 threads)
- RAM: 16GB
- Storage: 20GB free space
Recommended Requirements¶
- CPU: 12+ cores (24+ threads)
- RAM: 32GB+
- Storage: 50GB+ free space
Platform-Specific Setup¶
Linux CPU Mode¶
# Check CPU information
lscpu
# Start with CPU mode
docker compose -f docker-compose.cpu.yml up -d --build
Windows CPU Mode¶
# Check CPU information
wmic cpu get name,numberofcores,numberoflogicalprocessors
# Use CPU-only configuration
docker compose -f docker-compose.cpu.yml up -d --build
macOS (Always CPU Mode)¶
All Macs run in CPU mode (no CUDA support):
# Check CPU info
sysctl -n machdep.cpu.brand_string
sysctl -n hw.ncpu
# Start with CPU configuration
docker compose -f docker-compose.cpu.yml up -d --build
Embedding Model Discussion¶
MAESTRO uses BGE-M3 embeddings for document processing. In CPU mode:
Embedding Performance¶
The BGE-M3 model runs on CPU but is significantly slower than GPU:
- Document chunking: Efficient on CPU
- Embedding generation: 5-10x slower on CPU
- First-time model download: ~2GB download
Optimization Tips¶
- Process documents in batches during off-hours
- Use persistent model cache to avoid re-downloads:
- Pre-process documents before peak usage
Performance Expectations¶
Processing Times (Approximate)¶
Task | GPU | CPU (8 cores) | CPU (16 cores) |
---|---|---|---|
PDF Processing (10 pages) | 30s | 5-15 min | 3-12 min |
Document Embedding (1000 chunks) | 1 min | 10-20 min | 5-10 min |
Reranking (100 docs) | 5s | 30-60s | 15-30s |
Memory Usage¶
CPU mode memory requirements:
- Idle: 2-3GB
- Processing: 4-8GB
- Peak (large docs): 8-16GB
CPU vs GPU Comparison¶
Reliability¶
- CPU: Always works, platform independent
- GPU (NVIDIA): Excellent when available, requires specific hardware
Performance¶
- Document Processing: GPU is 10-30x faster
- Embedding Generation: GPU is significantly faster
- Chat Operations: Minimal difference (uses external AI APIs)
When CPU Mode is Fine¶
- Small document libraries (<100 documents)
- Infrequent document uploads
- Development and testing
- When GPU resources needed elsewhere
When GPU is Recommended¶
- Large document libraries (>1000 documents)
- Frequent document processing
- Regular use
Troubleshooting CPU Mode¶
High CPU Usage¶
Normal during document processing. To reduce:
Slow Processing¶
Expected in CPU mode. Tips:
- Process documents in smaller batches
- Schedule processing during off-hours
- Consider cloud GPU instances for bulk processing
Memory Issues¶
If running out of memory:
# Reduce parallel processing
MAX_WORKER_THREADS=2
# Restart containers
docker compose down
docker compose -f docker-compose.cpu.yml up -d
Docker Resource Limits¶
Set resource limits for CPU mode:
Monitoring Performance¶
Check resource usage: