Skip to content

Welcome to Speakr

Speakr is a powerful self-hosted transcription platform that helps you capture, transcribe, and understand your audio content. Whether you're recording meetings, interviews, lectures, or personal notes, Speakr transforms spoken words into valuable, searchable knowledge.

Main Interface

Latest Release: v0.8.15-alpha - New Transcription Connectors & Upload API

Mistral/Voxtral and VibeVoice connectors, upload API metadata, and bug fixes

  • Mistral/Voxtral Connector - Cloud transcription with built-in speaker diarization via Mistral's Voxtral models
  • VibeVoice Connector - Self-hosted transcription via vLLM with diarization and automatic chunking for long files
  • Upload API: title & meeting_date - Set recording metadata directly from integrations
  • Regenerate Title - Re-generate a recording's title with AI after transcription
  • Default Transcription Language - Auto-fills on upload and reprocess forms
  • Upload Disclaimer - Configurable pre-upload disclaimer with custom banner text
  • Complete Localization - All recent features fully localized across all six languages

Quick Navigation

📚

Getting Started

New to Speakr? Start here for a quick overview and setup guide.

Get Started →
🚀

Installation

Step-by-step instructions for Docker and manual installation.

Install Now →
👤

User Guide

Learn how to record, transcribe, and manage your audio content.

Learn More →
⚙️

Admin Guide

Configure users, system settings, and manage your instance.

Configure →

FAQ

Find answers to commonly asked questions about Speakr.

View FAQ →
🔧

Troubleshooting

Solutions for transcription issues and performance problems.

Get Help →

Core Features

🎙️ Smart Recording

  • Audio capture from mic or system
  • Take notes while recording
  • Generate smart summaries

🔍 Intelligent Search

🌍 International

  • 5+ languages supported
  • Automatic UI translation
  • Localized summaries

🔑 REST API

Interactive Audio Synchronization

Experience seamless bidirectional synchronization between your audio and transcript. Click any part of the transcript to jump directly to that moment in the audio, or watch as the system automatically highlights the currently spoken text as the audio plays. Enable auto-scroll follow mode to keep the active segment centered in view, creating an effortless reading experience for even the longest recordings.

Real-time audio-transcript synchronization

Real-time transcript highlighting synchronized with audio playback, with auto-scroll follow mode

Learn more about audio synchronization features in the user guide.

Transform Your Recordings with Custom Tag Prompts

Tags aren't just for organization - they transform content. Create a "Recipe" tag to convert cooking narration into formatted recipes. Use "Study Notes" tags to turn lecture recordings into organized outlines. Stack tags like "Client Meeting" + "Legal Review" for combined analysis. Learn more in the Custom Prompts guide.

Latest Updates

Version 0.8.15-alpha - New Transcription Connectors & Upload API

  • Mistral/Voxtral Connector - Cloud-based transcription with built-in speaker diarization via Mistral's Voxtral models
  • VibeVoice Connector - Self-hosted transcription via vLLM with speaker diarization and automatic chunking
  • Upload API: title & meeting_date - Optional metadata fields for integrations
  • Regenerate Title - Button to re-generate recording title with AI
  • Default Transcription Language - User-configurable default that auto-fills forms
  • Tag-Driven Auto-Processing - Watch folders auto-apply tags and trigger via API
  • Bug Fixes - Azure inquire crash, chat API serialization, user deletion, chunking limits

Version 0.8.13-alpha - Video Retention Fix

  • Fixed large video files silently losing their video stream during upload when VIDEO_RETENTION=true
  • Probe timeout now scales with file size; falls back to extension-based detection if probing fails

Version 0.8.12-alpha - Speaker Search, Shared Page Improvements & Bug Fixes

  • Speaker Name Filter - Search/filter speakers by name on the Speakers Management page with bulk operation support
  • Shared Page Auto-scroll - Follow-along mode on public shared transcript pages, matching the main app experience
  • Dynamic Footer Year - Shared page footer year no longer hardcoded
  • Duplicate Detection - SHA-256 file hashing with warning toasts and clickable copies indicator in sidebar/header
  • Volume Controls - Volume slider popups and mute visual indicators on all audio/video players
  • Speaker Enhancements - Split button UI, apply suggested names, name sanitization, JSON schema option, new speaker API endpoints
  • Localization - Complete translations for folders, API tokens, recording recovery, events, and speakers management
  • Bug Fixes - ASR_DIARIZE=false ignored, bulk delete cascade, file monitor stability, speaker snippets, null transcription, clean_llm_response too aggressive

Version 0.8.7 - Export Templates & Localization

  • Customizable Export Templates - Create markdown templates for exports with variables ({{title}}, {{summary}}, {{notes}}) and conditionals for optional sections
  • Localized Labels - Use {{label.metadata}}, {{label.summary}} etc. for automatically translated labels based on user's UI language
  • Localized Dates - Export dates formatted per user's language preference (e.g., "15. Januar 2026" for German)
  • Improvements - Opt-in ASR chunking, speaker ID remapping, ASR validation fixes

Fully backwards compatible with v0.8.x.

Version 0.8.3 - Naming Templates

  • Custom Title Formatting - Create templates with variables ({{ai_title}}, {{filename}}, {{date}}) and regex patterns to extract data from filenames
  • Tag or User Default - Assign templates to tags or set a user-wide default; templates without {{ai_title}} skip the AI call to save tokens
  • API v1 Upload - New /api/v1/upload endpoint for programmatic recording uploads
  • Improvements - Tag drag-and-drop reordering, registration domain restriction, event delete button, WebM seeking fix

Fully backwards compatible with v0.8.x.

Version 0.8.2 - Transcription Usage Tracking

  • Transcription Budget Management - Set monthly transcription limits (in minutes) per user with 80% warnings and 100% blocking
  • Usage Statistics - Track transcription minutes and estimated costs across all connectors
  • Admin Dashboard Improvements - Redesigned stats layout with summary cards and per-user tables

Version 0.8.0 - Connector Architecture & REST API

  • Connector-Based Transcription - Modular architecture with auto-detection for transcription providers
  • OpenAI Diarization - Use gpt-4o-transcribe-diarize for speaker identification without self-hosting
  • REST API v1 - Complete API for automation tools with Swagger UI at /api/v1/docs

See the Migration Guide and API Reference.

PyTorch 2.6 Compatibility Issue with WhisperX ASR

If you're using the WhisperX ASR service and encounter a "Weights only load failed" error after a recent update, add this environment variable to your ASR container in docker-compose.yml:

environment:
  - TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=true
This is caused by a PyTorch 2.6 change. See troubleshooting for details.

Version 0.6.6 - Filter & Compress

New Features - Audio compression and enhanced filtering

  • Auto Compression - Lossless uploads automatically compressed (configurable codec/bitrate)
  • Speaker Filtering - Filter recordings by speaker, starred/inbox toggles
  • Sorting Fix - Sort toggle works correctly, added Upcoming group for future dates
  • Format Support - .weba format, FFmpeg fallback for unknown formats

✅ Fully backward compatible. Optional env vars: AUDIO_COMPRESS_UPLOADS, AUDIO_CODEC, AUDIO_BITRATE

Version 0.6.5 - Separate Chat Model Configuration

New Feature - Configure different AI models for chat vs background tasks

  • Separate Chat Model - Use different service tiers for chat and summarization (#143)
  • Custom Datetime Picker - New themed calendar and time selection modal
  • Bug Fixes - Audio chunking after refactor (#140), username display (#138)

✅ Fully backward compatible. Optional CHAT_MODEL_* environment variables.

Version 0.6.3 - API Token Authentication

New Feature - Programmatic API access for automation tools

  • API Tokens - Create personal access tokens for programmatic API access
  • Multiple Auth Methods - Bearer token, X-API-Token header, API-Token header, or query parameter
  • Token Management - Create, revoke, and track token usage from Account Settings
  • Flexible Expiration - Set custom expiration periods or create non-expiring tokens
  • Secure Storage - Tokens are hashed (SHA-256) and never stored in plaintext

✅ Fully backward compatible with v0.6.x. No configuration changes required.

Version 0.6.2 - UX Polish & Bug Fixes

  • Standardized modal UX with backdrop click and consistent X button placement
  • Recording disclaimer markdown support
  • IndexedDB crash recovery fixes
  • Processing queue cleanup on delete

Version 0.6.1 - Offline Ready

  • HuggingFace Model Caching - Embedding model persists across container restarts
  • Offline Deployment - Run once with internet, then works fully offline

Version 0.6.0 - Queue Control

  • Multi-User Job Queue - Fair round-robin scheduling with automatic retry for failed jobs
  • Unified Progress Tracking - Single view merging uploads and backend processing
  • Media Support - Added video format support and fixed Firefox system audio recording

Version 0.5.9 - Major Release

⚠️ Major architectural changes - Backup data before upgrading!

  • Internal Sharing System - Share recordings with granular permissions (view/edit/reshare)
  • Group Management - Create groups with leads, group tags, custom retention policies
  • Speaker Voice Profiles - AI-powered recognition with embeddings (requires WhisperX)
  • Audio-Transcript Sync - Click-to-jump, auto-highlight, and follow mode
  • Auto-Deletion & Retention - Global and group-level policies with tag protection
  • Modular Architecture - Backend refactored into blueprints, frontend composables

Previous release (v0.5.8):

  • Inline Transcript Editing - Edit speaker assignments and text directly in the speaker identification modal
  • Add Speaker Functionality - Dynamically add new speakers during transcript review
  • Enhanced Speaker Modal - Improved UX with hover-based edit controls and real-time updates

Previous release (v0.5.7):

  • GPT-5 Support - Full support for OpenAI's GPT-5 model family with automatic parameter detection
  • Custom Summary Prompts on Reprocessing - Experiment with different prompts when regenerating summaries
  • PWA Enhancements - Service worker for wake lock to prevent screen sleep on mobile

Previous release (v0.5.6):

  • Event extraction for automatically identifying calendar-worthy events
  • Transcript templates for customizable download formats
  • Enhanced export options and improved mobile UI

Getting Help

Need assistance? We're here to help:

📖 Documentation

You're already here! Browse our comprehensive guides:

💬 Community

Connect with other users and get support:


Ready to transform your audio into actionable insights? Get started now