Compare providers, A/B test configurations, import recordings, and benchmark at scale
1 provider config ready
Upload an audio file to stream to the providers in real-time
Drop an audio file or click to browse
WAV, MP3, M4A, OGG, FLAC, WebM
Backend handles decoding and resampling for each provider
Upload a file to start streaming
Data processing region
Disable to test non-optimal sample rates
Select languages in audio (leave empty for auto-detect)
Target language in `iso639-1` format you want the transcription translated to
If true, apply pre-processing to the audio stream to enhance the quality.
If true, enable custom vocabulary for the transcription.
If true, enable custom spelling for the transcription.
If true, enable translation for the transcription
Align translated utterances with the original ones
Whether to apply lipsync to the translated transcription.
Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Forces the translation to use informal language forms when available in the target language.
If true, enable named entity recognition for the transcription.
If true, enable sentiment analysis for the transcription.
If true, enable accurate word-level timestamps for the transcription. This provides precise start and end times for each word.
If true, generates summarization for the whole transcription.
If true, generates chapters for the whole transcription.
If true, partial transcript will be sent to websocket.
If true, final transcript will be sent to websocket.
If true, begin and end speech events will be sent to websocket.
If true, pre-processing events will be sent to websocket.
If true, realtime processing events will be sent to websocket.
If true, post-processing events will be sent to websocket.
If true, acknowledgments will be sent to websocket.
If true, errors will be sent to websocket.
If true, lifecycle events will be sent to websocket.
The endpointing duration in seconds. Endpointing is the duration of silence which will cause an utterance to be considered as finished
The maximum duration in seconds without endpointing. If endpointing is not detected after this duration, current utterance will be considered as finished
Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech.
Default intensity for the custom vocabulary
Custom metadata you can attach to this live transcription
The list of spelling applied on the audio transcription
Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.
Waiting for audio...
Calculate WER and CER metrics against ground truth transcripts
Stream audio to multiple providers simultaneously and compare in real-time
Upload hundreds of files with annotations for bulk comparison