Compare providers, A/B test configurations, import recordings, and benchmark at scale
1 provider config ready
Upload an audio file for async transcription
Drop an audio file or click to browse
WAV, MP3, M4A, OGG, FLAC, WebM
Provide the correct transcript to calculate WER/CER accuracy
**[Beta]** Can be either boolean to enable custom_vocabulary for this audio or an array with specific vocabulary list to feed the transcription model with
**[Deprecated]** Use `language_config` instead. Detect the language from the given audio
**[Deprecated]** Use `language_config` instead.Detect multiple languages in the given audio
Enable subtitles generation for this transcription
Enable speaker recognition (diarization) for this audio
**[Beta]** Enable translation for this audio
Align translated utterances with the original ones
Whether to apply lipsync to the translated transcription.
Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Forces the translation to use informal language forms when available in the target language.
**[Beta]** Enable summarization for this audio
**[Alpha]** Enable moderation for this audio
**[Alpha]** Enable named entity recognition for this audio
**[Alpha]** Enable chapterization for this audio
**[Alpha]** Enable names consistency for this audio
**[Alpha]** Enable custom spelling for this audio
**[Alpha]** Enable structured data extraction for this audio
Enable sentiment analysis for this audio
**[Alpha]** Enable audio to llm processing for this audio
Enable sentences for this audio
**[Alpha]** Allows to change the output display_mode for this audio. The output will be reordered, creating new utterances when speakers overlapped
**[Alpha]** Use enhanced punctuation for this audio
If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored.
Default intensity for the custom vocabulary
Minimum duration of a subtitle in seconds
Maximum duration of a subtitle in seconds
Maximum number of characters per row in a subtitle
Maximum number of rows per caption
Exact number of speakers in the audio
Minimum number of speakers in the audio
Maximum number of speakers in the audio
The list of spelling applied on the audio transcription
Custom metadata you can attach to this transcription
Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.
Specify the languages you want to use when detecting multiple languages
Subtitles formats you want your transcription to be formatted to
Target language in `iso639-1` format you want the transcription translated to
The list of classes to extract from the audio transcription
The list of prompts applied on the audio transcription
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
Waiting for audio...
Calculate WER and CER metrics against ground truth transcripts
Stream audio to multiple providers simultaneously and compare in real-time
Upload hundreds of files with annotations for bulk comparison