STT Benchmark

STT Playground

Compare providers, A/B test configurations, import recordings, and benchmark at scale

Live Mic File Stream Async Upload Dataset Benchmark

Battle Arena

1 provider config ready

1/7

Gladia

Upload Audio File

Upload an audio file for async transcription

Drop an audio file or click to browse

WAV, MP3, M4A, OGG, FLAC, WebM

Reference Transcript

Provide the correct transcript to calculate WER/CER accuracy

0 words

Provider Configurations & Results

Gladia

Ready

16kHz

default

Language

Subtitles config.style

Translation config.model

Summarization config.type

Custom vocabulary config.default intensity

Default intensity for the custom vocabulary

Subtitles config.minimum duration

Minimum duration of a subtitle in seconds

Subtitles config.maximum duration

Maximum duration of a subtitle in seconds

Subtitles config.maximum characters per row

Maximum number of characters per row in a subtitle

Subtitles config.maximum rows per caption

Maximum number of rows per caption

Diarization config.number of speakers

Exact number of speakers in the audio

Diarization config.min speakers

Minimum number of speakers in the audio

Diarization config.max speakers

Maximum number of speakers in the audio

Custom spelling config.spelling dictionary

The list of spelling applied on the audio transcription

Custom metadata

Custom metadata you can attach to this transcription

Context prompt

Translation config.context

Audio url

Custom vocabulary config.vocabulary

Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.

Code switching config.languages

Specify the languages you want to use when detecting multiple languages

Subtitles config.formats

srt

Subtitles formats you want your transcription to be formatted to

Translation config.target languages

Target language in `iso639-1` format you want the transcription translated to

Structured data extraction config.classes

The list of classes to extract from the audio transcription

Audio to llm config.prompts

The list of prompts applied on the audio transcription

Language config.languages

If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.

—EOT

—TTFP

—TTFF

—Events

-Speech End

-Match

Waiting for audio...

Compare Accuracy

Calculate WER and CER metrics against ground truth transcripts

Live Battle

Stream audio to multiple providers simultaneously and compare in real-time

Batch Processing

Upload hundreds of files with annotations for bulk comparison