STT Benchmark

Upload Audio File

Upload an audio file to stream to the providers in real-time

Drop an audio file or click to browse

WAV, MP3, M4A, OGG, FLAC, WebM

Stream to Providers

Backend handles decoding and resampling for each provider

Upload a file to start streaming

Use optimal rate

Disable to test non-optimal sample rates

Haw

Audio enhancer

If true, apply pre-processing to the audio stream to enhance the quality.

Custom vocabulary

If true, enable custom vocabulary for the transcription.

Custom spelling

If true, enable custom spelling for the transcription.

Translation

If true, enable translation for the transcription

Match original utterances

Align translated utterances with the original ones

Lipsync

Whether to apply lipsync to the translated transcription.

Context adaptation

Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.

Informal

Forces the translation to use informal language forms when available in the target language.

Named entity recognition

If true, enable named entity recognition for the transcription.

Sentiment analysis

If true, enable sentiment analysis for the transcription.

Words accurate timestamps

If true, enable accurate word-level timestamps for the transcription. This provides precise start and end times for each word.

Summarization

If true, generates summarization for the whole transcription.

Chapterization

If true, generates chapters for the whole transcription.

Receive partial transcripts

If true, partial transcript will be sent to websocket.

Receive final transcripts

If true, final transcript will be sent to websocket.

Receive speech events

If true, begin and end speech events will be sent to websocket.

Receive pre processing events

If true, pre-processing events will be sent to websocket.

Receive realtime processing events

If true, realtime processing events will be sent to websocket.

Receive post processing events

If true, post-processing events will be sent to websocket.

Receive acknowledgments

If true, acknowledgments will be sent to websocket.

Receive errors

If true, errors will be sent to websocket.

Receive lifecycle events

If true, lifecycle events will be sent to websocket.

Endpointing

The endpointing duration in seconds. Endpointing is the duration of silence which will cause an utterance to be considered as finished

Maximum duration without endpointing

The maximum duration in seconds without endpointing. If endpointing is not detected after this duration, current utterance will be considered as finished

Speech threshold

Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech.

Default intensity

Default intensity for the custom vocabulary

—EOT

—TTFP

—TTFF

—Events

-Speech End

-Match

Compare Accuracy

Calculate WER and CER metrics against ground truth transcripts

Live Battle

Stream audio to multiple providers simultaneously and compare in real-time

Batch Processing

Upload hundreds of files with annotations for bulk comparison

Upload Audio File

Upload an audio file to stream to the providers in real-time

Drop an audio file or click to browse

WAV, MP3, M4A, OGG, FLAC, WebM

Stream to Providers

Backend handles decoding and resampling for each provider

Upload a file to start streaming

Compare Accuracy

Calculate WER and CER metrics against ground truth transcripts

Live Battle

Stream audio to multiple providers simultaneously and compare in real-time

Batch Processing

Upload hundreds of files with annotations for bulk comparison

STT Playground

Battle Arena

Upload Audio File

Stream to Providers

Gladia

Compare Accuracy

Live Battle

Batch Processing

STT Playground

Battle Arena

Upload Audio File

Stream to Providers

Gladia

Compare Accuracy

Live Battle

Batch Processing