Skip to main content
result = evaluator.evaluate(
    eval_templates="audio_transcription",
    inputs={
        "audio": "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
        "transcription": "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Input
Required InputTypeDescription
audiostringThe file path or URL to the audio file containing the speech
transcriptionstringThe text transcription to be evaluated for accuracy
Output
FieldDescription
ResultReturns a numeric score, where higher score indicates a more accurate transcription
ReasonProvides a detailed explanation of the transcription assessment

What to do If you get Undesired Results

If the transcription accuracy score is lower than expected:
  • Ensure the audio is clear with minimal background noise
  • Check for proper capitalization and punctuation in the transcription
  • Include all filler words (um, uh, etc.) for verbatim accuracy if required
  • Verify correct spelling of technical terms, names, or specialized vocabulary
  • Review for word substitution errors where similar-sounding words are confused
  • Consider using professional transcription services for important content
  • For non-native speakers, ensure the transcriber is familiar with the accent
  • Use timestamps for longer audio to help identify where errors might occur

Comparing Audio Transcription with Similar Evals

  • Audio Quality: While Audio Transcription evaluates the accuracy of converting speech to text, Audio Quality assesses the perceptual quality of the audio itself.
  • Context Adherence: Audio Transcription focuses on accurately capturing spoken words, while Context Adherence evaluates how well content aligns with given context or instructions.
I