Audio Transcription

Analyses the accuracy of a provided transcription against the content of a given audio file.

result = evaluator.evaluate(
    eval_templates="ASR/STT_accuracy",
    inputs={
        "audio": "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
        "generated_transcript": "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "ASR/STT_accuracy",
  {
    audio: "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
    generated_transcript: "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
audiostringThe file path or URL to the audio file containing the speech
generated_transcriptstringThe text transcription to be evaluated for accuracy
Output
FieldDescription
ResultReturns a numeric score, where higher score indicates a more accurate transcription
ReasonProvides a detailed explanation of the transcription assessment

What to do If you get Undesired Results

If the transcription accuracy score is lower than expected:

  • Ensure the audio is clear with minimal background noise
  • Check for proper capitalization and punctuation in the transcription
  • Include all filler words (um, uh, etc.) for verbatim accuracy if required
  • Verify correct spelling of technical terms, names, or specialized vocabulary
  • Review for word substitution errors where similar-sounding words are confused
  • Consider using professional transcription services for important content
  • For non-native speakers, ensure the transcriber is familiar with the accent
  • Use timestamps for longer audio to help identify where errors might occur

Comparing Audio Transcription with Similar Evals

  • Audio Quality: While Audio Transcription evaluates the accuracy of converting speech to text, Audio Quality assesses the perceptual quality of the audio itself.
  • Context Adherence: Audio Transcription focuses on accurately capturing spoken words, while Context Adherence evaluates how well content aligns with given context or instructions.
Was this page helpful?

Questions & Discussion