Audio Transcription Accuracy Evaluation Metric

Evaluates the accuracy of a provided transcription against the content of an audio file, checking for omissions, additions, and misrepresentations.

result = evaluator.evaluate(
    eval_templates="ASR/STT_accuracy",
    inputs={
        "audio": "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
        "generated_transcript": "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "ASR/STT_accuracy",
  {
    audio: "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
    generated_transcript: "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`audio`	`string`	The file path or URL to the audio file containing the speech
`generated_transcript`	`string`	The text transcription to be evaluated for accuracy

Output
	Field	Description
	Result	Returns a numeric score, where higher score indicates a more accurate transcription
	Reason	Provides a detailed explanation of the transcription assessment

What to do If you get Undesired Results

If the transcription accuracy score is lower than expected:

Ensure the audio is clear with minimal background noise
Check for proper capitalization and punctuation in the transcription
Include all filler words (um, uh, etc.) for verbatim accuracy if required
Verify correct spelling of technical terms, names, or specialized vocabulary
Review for word substitution errors where similar-sounding words are confused
Consider using professional transcription services for important content
For non-native speakers, ensure the transcriber is familiar with the accent
Use timestamps for longer audio to help identify where errors might occur

Comparing Audio Transcription with Similar Evals

Audio Quality: While Audio Transcription evaluates the accuracy of converting speech to text, Audio Quality assesses the perceptual quality of the audio itself.
Context Adherence: Audio Transcription focuses on accurately capturing spoken words, while Context Adherence evaluates how well content aligns with given context or instructions.

Was this page helpful?

Questions & Discussion