Audio Transcription
Analyses the accuracy of a provided transcription against the content of a given audio file.
result = evaluator.evaluate(
eval_templates="ASR/STT_accuracy",
inputs={
"audio": "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
"generated_transcript": "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
},
model_name="turing_flash"
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)import { Evaluator, Templates } from "@future-agi/ai-evaluation";
const evaluator = new Evaluator();
const result = await evaluator.evaluate(
"ASR/STT_accuracy",
{
audio: "https://datasets-server.huggingface.co/assets/MLCommons/peoples_speech/--/f10597c5d3d3a63f8b6827701297c3afdf178272/--/clean/train/0/audio/audio.wav",
generated_transcript: "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"
},
{
modelName: "turing_flash",
}
);
console.log(result); | Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
audio | string | The file path or URL to the audio file containing the speech | |
generated_transcript | string | The text transcription to be evaluated for accuracy |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns a numeric score, where higher score indicates a more accurate transcription | |
| Reason | Provides a detailed explanation of the transcription assessment |
What to do If you get Undesired Results
If the transcription accuracy score is lower than expected:
- Ensure the audio is clear with minimal background noise
- Check for proper capitalization and punctuation in the transcription
- Include all filler words (um, uh, etc.) for verbatim accuracy if required
- Verify correct spelling of technical terms, names, or specialized vocabulary
- Review for word substitution errors where similar-sounding words are confused
- Consider using professional transcription services for important content
- For non-native speakers, ensure the transcriber is familiar with the accent
- Use timestamps for longer audio to help identify where errors might occur
Comparing Audio Transcription with Similar Evals
- Audio Quality: While Audio Transcription evaluates the accuracy of converting speech to text, Audio Quality assesses the perceptual quality of the audio itself.
- Context Adherence: Audio Transcription focuses on accurately capturing spoken words, while Context Adherence evaluates how well content aligns with given context or instructions.
Was this page helpful?