Autograding Speech Assessments using Deep Learning and LLMs

The Challenge

A leading publisher of language content needed a solution for auto-grading speech assessments against a given topic based on several factors like pronunciation accuracy, tone of voice, relevancy to the given topic etc. The existing cloud solution does not provide all of these and not customizable for any low resource languages. Tatras developed a solution that can grade a speech based on these factors and as well as can be adapted to any languages.

Hypothesis

  • The general strategy is to represent audio data in two different ways, i.e. by Frequency content and by Text content
  • Frequency content to be analyzed for prosody based scoring while Text to be analysed for content based scoring
  • This system will produce multiple scores rendering articulation quality and content quality

Execution

  • Prepared audio processing pipeline to extract frequency based statistics to evaluate fluency, tone etc.
  • Developed a ASR + LLM module to perform content analysis of the given speech
  • Developed a diarization module to segregate speaker in case of an overlapped conversation
  • Developed a sequence classifier to predict phonemes and measure pronunciation accuracy
  • Implemented a Training workflow for any language

Outcomes

  • Speaker segmentation module
  • IPA based phonetic evaluator for pronunciation grading
  • Frequency based tonal quality, fluency measurement system
  • LLM based topic relevance grading for the speech
  • Support for multiple language (requires training)
  • Complete API deployment
  • The pipeline is integrated into the EdTech platform, providing an automatic grading of speech assessments in English
  • Extension to other languages is underway

Project highlights

90% +

accuracy in language translation resulting in cost reduction and new product features.