The Challenge
A leading publisher of language learning content was struggling to scale spoken assessment grading. Traditional tools could handle only basic transcriptions and were insufficient for evaluating nuanced speech features such as pronunciation accuracy, tone, fluency, and semantic relevance to a prompt.
The publisher also needed the system to support low-resource languages, which most commercial offerings ignored. Additionally, it had to be customizable, with granular scoring for articulation and content accuracy across diverse user segments.
A Day in the Life: Before Our Solution
An English learner uploads a voice submission for an assessment:
"Discuss the importance of clean energy in your country."
The system transcribes the sentence but fails to evaluate pronunciation or tone. It cannot detect off-topic responses or give feedback beyond word accuracy. Instructors must manually listen, score fluency, and judge topic alignment, often inconsistently and with delay.
This process doesn’t scale for thousands of users, especially across multiple languages.
Pain Points:
- Instructors spent hours listening to student recordings and manually scoring speech
- Commercial tools couldn’t evaluate tone, fluency, or pronunciation reliably
- No support for regional or low-resource languages
- Assessment feedback was inconsistent and lacked clarity
- Lack of custom grading models blocked product innovation in new markets