Content Transformation using Deep Learning

The Challenge

Making tutorial videos are difficult and time consuming. Moreover as the speaker speaks only in one language it is hard to reach a larger audience due to lack of multilanguage version. Tatras was tasked by a leading EdTech company to create a solution that will transform the tutorial videos into a similar video but with the speaker speaking in a different language. The original tone, sentiment and expressions of the tutor will be preserved in transformed content so that it appears natural to the audience.

Hypothesis

  • The objective is to separately transform video, audio and text content of a video lecture in a particular source language and transform the same to a target language.
  • The transformation algorithm is capable of generating a synthetic video where same speaker is speaking in a different target language in his/her own voice and it is also capable of rendering the scene text in the same target language synchronized with the video stream

Execution

  • Built an ASR pipeline to transcribe the audio from the video
  • Used LLM to translate the transcription
  • Developed a Transfer learning based TTS pipeline to generate translated audio in original voice
  • Developed a synthetic video generator with lip-sync matching to match translated audio to original video
  • Implemented a image generator that replace text in original video with translated text

Outcomes

  • Given a video tutorial we generate a translated content in a target language specified by the user
  • Generate a video similar to original video but with translated content reflecting change in video, audio and text

Project highlights

  • The pipeline has been integrated into the Edtech platform and being demonstrated as a low cost approach to making content accessible to new multilingual catchments