Content Transformation using Deep Learning
The Challenge
Making tutorial videos are difficult and time consuming. Moreover as the speaker speaks only in one language it is hard to reach a larger audience due to lack of multilanguage version. Tatras was tasked by a leading EdTech company to create a solution that will transform the tutorial videos into a similar video but with the speaker speaking in a different language. The original tone, sentiment and expressions of the tutor will be preserved in transformed content so that it appears natural to the audience.
Hypothesis
- The objective is to separately transform video, audio and text content of a video lecture in a particular source language and transform the same to a target language.
- The transformation algorithm is capable of generating a synthetic video where same speaker is speaking in a different target language in his/her own voice and it is also capable of rendering the scene text in the same target language synchronized with the video stream
Execution
- Built an ASR pipeline to transcribe the audio from the video
- Used LLM to translate the transcription
- Developed a Transfer learning based TTS pipeline to generate translated audio in original voice
- Developed a synthetic video generator with lip-sync matching to match translated audio to original video
- Implemented a image generator that replace text in original video with translated text
Outcomes
- Given a video tutorial we generate a translated content in a target language specified by the user
- Generate a video similar to original video but with translated content reflecting change in video, audio and text
Project highlights
- The pipeline has been integrated into the Edtech platform and being demonstrated as a low cost approach to making content accessible to new multilingual catchments