Alibaba’s Intelligent Computing research unit has unveiled a cutting-edge AI technology that can generate videos from images and audio, known as the EMO model.
EMO only requires input of a portrait photo and an audio file to create a personalized video of the individual speaking or singing according to the audio. The maximum length is 1 minute and 30 seconds. One of EMO’s key features is the ability to synchronize facial expressions with the audio, not just mouth movements.
For example, EMO can create a video of a portrait image singing, adjust to the song’s language, and have movements in sync with the fast-paced rhythm. One striking example presented involves seamlessly crossing zones by using a still image of a Japanese woman walking on a street, generated by the Sora model from OpenAI.
For more details on EMO, visit GitHub and watch the sample video at the end of the news article.
Source: Pandaily
TLDR: Alibaba’s Intelligent Computing research introduces EMO, an AI model creating videos from images and audio, with the ability to synchronize facial expressions to audio, offering a unique and innovative video creation experience.
Leave a Comment