Revealed: Alibaba Releases EMO AI Model Generating Singing Videos from a Single Portrait Image

Alibaba’s Intelligent Computing research unit has unveiled a cutting-edge AI technology that can generate videos from images and audio, known as the EMO model.

EMO only requires input of a portrait photo and an audio file to create a personalized video of the individual speaking or singing according to the audio. The maximum length is 1 minute and 30 seconds. One of EMO’s key features is the ability to synchronize facial expressions with the audio, not just mouth movements.

For example, EMO can create a video of a portrait image singing, adjust to the song’s language, and have movements in sync with the fast-paced rhythm. One striking example presented involves seamlessly crossing zones by using a still image of a Japanese woman walking on a street, generated by the Sora model from OpenAI.

For more details on EMO, visit GitHub and watch the sample video at the end of the news article.
Source: Pandaily

TLDR: Alibaba’s Intelligent Computing research introduces EMO, an AI model creating videos from images and audio, with the ability to synchronize facial expressions to audio, offering a unique and innovative video creation experience.

Revealed: Alibaba Releases EMO AI Model Generating Singing Videos from a Single Portrait Image

More Reading

Dell Technologies Releases Performance Report, Reveals Surge in AI Server Orders

Exploring the Intersection of Fake News and AI: Mark Zuckerberg Meets with President of South Korea.

Leave a Comment

Leave a Reply Cancel reply

More Reading

Post navigation

Leave a Comment

Leave a Reply Cancel reply

Unveiling Meta’s Movie Gen: A Tool for Crafting Short Videos with AI-generated Audio

Introducing Lumiere: Google Research Unveils an Exemplary AI Model Crafting Video Clips that Perpetuate the Quintessential Aesthetics

Alibaba Unleashes Wan2.1 AI Model for Video Creation with Parameters Ranging from 1.3B to 14B