人工智能图像处理神经网络

人工智能：EMO——一个可以用语音和图片进行输入并生成视频的AI工具

By taho 2024年4月8日No Comments人工智能, 图像处理, 神经网络

全文内容来自：https://humanaigc.github.io/emote-portrait-alive/查看视频可以查看该链接。

本文目录

EMO: Emote Portrait Alive – Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang, Liefeng Bo

Institute for Intelligent Computing, Alibaba Group

GitHub arXiv

Abstract

We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input video.

Method

Overview of the proposed method. Our framework is mainly constituted with two stages. In the initial stage, termed Frames Encoding, the ReferenceNet is deployed to extract features from the reference image and motion frames. Subsequently, during the Diffusion Process stage, a pretrained audio encoder processes the audio embedding. The facial region mask is integrated with multi-frame noise to govern the generation of facial imagery. This is followed by the employment of the Backbone Network to facilitate the denoising operation. Within the Backbone Network, two forms of attention mechanisms are applied: Reference-Attention and Audio-Attention. These mechanisms are essential for preserving the character’s identity and modulating the character’s movements, respectively. Additionally, Temporal Modules are utilized to manipulate the temporal dimension, and adjust the velocity of motion.

人工智能：EMO——一个可以用语音和图片进行输入并生成视频的AI工具

EMO: Emote Portrait Alive – Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Abstract

Method

Various Generated Videos （更多的视频演示请查看原网址。）

Comments

发表回复取消回复

极客+摄影：视场角FOV的原理及应用[转载]

飞行：一款4摄像头纯视觉定位建模无人机开源案例

飞行+穿越机：穿越机PID调整心得

穿越机：这个博主的视频拍的真好，顺便参考一下他的GoPro色彩设置

EMO: Emote Portrait Alive – Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Abstract

Method

Various Generated Videos （更多的视频演示请查看原网址。）

Comments

发表回复 取消回复

极客+摄影：视场角FOV的原理及应用[转载]

飞行：一款4摄像头纯视觉定位建模无人机开源案例

飞行+穿越机：穿越机PID调整心得

穿越机：这个博主的视频拍的真好，顺便参考一下他的GoPro色彩设置

发表回复取消回复