Awesome talking face generation

papers & codes

2023

title		paper	code	dataset	keywords
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior	CVPR(23)	paper	code	BIWI, VOCA
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation	CVPR(23)	paper		HDTF	Diffusion
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction	CVPR(23)	paper		Multiface	3D
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert	CVPR(23)	paper	code	LRS2
LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook	CVPR(23)	paper		LRS2, FFHQ
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment	CVPR(23)	paper		HDTF
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors	CVPR(23)	paper	code	LRS2, LRS3
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning	CVPR(23)	paper		MEAD	emotion
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks	InterSpeech(23)	paper		MEAD	emotion
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation	ICCV(23)	paper	code(not yet)		emotion
Emotionally Enhanced Talking Face Generation		paper	code	CREMA-D	emotion
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video	AAAI(23)	paper	code
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior		paper	code		3D
GENEFACE: GENERALIZED AND HIGH-FIDELITY AUDIO-DRIVEN 3D TALKING FACE SYNTHESIS	ICLR (23)	paper	code		NeRF
OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION		paper
LipNeRF: What is the right feature space to lip-sync a NeRF?		paper			NeRF
Audio-Visual Face Reenactment	WACV (23)	paper	code
Towards Generating Ultra-High Resolution Talking-Face Videos With Lip Synchronization	WACV (23)	paper
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles	AAAI(23)	paper	code
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis		paper	proj		Diffusion
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation		paper	proj		Diffusion
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model		paper	code		Diffusion
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles		paper		Text-Annotated MEAD	Text

2022

title		paper	code	dataset	keywords
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors		paper	proj
SPACE: Speech-driven Portrait Animation with Controllable Expression	ICCV(23)	paper			Pose, Emotion
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation	CVPR(23)	paper	code
Compressing Video Calls using Synthetic Talking Heads	BMVC (22)	paper			application
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH (22)	paper			emotion
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	ECCV(22)	paper	code
Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR(22)	paper
Talking Face Generation With Multilingual TTS	CVPR(22)	paper	code		-
Deep Learning for Visual Speech Analysis: A Survey		paper			survey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN		paper	code		stylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	ECCV(22)	paper	code(coming soon)		NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation		paper
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory	AAAI(22)	paper(temp)		LRW, LRS2, BBC News
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering		paper			NeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos		paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions		paper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation		paper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion		paper
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation		paper		-
AUTOLV: AUTOMATIC LECTURE VIDEO GENERATOR		paper
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement		paper

2021

title		paper	code	dataset
Depth-Aware Generative Adversarial Network for Talking Head Video Generation		paper	code
		paper	code
Parallel and High-Fidelity Text-to-Lip Generation		paper
[Survey]Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis	-	paper
FaceFormer: Speech-Driven 3D Facial Animation with Transformers	CVPR(22)	paper	code
Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices		paper	code
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning	ICCV	paper	code
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis		paper	code
Audio-Driven Emotional Video Portraits	CVPR	paper	code	MEAD, LRW
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization	CVPR	paper
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR	paper	code	VoxCeleb2, LRW
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset	CVPR	paper	code	HDTF
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement	ICCV	paper	code(coming soon)
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV	paper	code
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI	paper	code(coming soon)	Mocap dataset
Visual Speech Enhancement Without A Real Visual Stream		paper
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary		paper	code
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	IJCAI	paper	code	VoxCeleb, GRID, LRW
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head		paper
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person		paper		VoxCeleb2, Obama

2020

title		paper	code	dataset
[Survey]What comprises a good talking-head video generation?: A survey and benchmark		paper	code
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing	CVPR(21)	paper	code
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition		paper	code	CREMA-D
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	ACMMM	paper	code	LRS2
Talking-head Generation with Rhythmic Head Motion	ECCV	paper	code	Crema, Grid, Voxceleb, Lrs3
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	ECCV	paper	code	VoxCeleb2, AffectNet
Neural voice puppetry:Audio-driven facial reenactment	ECCV	paper
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars	ECCV	paper	code
HeadGAN:Video-and-Audio-Driven Talking Head Synthesis		paper		VoxCeleb2
MakeItTalk: Speaker-Aware Talking Head Animation		paper	code, code	VoxCeleb2, VCTK
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose	-	paper	code	ImageNet, FaceWarehouse, LRW
Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks		paper
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES		paper		LRW
Animating Face using Disentangled Audio Representations	WACV	paper
Everybody’s Talkin’: Let Me Talk as You Want		paper
Multimodal Inputs Driven Talking Face Generation With Spatial-Temporal Dependency		paper
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition		paper

2019

title		paper	code	dataset
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss	CVPR	paper	code	VGG Face, LRW

datasets

MEAD link
HDTF link
CREMA-D link
VoxCeleb link
LRS2 link
LRW link
GRID link
SAVEE link
BIWI(3D) link
VOCA link
Multiface(3D) link

metrics

PSNR (peak signal-to-noise ratio)
SSIM (structural similarity index measure)
LMD (landmark distance error)
LRA (lip-reading accuracy) -
FID (Fréchet inception distance)
LSE-D (Lip Sync Error - Distance)
LSE-C (Lip Sync Error - Confidence)
LPIPS (Learned Perceptual Image Patch Similarity) -
NIQE (Natural Image Quality Evaluator) -

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome talking face generation

papers & codes

2023

2022

2021

2020

2019

datasets

metrics

About

Releases

Packages

Contributors 3

YunjinPark/awesome_talking_face_generation

Folders and files

Latest commit

History

Repository files navigation

Awesome talking face generation

papers & codes

2023

2022

2021

2020

2019

datasets

metrics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages