Lip sync (talking video)

A feature that combines audio with a person’s image (or character) to create a ‘talking video’ where the person’s lips naturally move in sync with the audio.

Method 1: Image + Text

Carat AI automatically converts text to speech and completes the lip sync in one step.

Upload image

Upload (or generate) a person/character image.

Text request

Make your request in the Chat.

    Make this image say 'Hello'

    Make this photo read this script: (script content)

Carat AI automatically generates the voice (TTS) and creates a video with lip sync applied to the image.

Method 2: Image + Audio file

Use this when you already have a recorded audio file (MP3, WAV, etc.).

Upload image

Upload (or generate) a person/character image.

Upload audio file

Upload the audio file you want to use.

Request lip sync

Make your request in the Chat.

    Lip sync this image with the audio I just uploaded

Cost-saving tipLip sync is a feature that uses a lot of Usage (credits). To save costs, you can first generate a “video of talking lip movements (without sound)”, then use the Add audio to video feature to add narration audio separately.

Add audio to video Video merge

⌘I

Planning & research

Image features

Video features

Audio features

Guides

Lip sync (talking video)

Method 1: Image + Text

Method 2: Image + Audio file

​Method 1: Image + Text

​Method 2: Image + Audio file

Method 1: Image + Text

Method 2: Image + Audio file