Skip to main content
A feature that combines audio with a person’s image (or character) to create a ‘talking video’ where the person’s lips naturally move in sync with the audio.

Method 1: Image + Text

Carat AI automatically converts text to speech and completes the lip sync in one step.
1

Upload image

Upload (or generate) a person/character image.
2

Text request

Make your request in the Chat.
    Make this image say 'Hello'
    Make this photo read this script: (script content)
Carat AI automatically generates the voice (TTS) and creates a video with lip sync applied to the image.

Method 2: Image + Audio file

Use this when you already have a recorded audio file (MP3, WAV, etc.).
1

Upload image

Upload (or generate) a person/character image.
2

Upload audio file

Upload the audio file you want to use.
3

Request lip sync

Make your request in the Chat.
    Lip sync this image with the audio I just uploaded
Cost-saving tipLip sync is a feature that uses a lot of Usage (credits). To save costs, you can first generate a “video of talking lip movements (without sound)”, then use the Add audio to video feature to add narration audio separately.