#which model is suitable for video / picture transformation, with audio?
1 messages · Page 1 of 1 (latest)
Short answer:
- For turning a single picture into a talking video (image → video with your audio): use the Omnihuman 1.5 lip‑sync model. Inputs: a static image and a speech audio file; it animates the mouth to match your audio. It’s a utility lip‑sync model (not a full video generator). See Image & Video docs. (elevenlabs.io)
- For re‑voicing an existing video (video → video with new audio): use Veed LipSync. Inputs: a source video and a new speech audio file; it re‑animates mouth movements to the new audio. Also a utility lip‑sync model. See Image & Video docs. (elevenlabs.io)
If you instead need to generate a brand‑new video that already includes synced audio (not just transform existing media), ElevenLabs’ Image & Video area provides video models such as OpenAI Sora 2 / Sora 2 Pro and Google Veo 3.1 / Veo 3 Fast; these generate videos with audio by default/integrated. See the Image & Video overview and model details. Note: video generation is currently in beta and available on paid plans. (elevenlabs.io)
Helpful references:
- Image & Video overview and models (includes Omnihuman 1.5 and Veed LipSync). (elevenlabs.io)
- Creative Platform Image & Video playground (shows how lip‑sync models are used). (elevenlabs.io)
- Blog: OpenAI Sora with ElevenLabs audio tools (context on video generation with audio). (elevenlabs.io)
Sources:
Image & Video | ElevenLabs Documentation
Image & Video | ElevenLabs Documentation
Image & Video | ElevenLabs Documentation
Image & Video | ElevenLabs Documentation
Image & Video | ElevenLabs Documentation
ElevenLabs — OpenAI Sora: Text to Video generation
omnihuman too expensive