The video you referenced appears to be created using a high-performance AI video generation tool. Based on the capabilities described in the knowledge sources, OpenAI's Sora is a leading model for generating complex, story-driven videos with natural character motion and scene coherence. Sora can generate videos up to 20 seconds long (or up to 1 minute in some contexts), handling multiple characters, realistic motion, and detailed backgrounds, all while closely following the user's prompt. This results in videos where the story is easy to understand and the characters avoid the "uncanny" feeling often seen in earlier AI models. Sora is available to ChatGPT Plus, Team, and Pro users, and offers features like aspect ratio, resolution, and storyboard editing for fine-tuning the output Generating videos on Sora.
Regarding sound: The knowledge sources do not explicitly confirm whether Sora automatically generates audio or soundtracks for videos. However, OpenAI does provide advanced text-to-speech (TTS) models (such as GPT-4o mini TTS, tts-1, and tts-1-hd) that can generate natural-sounding speech from text, which could be used to add AI-generated narration or dialogue to videos Audio and speech. For more complex sound design (like music or sound effects), the knowledge sources do not specify if Sora or other OpenAI tools handle this automatically.