Models like Seedance 2.0 already support using reference video/audio inputs, but right now that functionality doesn’t seem accessible through OpenRouter’s Video API. That makes it harder to use these models to their full potential through OpenRouter, especially for workflows that depend on matching motion, style, timing, voice, sound, or other reference-based inputs.
This also limits your ability to provide access to motion-control models (like Kling 3.0 Motion Control), these rely on reference-style video inputs.
Adding support for video and audio references would make the video API much more useful and competitive, especially as more video models start depending on multimodal reference inputs rather than just text prompts.