In short we need our webgl animation in the browser that is overlayed on a video to be composited with a what you see is what you get approach. Ideally a 10 minute video can be processed in less than a minute. Dm me if you are an expert with this. Have already tried headless gl, remotion, ffmpeg.wasm, and headless puppeteer.
Here is a demo of what I am talking about https://phont-web-demo.vercel.app/. WILLING TO PAY HIGH ON THE HOUR.
Visually Enriched Subtitles