#Work Stealing: A design pattern for distributing compute-heavy workloads

17 messages · Page 1 of 1 (latest)

idle badger
tribal seal
idle badger
#

I used that on a fly.io container where there were more, and didn't want to update the worker to pull up to 4 jobs at once. However, I didn't do the work to autoscale fly.io and it turns out paying per minute to run something nonstop wasn't great economics. I'd use it on your own hardware or split the worker out and have it call to an LLM on auto-scaled machines (which helps limit overwhelm but in some ways isn't much better than hitting a hosted service using my new rate limiter helper & reserved capacity

tribal seal
# idle badger I used that on a fly.io container where there were more, and didn't want to upda...

ye im just doing this as a learning exercise really. What im currently thinking is:

  1. User requests a transcribe of a very large audio file (podcast)
  2. Convex loads the audio file and splits it into smaller chunks, each chunk is turned into a separate transcode job. It also runs a scheduled function to start workers if none are running.
  3. Scheduled function runs, checks to see if there are any workers alive, if not it uses the Fly API to spin up an auto-scaling worker machine.
  4. Worker machine starts pulling jobs. Machine should auto-scale based upon high CPU / GPU usage
  5. Once all work complete, it should auto-scale back to 0
  6. Once last chunk is transcribed the results are combined and returned to the client.

No idea if this will work or not, but in theory it should mean that a long audio file should be transcoded very quickly.

Potential issues I see:

  1. Where I choose to "chunk" the audio will be important. Doing it in the middle of someone speaking will likey break the transcode. So I will have to look for points of silence first before chunking which might be tricky particularly if the audio file has background music or something like that
  2. Machine scaling does work like I think it does or you are unable to scale by GPU usage or something like that
idle badger
#

Sounds like a cool project! Chunking sounds like an interesting challenge indeed - I wonder how much transcription accuracy depends on knowing previous context - e.g. partially guessing what word would come next. I wonder if there's a way to transcode overlapping sections and stitch together the overlap..

The fly.io folks would probably be happy to help you work through it, esp. if it's headed towards one of your good blog posts.

tribal seal
#

ye I have been wondering about the context and the overlap solution too. Ye I may reach out to them if I run into some headaches 🙂

tribal seal
lucid prairie
#

so cool!

idle badger
tribal seal
#

I'm having coffee with a friend at Fly on Wednesday, do you mind if I show him

Ooo nice! Sometimes I wish I was in SF so I could just pop out and meet people like that! But yes please do show him.

when are you thinking the blog post will be out

I am probably going to hold off on the blog post until early next month, I try to release one post per month, I could do another this month but then that would mean I would have to come up with something else next month and im pretty busy ATM 😛

Also curious how everything held up under more load

The code is verrrry rickety at the moment. I will do a little bit of tidying up before I release the source but its never going to be something I would tell others to use in their own projects, its just a Proof of Concept at this stage.

So that is my way of saying im not sure, it held together long enough for the demo video but there are likely lots of places that this could fall down as you can imagine when juggling so many things.

if you had to establish multiple queues or shard or anything?

Not entirely sure what you mean here but the chunking phase seems to work fine and then the workers just pick up jobs and work on them as they come along. If nothing left they go to sleep (scale to 0).

The chunking is done using ffmpeg BTW. I was thinking my next project might be to create an "ffmpeg as a service" actual paid product. It might be a nice little thing to wrap in a service powered by convex and fly. I havent seen it done anywhere else, there this (https://ffmpeg-api.com/) but its not available yet..

idle badger
#

Yeah once you know ffmpeg something like that would be really handy - I'd want to use it for sure. I suspect a lot of the existing products are operating at a higher level - like GCP's transcoding offering, or other CDN-type media services that transcode & serve for you. Doing the raw compute can be pretty heavy & hard to scale (especially if you own the machines!) - I used to run ~100 servers doing transcoding nonstop at Dropbox, and lots of hard lessons learned. Even things like having multiple transcoding jobs happening at the same time, competing for memory, ffmpeg will start producing worse results to "make do" with the available resources to keep up. Makes it hard to benchmark, since you actually have to sit down and stare at the video output (which I couldn't do with users data so I was scrounging for random representative videos to test)

tribal seal
#

Ye maybe I’m setting myself up for struggle town.

I was just surprised that it doesn’t seem to exist already.

I was thinking just letting the user provide a set of “inputs” (images, videos whatever) and a command string and then just run the command and upload all outputs.

Charging would be based on usage.

Scaling handled by fly.. wouldn’t mind some advice from fly on this.. there may be something they have already that can make this reliable. Should you have the inclination to mention it to your friend 🙂

Good point about not snooping into user data! I handnt considered those sort of challenges you had at Dropbox

idle badger
tribal seal
night plinth
#

Nice project @tribal seal ! Thanks for sharing, and the blog (https://mikecann.co.uk/posts/whisper-farm-parallel-transcription). I'm hoping it's an accident that the link to the github repo doesn't work anymore? 🙂 Was hoping to see how the docker / fly setup worked, and to get a feel for using fly in that way. Being able to start and stop those bigger instances efficiently makes those more expensive instances much more approachable!

tribal seal
night plinth
#

Legend - thanks for that!