#caching idea

11 messages · Page 1 of 1 (latest)

quartz chasm
#

**TL;DR **— I’ve noted Salad’s high bandwidth demands, which limit the number of GPUs I can operate. Considering local and peer-to-peer caching could help chefs and reduce Salad’s operational expenses.

During State of Salad 192 at 28:50, they commented on infrastructure limitations (slow bandwidth) delaying container workloads, attributing it to the transfer of petabytes of data, wouldn't this incur substantial operational costs?

On the user side, I’m limited to 1 or 2 out of 4 GPUs, as I’ve found a minimum of 30 Mbits per GPU is required before workloads begin to experience excessive wait time (and subsequent drop). This tends to result in lower earnings if I fail to properly adjust balance between PCs within my router’s configurations.

Generally, salad workers fully utilize my network bandwidth which results in slower internet speeds for everyone on my network. The data usage amounts to approximately 400-500GB daily. I would like to allocate a single 12-18TB SATA hard drive to each of my salad workers to reduce this.

My investment/contribution to the salad network would be significantly greater if not limited by internet speeds. Implementing a local caching solution effectively addresses this issue.

In addition feels salad team tends to opt for oversimplification at cost of function, development costs factor however less money spent on infrastructure/bandwidth by developing/adding P2P cached bandwidth sharing amongst chefs along with local caching to free up funds to develop other long neglected but highly requested features like multi-gpu support.

I would appreciate the ability to select distinct folders for segregating cache storage (on HDD) from container storage (on SSD), along with a slider to designate the maximum storage capacity.

Eventually adding P2P caching which pays chef's for sharing their cache would ease pressure on salad's infrastructure with a local network transfer option (to avoid redownloading for those with multiple rigs).

#

https://discord.com/channels/509419745834041355/1252988107457036328 reminded me to post this that I had written over a month ago. Since then, I’ve experienced significantly worse performance (often 3 Mbps), with containers dropping much more frequently. I’ve given up on salad for a while but wanted to share my idea.

I’ve considered implementing a local caching solution, such as LANCACHE, but I presume the encryption of data HTTPS makes this unfeasible.

I was considering upgrading to the RTX 4090 due to its popularity and the chopping power it could provide for salad's customers. However, I concluded that it isn’t a wise investment for me at the moment since my current bandwidth can’t support my existing GPUs.

some takeaway questions for the community...

  • would you find local caching useful? why?

  • how much space would you be willing to allocate (GB/TB)?

  • would you like an option to choose storage location? maximum disk space limits? would you use or even dedicate a separate HDD to cache more?

and some questions for developers...

  • are there any technical (or cost prohibitive) limitations preventing development of local/p2p caching solutions?

  • what percentage of workloads are unique (cannot be cached)?

  • what is the maximum amount of cacheable data salad hosts?

  • what percentage of cacheable data has high chance/frequently of being accessed?

  • can Salad’s bandwidth sharing be used for peer-to-peer (P2P) sharing of cached data to reduce Salad’s operational costs?

finally some questions for salad's customers...

  • how preferable is it to launch "common" containers within minutes instead of hours?
boreal violet
#

Just my personal thoughts
Local caching would be pretty cool, though it might only be feasible with publicly accessible files, e.g. stable diffusion models hosted on huggingface. Things like customers' private workloads probably won't be able to be cached.
Anything above a couple dozen GB would probably require Salad to add a way to control how much space is allocated to the cache, though I doubt this cache would get too big anyways.
Drive endurance could be a pretty big concern, since Salad is already chewing away at that with its (arguably excessive) drive writes, a cache would make that even bigger of an issue :(
P2P file sharing would expose users' IPs, so I doubt Salad will add that, despite the benefits it could provide.

knotty kestrel
#

I'm adding some context on this - the server that is used to send out containers, the one you download from, has never reached its top speed limit yet... Even when there's very high network demand, we've still been way in the clear in terms of server bandwidth.

iron adder
#

Just gonna add in my 2 cents. Each chef with multiple pcs on the same wan needs to take on the role as network administrator. This means you need to have a router capable of QoS, rate limiting and bandwidth management. Assign static ips to your lan workers and set QoS tiered for the services used on your network, eg. Streaming is high priority, browsing is low priority etc. Most streaming requires about 25mb internet, if you have 2 salad workers and a 300mb internet you can rate limit each to 125mb with bandwidth to spare for general use.

charred aspen
#

I read everything and i understoof nothing can someone please brief?

knotty kestrel
#

Caching is the idea that your machine would keep (at least temporarily) some heavily-requested files that are downloaded - which means that when a container is started instead of having to each time redownload the full container you could get the data that is already on your machine (in the cache) for the base layers, and then only have to download the extra workload-specific data from the internet, which would reduce the total amount of internet usage

knotty kestrel
#

I'm not sure of the overall specific details, but I'm pretty sure we already have some form of layer management and caching... Though there are perhaps more improvements that could be made

proven walrus
knotty kestrel
#

Not everything is client-related, no. The client executable runs over an OS, they usually use specific libraries like PyTorch or others, ...