#How to evaluate cloud GPU providers?

5 messages · Page 1 of 1 (latest)

thorny garnet
#

Curious to learn from people with experience deploying and running AI workloads on non-hyperscalar cloud GPUs (think like Voltage Park, Hyperstack, Akash, etc).
What went into your choice of GPU cloud? What were the criteria? Have you experienced any frustrations/issues with their infra?

ruby anchor
#

main challenge has sometimes been issues with our clusters not working properly for a myriad of reasons so I suppose reliability is the biggest thing... it all depends on where you're at and what level of scaling you need to do.

thorny garnet
#

Ahhhh gotcha, is this mainly for like multi-node workloads + interconnect reliability? Which cloud GPU providers have you used in the past?

spice peak
#

if you want realiability i think setting up machines on two differnt providers (datacentrs) is useful eg even if you use vastai just set two providers from different data centres and route traffic. Even just two machines should give you good redundancy. Then if one falls its usually ~10min to load another machine with docker containers.