#Gemma4 is not working(serverless) via vLLM v2.14.0.
15 messages · Page 1 of 1 (latest)
Yup vvlm worker has not yet been updated to latest upstream vllm
can you help me with Network venue ? I can't setup it, because I don't know on which data centers is available for my GPU
I configured it this way, but it still doesn’t behave as expected. My goal is to send a single request and have the service quickly start inference from the disk image without a long cold start.
what are your env variables
can i see your env variables, all of them
dont screenhot tokens if you have any ( blur / cover them)
usually bigger models still do take time to load so its normal
seems normal to me, i think its just the loading time
so it's should be the best setup, as I can do? with cold start?
where can I see Network volumes ? I mean that how much space there is free or occupied