I'm running magic-llama-3.1 on an Threadripper 1920x and a RTX3070Ti with Ubuntu 24.04.1 LTS.
I have a few errors/questions:
-
with "magic run serve --huggingface-repo-id modularai/llama-3.1"
i got this right at the Beginning "INFO: MainThread: root:Estimated memory consumption:
Weights: 4693 MiB
KVCache allocation: 128 MiB
Total estimated: 4821 MiB used / unknown MiB freeCurrent batch size: 1
Current max sequence length: 512
Max recommended batch size for current sequence length: unknown
"
why is the estimated memory consumption unknown? same for recommended batch size...
-
Why is only one of my NUMA nodes in use without the gpu flag?
-
gpu flag isn't working on driver 550.120