#Mac M1 Air issue
53 messages · Page 1 of 1 (latest)
Which mode is this exactly? did you try running with --verbose?
What model parameters are you using? That would be helpful too.
I didn’t I used —threads
—threads 8 that’s all, do you recommend something else ?
How many CPU cores do you have?
The main thing is memory and cores, Waht kind of model ar you useing? is it GGML, GGUF, GOPTQ, HF, something else?
I think I found it, not th e16k one, I hope.
@viscid vine I've had rather bad luck loading models using transformers, if that's what you are loading with. I also don't know how much memory you have or if you are seeing CPU activity while it's "typing..."
I see there's a new version of this one the 1.5 version. I'd recommend getting the GGML version, it runs really fast with the latest version of llama-cpp.python using GGUF. You would have to convert the model to GGUF format after you download it. If you download the repository for llama.cpp, there is a conversion script there. That is probably your best option, unless you are using the model for fine tuning. In that case loading and running with ctransformers is probably your best route and the 1.5 version is also available for that. Though without knowing more, it is difficult to be of more assistance.
More information would help.
Here are the newer versions and the GGML version too.
https://huggingface.co/models?search=lmsys/vicuna-13b
8 cores
So i cannot use the regular one and just download the model for the webui and load it ?
because i am pretty new for it and i dont know how to use GGML
He also has 8GB ram on this machine
@timid birch I'm guessing that's the standard config for that machine? If it is, then its reaklly thrashing memory, I'm surprised it's even loading, but I suppose with transformers it wilkl use a combinatuion of memory and disk. shudder
ahhhh
@viscid vine see my comment above abojut memory, is it true you have 8GB? You'r gonna have a hard time with a 7B model, much less a 13B model. The GGML models will maybe run "better", but they will still run with lots fo swapping for that. ANd 6 or 7 cores is probably sufficient.
I am going to hook him up with some ram
Yesh, but also note, that you may not get much performance increase unless it's unified memory.
It is a Mac Bood Air M1, I don't think he is going to have that option.
But even with 16GB RAM, it's still going to be a stretch to get the 13B models to run, I'd defibnitely look at using a GGML/GGUF 4 bit quant mnodel, they give good performance/memory utilization with moderate quality tradeoff.
I don't think there is a good solution other than using a more robust machine. I think dallelamma might run on his machine
@timid birch I'm looking at the page above with the models and the q4_0 model takes up only 7.4GB disk, and the models are best if they completely load in RAM, that should fit in the 16GB, and leave enough for overhead which will be a bit, difficult to say, and the data card has no numbers with it. I can maybe try to run it in GGUF format and see what the memory hit looks like, but it will only be a close approximation depending on load on the machine.
These things are monsters that eat disk for breakfast and RAM all day long.
Understood
thank you @sleek plinth 🙂
at least he didn't try it on a commodore64
I wanted to get a 64GB MacBook Pro M2 Max, settled for a 32BG because anything larger than 32GB has to be sent from the factory. AFter migrating over to M2 from Intel, I discovered I should have got the 64GB in th efirst place. Apple was nice enough to exchange it almost 90 days after purchase, and I went ahead and maxed the machine out to 96GB.
I just blew four grand on a new PC with a 13.9k i9, 128GB DDR5 and 12GB redundant nvme
oh, and an rtx 4090
It's crazy what the prices are, especially if you want to use a GPU. I have been working on benchmarking GPU performance in various Python library configurations, keep hitting bump along the way, but hopefully soon I will have some initial numbers comparing performance. Basically for using the NumPy which comes with SciPy vs the one that comes with PyTorch, vs one built with the Accelerate frame work, along with the package regression tests and bnenchmarks included too.
I went with Apple, yeah I know you can't upgrade, well, I did kinda, but to upgrade RAM in these new machines is a new system board.
The Unified Memory is directly on the sysetm board to minimize/reduce/eliminate noise from memory connectors.
I have been working in benchmarking CPU performance for some enterprise CPUs
Nice, how’s that going and what are you finding?
@timid birch have you found anything which will give you a benchmark for performance for a LLM? I’d think that would be difficult to find, at least anything which would provided consistent data and results.
I’m not at that point yet, but would like to put together a set of tests for one or two LLM’s. My thinking is though, that of you constrain the LLM with very deterministic parameters, you won’t get a good test because the whole point is you would like to see creativity and diversity in answers in the real world.
The answer is yes.
Nice answer… lol
Sorry not so specific but I am unsure what I can share in that arena.
I figured, that’s the way it goes sometimes.
We have some internally build scripts that utilize some older LLMs but are made to tax all the resources.
Oh, we are using LLMs to benchmark hardware
not benchmarking the LLMs on hardware itself
for the purpose of a predetermined output
Oh, that’s different then. Very nice and I’m sure rather proprietary. I’d be interested in locating any resources which are Open Source, if you have any, they would be most welcome.
I also figure if I asked are you having fun with that, you’d give the same answer.😀
I own a distillery and liked what the job entailed so much I decided to jump onboard. I manage my business from afar and do this full time and make cheese on the weekends and write podcasts at night.
So, it's pretty technically fufilling.
That sounds like a lot of fun and a really niche application/use case.
And a lot of work and long hours too.
The thing about running tests that take days and hours is that there is time enough for much