#Mac M1 Air issue

53 messages · Page 1 of 1 (latest)

viscid vine
#

Hey guys, got an issue when i try to generate some text with some models i installed and loaded i got this output on console but the answers from the model is not coming just stuck on the "is typing" text

#

Thanks for helping 🙂

sleek plinth
#

Which mode is this exactly? did you try running with --verbose?

#

What model parameters are you using? That would be helpful too.

viscid vine
viscid vine
sleek plinth
#

How many CPU cores do you have?

#

The main thing is memory and cores, Waht kind of model ar you useing? is it GGML, GGUF, GOPTQ, HF, something else?

#

I think I found it, not th e16k one, I hope.

sleek plinth
#

@viscid vine I've had rather bad luck loading models using transformers, if that's what you are loading with. I also don't know how much memory you have or if you are seeing CPU activity while it's "typing..."

I see there's a new version of this one the 1.5 version. I'd recommend getting the GGML version, it runs really fast with the latest version of llama-cpp.python using GGUF. You would have to convert the model to GGUF format after you download it. If you download the repository for llama.cpp, there is a conversion script there. That is probably your best option, unless you are using the model for fine tuning. In that case loading and running with ctransformers is probably your best route and the 1.5 version is also available for that. Though without knowing more, it is difficult to be of more assistance.

#

More information would help.

viscid vine
viscid vine
timid birch
#

He also has 8GB ram on this machine

sleek plinth
#

@timid birch I'm guessing that's the standard config for that machine? If it is, then its reaklly thrashing memory, I'm surprised it's even loading, but I suppose with transformers it wilkl use a combinatuion of memory and disk. shudder

timid birch
#

ahhhh

sleek plinth
#

@viscid vine see my comment above abojut memory, is it true you have 8GB? You'r gonna have a hard time with a 7B model, much less a 13B model. The GGML models will maybe run "better", but they will still run with lots fo swapping for that. ANd 6 or 7 cores is probably sufficient.

timid birch
#

I am going to hook him up with some ram

sleek plinth
#

Yesh, but also note, that you may not get much performance increase unless it's unified memory.

timid birch
#

It is a Mac Bood Air M1, I don't think he is going to have that option.

sleek plinth
#

But even with 16GB RAM, it's still going to be a stretch to get the 13B models to run, I'd defibnitely look at using a GGML/GGUF 4 bit quant mnodel, they give good performance/memory utilization with moderate quality tradeoff.

timid birch
#

I don't think there is a good solution other than using a more robust machine. I think dallelamma might run on his machine

sleek plinth
#

@timid birch I'm looking at the page above with the models and the q4_0 model takes up only 7.4GB disk, and the models are best if they completely load in RAM, that should fit in the 16GB, and leave enough for overhead which will be a bit, difficult to say, and the data card has no numbers with it. I can maybe try to run it in GGUF format and see what the memory hit looks like, but it will only be a close approximation depending on load on the machine.

#

These things are monsters that eat disk for breakfast and RAM all day long.

timid birch
#

Understood

viscid vine
#

thank you @sleek plinth 🙂

timid birch
#

at least he didn't try it on a commodore64

sleek plinth
#

I wanted to get a 64GB MacBook Pro M2 Max, settled for a 32BG because anything larger than 32GB has to be sent from the factory. AFter migrating over to M2 from Intel, I discovered I should have got the 64GB in th efirst place. Apple was nice enough to exchange it almost 90 days after purchase, and I went ahead and maxed the machine out to 96GB.

timid birch
#

I just blew four grand on a new PC with a 13.9k i9, 128GB DDR5 and 12GB redundant nvme

#

oh, and an rtx 4090

sleek plinth
#

It's crazy what the prices are, especially if you want to use a GPU. I have been working on benchmarking GPU performance in various Python library configurations, keep hitting bump along the way, but hopefully soon I will have some initial numbers comparing performance. Basically for using the NumPy which comes with SciPy vs the one that comes with PyTorch, vs one built with the Accelerate frame work, along with the package regression tests and bnenchmarks included too.

#

I went with Apple, yeah I know you can't upgrade, well, I did kinda, but to upgrade RAM in these new machines is a new system board.

#

The Unified Memory is directly on the sysetm board to minimize/reduce/eliminate noise from memory connectors.

timid birch
#

I have been working in benchmarking CPU performance for some enterprise CPUs

sleek plinth
#

Nice, how’s that going and what are you finding?

#

@timid birch have you found anything which will give you a benchmark for performance for a LLM? I’d think that would be difficult to find, at least anything which would provided consistent data and results.

#

I’m not at that point yet, but would like to put together a set of tests for one or two LLM’s. My thinking is though, that of you constrain the LLM with very deterministic parameters, you won’t get a good test because the whole point is you would like to see creativity and diversity in answers in the real world.

timid birch
#

The answer is yes.

sleek plinth
#

Nice answer… lol

timid birch
#

Sorry not so specific but I am unsure what I can share in that arena.

sleek plinth
#

I figured, that’s the way it goes sometimes.

timid birch
#

We have some internally build scripts that utilize some older LLMs but are made to tax all the resources.

#

Oh, we are using LLMs to benchmark hardware

#

not benchmarking the LLMs on hardware itself

#

for the purpose of a predetermined output

sleek plinth
#

Oh, that’s different then. Very nice and I’m sure rather proprietary. I’d be interested in locating any resources which are Open Source, if you have any, they would be most welcome.

#

I also figure if I asked are you having fun with that, you’d give the same answer.😀

timid birch
#

I own a distillery and liked what the job entailed so much I decided to jump onboard. I manage my business from afar and do this full time and make cheese on the weekends and write podcasts at night.

#

So, it's pretty technically fufilling.

sleek plinth
#

That sounds like a lot of fun and a really niche application/use case.

#

And a lot of work and long hours too.

timid birch
#

The thing about running tests that take days and hours is that there is time enough for much