Mac M1 Air issue | Text Generation WebUI | Page 1

viscid vine Aug 30, 2023, 10:25 PM

#

Hey guys, got an issue when i try to generate some text with some models i installed and loaded i got this output on console but the answers from the model is not coming just stuck on the "is typing" text

#

Thanks for helping 🙂

sleek plinth Aug 31, 2023, 1:54 PM

#

Which mode is this exactly? did you try running with --verbose?

#

What model parameters are you using? That would be helpful too.

viscid vine Aug 31, 2023, 3:44 PM

#

sleek plinth Which mode is this exactly? did you try running with --verbose?

I didn’t I used —threads

viscid vine Aug 31, 2023, 3:44 PM

#

sleek plinth What model parameters are you using? That would be helpful too.

—threads 8 that’s all, do you recommend something else ?

sleek plinth Aug 31, 2023, 4:01 PM

#

How many CPU cores do you have?

#

The main thing is memory and cores, Waht kind of model ar you useing? is it GGML, GGUF, GOPTQ, HF, something else?

#

I think I found it, not th e16k one, I hope.

sleek plinth Sep 1, 2023, 1:44 AM

#

@viscid vine I've had rather bad luck loading models using transformers, if that's what you are loading with. I also don't know how much memory you have or if you are seeing CPU activity while it's "typing..."

I see there's a new version of this one the 1.5 version. I'd recommend getting the GGML version, it runs really fast with the latest version of llama-cpp.python using GGUF. You would have to convert the model to GGUF format after you download it. If you download the repository for llama.cpp, there is a conversion script there. That is probably your best option, unless you are using the model for fine tuning. In that case loading and running with ctransformers is probably your best route and the 1.5 version is also available for that. Though without knowing more, it is difficult to be of more assistance.

#

More information would help.

#

Here are the newer versions and the GGML version too.
https://huggingface.co/models?search=lmsys/vicuna-13b

Models - Hugging Face

viscid vine Sep 1, 2023, 4:58 PM

#

sleek plinth How many CPU cores do you have?

8 cores

viscid vine Sep 1, 2023, 4:59 PM

#

sleek plinth <@116567381089714181> I've had rather bad luck loading models using transformer...

So i cannot use the regular one and just download the model for the webui and load it ?
because i am pretty new for it and i dont know how to use GGML

timid birch Sep 2, 2023, 2:11 AM

#

He also has 8GB ram on this machine

sleek plinth Sep 2, 2023, 2:14 AM

#

@timid birch I'm guessing that's the standard config for that machine? If it is, then its reaklly thrashing memory, I'm surprised it's even loading, but I suppose with transformers it wilkl use a combinatuion of memory and disk. shudder

timid birch Sep 2, 2023, 2:15 AM

#

ahhhh

sleek plinth Sep 2, 2023, 2:16 AM

#

@viscid vine see my comment above abojut memory, is it true you have 8GB? You'r gonna have a hard time with a 7B model, much less a 13B model. The GGML models will maybe run "better", but they will still run with lots fo swapping for that. ANd 6 or 7 cores is probably sufficient.

timid birch Sep 2, 2023, 2:16 AM

#

I am going to hook him up with some ram

sleek plinth Sep 2, 2023, 2:17 AM

#

Yesh, but also note, that you may not get much performance increase unless it's unified memory.

timid birch Sep 2, 2023, 2:18 AM

#

It is a Mac Bood Air M1, I don't think he is going to have that option.

sleek plinth Sep 2, 2023, 2:18 AM

#

But even with 16GB RAM, it's still going to be a stretch to get the 13B models to run, I'd defibnitely look at using a GGML/GGUF 4 bit quant mnodel, they give good performance/memory utilization with moderate quality tradeoff.

timid birch Sep 2, 2023, 2:21 AM

#

I don't think there is a good solution other than using a more robust machine. I think dallelamma might run on his machine

sleek plinth Sep 2, 2023, 2:23 AM

#

@timid birch I'm looking at the page above with the models and the q4_0 model takes up only 7.4GB disk, and the models are best if they completely load in RAM, that should fit in the 16GB, and leave enough for overhead which will be a bit, difficult to say, and the data card has no numbers with it. I can maybe try to run it in GGUF format and see what the memory hit looks like, but it will only be a close approximation depending on load on the machine.

#

These things are monsters that eat disk for breakfast and RAM all day long.

timid birch Sep 2, 2023, 2:25 AM

#

Understood

viscid vine Sep 2, 2023, 2:25 AM

#

thank you @sleek plinth 🙂

timid birch Sep 2, 2023, 2:25 AM

#

at least he didn't try it on a commodore64

sleek plinth Sep 2, 2023, 2:28 AM

#

I wanted to get a 64GB MacBook Pro M2 Max, settled for a 32BG because anything larger than 32GB has to be sent from the factory. AFter migrating over to M2 from Intel, I discovered I should have got the 64GB in th efirst place. Apple was nice enough to exchange it almost 90 days after purchase, and I went ahead and maxed the machine out to 96GB.

timid birch Sep 2, 2023, 2:30 AM

#

I just blew four grand on a new PC with a 13.9k i9, 128GB DDR5 and 12GB redundant nvme

#

oh, and an rtx 4090

sleek plinth Sep 2, 2023, 2:33 AM

#

It's crazy what the prices are, especially if you want to use a GPU. I have been working on benchmarking GPU performance in various Python library configurations, keep hitting bump along the way, but hopefully soon I will have some initial numbers comparing performance. Basically for using the NumPy which comes with SciPy vs the one that comes with PyTorch, vs one built with the Accelerate frame work, along with the package regression tests and bnenchmarks included too.

#

I went with Apple, yeah I know you can't upgrade, well, I did kinda, but to upgrade RAM in these new machines is a new system board.

#

The Unified Memory is directly on the sysetm board to minimize/reduce/eliminate noise from memory connectors.

timid birch Sep 2, 2023, 2:46 AM

#

I have been working in benchmarking CPU performance for some enterprise CPUs

sleek plinth Sep 2, 2023, 2:48 AM

#

Nice, how’s that going and what are you finding?

#

@timid birch have you found anything which will give you a benchmark for performance for a LLM? I’d think that would be difficult to find, at least anything which would provided consistent data and results.

#

I’m not at that point yet, but would like to put together a set of tests for one or two LLM’s. My thinking is though, that of you constrain the LLM with very deterministic parameters, you won’t get a good test because the whole point is you would like to see creativity and diversity in answers in the real world.

timid birch Sep 2, 2023, 3:16 AM

#

The answer is yes.

sleek plinth Sep 2, 2023, 3:17 AM

#

Nice answer… lol

timid birch Sep 2, 2023, 3:19 AM

#

Sorry not so specific but I am unsure what I can share in that arena.

sleek plinth Sep 2, 2023, 3:19 AM

#

I figured, that’s the way it goes sometimes.

timid birch Sep 2, 2023, 3:20 AM

#

We have some internally build scripts that utilize some older LLMs but are made to tax all the resources.

#

Oh, we are using LLMs to benchmark hardware

#

not benchmarking the LLMs on hardware itself

#

for the purpose of a predetermined output

sleek plinth Sep 2, 2023, 3:22 AM

#

Oh, that’s different then. Very nice and I’m sure rather proprietary. I’d be interested in locating any resources which are Open Source, if you have any, they would be most welcome.

#

I also figure if I asked are you having fun with that, you’d give the same answer.😀

timid birch Sep 2, 2023, 3:25 AM

#

I own a distillery and liked what the job entailed so much I decided to jump onboard. I manage my business from afar and do this full time and make cheese on the weekends and write podcasts at night.

#

So, it's pretty technically fufilling.

sleek plinth Sep 2, 2023, 3:26 AM

#

That sounds like a lot of fun and a really niche application/use case.

#

And a lot of work and long hours too.

timid birch Sep 2, 2023, 3:28 AM

#

The thing about running tests that take days and hours is that there is time enough for much

#Mac M1 Air issue