#Starting out, performance woes

7 messages · Page 1 of 1 (latest)

sterile mantle Nov 24, 2023, 12:45 AM

You can try using GGUF with llama.cpp or ctransformers and see if that improves anything

Also, with 128k context that might be a large RAM/VRAM burden

Try reducing it to something like 4k when loading the model

See if that improves anything

As for which GGUF quant to use, maybe go for Q4_K_M.

sterile mantle Nov 24, 2023, 1:48 AM

did you try reducing context?

128k context is very large