Not sure what you exactly want, bigger context or bigger models.
You could try Mistral-7B family, it has better quality than Llama2-7B and some claim it rivals Llama2-13B.
I have mixed impressions when I compare it with 13B but it's definitely much better than average 7B, and it can handle bigger contexts better than Llama models.
It can handle more or less reliably 6k context with usable quality, although it slowly drops as context get bigger.
But it doesn't have any issues with GPU memory at all even if you try to use high context versions. Even MistalLite with 16k should load perfectly fine.
If you really want bigger models instead, take a look at GGUF models that run on CPU+GPU.
I have no idea what speed you'd get for 13B GGUF models.
But I can run 33B GGUF models at about 0.6t/s with a similar setup but with much older CPU and ancient DDR3 memory.
My setup is RTX3060/12GB 32GB DDR3 and i7-3370 that doesn't have quite few optimizations your CPU has.
I'd assume your setup should handle this noticeably better than mine.
And even this 0.6t/s for 33B is better than your exllama2 speed with 13B without 8bit cache. So I think this option is worth testing.