# I have a rtx 3070 mobile. I have been running 7b exl2 quants at 8bpw with 8192 context with 4 bit cache with no issues but after running the updater, I can no longer run them because it says out of memory
# I too had a similar problem after the exl2 update. Try loading GPTQ model with transformers, it may be slower but requires less VRAM.