I'm not 100% new to deep learning because I did train some models back in 2017 for other tasks than NLP. But for NLP it's all new to me. Seeing ChatGPT inspired me to learn about this. I've been running some pre-trained models from huggingface on my PC and even made a few little chatbots in python using them but they aren't any good really. I've been learning and experimenting with different pretrained models for about a month now. I also followed a tutorial about finetuning a GPT-2 model that can generate shakesphere. I then proceeded to finetune that GPT-2 model on other things to see how well it worked. I made one that could generate infinite bible verses. Then I started trying to use bigger pretrained models and I hit a wall it seems. My GPU only has 12GB of memory and it runs out instantly. The System has 64GB of RAM and it doesn't crash on the CPU but it is WAY too slow to be very useful to me. Oh and by the way this isn't my job or anything it's just something I've been doing for fun. I still have lots to learn. But anyway How should I proceed with finetuning of models bigger than GPT-2 Large from hugging face? I thought about making the GPU overflow to the system memory but I don't know how to do that. So I really don't know all my options but I know I can rent a GPU with 80GB of vRAM on the cloud but that's VERY expensive and I could not afford to run fine tuning for any significant time. Really I'd rather just get a pay monthly option and all I could pay is maybe up to $50 a month. I want to dive further into this and train larger models on larger datasets. I tried the GPT Neo 6 billion parameter model. Instacrash. I don't know if I could tweak any settings to make it work. Also as a quick rule of thumb is there anyway I can judge how much memory I'd need to finetune or even just run a pretrained model?
And one more thing. I saw K80's on ebay for around $80 and they have 24GB of memory. If I could use one of those with my RTX 4070 ti would that help any?