#Should I finetune on my RTX 4070 Ti or get a cloud GPU?

42 messages · Page 1 of 1 (latest)

sharp bolt
#

I'm not 100% new to deep learning because I did train some models back in 2017 for other tasks than NLP. But for NLP it's all new to me. Seeing ChatGPT inspired me to learn about this. I've been running some pre-trained models from huggingface on my PC and even made a few little chatbots in python using them but they aren't any good really. I've been learning and experimenting with different pretrained models for about a month now. I also followed a tutorial about finetuning a GPT-2 model that can generate shakesphere. I then proceeded to finetune that GPT-2 model on other things to see how well it worked. I made one that could generate infinite bible verses. Then I started trying to use bigger pretrained models and I hit a wall it seems. My GPU only has 12GB of memory and it runs out instantly. The System has 64GB of RAM and it doesn't crash on the CPU but it is WAY too slow to be very useful to me. Oh and by the way this isn't my job or anything it's just something I've been doing for fun. I still have lots to learn. But anyway How should I proceed with finetuning of models bigger than GPT-2 Large from hugging face? I thought about making the GPU overflow to the system memory but I don't know how to do that. So I really don't know all my options but I know I can rent a GPU with 80GB of vRAM on the cloud but that's VERY expensive and I could not afford to run fine tuning for any significant time. Really I'd rather just get a pay monthly option and all I could pay is maybe up to $50 a month. I want to dive further into this and train larger models on larger datasets. I tried the GPT Neo 6 billion parameter model. Instacrash. I don't know if I could tweak any settings to make it work. Also as a quick rule of thumb is there anyway I can judge how much memory I'd need to finetune or even just run a pretrained model?

And one more thing. I saw K80's on ebay for around $80 and they have 24GB of memory. If I could use one of those with my RTX 4070 ti would that help any?

crisp barn
#

You're gonna want to pay the premium Colab subscription for those larger models. Even then, they may not have what you want. Finetuning GPT Neo is an undertaking that is going to require high end GPUs. GPT-2 Large is 774M parameters iirc and GPT-2 XL is 1.5B parameters. GPT-Neo is has a 1.3B and 2.7B (as well as 125M) parameter variants available on huggingface. You're going to want all the RAM & GPU you can get and you'll only find that one Colab's premium service if you MUST leverage GPUs for faster training. Just be aware you need to keep your Colab instance alive and active or you'll get timed out (I think there are some scripts around that should help you with that).

Alternatively, we've come a ways from transformer decoder stacks like GPT. ChatGPT sounds like it is a more refined version of InstructGPT, which uses information gathered from a web search to aid its generations. Google Deepmind put out a paper called RETRO which is a model that retrieves data from a database and uses that to influence text generations. The idea was that they were able to get generations of similar quality to GPT-3 (which is around 330B parameters depending on which variant you choose) while only having a model that is a few B parameters (<5B). I think you should give it a shot to try a downsized version of RETRO on your machine given its resources (and it saves you the money of local training vs Colab premium). I have a few resources you can look at if you're interested in that. I also have a medium article from someone who finetuned GPT-Neo for various tasks (haiku, screen writing).

#

One other thing, I saw the K80s too when I was considering building my own GPU server (still havent full disclosure). The feedback I got on the K80s youre seeing is that the design is years old, and it's a card with 2 12GB GPUs stuck together and is overall still quite slow compared to something like the A or T series cards Nvidia has.

sharp bolt
#

@crisp barn I'm aware that the K80 is way slower than the Turing or Ampere GPU's. It uses Kepler which is in the 600 and 700 series of consumer GPU's. I had one back in 2013. They were way outclassed by Maxwell and even more by the very popular Pascal architecture. I have the latest Ada Lovelace in my GPU. I knew exactly what the K in K80 meant. But anyways... I'm doing some experiments in even more RAM limited scenarios in VM's and seeing what I can learn from that.

crisp barn
sharp bolt
#

I'm trying to see if there is any way the slow down could be more tolerable.

crisp barn
#

You can sort of test the performance there for no cost and see if it’s tolerable to you

sharp bolt
#

Right now I have no choice but to use the CPU or use a cloud device... Which I'm a little hesitant about. The whole point of this for me anyway was to see what I could do locally. Better Ai is already on the cloud. I mainly want a chatbot that is somewhat decent. After that I want to use what I learned to make all kinds of different and cool things.

crisp barn
#

Let me get you the links

sharp bolt
#

I just can't seem to get good results from the chatbots I've made.

#

If I could just do what this guy did https://www.youtube.com/watch?v=3mUnEywtdZY I'd be ok with it.

I let an artificial intelligence model loose on my comments and some weird stuff happened. If we hit 20k likes (we've done it before) I'll run the AI on this entire comment section, so make sure to like and comment something you want to get a reply.

This was made using the GPT-3 AI model (774M version) and YouTube API v3.

Disclaimer: The YouTu...

▶ Play video
crisp barn
#

Part of the problem may be how you are training the model. If you format your data so that it's more like a chat log vs pure text generation you may see different results

sharp bolt
#

I think all the guy in the video did was download the pretrained model and run it with a prompt that has examples of youtube comments in the prompt text. But I tried that same thing and all it does is give me the first part of the prompt string every time.

#

But now that I think about it. It might be my text processing that is causing that to happen.

#

It might be wrong.

#

I'll test that by removing my text processing step.

#

Ok now I'm seeing that it was that all along it seems.

#

Yeah Now I feel stupid. I coded the text processing totally wrong and I didn't notice until I looked at it. I kinda relied too much on Copilot to do it based on comments and it came out bad. After I actually looked at my code I realized it was totally wrong.

crisp barn
#

haha, trusting the AI kinda messed with your process. That's why I use ChatGPT for code inspiration but when you look closer at it, it's barely inline with syntax or would barely compile

sharp bolt
#

I agree with that. The Ai can give you ideas that you might not have thought to try though and sometimes the code does compile and work. But a lot of the time it's just hours of WHY IS THIS HAPPENING!? And when you don't know python good anyway it's kinda bad sometimes. But I learned from the Ai how python code works and is structured. And that helped me a lot. I just started learning it a month ago.

#

My Ai is funny. I asked it. Where is Hell? It replied Los Angeles.

crisp barn
#

Not wrong

obsidian jacinth
#

finetuning will be fine on 4070ti

obsidian jacinth
spring basin
#

For language models GPU memory is often the limiting factor

12GB won't be enough for modern models if you're thinking about GPT-Neo, NeoX etc

#

Cloud will probably be the way to go

#

Someone recommend Colab pro, I think that's good too

#

Beam also looks like a good option

sharp bolt
#

@spring basin That's still too expensive for me. I can't pay $50+ per day for that.

#

Or would it just be better to opt for an RTX 4090 in my case? It has 24GB of vram.

#

I'd rather have a local option or a one time type of payment.

spring basin
#

Was that the value on beam's pricing page for 24h?

sharp bolt
#

$0.00230830 per second cost for one I'd prefer. $0.00230830x60x60x24= $199.44.

spring basin
#

Will you be using it for 24h everyday? 😅

#

If you were deploying models then yes probably but that's not the usecase in your post

sharp bolt
#

@spring basin No probably not.