GPU Accelerated NEAT algorithm for trading environment | GPU MODE | Page 1

torpid flower Jun 2, 2025, 12:55 PM

#

Hi all,

I've been working on a GPU-accelerated implementation of the NEAT (NeuroEvolution of Augmenting Topologies) algorithm, tailored for a trading simulation environment. The core architecture is mostly complete and currently leverages CUDA to speed up population evaluation significantly

The system is functional, but there are still some key areas open for contribution and improvement:

Implementing advanced crossover/mutation strategies beyond the basic NEAT spec
Handling memory related issues during crossover and mutation (some bugs remain)
Integrating additional layer 2 market data for richer training input

I'd be glad if anyone finds this project interesting and would like to contribute or collaborate.

plain steeple Jun 3, 2025, 7:07 AM

#

What are the pre requesites for this project?

torpid flower Jun 3, 2025, 7:33 AM

#

plain steeple What are the pre requesites for this project?

Mainly a good backend with CUDA, competence on finance and neuroevolutionary algorithms would be enough

#

Competence on neuroevolutionary algorithms is not that much important because it's not that hard to learn in short time

plain steeple Jun 3, 2025, 8:11 AM

#

Oh, but I am beginner in CUDA, so not much proficient with backend

torpid flower Jun 3, 2025, 8:28 AM

#

If you have experience with neuroevolutionary algorithms and if someone collaborates with us who is experienced with CUDA then we can collaborate I think

plain steeple Jun 3, 2025, 9:51 AM

#

No experience with that as well

vestal magnet Jun 15, 2025, 4:32 PM

#

torpid flower Hi all, I've been working on a GPU-accelerated implementation of the NEAT (Neur...

Hey! This project seems really interesting. I have experience in FinTech, low-latency programming, and started dabbling in GPU programming so seems like a good cross-over. Would love to start collaborating! Let me know if there's already any efforts so far

grizzled hill Jun 25, 2025, 6:48 AM

#

torpid flower Mainly a good backend with CUDA, competence on finance and neuroevolutionary alg...

Hey @grim laurel, I am a trained Quant and just learning GPU programming. I am building trading algorithm, ML application for time series analysis and dynamic portfolio optimization algorithm. I have extenstive experience working on C# framework QuantConnect to backtest trading algorithms. The project sounds fun and I would love to join

#

Yo @vestal magnet saw you on the same QC server

#

Nice to meet you man

torpid flower Jun 25, 2025, 12:08 PM

#

grizzled hill Hey <@1020470086047375381>, I am a trained Quant and just learning GPU programmi...

Hello Robert,

It's nice to hear that! @vestal magnet is learning about NEAT algorithm for now and will start working on the project after probably 1 week or something like that. Do you want me to create a group chat?

grizzled hill Jun 25, 2025, 12:09 PM

#

Sounds good

#

If you have your group please let me in

vivid rivet Jun 28, 2025, 12:40 PM

#

torpid flower Hi all, I've been working on a GPU-accelerated implementation of the NEAT (Neur...

Hey,

I've been working on a similar project for almost a year but didn't start using CUDA until about 5 months ago. My own method is to use tons of public trading algorithms and then calculate signals (buy, sell, or nothing) on each bar across a dataset that I've collected from the brokerage. Then it calculates the pnl, winrate, etc on that data and sends it to a second kernel which checks all of the values. Basically, I hoist 100s of trading algorithms into device functions that are called from a single kernel which calls them at random changing the backtest values and entry/exit algorithms along with their arguments at random. I originally used the cpu via the TA-LIB library which worked great but rewriting it has been very tough since its an old library thats hard to read so I simply deviated and just made similar gpu compatible functions that are very close to the cpu functions. I made a little money with this last year on my cpu version however it was only about 6-7 months later when I realized how much faster and better I could make it by using the gpu. So far my backtest is extremely close to actual market conditions (tested via a demo account several times) and for results I'm using linear regression to aggregate values based on how consistent the algorithm is along with the total pnl the backtester gives at the end. Recently I've figured out that a single dataset doesn't seem to be enough to find good algorithms (some just seem to be good at that particular dataset and not market data weeks later), so I've begun using a different dataset weeks after the other one, and all seems to be going well so far.

#

I originally started using tensorflow and creating LSTM models but I figured out that it was just way too slow and it never really gave any real results for me. So, I moved on to simply make a strategy guesser/bruteforcer which has worked much better and has given me results.

torpid flower Jun 28, 2025, 4:36 PM

#

vivid rivet I originally started using tensorflow and creating LSTM models but I figured out...

I also made a QR-DQN based RL model but it was just fitting only the part that it's currently being trained, for example it fits very well at starting of the data and after like 500 steps it fits for new trend and forgets the other part (no matter how much I play with learning rate, batch size or other parameters). So making a general model is very hard with RL models. I also tried supervised learning with zigzag algorithm but this time it just overfitted too much. It was going from 10k to billions in train data and billions to a few ks in test data. And also the other problem is RL training has to be sequential and because of that it's also very very slow and can not be parallelised

torpid flower Jun 28, 2025, 4:39 PM

#

vivid rivet Hey, I've been working on a similar project for almost a year but didn't start...

Exactly, single dataset is never enough and it's either not learning or overfitting too much

vivid rivet Jun 28, 2025, 4:57 PM

#

torpid flower I also made a QR-DQN based RL model but it was just fitting only the part that i...

Interesting, yeah I have the same problem with sequential data. What I do is for loop where it starts at the lookback period and calculates the algorithm on the data, then increment by 1 bar each time to generate signals, and once those signals have been aggregated then you can begin to do trading logic. I've looked into trying to optimize it but so far the only thing I can come up with is basic stuff like for functions like ema, macd, bbands, stochastic, rsi, etc you only calculate the last element in the data your calculating to avoid wasting gpu time.

torpid flower Jun 28, 2025, 5:06 PM

#

In that project, it prepares data (adding some indicators, scaling data between 1 and -1 without seeing the future data (normally minmax scaler's using future data and because of that I have to use rolling minmax scaler)) after that it's dividing the data with window sizes to make each input chunks as different neural net input and parallelly calculates each input chunk on every individual at the population (using multiple datasets to avoid overfitting) and based on MDD and ROI it calculates fitness score (using calmar ratio) and based on that fitness score it chooses the elite ones and crossovers the individuals and then to explore better network stractures it mutates the final networks

ruby pine Jul 17, 2025, 10:31 PM

#

torpid flower If you have experience with neuroevolutionary algorithms and if someone collabor...

If this is still open I have some basic experiance with CUDA (more with msl but even that isn't very impressive) but lots more on neuroevolutionary algorithms and I've recently been dipping into finance. If you want an Algo first collaborator I'd be interested

ruby pine Jul 17, 2025, 10:35 PM

#

torpid flower I also made a QR-DQN based RL model but it was just fitting only the part that i...

yeah i don't the best way to get gpu value from RL models... you can always batch but... Maybe you can train many actors and have them teach each other. Like bacteria where genetic code can be shared and there is a natural selection environment. The learning can thereforefore be distributed for the price of memory. which could be a good RL tradeoff... But that is Neuroevolutionary mixed with RL

torpid flower Jul 17, 2025, 10:37 PM

#

ruby pine yeah i don't the best way to get gpu value from RL models... you can always batc...

Actually that's quite interesting idea but not sure if it's actually logical tho. Can you please just explain it little bit more

torpid flower Jul 17, 2025, 10:38 PM

#

ruby pine If this is still open I have some basic experiance with CUDA (more with msl but ...

If you are generally active and can work at least a one or two hours a day with group yeah I'd like to see you in the group

#

Generally it's important to have a good background but everything is pointless if you are not doing anything with it

ruby pine Jul 17, 2025, 10:43 PM

#

so basic evolutionary nns use mutation to explore the space of all possible states and responces (in RL we can call this policy space) then we treat some parameters as genes changing them based of different condition (breeding, timesteps, splitting ect). So this algo would start with a population of say 1000. We would randomly pair actors (500 pairs). The best performing one would give their 'genes' to the other one (this could look like distillation or if we have modular networks be can copy over a gene) then each one has there fitness measured and according to that they may survive to the next generation and reproduce (then around 1000 are left in total, we keep pop constant for gpu reasons I think). First they swap then they reporduce... There are bacteria that do this in the real world

#

Called horizontal gene transfer

#

Always a crazy concept to me

torpid flower Jul 17, 2025, 10:46 PM

#

torpid flower If you are generally active and can work at least a one or two hours a day with ...

The thing that I'm searching is you have to "actually work together". To explain it better, f.e if you want me to work together at a free time and I'm saying im busy etc id never expect you to work but overall if you do not have even a few hours to work together I wouldn't want to accept at all

ruby pine Jul 17, 2025, 10:50 PM

#

torpid flower If you are generally active and can work at least a one or two hours a day with ...

So I have only 5-6 hours a week right now. I can do a lot in 5 hours and I'm pretty flexible as to when those are but I really do understand if that is too little. No worries otherwise. This time is pretty busy for me and I expect itll lightetn up soon.

torpid flower Jul 17, 2025, 10:50 PM

#

ruby pine so basic evolutionary nns use mutation to explore the space of all possible stat...

This is just what evalutionary algorithm does, I can't see where is the point of RL

torpid flower Jul 17, 2025, 10:50 PM

#

ruby pine So I have only 5-6 hours a week right now. I can do a lot in 5 hours and I'm pre...

even if you have 1 hour for 5 days, it's not problem

#

Just not like a few messages per week

ruby pine Jul 17, 2025, 10:52 PM

#

torpid flower This is just what evalutionary algorithm does, I can't see where is the point of...

The RL is the learning in between timesteps that justifies the HGT. So each agent is learning and getting feedback between each step. i.e. in 1000 timesteps with an average pop of 100 there is 1000 RL batches each can be run with 100 actors in parrell

#

At least that's what I'm thinking... I certainly hope it trains faster than 1 agent alone. I doubt it will help in the stock market though... but definitly worth a try

ruby pine Jul 17, 2025, 10:53 PM

#

torpid flower even if you have 1 hour for 5 days, it's not problem

Yeah! What is the project timeline

torpid flower Jul 17, 2025, 10:57 PM

#

ruby pine The RL is the learning in between timesteps that justifies the HGT. So each agen...

Ohh now I got your point, it was also my first main idea but after seeing NEAT algorithm (which is the closest approach imo) I just built additional algorithm

torpid flower Jul 17, 2025, 10:58 PM

#

ruby pine Yeah! What is the project timeline

The goal was about 1 month when I first created the group but after seeing the progress speed it became "whenever it finishes"

torpid flower Jul 17, 2025, 11:00 PM

#

torpid flower Ohh now I got your point, it was also my first main idea but after seeing NEAT a...

And also I think it's still not that efficient to use RL instead of/with neuroevolution for gpu performance

#

It almost "wastes" the advantage of GPU

#GPU Accelerated NEAT algorithm for trading environment