#GPU Accelerated NEAT algorithm for trading environment

40 messages · Page 1 of 1 (latest)

torpid flower
#

Hi all,

I've been working on a GPU-accelerated implementation of the NEAT (NeuroEvolution of Augmenting Topologies) algorithm, tailored for a trading simulation environment. The core architecture is mostly complete and currently leverages CUDA to speed up population evaluation significantly

The system is functional, but there are still some key areas open for contribution and improvement:

  1. Implementing advanced crossover/mutation strategies beyond the basic NEAT spec

  2. Handling memory related issues during crossover and mutation (some bugs remain)

  3. Integrating additional layer 2 market data for richer training input

I'd be glad if anyone finds this project interesting and would like to contribute or collaborate.

plain steeple
#

What are the pre requesites for this project?

torpid flower
#

Competence on neuroevolutionary algorithms is not that much important because it's not that hard to learn in short time

plain steeple
#

Oh, but I am beginner in CUDA, so not much proficient with backend

torpid flower
#

If you have experience with neuroevolutionary algorithms and if someone collaborates with us who is experienced with CUDA then we can collaborate I think

plain steeple
#

No experience with that as well

vestal magnet
grizzled hill
#

Yo @vestal magnet saw you on the same QC server

#

Nice to meet you man

torpid flower
grizzled hill
#

Sounds good

#

If you have your group please let me in

vivid rivet
# torpid flower Hi all, I've been working on a GPU-accelerated implementation of the NEAT (Neur...

Hey,

I've been working on a similar project for almost a year but didn't start using CUDA until about 5 months ago. My own method is to use tons of public trading algorithms and then calculate signals (buy, sell, or nothing) on each bar across a dataset that I've collected from the brokerage. Then it calculates the pnl, winrate, etc on that data and sends it to a second kernel which checks all of the values. Basically, I hoist 100s of trading algorithms into device functions that are called from a single kernel which calls them at random changing the backtest values and entry/exit algorithms along with their arguments at random. I originally used the cpu via the TA-LIB library which worked great but rewriting it has been very tough since its an old library thats hard to read so I simply deviated and just made similar gpu compatible functions that are very close to the cpu functions. I made a little money with this last year on my cpu version however it was only about 6-7 months later when I realized how much faster and better I could make it by using the gpu. So far my backtest is extremely close to actual market conditions (tested via a demo account several times) and for results I'm using linear regression to aggregate values based on how consistent the algorithm is along with the total pnl the backtester gives at the end. Recently I've figured out that a single dataset doesn't seem to be enough to find good algorithms (some just seem to be good at that particular dataset and not market data weeks later), so I've begun using a different dataset weeks after the other one, and all seems to be going well so far.

#

I originally started using tensorflow and creating LSTM models but I figured out that it was just way too slow and it never really gave any real results for me. So, I moved on to simply make a strategy guesser/bruteforcer which has worked much better and has given me results.

torpid flower
# vivid rivet I originally started using tensorflow and creating LSTM models but I figured out...

I also made a QR-DQN based RL model but it was just fitting only the part that it's currently being trained, for example it fits very well at starting of the data and after like 500 steps it fits for new trend and forgets the other part (no matter how much I play with learning rate, batch size or other parameters). So making a general model is very hard with RL models. I also tried supervised learning with zigzag algorithm but this time it just overfitted too much. It was going from 10k to billions in train data and billions to a few ks in test data. And also the other problem is RL training has to be sequential and because of that it's also very very slow and can not be parallelised

torpid flower
vivid rivet
# torpid flower I also made a QR-DQN based RL model but it was just fitting only the part that i...

Interesting, yeah I have the same problem with sequential data. What I do is for loop where it starts at the lookback period and calculates the algorithm on the data, then increment by 1 bar each time to generate signals, and once those signals have been aggregated then you can begin to do trading logic. I've looked into trying to optimize it but so far the only thing I can come up with is basic stuff like for functions like ema, macd, bbands, stochastic, rsi, etc you only calculate the last element in the data your calculating to avoid wasting gpu time.

torpid flower
#

In that project, it prepares data (adding some indicators, scaling data between 1 and -1 without seeing the future data (normally minmax scaler's using future data and because of that I have to use rolling minmax scaler)) after that it's dividing the data with window sizes to make each input chunks as different neural net input and parallelly calculates each input chunk on every individual at the population (using multiple datasets to avoid overfitting) and based on MDD and ROI it calculates fitness score (using calmar ratio) and based on that fitness score it chooses the elite ones and crossovers the individuals and then to explore better network stractures it mutates the final networks

ruby pine
ruby pine
torpid flower
torpid flower
#

Generally it's important to have a good background but everything is pointless if you are not doing anything with it

ruby pine
#

so basic evolutionary nns use mutation to explore the space of all possible states and responces (in RL we can call this policy space) then we treat some parameters as genes changing them based of different condition (breeding, timesteps, splitting ect). So this algo would start with a population of say 1000. We would randomly pair actors (500 pairs). The best performing one would give their 'genes' to the other one (this could look like distillation or if we have modular networks be can copy over a gene) then each one has there fitness measured and according to that they may survive to the next generation and reproduce (then around 1000 are left in total, we keep pop constant for gpu reasons I think). First they swap then they reporduce... There are bacteria that do this in the real world

#

Called horizontal gene transfer

#

Always a crazy concept to me

torpid flower
ruby pine
torpid flower
torpid flower
#

Just not like a few messages per week

ruby pine
#

At least that's what I'm thinking... I certainly hope it trains faster than 1 agent alone. I doubt it will help in the stock market though... but definitly worth a try

ruby pine
torpid flower
torpid flower
torpid flower
#

It almost "wastes" the advantage of GPU