Is there anyway to make this faster? | Learn AI Together | Page 1

late finch Apr 12, 2023, 8:12 PM

#

I am writing a gym environment for my reinforcement learning agent to train in, but it is slow as heck. I am using numpy for all my operations, there is a single loop in my code (the while in the init to make sure the sample is within the range). I am thinking that maybe pandas is the reason why its slow, or maybe im just writing slow inefficient code, in any case id appreciate a second set of eyes
https://paste.pythondiscord.com/isepapabuw (please ping replies thanks <3)

late finch Apr 12, 2023, 9:10 PM

#

profiled the environment using cProfile and as i thought, its mostly pandas

this is the code i used to profile it

from .env import TradingSimulator, Action
import cProfile
import io
import pstats
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--out', type=str, default='./envprofile.csv', nargs='?')
out = parser.parse_args().out

def test_trading_env() -> None:
    env = TradingSimulator()
    state, _ = env.reset()
    assert state.shape == (2, env.seq_length + 2)
    action = Action(0.4, 0.3, 0.3)
    state, reward, done, _,  _ = env.step(action)
    assert state.shape == (2, env.seq_length + 2)
    assert isinstance(reward, float)
    assert isinstance(done, bool)

## profiling the environment
if __name__ == '__main__':
    pr = cProfile.Profile()
    pr.enable()
    for _ in range(1000):
        test_trading_env()
    pr.disable()

    result = io.StringIO()
    ps = pstats.Stats(pr, stream=result).print_stats()
    result = result.getvalue()

    result='ncalls'+result.split('ncalls')[-1]
    result='\n'.join([','.join(line.rstrip().split(None,5)) for line in result.split('\n')])
    with open(out, 'w+') as f:
        #f=open(result.rsplit('.')[0]+'.csv','w')
        f.write(result)
        f.close()

#

im still not blaming pandas completely, it could be that I am using it wrong, my _calc_state function is what spends the most time, and then the getting of data from pandas in there is what slows it down, perhaps there is a better way to use pandas there

#

    def _calc_state(self) -> None:
        ## state is shape (assets, assets + 288)
        ## first calculate features matrix (assets, 288)
        features = np.concatenate([self.btc_price_data[self.current_step-self.seq_length:self.current_step]['close'].to_numpy(),
                                   self.eth_price_data[self.current_step-self.seq_length:self.current_step]['close'].to_numpy()]).reshape((2,self.seq_length))
        ## price relative vector (assets, 288)
        price_relative = np.zeros((2, self.seq_length))
        open_prices = np.array([self.btc_price_data['open'][self.current_step - self.seq_length], self.eth_price_data['open'][self.current_step - self.seq_length]])
        price_relative[:, 0] = features[:, 0] / open_prices
        price_relative[:, 1:] = features[:, 1:] / features[:, :-1]
        ## concatenate features and price relative covariance
        self.state = np.concatenate([features, np.cov(price_relative)], axis=1)

this is the calc state function

runic hearth Apr 15, 2023, 10:05 AM

#

My first thought is that you create a lot of temporary variables which involves a lot of manipulation of the arrays. Now you have the logic sorted, can you reduce that?

late finch Apr 15, 2023, 6:48 PM

#

runic hearth My first thought is that you create a lot of temporary variables which involves ...

not really much i can do tbh, i did make it 7x faster by switching to polars and using float32s instead of float64s though

runic hearth Apr 15, 2023, 7:59 PM

#

Nice!

unreal osprey Apr 15, 2023, 10:05 PM

#

late finch im still not blaming pandas completely, it could be that I am using it wrong, my...

Have you tried pandas 2.0?

late finch Apr 15, 2023, 10:32 PM

#

unreal osprey Have you tried pandas 2.0?

never heard of it tbh, but I switched to polars and that is so much faster

#

is it a different package than pandas? my pandas package is up to date

unreal osprey Apr 15, 2023, 10:44 PM

#

late finch is it a different package than pandas? my pandas package is up to date

I haven’t tried, but supposedly Arrow was used instead of numpy, so should work well with large data

#

It’s just the latest updates

#

For the longest time, that was the reason why it’s slow

late finch Apr 15, 2023, 11:49 PM

#

unreal osprey It’s just the latest updates

i doubt that tbh

#

because polars also uses apache arrow

#

and the speedup was obvious when i switched to it

runic hearth Apr 16, 2023, 5:27 PM

#

I keep seeing things for polar but never tried it, definitely will now.

late finch Apr 16, 2023, 10:27 PM

#

runic hearth I keep seeing things for polar but never tried it, definitely will now.

its not that different from pandas

#

in my case i just had to rewrite like 3 loc but the logic remained largely the same

#Is there anyway to make this faster?