#Is there anyway to make this faster?

19 messages · Page 1 of 1 (latest)

late finch
#

I am writing a gym environment for my reinforcement learning agent to train in, but it is slow as heck. I am using numpy for all my operations, there is a single loop in my code (the while in the init to make sure the sample is within the range). I am thinking that maybe pandas is the reason why its slow, or maybe im just writing slow inefficient code, in any case id appreciate a second set of eyes
https://paste.pythondiscord.com/isepapabuw (please ping replies thanks <3)

late finch
#

profiled the environment using cProfile and as i thought, its mostly pandas

this is the code i used to profile it

from .env import TradingSimulator, Action
import cProfile
import io
import pstats
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--out', type=str, default='./envprofile.csv', nargs='?')
out = parser.parse_args().out

def test_trading_env() -> None:
    env = TradingSimulator()
    state, _ = env.reset()
    assert state.shape == (2, env.seq_length + 2)
    action = Action(0.4, 0.3, 0.3)
    state, reward, done, _,  _ = env.step(action)
    assert state.shape == (2, env.seq_length + 2)
    assert isinstance(reward, float)
    assert isinstance(done, bool)

## profiling the environment
if __name__ == '__main__':
    pr = cProfile.Profile()
    pr.enable()
    for _ in range(1000):
        test_trading_env()
    pr.disable()

    result = io.StringIO()
    ps = pstats.Stats(pr, stream=result).print_stats()
    result = result.getvalue()

    result='ncalls'+result.split('ncalls')[-1]
    result='\n'.join([','.join(line.rstrip().split(None,5)) for line in result.split('\n')])
    with open(out, 'w+') as f:
        #f=open(result.rsplit('.')[0]+'.csv','w')
        f.write(result)
        f.close()
#

im still not blaming pandas completely, it could be that I am using it wrong, my _calc_state function is what spends the most time, and then the getting of data from pandas in there is what slows it down, perhaps there is a better way to use pandas there

#
    def _calc_state(self) -> None:
        ## state is shape (assets, assets + 288)
        ## first calculate features matrix (assets, 288)
        features = np.concatenate([self.btc_price_data[self.current_step-self.seq_length:self.current_step]['close'].to_numpy(),
                                   self.eth_price_data[self.current_step-self.seq_length:self.current_step]['close'].to_numpy()]).reshape((2,self.seq_length))
        ## price relative vector (assets, 288)
        price_relative = np.zeros((2, self.seq_length))
        open_prices = np.array([self.btc_price_data['open'][self.current_step - self.seq_length], self.eth_price_data['open'][self.current_step - self.seq_length]])
        price_relative[:, 0] = features[:, 0] / open_prices
        price_relative[:, 1:] = features[:, 1:] / features[:, :-1]
        ## concatenate features and price relative covariance
        self.state = np.concatenate([features, np.cov(price_relative)], axis=1)

this is the calc state function

runic hearth
#

My first thought is that you create a lot of temporary variables which involves a lot of manipulation of the arrays. Now you have the logic sorted, can you reduce that?

late finch
runic hearth
#

Nice!

unreal osprey
late finch
#

is it a different package than pandas? my pandas package is up to date

unreal osprey
#

It’s just the latest updates

#

For the longest time, that was the reason why it’s slow

late finch
#

because polars also uses apache arrow

#

and the speedup was obvious when i switched to it

runic hearth
#

I keep seeing things for polar but never tried it, definitely will now.

late finch
#

in my case i just had to rewrite like 3 loc but the logic remained largely the same