#Custom model for stock prices

133 messages · Page 1 of 1 (latest)

daring robin
#

Predicts a huge drop in price after the last known datapoint
(Green is real price, blue is predicted price)

daring robin
#

Sharp change in price, after which it just continues as if nothing has happened
The model makes all the predictions for the following days once, without knowing what its prediction for other days is.

#

(Same problem happens with both Transformer and LSTM architectures)

#

===

Training and validation data split seems correct:
Green is training data, blue is evaluation input, red is expected eval output (real price)

daring robin
#

Please ping me if you might know a possible reason to this issue

paper solstice
daring robin
#

Transformer model was Encoder-only, now I changed it to be Decoder-only with causal mask and RoPE.

#

Same issue persists even with LSTM implementation

#

(I already work on this issue for a couple of days now, and each day I update mine csv dataset files, so the last known data point also moves. So it's not like on that date there was some huge unseen spike in the input data)

paper solstice
#

And how do your training graphs look?

#

And also I don't think that the decoder-only model with causal masks makes sense for predicting in chunks.
You are basically predicting timesteps that are 30 days away all the time

gusty hazel
#

@daring robin And on how many examples did u train it on?

daring robin
daring robin
daring robin
paper solstice
daring robin
#

This is supposed to be validation prediction

paper solstice
# daring robin So what could be a good choice in this situation?

I'm not sure what's best.
But I think you should do full self-attention, if you are predicting whole 30 day blocks at a time.
When you use causal mask, it hides embeddings, that are in the future for each token.
Normally, you would use it to predict the direct next output.
input1 sees [input1] and predicts input2
input2 sees [input1, input2] and predicts input3
...

But in you case
input1 sees [input1] and predicts input31
input2 sees [input1, input2] and predicts input32
...

paper solstice
daring robin
#

It seems like the validation data might leak into training or something, because the validation predictions are too good...

daring robin
daring robin
#

@paper solstice Fixed. This looks more like it. RMSE is % deviation from real data, but those are on the one of the training batches, so not really real-world scenario, fixing that aswell now.

#

It follows the real price quite well (since it was trained on it), but once it has to predict for the next day it just drops like crazy (-97.08% in this case)

#

And this is really weird, because the model just predicts all 30 days at once, so it's doesn't accumulate errors.

#

You can also see "ghost" predictions, that start a little earlier, they also drop in the same exact place. This shows that the issue is not in one of the model outputs, but persists even with offset.

paper solstice
daring robin
#

2.0 loss is pretty bad

paper solstice
#

it didn't lower at all, so the model isn't learning any generalization

daring robin
#

Ah

#

Uh..

#

Why?

paper solstice
#

And what architecture do you have now?

daring robin
# paper solstice And what architecture do you have now?

def apply_rope(x):
    # x: (batch, seq_len, dim)
    # RoPE expects even dim
    batch, seq_len, dim = x.shape
    assert dim % 2 == 0, "Model dim must be even for RoPE"
    half_dim = dim // 2
    pos = torch.arange(seq_len, device=x.device).unsqueeze(1)  # (seq_len, 1)
    freq = torch.exp(-math.log(10000) * torch.arange(0, half_dim, device=x.device) / half_dim)  # (half_dim,)
    angles = pos * freq  # (seq_len, half_dim)
    cos = torch.cos(angles)
    sin = torch.sin(angles)
    x1, x2 = x[..., :half_dim], x[..., half_dim:]
    x_rope = torch.cat([x1 * cos - x2 * sin, x1 * sin + x2 * cos], dim=-1)
    return x_rope

# Transformer model
class StockTransformer(nn.Module):
    def __init__(self, input_dim, model_dim, num_heads, num_layers, dropout):
        super().__init__()
        self.input_dim = input_dim
        self.model_dim = model_dim
        self.embedding = nn.Linear(input_dim, model_dim)
        self.decoder_layers = nn.ModuleList([
            nn.TransformerDecoderLayer(
                d_model=model_dim, nhead=num_heads, dropout=dropout, batch_first=True
            ) for _ in range(num_layers)
        ])
        self.decoder_norm = nn.LayerNorm(model_dim)
        self.output = nn.Linear(model_dim, 1)

    def forward(self, x):
        # x: (batch, seq_len, input_dim)
        x = self.embedding(x)  # (batch, seq_len, model_dim)
        x = apply_rope(x)      # (batch, seq_len, model_dim)
        tgt = x
        memory = torch.zeros_like(x)  # dummy, not used

        seq_len = x.size(1)
        # Causal mask: (seq_len, seq_len), True means masked
        mask = torch.triu(torch.ones(seq_len, seq_len, device=x.device), diagonal=1).bool()
        for layer in self.decoder_layers:
            tgt = layer(tgt, memory, tgt_mask=mask)
        tgt = self.decoder_norm(tgt)
        out = self.output(tgt)  # (batch, seq_len, 1)
        return out.squeeze(-1)

#

@paper solstice Is that okay?

#

Or should I use full self-attention (no causal mask), so each output position can attend to all positions in the input block?

paper solstice
#

But what are you using as inputs and targets?

daring robin
#

Not good, but just a bit of movement in a correct direction

paper solstice
daring robin
# paper solstice And are you normalizing the data?

Yes:

# Fit scaler only on train, then transform both
self.scaler = StandardScaler()
feature_cols = ["Pct_Change", "High_pct", "Low_pct", "Volume_pct"] + [col for col, _ in ext_ticker_cols]
train_features = self.train_df[feature_cols]
self.scaler.fit(train_features)
self.train_df[feature_cols] = self.scaler.transform(train_features)
self.val_df[feature_cols] = self.scaler.transform(self.val_df[feature_cols])
self.test_df[feature_cols] = self.scaler.transform(self.test_df[feature_cols])
#

.
Also removed causal mask

def forward(self, x):
    # x: (batch, seq_len, input_dim)
    x = self.embedding(x)  # (batch, seq_len, model_dim)
    x = apply_rope(x)      # (batch, seq_len, model_dim)
    tgt = x
    memory = torch.zeros_like(x)  # dummy, not used

    # No causal mask for block prediction
    for layer in self.decoder_layers:
        tgt = layer(tgt, memory, tgt_mask=None)
    tgt = self.decoder_norm(tgt)
    out = self.output(tgt)  # (batch, seq_len, 1)
    return out.squeeze(-1)

And the result is somewhat better now:

paper solstice
#

but I though even with that the model would be able to learn atleast something

#

you could also try to predict just the next value

#

and leave the causal mask

daring robin
#

What's strange is that sometimes it can predict just good.
For example here you can see it making predictions for INTC (Intel stock which it didn't see during training at all):

#

There is still a drop, but not as big

#

Testing it on some other stocks, it also has that "drop" on many, but here is NVDA for example, where it has no crazy fluctuations:

daring robin
paper solstice
daring robin
#

It pisses me off 😂

#

Just trained a new model with different hyperparameters, and fewer inputs.
Now it goes to different price levels, so it's pct_change for that day kinda depends on where it starts from. But it still happens at that one date:

#

(It could seem like it drops at different dates now, but I think it's because of the working days, and that sometimes it starts to predict from weekends, stuff like that)

paper solstice
#

and are you scaling the targets properly?

paper solstice
#

you could try encoder-decoder, but I have no idea if it'd perform better

daring robin
paper solstice
daring robin
paper solstice
daring robin
#

Pretty basic for now

paper solstice
#

I'm curious if it is overfitting on those features

daring robin
#

But it trains on like 25 years of data, months and weekdays repeat and overlap, so I guess that shouldn't be a thing to overfit on

#

I am curious to try tho

#

While training new model, testing the old one, apperently it can also do this:

#

Trained. This is rather strange looking graph

#

Guess what

#
Input window (all days, all features):
Pct_Change | High_pct | Low_pct | Volume_pct |
Day 1: -0.075 | -0.768 | 0.100 | 0.243 |
...
Day 30: 0.688 | 0.736 | 0.007 | 0.317 |

Top input factors for biggest predicted price jump (Day 11):
SI=F_pct: 2.859
HG=F_pct: 2.601
GC=F_pct: 2.519```
daring robin
#

This is so weird, I don't even know what to do

#

Maybe start completely from scratch

paper solstice
#

isn't the SI, HG and GC silver, copper and gold?

daring robin
daring robin
paper solstice
#

Did you try to train in super clean?
Like only ["Pct_Change", "High_pct", "Low_pct", "Volume_pct"]

daring robin
#

Still does The Drop

#

Some information if useful:

Input window (all days, all features):
Pct_Change | High_pct | Low_pct | Volume_pct
Day 1: 0.446 | -0.259 | 0.075 | 0.385
Day 2: 0.033 | -0.066 | 0.783 | -0.693
Day 3: -0.594 | -0.096 | -0.490 | 0.828
Day 4: 0.960 | 2.561 | 0.846 | 0.909
Day 5: 0.450 | 0.854 | 0.464 | -0.120
Day 6: -0.547 | -0.514 | 0.445 | -1.137
Day 7: 0.902 | 0.212 | 0.837 | -0.010
Day 8: -0.155 | -0.111 | 0.426 | -0.313
Day 9: -0.147 | -0.684 | -0.370 | -0.445
Day 10: -0.068 | -0.532 | -0.596 | 1.310
Day 11: -0.609 | -0.116 | 0.723 | -0.650
Day 12: -0.648 | -0.479 | 0.035 | 0.154
Day 13: 0.362 | -0.305 | 0.749 | -0.903
Day 14: -0.008 | -0.623 | -0.159 | 0.496
Day 15: 1.125 | 0.434 | 0.898 | -0.189
Day 16: 0.500 | -0.177 | 0.835 | -0.509
Day 17: 0.469 | 0.809 | 0.419 | 1.944
Day 18: -0.294 | -0.630 | -0.201 | -1.059
Day 19: -0.377 | -0.646 | 0.367 | -0.783
Day 20: -0.253 | 0.770 | 0.879 | 0.438
Day 21: 0.389 | -0.084 | 0.868 | -0.175
Day 22: -0.209 | -0.272 | 0.481 | -0.320
Day 23: -0.579 | -0.643 | -0.147 | 0.120
Day 24: -1.426 | -0.694 | -2.297 | 2.872
Day 25: -0.355 | -0.448 | -0.783 | -0.597
Day 26: 0.301 | -0.369 | 0.589 | -0.707
Day 27: 0.799 | 0.829 | 0.867 | -0.387
Day 28: 0.559 | -0.397 | -0.047 | -0.349
Day 29: 0.990 | 0.912 | 0.216 | 4.289
Day 30: -0.505 | -0.680 | -1.347 | -0.923
Top input factors for biggest predicted price jump (Day 11):
Low_pct: 0.723
Volume_pct: -0.650
Pct_Change: -0.609
paper solstice
#

I don't really know anymore

#

And is the drop there for all predictions that have 2025-07-14 in their prediction window?

#

or only last x < 30

#

I can't really tell from the picture exactly how many lines with drop are there

daring robin
daring robin
#

I already have this issue for quite a few days now, and each time I update my training dataset with new data and train new model, then that date where the model drops shifts +1 day. So it's not about that specific date, it's that after it model has never seen anything. Which is logical, but doesn't explain the sudden instability, and why it's only for that one date. If it was overfitting or some other kind of issue I would expect the whole prediction line to be choopy and all over the place, but it's quite okay except that one day.

daring robin
#

Validation being lower at first makes sense, since here I trained it with 0.3 dropout

#

I took a model at epoch 14 (low validation loss) and it still resulted in that drop

#

drop drdop drop drop I am going crazyt

#

Uh, oh. Hmm. If I start prediction after that day, it's all fine actually

#

To say that I am confused is a huge understatement

daring robin
# daring robin Uh, oh. Hmm. If I start prediction after that day, it's all fine actually

You know what @paper solstice, look there. The pattern after the drop is exactly the same as in the last most visible prediction, same movement a bit higher and then down and a bit up again. So the model works exactly the same, and predictions for each day are generalized pretty okay. If I just replace that drop with 0 pct_change for that day, it will look almost identical

#

I am thinking that this could be the Scaler/Normalization Issue: the inverse transform (when converting predicted pct_change back to price) can do such weird thing with numbers.

daring robin
#

I just manually cut last 60 days for each ticker in training data, and went back in time to predict price changes in April. There is that drop! I caught it in 4K

#

And if I go to the current date it's fine, no drop, because now we are so much in the future

#

So, this issue does not come from:

  • Scaler/Normalization
  • Some spike in the input data right before 07-17
  • Bad model architecture or hyperparameters
  • Overfitting on the data

It looks like:

  • The price drops in prediction right after exiting the training dataframe
paper solstice
paper solstice
daring robin
#

Working on it now

#

Crazy atm

#

Added +10 predictions with offset for each ticker during training, looks much more smooth:

#

Validation loss is a bit smaller at the start cuz of the dropout. But yeah, we can clearly see that it doesn't generalize at all, only gets worse as the training goes. Overfitting?

paper solstice
#

Id say use like 80-20 split

#

Especially if your're experimenting with different approaches

hearty stratus
# daring robin Still does **The Drop**

I think its because you give the model too much room to predict. Assuming that the price wont drop less than 10% in a day, you can set that as the “largest” prediction it can make.

Depending on how you want to structure your code you can make something like this, so for example if the current price it as 20$, it couldn’t go under 18$ (you know better how to structure your code)

[l_prediction = current_price - (current_price * 0.9)]

Let me know if it worked!

daring robin
daring robin
hearty stratus
daring robin
hearty stratus
#

Depending on how you structured the AI it could also be a problem with the math, if you calculate the derivative on the accuracy

daring robin
#

@paper solstice I think I might have found the issue, or at least some part of it

#

The gold data does not exist till 2001 or so

#

But I think I've tried to train a model without external tickers like gold and still got an issue