#Custom model for stock prices
133 messages · Page 1 of 1 (latest)
Sharp change in price, after which it just continues as if nothing has happened
The model makes all the predictions for the following days once, without knowing what its prediction for other days is.
(Same problem happens with both Transformer and LSTM architectures)
===
Training and validation data split seems correct:
Green is training data, blue is evaluation input, red is expected eval output (real price)
Please ping me if you might know a possible reason to this issue
bro you need to provide more info...
Transformer model that takes 30 days * 7 features as input and predicts price changes for the following 30 days at once.
Transformer model was Encoder-only, now I changed it to be Decoder-only with causal mask and RoPE.
Same issue persists even with LSTM implementation
(I already work on this issue for a couple of days now, and each day I update mine csv dataset files, so the last known data point also moves. So it's not like on that date there was some huge unseen spike in the input data)
Is there just that one spike in the validation data, or are all validation predictions bad?
And how do your training graphs look?
And also I don't think that the decoder-only model with causal masks makes sense for predicting in chunks.
You are basically predicting timesteps that are 30 days away all the time
@daring robin And on how many examples did u train it on?
Loss graph looks okay
So what could be a good choice in this situation?
Data from around 2000 till today. Sometimes for a few stocks, sometimes up to 30 stocks in the same field (tech for ex.). Batches shuffled each epoch
Even validation loss? This looks like overfitting to me
This is supposed to be validation prediction
I'm not sure what's best.
But I think you should do full self-attention, if you are predicting whole 30 day blocks at a time.
When you use causal mask, it hides embeddings, that are in the future for each token.
Normally, you would use it to predict the direct next output.
input1 sees [input1] and predicts input2
input2 sees [input1, input2] and predicts input3
...
But in you case
input1 sees [input1] and predicts input31
input2 sees [input1, input2] and predicts input32
...
do you track validation loss during training?
It seems like the validation data might leak into training or something, because the validation predictions are too good...
yeah, looks like it
Found the issue, fixing it rn
@paper solstice Fixed. This looks more like it. RMSE is % deviation from real data, but those are on the one of the training batches, so not really real-world scenario, fixing that aswell now.
It follows the real price quite well (since it was trained on it), but once it has to predict for the next day it just drops like crazy (-97.08% in this case)
And this is really weird, because the model just predicts all 30 days at once, so it's doesn't accumulate errors.
You can also see "ghost" predictions, that start a little earlier, they also drop in the same exact place. This shows that the issue is not in one of the model outputs, but persists even with offset.
Looking at your validation loss, it's not that weird
2.0 loss is pretty bad
it didn't lower at all, so the model isn't learning any generalization
And what architecture do you have now?
def apply_rope(x):
# x: (batch, seq_len, dim)
# RoPE expects even dim
batch, seq_len, dim = x.shape
assert dim % 2 == 0, "Model dim must be even for RoPE"
half_dim = dim // 2
pos = torch.arange(seq_len, device=x.device).unsqueeze(1) # (seq_len, 1)
freq = torch.exp(-math.log(10000) * torch.arange(0, half_dim, device=x.device) / half_dim) # (half_dim,)
angles = pos * freq # (seq_len, half_dim)
cos = torch.cos(angles)
sin = torch.sin(angles)
x1, x2 = x[..., :half_dim], x[..., half_dim:]
x_rope = torch.cat([x1 * cos - x2 * sin, x1 * sin + x2 * cos], dim=-1)
return x_rope
# Transformer model
class StockTransformer(nn.Module):
def __init__(self, input_dim, model_dim, num_heads, num_layers, dropout):
super().__init__()
self.input_dim = input_dim
self.model_dim = model_dim
self.embedding = nn.Linear(input_dim, model_dim)
self.decoder_layers = nn.ModuleList([
nn.TransformerDecoderLayer(
d_model=model_dim, nhead=num_heads, dropout=dropout, batch_first=True
) for _ in range(num_layers)
])
self.decoder_norm = nn.LayerNorm(model_dim)
self.output = nn.Linear(model_dim, 1)
def forward(self, x):
# x: (batch, seq_len, input_dim)
x = self.embedding(x) # (batch, seq_len, model_dim)
x = apply_rope(x) # (batch, seq_len, model_dim)
tgt = x
memory = torch.zeros_like(x) # dummy, not used
seq_len = x.size(1)
# Causal mask: (seq_len, seq_len), True means masked
mask = torch.triu(torch.ones(seq_len, seq_len, device=x.device), diagonal=1).bool()
for layer in self.decoder_layers:
tgt = layer(tgt, memory, tgt_mask=mask)
tgt = self.decoder_norm(tgt)
out = self.output(tgt) # (batch, seq_len, 1)
return out.squeeze(-1)
@paper solstice Is that okay?
Or should I use full self-attention (no causal mask), so each output position can attend to all positions in the input block?
I think the model itself looks good
But what are you using as inputs and targets?
I rerun training on only GOOGL and MSFT, and it did manage to generalize them
Not good, but just a bit of movement in a correct direction
And are you normalizing the data?
Yes:
# Fit scaler only on train, then transform both
self.scaler = StandardScaler()
feature_cols = ["Pct_Change", "High_pct", "Low_pct", "Volume_pct"] + [col for col, _ in ext_ticker_cols]
train_features = self.train_df[feature_cols]
self.scaler.fit(train_features)
self.train_df[feature_cols] = self.scaler.transform(train_features)
self.val_df[feature_cols] = self.scaler.transform(self.val_df[feature_cols])
self.test_df[feature_cols] = self.scaler.transform(self.test_df[feature_cols])
.
Also removed causal mask
def forward(self, x):
# x: (batch, seq_len, input_dim)
x = self.embedding(x) # (batch, seq_len, model_dim)
x = apply_rope(x) # (batch, seq_len, model_dim)
tgt = x
memory = torch.zeros_like(x) # dummy, not used
# No causal mask for block prediction
for layer in self.decoder_layers:
tgt = layer(tgt, memory, tgt_mask=None)
tgt = self.decoder_norm(tgt)
out = self.output(tgt) # (batch, seq_len, 1)
return out.squeeze(-1)
And the result is somewhat better now:
yeah, that makes sense
but I though even with that the model would be able to learn atleast something
you could also try to predict just the next value
and leave the causal mask
But it still breaks around last seen during training data point
What's strange is that sometimes it can predict just good.
For example here you can see it making predictions for INTC (Intel stock which it didn't see during training at all):
There is still a drop, but not as big
Testing it on some other stocks, it also has that "drop" on many, but here is NVDA for example, where it has no crazy fluctuations:
The problem with that, is that I give the model many inputs from real world, which I suppose it would also need to predict in order to make it autoregressive, so it could "walk" its way in the future. But that makes it more complicated
it's weird
I agree
It pisses me off 😂
Just trained a new model with different hyperparameters, and fewer inputs.
Now it goes to different price levels, so it's pct_change for that day kinda depends on where it starts from. But it still happens at that one date:
(It could seem like it drops at different dates now, but I think it's because of the working days, and that sometimes it starts to predict from weekends, stuff like that)
and are you scaling the targets properly?
oh right
you could try encoder-decoder, but I have no idea if it'd perform better
That could improve accuracy, but will not fix the problem we are facing...
And the blue line is the finall prediction?
Yes, the blue line with dots is the last prediction
what features are you using?
Input: 30 days * ["Weekday", "Month", "Pct_Change", "High_pct", "Low_pct", "Volume_pct"]
Output: 30 days * "Pct_change"
Pretty basic for now
can you try to train again without weekday and month?
I'm curious if it is overfitting on those features
Could be, I will try
But it trains on like 25 years of data, months and weekdays repeat and overlap, so I guess that shouldn't be a thing to overfit on
I am curious to try tho
While training new model, testing the old one, apperently it can also do this:
Trained. This is rather strange looking graph
Guess what
Input window (all days, all features):
Pct_Change | High_pct | Low_pct | Volume_pct |
Day 1: -0.075 | -0.768 | 0.100 | 0.243 |
...
Day 30: 0.688 | 0.736 | 0.007 | 0.317 |
Top input factors for biggest predicted price jump (Day 11):
SI=F_pct: 2.859
HG=F_pct: 2.601
GC=F_pct: 2.519```
what does this mean?
Some information, like what the inputs look like, and what inputs exactly contributed the most to the sspike in price
This is so weird, I don't even know what to do
Maybe start completely from scratch
But if I understand it correctly, you are also passing comodities as inputs?
isn't the SI, HG and GC silver, copper and gold?
Correct
Sometimes I do, sometimes I train it simpler, without them, but it has no affect on that weird glitch
I mean it says that those values have the largest effect on the jump
Did you try to train in super clean?
Like only ["Pct_Change", "High_pct", "Low_pct", "Volume_pct"]
Will do now
Okay, so it produces quite a strange result. It clearly underperforms, and lacks accuracy. Also the validation loss behaves strangely
Still does The Drop
Some information if useful:
Input window (all days, all features):
Pct_Change | High_pct | Low_pct | Volume_pct
Day 1: 0.446 | -0.259 | 0.075 | 0.385
Day 2: 0.033 | -0.066 | 0.783 | -0.693
Day 3: -0.594 | -0.096 | -0.490 | 0.828
Day 4: 0.960 | 2.561 | 0.846 | 0.909
Day 5: 0.450 | 0.854 | 0.464 | -0.120
Day 6: -0.547 | -0.514 | 0.445 | -1.137
Day 7: 0.902 | 0.212 | 0.837 | -0.010
Day 8: -0.155 | -0.111 | 0.426 | -0.313
Day 9: -0.147 | -0.684 | -0.370 | -0.445
Day 10: -0.068 | -0.532 | -0.596 | 1.310
Day 11: -0.609 | -0.116 | 0.723 | -0.650
Day 12: -0.648 | -0.479 | 0.035 | 0.154
Day 13: 0.362 | -0.305 | 0.749 | -0.903
Day 14: -0.008 | -0.623 | -0.159 | 0.496
Day 15: 1.125 | 0.434 | 0.898 | -0.189
Day 16: 0.500 | -0.177 | 0.835 | -0.509
Day 17: 0.469 | 0.809 | 0.419 | 1.944
Day 18: -0.294 | -0.630 | -0.201 | -1.059
Day 19: -0.377 | -0.646 | 0.367 | -0.783
Day 20: -0.253 | 0.770 | 0.879 | 0.438
Day 21: 0.389 | -0.084 | 0.868 | -0.175
Day 22: -0.209 | -0.272 | 0.481 | -0.320
Day 23: -0.579 | -0.643 | -0.147 | 0.120
Day 24: -1.426 | -0.694 | -2.297 | 2.872
Day 25: -0.355 | -0.448 | -0.783 | -0.597
Day 26: 0.301 | -0.369 | 0.589 | -0.707
Day 27: 0.799 | 0.829 | 0.867 | -0.387
Day 28: 0.559 | -0.397 | -0.047 | -0.349
Day 29: 0.990 | 0.912 | 0.216 | 4.289
Day 30: -0.505 | -0.680 | -1.347 | -0.923
Top input factors for biggest predicted price jump (Day 11):
Low_pct: 0.723
Volume_pct: -0.650
Pct_Change: -0.609
that's so weird
I don't really know anymore
And is the drop there for all predictions that have 2025-07-14 in their prediction window?
or only last x < 30
I can't really tell from the picture exactly how many lines with drop are there
So it's 10 lines with offset -1 day for each. Last day prediction being the most visible dashed blue line with dots
Basically yes, but this 07-14 is being close to the last known data point to the model, that is saw during training
I already have this issue for quite a few days now, and each time I update my training dataset with new data and train new model, then that date where the model drops shifts +1 day. So it's not about that specific date, it's that after it model has never seen anything. Which is logical, but doesn't explain the sudden instability, and why it's only for that one date. If it was overfitting or some other kind of issue I would expect the whole prediction line to be choopy and all over the place, but it's quite okay except that one day.
Validation being lower at first makes sense, since here I trained it with 0.3 dropout
I took a model at epoch 14 (low validation loss) and it still resulted in that drop
drop drdop drop drop I am going crazyt
Uh, oh. Hmm. If I start prediction after that day, it's all fine actually
To say that I am confused is a huge understatement
You know what @paper solstice, look there. The pattern after the drop is exactly the same as in the last most visible prediction, same movement a bit higher and then down and a bit up again. So the model works exactly the same, and predictions for each day are generalized pretty okay. If I just replace that drop with 0 pct_change for that day, it will look almost identical
I am thinking that this could be the Scaler/Normalization Issue: the inverse transform (when converting predicted pct_change back to price) can do such weird thing with numbers.
No bro
I just manually cut last 60 days for each ticker in training data, and went back in time to predict price changes in April. There is that drop! I caught it in 4K
And if I go to the current date it's fine, no drop, because now we are so much in the future
So, this issue does not come from:
- Scaler/Normalization
- Some spike in the input data right before 07-17
- Bad model architecture or hyperparameters
- Overfitting on the data
It looks like:
- The price drops in prediction right after exiting the training dataframe
and how big validation split are you using for this evaluation? It looks super unstable, so I guess it's kinda small
Maybe some error in the data preparation/prediction code? Like accidentally including padding or something like that
test_size = window_size + predict_size
So it's only one prediction per each ticker. Yeah, very small
Working on it now
Crazy atm
Added +10 predictions with offset for each ticker during training, looks much more smooth:
Validation loss is a bit smaller at the start cuz of the dropout. But yeah, we can clearly see that it doesn't generalize at all, only gets worse as the training goes. Overfitting?
wait, you only have one validation sample?
Id say use like 80-20 split
Especially if your're experimenting with different approaches
I think its because you give the model too much room to predict. Assuming that the price wont drop less than 10% in a day, you can set that as the “largest” prediction it can make.
Depending on how you want to structure your code you can make something like this, so for example if the current price it as 20$, it couldn’t go under 18$ (you know better how to structure your code)
[l_prediction = current_price - (current_price * 0.9)]
Let me know if it worked!
That's a way to just clip this instability, but it will still be there and could potentially ruin the prediction even with that custom clipping. Thanks for the advice tho
Now I have 11 validations on the last known data for each ticker. So let's say I train it on 10 tickers, that would result in 110 entries for validation
Try it and see if it still drops, also how does the model predict the future prices? There might be a problem there
Logically thinking there will be the same drop on the same date, but now smaller, clipped to be only -10%.
That's why I say it won't fix the issue
Doesn’t need to be true, in your example it also doesn’t go straight to zero, you also don’t know how the AI learns the predictions
Depending on how you structured the AI it could also be a problem with the math, if you calculate the derivative on the accuracy