#hull-tactical-market-prediction | Kaggle | Page 1

radiant crag Sep 17, 2025, 4:06 PM

#

Glad to see this data looks like much better than mitsui

stray stump Sep 20, 2025, 8:04 PM

#

Yeah I'm not clicking those

spiral geyser Sep 20, 2025, 9:56 PM

#

Hi, I am a little bit confused by the competition description. 1. In the forecasting phase, the model would be retrained on the most recent data ? for example, in the day 10 during the forecasting phase, can the model access the data of day 1 to day 10? or the model is just trained before the forecasting period, can only see one day data everyday and cannot leverage the data of the past days? Thanks @noble portal

fathom finch Sep 21, 2025, 1:37 PM

#

Hi, I'm new and still learning. Is it normal to have difficulty getting a positive R²? I'm using GA to improve feature selection, but even so, the Sharpe ratio increases while the R² remains close to zero or lower.

noble portal Sep 22, 2025, 8:50 PM

#

spiral geyser Hi, I am a little bit confused by the competition description. 1. In the forecas...

You will be able to use latest data via online learning.

noble portal Sep 22, 2025, 8:50 PM

#

fathom finch Hi, I'm new and still learning. Is it normal to have difficulty getting a positi...

Yes, predicting market return generally involves very low R2.

dark vector Sep 23, 2025, 6:46 PM

#

Hi! Anyone in for teaming up

hallow elbow Sep 25, 2025, 5:59 PM

#

Hi Everyone, I'm new to the competition, and need some basic clarification, what's the target column that we are predicting? I see the measuring metric is shape ratio, does that means we are using ML to predict level of S&P holding (between 0-2). Thanks 😃

noble portal Sep 29, 2025, 1:06 PM

#

hallow elbow Hi Everyone, I'm new to the competition, and need some basic clarification, what...

The goal is to use the dataset to get a daily exposure to the market between 0-200% in order to maximize some variant of Sharpe Ratio.

hallow elbow Sep 29, 2025, 3:33 PM

#

noble portal The goal is to use the dataset to get a daily exposure to the market between 0-2...

yeah, i get that, but we can't use ML to output the portfolio balance level, so likely the direct target is a return? then we work out the 0-200% level according to the shape ratio (evaluation metric), is that general idea?

hallow elbow Sep 29, 2025, 3:57 PM

#

quick quesitons for market_forward_excess_returns, it says "Train set only", does that means this will be missing in actual test file? and it will be missing during "Forecasting Timeline" so we shouldn't use it as feature?

noble portal Sep 29, 2025, 5:00 PM

#

The columns available during the live phase are in the test dataset.

stray stump Oct 5, 2025, 3:56 PM

#

hallow elbow yeah, i get that, but we can't use ML to output the portfolio balance level, so ...

Of course you can

viscid zinc Oct 7, 2025, 11:59 PM

#

hello everyone i have posted an exploratory data analysis going through the data, analysing the target market_forward_excess_returns and suggesting some modelling ideas in this discussion : https://www.kaggle.com/competitions/hull-tactical-market-prediction/discussion/610981

ripe briar Oct 8, 2025, 2:54 PM

#

noble portal The goal is to use the dataset to get a daily exposure to the market between 0-2...

You can either train a model to maximise the sharpe ratio score using the position size (0 - 2) directly, but it might be easier to train a model to predict excess_returns, and then use that to pick a position size which you will submit

cosmic grove Oct 9, 2025, 4:41 PM

#

Having issus submitting. Anyone have advice?

cosmic grove Oct 9, 2025, 7:13 PM

#

Also, I opened the train.csv to take a look and all the features except for the D columns have missinng vaues up to around row 1000. Is this a mistake?

stray stump Oct 9, 2025, 10:25 PM

#

viscid zinc hello everyone i have posted an exploratory data analysis going through the data...

Cool! I had a quick look at the covariance matrix for the features (including the lagged ones in the test data) and it looks like there is a zero eigenvalue. Did you see this in your analysis?

viscid zinc Oct 9, 2025, 10:33 PM

#

stray stump Cool! I had a quick look at the covariance matrix for the features (including th...

yes but i only went as far as doing a VIF filtering (Variance Inflation Factor) to drop features with very high multicollinearity (VIF > 10) i did also mention it briefly i think how all these features might be redundant and lead to worse models overall if used all for training

#

the whole purpose of that eda was to shed some light on the data structure how to use it for the end goal (allocation) and how the features interact with the target (excess returns)
as the competition guys themselves mentioned simpler models tend to perfrom better which is what i'm also noticing after deplyoing few models with different levels of complexity

#

you could not use any one of the features they provdied and go based on just the daily returns and calculate the std yourself and you'd still get a running model that can beat some of the more complex ones

final sage Oct 10, 2025, 11:23 AM

#

anybody want to team up ?

boreal oracle Oct 12, 2025, 2:01 PM

#

someone ban this guy

crystal axle Oct 14, 2025, 12:51 AM

#

In this competition what are the target columns, these three only right ?
forward_returns
risk_free_rate
market_forward_excess_returns
we can treat this problem as multi input multi output regression problem !, correct me if I'm wrong.
Thank you !
@noble portal

noble portal Oct 14, 2025, 1:45 PM

#

crystal axle In this competition what are the target columns, these three only right ? forwar...

The columns you mention are very useful to build your daily position, but the goal of the competition is not to predict a target column but to build a daily leverage. Please read the competition details.

tropic fiber Oct 14, 2025, 5:09 PM

#

"D* - Dummy/Binary features" -- what exactly is meant by "Dummy features"?

noble portal Oct 14, 2025, 6:03 PM

#

tropic fiber "D* - Dummy/Binary features" -- what exactly is meant by "Dummy features"?

It's value is either 0 or 1.

versed skiff Oct 15, 2025, 7:03 PM

#

how are we suppose to submit our notebook without internet ?

humble gulch Oct 17, 2025, 7:02 PM

#

Hey everyone, quick question about the date_id format in the training data. It looks like a sequential integer counter (e.g., 8980, 8981, 8982), but converting it with pd.to_datetime() defaults to the Unix Epoch (1970-01-01). Can someone confirm if the date_id is just an anonymized counter or if it maps to actual calendar dates? I want to be sure before deciding on creating calendar-based features or merging in external data. Thanks!

feral cradle Oct 19, 2025, 10:45 AM

#

The predict range is between 0 and 2, does that mean shorting is not allowed?

wet cosmos Oct 21, 2025, 10:03 AM

#

Hi, I am Rahul Raj Sirapuram. This is my linkedin:
www.linkedin.com/in/wbcoder

If anyone is interested to team up for:
https://www.kaggle.com/competitions/hull-tactical-market-prediction

Feel free to ping me here or linkedin. We will discuss and edit together. My kaggle id:
roadrashfifa21
Mon - Fri from 15:00 to 17:00 we will work together to do what we can regarding the competition.

#

This is the current state of my notebook:
https://www.kaggle.com/code/roadrashfifa21/hull-tactical-market-prediction-demo-submission

wet cosmos Oct 21, 2025, 10:37 AM

#

noble portal The columns you mention are very useful to build your daily position, but the go...

Hi, I have read the details. So, betting strategy must be a dataframe that says long short or flat for each row of the dataframe along with the predicted market_forward_excess_returns?

modest trench Oct 21, 2025, 11:49 AM

#

Outta curiosity, what kinda rmse are y’all getting? I’m having trouble getting below 0.01 tbh

#

Also. Is it just me or does the test set seem to be a little bit funky and not quite behave like the rest of the data set?

#

Probs just me tbh

noble portal Oct 21, 2025, 4:15 PM

#

wet cosmos Hi, I have read the details. So, betting strategy must be a dataframe that says ...

Look at the starter notebook to get an idea of the output format.

feral cradle Oct 29, 2025, 7:45 PM

#

Hi, am I missing something or is this competition not telling us what each column means? If that's indeed the case, why????

coarse swallow Oct 30, 2025, 10:48 AM

#

Maybe a bit of a dumb question but does predict function return the first or the last date of the trading interval? Like, what date_id is targeted when it is called? I don't think this information is specified anywhere..

true goblet Nov 1, 2025, 8:57 PM

#

hi, do we have any information on private evaluation, for now the last date_id is 8989, should we expect 8990 and so on for final results?

noble portal Nov 3, 2025, 2:34 PM

#

The date_id for private evaluation will be from the start to the end of the live phase.

coarse swallow Nov 3, 2025, 9:42 PM

#

Likely the date_id also doesn't really need to be sequential on the test server, correct?

coarse swallow Nov 4, 2025, 9:52 AM

#

I was trying to replicate the leaderboard score locally and I ran into some pretty wild discrepancy. Essentially I used the default gateway to produce a submission.parquet - then I used the following code:

from argparse import ArgumentParser
from pathlib import Path

import pandas as pd

from hull_challenge.data import determine_data_dir
from hull_challenge.score import score


def main(
        submission_path: Path,
):
    data_dir = determine_data_dir()

    df = pd.read_csv(data_dir / 'train.csv', index_col="date_id")
    submission = pd.read_parquet(submission_path)
    submission = submission.set_index("date_id")

    solution = df[df.index.isin(df.index)][["risk_free_rate", "forward_returns"]]

    submission_score = score(
        solution,
        submission,
        ''
    )
    print(f"Submission score: {submission_score}")


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--submission-path", required=True, type=Path)

    args = parser.parse_args()
    main(submission_path=args.submission_path)

Where the determine_data_dir returns '/kaggle/input/hull-tactical-market-prediction/' on Kaggle, and the score function is the exact same as in their Metric notebook. I got a score of 0.002 locally and a score of 1.2 on the leaderboard. I verified that the predictions are indeed the same as well. What could be going on here?

#

I will raise this in the Kaggle forums but this is essentially a bad state when we simply can't validate our ideas because the LB does some unhinged things in the background that we do not know

wary jungle Nov 6, 2025, 6:22 PM

#

Hi @noble portal, quick question:

In the forecasting phase, will train.csv continue to grow with new dates so we can re-train daily, or is it frozen after the submission deadline?

leaden isle Nov 7, 2025, 9:17 AM

#

wary jungle Hi <@270989667611049984>, quick question: In the forecasting phase, will train...

Look for data description

#

The safe assumption is to use only the data you have unless stated.

noble portal Nov 7, 2025, 1:55 PM

#

wary jungle Hi <@270989667611049984>, quick question: In the forecasting phase, will train...

Online learning will allow you to update your model with the new incoming data.

stone crystal Nov 7, 2025, 8:05 PM

#

Hi @noble portal , I tried to replicate locally the lb score. I just return a constant 0.7 in the predict function, in lb I get 0.467, however, if run it locally I get 0.592. I use exactly the last 180 rows in the training set, and exactly the same score function provided officially. Is there anything I misunderstand?

noble portal Nov 7, 2025, 8:43 PM

#

stone crystal Hi <@270989667611049984> , I tried to replicate locally the lb score. I just ret...

We've recently updated the data. Could that been the cause?

stone crystal Nov 7, 2025, 10:23 PM

#

noble portal We've recently updated the data. Could that been the cause?

I have noticed the updated training set, and it is not the cause.

tired hazel Nov 9, 2025, 10:26 PM

#

Hi everyone, i’m a physicist looking for teammates for this competition. I have spent 3 months writing and just published marketML (https://github.com/Microcosmos22/TradeBot_public/tree/main) a python package covering the whole process from data acquisition to training LSTM machines on historical crypto data as well as implementing them in trading strategies. I have some interesting insights and would like to discuss with a data scientist.

spark wasp Nov 10, 2025, 7:28 AM

#

yoo

vital flint Nov 10, 2025, 8:47 AM

#

noble portal Online learning will allow you to update your model with the new incoming data.

Could you please clarify how that would work in practice? Do you plan to run the evaluation daily and grow the train.csv accordingly, or would the user need to manually handle the incoming data and append it to the existing training set? That would very be useful to know to design the notebook and in particular to know how to best handle momentum-based features which rely on historical data. Thanks!

hidden geyser Nov 10, 2025, 9:16 AM

#

Hey guys is data leakage in test set still an issue? Like can I use all train rows for submission now?

tired hazel Nov 10, 2025, 10:59 AM

#

vital flint Could you please clarify how that would work in practice? Do you plan to run the...

Why do you want to train with the evaluation data?

vital flint Nov 10, 2025, 11:08 AM

#

tired hazel Why do you want to train with the evaluation data?

Sorry it maybe wasn’t very clear, but I meant continuously tuning/retraining the model based on historical data, i.e. for evaluating the model on the current day t using all the data until t-1. Of course that can already be done in the current training phase, but I’m not sure how it would work once we get to the forecasting phase

tired hazel Nov 10, 2025, 12:15 PM

#

Wrote you a PM

tired hazel Nov 10, 2025, 3:14 PM

#

hidden geyser Hey guys is data leakage in test set still an issue? Like can I use all train ro...

I think the last 180 rows of the train set are being used for LB Evaluation, thus you should not use them to train, or you will end up overfitting like the current LB leaders

mossy swift Nov 10, 2025, 3:15 PM

#

Hi

proud pivot Nov 10, 2025, 4:07 PM

#

@everyone
💬 My Conceptual & Deep Learning Questions for the Kaggle AI Agents Livestream

1️⃣ As AI agents become increasingly autonomous and goal-driven, how can we design incentive systems that keep their long-term behavior aligned with human ethics instead of just short-term reward optimization?

2️⃣ As AI agents get more goal-driven through incentive structures, how do we make sure their reward optimization doesn’t conflict with the broader ethical or social alignment we expect?
And if an agent gains the ability to modify its own mechanisms, how can we formally guarantee that these self-changes stay consistent with human-aligned objectives and don’t create unintended behaviors over time?

3️⃣ When agents start self-modifying their reasoning or learning mechanisms, what kind of formal or mathematical safeguards can ensure their updated versions remain predictable, stable, and still aligned with human objectives?

4️⃣ In multi-agent systems, how can we balance cooperation and autonomy—so that agents don’t end up competing or working against each other while still staying efficient and independent?

5️⃣ With the rise of deep reinforcement learning in multi-agent setups, how can we prevent emergent adversarial behaviors that arise from agents learning implicit competition through shared environments?

6️⃣ How can transformer-based architectures be adapted for continual learning within agent frameworks—so that agents can retain past knowledge while still adapting to new contexts without catastrophic forgetting?

✨ Thanks to the Kaggle and Google teams for hosting such an insightful course and livestream.
Really looking forward to hearing the experts’ thoughts on long-term safety, alignment, and the next wave of deep learning–driven agent architectures. 🚀

noble portal Nov 10, 2025, 4:40 PM

#

tired hazel I think the last 180 rows of the train set are being used for LB Evaluation, thu...

The current LB is meaningless. The user should take care of the train/test split evaluation on its own. The data provided to the user will contain all information up to that date, and use that to predict next's day return.

hidden geyser Nov 11, 2025, 9:43 AM

#

noble portal The current LB is meaningless. The user should take care of the train/test split...

Ok thank you

gritty valley Nov 11, 2025, 7:13 PM

#

hey dears ! anyone who need a teamate . I want to join..

night crescent Nov 11, 2025, 7:19 PM

#

noble portal Online learning will allow you to update your model with the new incoming data.

This has been mentioned a few times...but the detail on how seems to be lacking. Is train.csv updated? How is new train data served to the notebook?

tame rover Nov 12, 2025, 3:26 AM

#

Hi

trim flume Nov 12, 2025, 6:04 AM

#

Is scheduled here the Hull Tactical Market Prediction Competition live Ask Me Anything (AMA) session

noble portal Nov 12, 2025, 4:04 PM

#

I'm here for the next hour, if anyone has questions about the competition.

pale mauve Nov 12, 2025, 4:06 PM

#

Hey Laurent, have you tried deep learning? Want to share ideas, experiences?

trim flume Nov 12, 2025, 4:06 PM

#

I am at biginner level and I ask you of possibile tò have a complete example of submission file required

pale mauve Nov 12, 2025, 4:07 PM

#

How do you see the future of DL for your work. Is it going to be an important direction to explore?

noble portal Nov 12, 2025, 4:08 PM

#

pale mauve Hey Laurent, have you tried deep learning? Want to share ideas, experiences?

We have for a few of our internal projects, but not specifically for this problem. The low noise to signal ratio, non-stationary nature of the data, as well as small sample makes this not best suited for deep learning. We're welcomed to new ideas on that front though.

noble portal Nov 12, 2025, 4:08 PM

#

trim flume I am at biginner level and I ask you of possibile tò have a complete example of ...

https://www.kaggle.com/code/laurentlanteigne/hull-starter-notebook

noble portal Nov 12, 2025, 4:10 PM

#

pale mauve How do you see the future of DL for your work. Is it going to be an important di...

If we ever move to trading at higher frequency for capturing intraday alpha, we may consider using DL.

pale mauve Nov 12, 2025, 4:11 PM

#

(How) do you try to detect regime changes?

fierce pumice Nov 12, 2025, 4:13 PM

#

I am a student and this is my first challange, I am kinda confused, what the real target is.
Is it the forward_returns and I use this for the calculation of my Risk 0,1 or 2. Or do we use the risk_free_rate and the market_forward_excess_return aswell for it?

noble portal Nov 12, 2025, 4:13 PM

#

pale mauve (How) do you try to detect regime changes?

Depends on what kind of regime we are trying to detect changes, but we hope to capture change in dynamics through volatility indicators.

noble portal Nov 12, 2025, 4:15 PM

#

fierce pumice I am a student and this is my first challange, I am kinda confused, what the rea...

The confusion is understandable. We have a few set of columns that represent future returns. These are necessary in order to evaluate the performance of the trading strategy. What we are actually asking, is to incorporate the information as best you can in order to get a daily signal between 0 and 200% of your exposure to S&P500. This exposure with the returns will be your strategy's return, and its the cumulative set of returns that is being scored.

trim stream Nov 12, 2025, 4:15 PM

#

I recognized that people submit the predict function in the submission notebook differently... How frequently is the notebook called in the test phase? Is it day by day prediction? Is expected to predict the whole array (e.g. 180 days) at ones?

noble portal Nov 12, 2025, 4:16 PM

#

trim stream I recognized that people submit the predict function in the submission notebook ...

The API feeds one row at a time, but the score will be from the cumulative daily performance from December 15th 2025 to June 16th 2026.

trim flume Nov 12, 2025, 4:17 PM

#

I ask you if you have in mind a particular theoretical (or empirical) model in the literature that support your competition (challenge)

trim stream Nov 12, 2025, 4:19 PM

#

Is it allowed to do online training in the submission notebook (e.g. for lagged features)?

noble portal Nov 12, 2025, 4:21 PM

#

trim stream Is it allowed to do online training in the submission notebook (e.g. for lagged ...

Yes, you can ask Sohier on Kaggle about details.

trim stream Nov 12, 2025, 4:22 PM

#

Is this metric code still up to date? We plan to expand our training pipeline to include the kaggle evaluation in order to directly measure actual performance. https://www.kaggle.com/code/metric/hull-competition-sharpe

noble portal Nov 12, 2025, 4:23 PM

#

trim stream Is this metric code still up to date? We plan to expand our training pipeline to...

The metric will be the same for the duration of the competition.

noble portal Nov 12, 2025, 4:29 PM

#

trim flume I ask you if you have in mind a particular theoretical (or empirical) model in t...

Here is a well cited paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=962461

Others that could be of interest:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=948309
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1747345
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5035294

fierce pumice Nov 12, 2025, 4:33 PM

#

noble portal The confusion is understandable. We have a few set of columns that represent fut...

Oke so that I understand it right, we for testing the feature and the lagged( forward_returns, market_forward_excess_return, risk_free_rate)
and then we can try to predict ( forward_returns, market_forward_excess_return, risk_free_rate) if we need them for the calculation of the daily signal?

trim stream Nov 12, 2025, 4:33 PM

#

Why did you decide to only add 10 days to the test set in the data section but use the last 180 days for the actual score calculation?

#

What also makes me wonder is why you didn't include a detailed description of every feature? Is there a reason why this should be taken into account?

noble portal Nov 12, 2025, 4:38 PM

#

trim stream Why did you decide to only add 10 days to the test set in the data section but u...

The scoring dates remained the same during the initial phase so that same rows are being scored on before and after the update.

noble portal Nov 12, 2025, 4:39 PM

#

trim stream What also makes me wonder is why you didn't include a detailed description of ev...

We want a completely algorithmic, reproducible model. We were concerned that by knowing the data, participants might, knowingly or otherwise, incorporate discretion.

trim flume Nov 12, 2025, 4:47 PM

#

Is possibile to use also external data other that in your dataset, and you consider this (use of external data too) with favour or not

noble portal Nov 12, 2025, 4:50 PM

#

trim flume Is possibile to use also external data other that in your dataset, and you consi...

To use additional data, we would first need to review it and approve it. We would need to make sure that the data is public and easily accessible to anyone. We would also have to make sure that it would be available until the end of the competition. Requests are possible.

faint pine Nov 12, 2025, 4:58 PM

#

noble portal To use additional data, we would first need to review it and approve it. We woul...

Hi Laurent, When will the stream start?

noble portal Nov 12, 2025, 4:59 PM

#

There is no stream, I'm just answering questions here.

faint pine Nov 12, 2025, 4:59 PM

#

noble portal There is no stream, I'm just answering questions here.

OK

faint pine Nov 12, 2025, 5:00 PM

#

noble portal There is no stream, I'm just answering questions here.

Could you answer this one Thanks for the clarification. I have one last question — and sorry to bother you. Do we expect this competition to be held every year or every two years on Kaggle? And once this one ends, is there any possibility of collaborating with participants who have demonstrated robust approaches over several years, including real out-of-sample backtesting?

noble portal Nov 12, 2025, 5:01 PM

#

faint pine Could you answer this one ``Thanks for the clarification. I have one last questi...

We don't plan, thus far, to run this competition again. We will be monitoring the performance of the participants and there are possibilities in the future for collaboration yes.

#

I'll continue taking questions sporadically on this channel going forward. Thank you everyone for your participation.

trim flume Nov 12, 2025, 5:03 PM

#

Thank to your answers

faint pine Nov 12, 2025, 5:19 PM

#

noble portal We don't plan, thus far, to run this competition again. We will be monitoring th...

Please provide more clarification about the potential for collaboration. I’m quite certain that the top performers on the private leaderboard may not maintain strong results over a long period — say, five to seven years — and that may become evident in the future. Of course, we should respect the winners, but if there’s an opportunity to collaborate with participants who have demonstrated robust long-term approaches, I believe they should be given a way to validate and prove their methods — for example, by submitting reports or notebooks. It would also be valuable to provide them with a benchmark to assess whether their approaches truly add value before they start collaborating or connecting with your team.

noble portal Nov 12, 2025, 6:27 PM

#

faint pine Please provide more clarification about the potential for collaboration. I’m qui...

Sorry, I don't have more to provide than what I've already mentioned.

faint pine Nov 12, 2025, 6:33 PM

#

noble portal Sorry, I don't have more to provide than what I've already mentioned.

thank you

night crescent Nov 12, 2025, 6:45 PM

#

@noble portal could you address the questions about online learning? Specifically, how will updated training dated be provided? Is the train csv updated daily?

#

I ask because I can imagine it will be important to update the model over the period when evaluation occurs.

noble portal Nov 12, 2025, 6:50 PM

#

night crescent <@270989667611049984> could you address the questions about online learning? Spe...

This is a question for Sohier on Kaggle.

night crescent Nov 12, 2025, 6:52 PM

#

noble portal This is a question for Sohier on Kaggle.

Thank you

night crescent Nov 12, 2025, 6:53 PM

#

noble portal This is a question for Sohier on Kaggle.

Is Sohier also on this discord channel?

noble portal Nov 12, 2025, 6:54 PM

#

I don't believe so.

night crescent Nov 12, 2025, 6:57 PM

#

noble portal I don't believe so.

Okay. Thanks. I know it can't be easy to keep up with the questions

stuck galleon Nov 12, 2025, 7:10 PM

#

!rank

night crescent Nov 14, 2025, 9:44 PM

#

has anyone figured out how to get updated targets during evaluation peirod for online training?

#

nevermind. it's "lagged_forward_returns"

faint pine Nov 16, 2025, 6:08 PM

#

night crescent <@270989667611049984> could you address the questions about online learning? Spe...

I posted a discussion about online training, and I mentioned that if it receives enough upvotes, I will publish a tutorial notebook showing the community how to use online training during API inference. Unfortunately, the discussion started with a downvote, haha.

night crescent Nov 16, 2025, 8:01 PM

#

faint pine I posted a discussion about online training, and I mentioned that if it receives...

thanks, I figured it out: create a class with a "call" method, use it to store/update data, wrap it so it's a function type

rapid sphinx Nov 19, 2025, 6:47 PM

#

Hi

dull egret Nov 21, 2025, 11:29 AM

#

hello everyone , i have some problems .The output only exist one file names "Submission_parquet" or include other file(model.pkl;feature_cols.pkl etc) .So how do i submit? And i do not know the submission format ? Is only two element "date_id" and "prediction"?(i have seem this from the kaggle Date )

drowsy rain Nov 21, 2025, 2:07 PM

#

Hey everyone, I’m interested in working on this project, but before I start, I want to know more about the quality of the dataset. I previously tried the Mitsui dataset, but people on kaggle reddit community mentioned that Kagglers tend to avoid it due to poor data quality. I just want to make sure that’s not the case here. I’d appreciate any input, thanks for reading!

blazing spear Nov 23, 2025, 2:37 PM

#

train.csv Historic market data. The coverage stretches back decades; expect to see extensive missing values early on.

but the data is like 12mb??

hallow elbow Nov 27, 2025, 3:16 AM

#

hey everyone, i have a question. when our model is been called in testing phase, will that test data include past lagged_risk_free column? i guess the question is should us this column (current day's risk_free_rate) to adjust our strategy leverage level, like would this column be available in each row.

hallow elbow Nov 27, 2025, 3:29 AM

#

drowsy rain Hey everyone, I’m interested in working on this project, but before I start, I w...

there are a lot missing value, but i think most of them are good. data are very rich, they created a lot features (names are covered)~~~

hallow elbow Nov 28, 2025, 9:38 PM

#

i want to ask, how does the public score been evaluated? like it's the diff from xxx sharp or accuracy or what?

tropic fiber Nov 29, 2025, 3:49 AM

#

This may have already been asked, but why are the momentum-related features missing from the dataset? They are mentioned in the dataset description but are missing from train.csv

night crescent Nov 29, 2025, 2:50 PM

#

tropic fiber This may have already been asked, but why are the momentum-related features miss...

it's been asked many times. AFAIK, it has not been addressed.

young ginkgo Nov 29, 2025, 7:00 PM

#

So my notebook (more like a copied starter notebook) threw an exception at runtime despite successful running, no errors in logs, so how can i debug this?

young egret Dec 1, 2025, 12:05 AM

#

@noble portal
I have a few questions about forecasting phase: the last date_id in the train.csv is 9020.
Does that mean that the first date_id in the future test set on the forecasting phase will equal to 9021 (if not how it is possible to calculate some statistics which need previous values)?
If predicting new positions will happen every day by one row does it possible to use values from previous forecasting days (if answer is yes so where will be possible to find them, in updated train.csv or anywhere else). For example is it possible to use values from January 2026 in February 2026?

thin root Dec 1, 2025, 5:44 AM

#

tired hazel I think the last 180 rows of the train set are being used for LB Evaluation, thu...

OK, the current LB leaders are just overfitting right?

thin root Dec 1, 2025, 5:54 AM

#

blazing spear train.csv Historic market data. The coverage stretches back decades; expect to s...

what makes you think it stretehces decades

thin root Dec 1, 2025, 9:31 AM

#

@noble portal

#

Is the submission score legit? Or is it the last 180 days of train

noble portal Dec 1, 2025, 2:24 PM

#

young egret <@270989667611049984> I have a few questions about forecasting phase: the last ...

Hi Mykyta,

I don't handle the data to dateid conversion. That will be Kaggle's team that is in charge of that so I can't give you a specific answer of what the dateid will be like for the forecasting phase. You can use your forecast or any information from the forecasting phase via online learning. So yes, you will be able to use January 2026 data for February 2026. Online learning is also handled by Kaggle.

noble portal Dec 1, 2025, 2:25 PM

#

thin root Is the submission score legit? Or is it the last 180 days of train

The test set is 180 days from the train set.

thin root Dec 1, 2025, 2:33 PM

#

noble portal The test set is 180 days from the train set.

If i understand correctly, the test set is 180 days thats included in training

Whats the point of this? this is overfitting, i thinki misunderstood you?

thin root Dec 1, 2025, 4:00 PM

#

tropic fiber This may have already been asked, but why are the momentum-related features miss...

Its there just Nan starting out

arctic swallow Dec 1, 2025, 5:30 PM

#

https://media.discordapp.net/attachments/1444971360047726605/1445085758598938824/image1.gif?ex=692f107d&is=692dbefd&hm=94f18cd6e7350e7cc612826beb5d11a9fd125485a58ee1e39a16a03b6f9e2426&=&width=237&height=315
https://media.discordapp.net/attachments/1444971360047726605/1445085766937088000/image2.gif?ex=692f107f&is=692dbeff&hm=51e8429e6818b166e21485a613e8f0c706d64c765aefc93f65a7bcefa10907c2&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085774562197535/image3.gif?ex=692f1081&is=692dbf01&hm=e520e8e4edd4eea02e82168a7059a868ea59c19d9b90c7c34402f7bb3616c76f&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085781801566319/image4.gif?ex=692f1082&is=692dbf02&hm=bdc0715977fdcda4b7804916e5bfb36af1d3132f535d1b4327894a067fbfc769&=&width=725&height=907

thin root Dec 1, 2025, 6:00 PM

#

noble portal The test set is 180 days from the train set.

Why prposefully provide a useless submission score? Sorry to come off as rude, if i am, i dont mean to, i just want to undestand Its not my intention just dont know how to ask

noble portal Dec 1, 2025, 6:26 PM

#

thin root Why prposefully provide a useless submission score? Sorry to come off as rude, i...

The reason the competition is evaluated live is because it is the only fair evaluation method for the submissions. Kaggle requires to have some leaderboard setup by design, it is also to validate your code pipeline will work.

thin root Dec 1, 2025, 6:34 PM

#

noble portal The reason the competition is evaluated live is because it is the only fair eval...

Wouldnt a more fair evaluation be a truly OOS ?

noble portal Dec 1, 2025, 6:44 PM

#

thin root Wouldnt a more fair evaluation be a truly OOS ?

Yes, that will be the forecasting phase. We also want to supply as much recent data as possible for users to tune their model. They have to do diligent about their walk-forward train/test split though.

thin root Dec 1, 2025, 7:01 PM

#

noble portal Yes, that will be the forecasting phase. We also want to supply as much recent d...

Oh okay, so forecating phase is wha actually is measured for the first place

The submission score doesnt mean much

#

Thank you for your help

noble portal Dec 1, 2025, 7:02 PM

#

Yes.

thin root Dec 1, 2025, 7:02 PM

#

noble portal Yes.

Hm okay

And some question about feaures, are some of these features purposefully bad?

#

What are you guys trying to test

#

Features and target defenition are given, but it really comes down to features and how the competitor chooses to work with it

noble portal Dec 1, 2025, 7:06 PM

#

Nothing is intentionally bad. Finding value in the features IS the competition. The noise to signal ratio is high, so the difficulty of the competition is feature engineering to produce signal out of noise.

thin root Dec 1, 2025, 7:06 PM

#

noble portal Nothing is intentionally bad. Finding value in the features IS the competition. ...

Thanks for your insight ❤️

sand copper Dec 2, 2025, 10:38 AM

#

stupid question but how are ya all dealing with the null values

tacit fulcrum Dec 2, 2025, 10:07 PM

#

Hey, I am trying to undestand this scoring function.
https://www.kaggle.com/code/metric/hull-competition-sharpe

Is submission['prediction'] supposed to be market_forward_excess_returns from csv?

#

What does position column mean in the scoring function?

#

https://www.kaggle.com/code/laurentlanteigne/hull-starter-notebook
Also in this starter code, why are we using different columns as targets for training and test set.

    """
    Loads and preprocesses the training dataset.

    Returns:
        pl.DataFrame: The preprocessed training DataFrame.
    """
    return (
        pl.read_csv(DATA_PATH / "train.csv")
        .rename({'market_forward_excess_returns':'target'})
        .with_columns(
            pl.exclude('date_id').cast(pl.Float64, strict=False)
        )
        .head(-10)
    )

def load_testset() -> pl.DataFrame:
    """
    Loads and preprocesses the testing dataset.

    Returns:
        pl.DataFrame: The preprocessed testing DataFrame.
    """
    return (
        pl.read_csv(DATA_PATH / "test.csv")
        .rename({'lagged_forward_returns':'target'})
        .with_columns(
            pl.exclude('date_id').cast(pl.Float64, strict=False)
        )
    )```

#

It does not make any sense at all.

thin root Dec 2, 2025, 11:22 PM

#

@noble portal Hi, i wanted to ask, when you guys made this competition, did you guys do it to answer a question you guys want answered or just for the love of the game

hallow elbow Dec 3, 2025, 2:36 AM

#

feels like the score is pointless

#

i assume everyone's model performing is not so well? curious to ask the accuracy or whatever metric you guys achieved

#

if model can forecast 1d movement so well, then we shuld be very confidence to max leverage or leave 0 position right? am i interpreting this right? because trying to overfit the best combo level of positions for test set mean nothing in real environment?

hallow elbow Dec 3, 2025, 2:40 AM

#

thin root <@270989667611049984> Hi, i wanted to ask, when you guys made this competition, ...

i do it for fun, also bc the sponsor already create vast amount of features, including some data we don;t have as retail traders, so, it's kind fun to "try" finding machine learning methods that can do some sort of prediction~~~

thin root Dec 3, 2025, 2:41 AM

#

hallow elbow feels like the score is pointless

yes score is piontpless

hallow elbow Dec 3, 2025, 2:41 AM

#

if even with their data, i still can't train a OK model, that means it's almost impossible for normal people to play around market data fitting ML models

thin root Dec 3, 2025, 2:41 AM

#

no your wrong lol

#

normal people? dude just lock in gng

noble portal Dec 3, 2025, 4:58 PM

#

It's not easy, that's for sure. I think the main problem people run into is they are used to systematic approaches that are generic in their machine learning pipelines. This is more of a surgery type of forecasting where its not about multi-layered complicated machine-learning pipelines, but more so in carefully crafting robust features.

thin root Dec 3, 2025, 7:05 PM

#

noble portal It's not easy, that's for sure. I think the main problem people run into is they...

So main focus is literally feature surgery

#

Can we trust that the featres were crafted well?

#

Or like the entire game is filtering it

#

Whats a good scoremetric to target, the public LB is untrsutable so we dont have anyway to measure or proxy towards anything

noble portal Dec 3, 2025, 7:08 PM

#

Bottom/top approach is better than top/bottom approach in this case.

thin root Dec 3, 2025, 7:09 PM

#

noble portal Bottom/top approach is better than top/bottom approach in this case.

What about a good scoremetric?

#

and our pipeleins would be used for live forcasting correct for the leaderboard

thin root Dec 3, 2025, 7:12 PM

#

noble portal It's not easy, that's for sure. I think the main problem people run into is they...

So its more about precision than full stack kitchen sink

#

got it

hallow elbow Dec 3, 2025, 10:33 PM

#

i think the precision is the key?

thin root Dec 4, 2025, 12:17 AM

#

hallow elbow i think the precision is the key?

surgery

thin root Dec 4, 2025, 1:09 AM

#

noble portal It's not easy, that's for sure. I think the main problem people run into is they...

Yo

#

I think i got a good version, but since the LB is off, I dont know if i should continue tinkering at it or do something

#

I wanted to ask what scoremetric is good?

thin root Dec 4, 2025, 2:46 AM

#

@noble portal

#

How will you guys test our bots? If i engineer more features from what was given should i put it inside predict()?

#

What would inference look like, live foracsting?

thin root Dec 4, 2025, 3:46 AM

#

@noble portal
Should we worry about inference if we were able to successfully upload a submission?

sand copper Dec 4, 2025, 3:37 PM

#

sand copper stupid question but how are ya all dealing with the null values

Just gon keep it here in xase anypne want to answer

#

harold

formal pivot Dec 4, 2025, 4:02 PM

#

null

young egret Dec 4, 2025, 10:18 PM

#

@noble portal Do you know if test data in forecasting phase is continue train data? Or between them some distance is present? In other words, will it possible to calculate for first rows on the forecasting phase different features that depends on previous days (lagged features, rolling window features etc.)?

noble portal Dec 4, 2025, 10:54 PM

#

young egret <@270989667611049984> Do you know if test data in forecasting phase is continue ...

The data will continue to represent each market days continuously as it updates.

sand copper Dec 6, 2025, 12:34 PM

#

formal pivot null

Very insightful thank you

paper raven Dec 6, 2025, 1:05 PM

#

hello everyone, i have some questions about competitions in kaggle,
Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:

CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv

will these be assesed based on the notebook you use to train your (pre-trained) models? what if i use another source like colab pro i and just save the best models the upload to kaggle to inference, does that count as cheating or anything?

young egret Dec 6, 2025, 7:47 PM

#

noble portal The data will continue to represent each market days continuously as it updates.

As I know, the last date in train.csv it is the first day before the start of competition or something close to that, so it's approximately September 16, 2025. The first date in data for forecasting should be December 16, 2025. So, how it can continue train data continuously if we have 3 month distance between them. Or some of this dates are incorrect? Thanks for answers

thin root Dec 7, 2025, 3:36 AM

#

@noble portal

#

whats a good score metric

faint pine Dec 7, 2025, 8:15 PM

#

@noble portal Will the Test Period Be Extended if Random sub seed Lead the Leaderboard?

#

Anyone who is still confused about maintaining dataset continuity during API inference, facing issues with re-training their models during API inference (‘online training’), or not knowing how to structure data for creating lag features — this is my tutorial notebook. You will find all the steps I mentioned here. Don’t forget to upvote it if you find the information helpful.

quartz ferry Dec 7, 2025, 11:57 PM

#

thin root Whats a good scoremetric to target, the public LB is untrsutable so we dont have...

in real life I think 0.7-1 is considered very good. Though for this competition, anything up to 3 is what I am reading in discussions saying would be reasonable. Above that is probably severely over fitting. Though, this is just speculation. A lot of unknowns here.

thin root Dec 8, 2025, 12:05 AM

#

quartz ferry in real life I think 0.7-1 is considered very good. Though for this competition,...

no way 3 is reasonable

#

3 sharpe alone is extremely good, a 3 with this scoreemtric ? damn bro

thin root Dec 8, 2025, 12:05 AM

#

quartz ferry in real life I think 0.7-1 is considered very good. Though for this competition,...

what number have you gotten

quartz ferry Dec 8, 2025, 12:06 AM

#

All over the place

#

As low as .45 to has high as 10ish I think....

#

3 is unrealistic in the real world

#

But for this test data ....

#

I'm just saying 3 might be reasonable

thin root Dec 8, 2025, 12:17 AM

#

quartz ferry I'm just saying 3 might be reasonable

do you have a baackground in finace or something?

#

dont mean to be rude lol but 3 is crazy high

#

its not sharpe 3

quartz ferry Dec 8, 2025, 12:17 AM

#

Yeah it is, definitely possible data leakage

thin root Dec 8, 2025, 12:18 AM

#

oh wait

quartz ferry Dec 8, 2025, 12:18 AM

#

That's what I'm looking into

thin root Dec 8, 2025, 12:18 AM

#

so

#

when you make a submisssion

quartz ferry Dec 8, 2025, 12:18 AM

#

Yeah

thin root Dec 8, 2025, 12:18 AM

#

that score derives from the last 180 days of the train.csv

quartz ferry Dec 8, 2025, 12:18 AM

#

Yeah

thin root Dec 8, 2025, 12:18 AM

#

thats why people overfit to 17

quartz ferry Dec 8, 2025, 12:18 AM

#

Ahh I see

thin root Dec 8, 2025, 12:18 AM

#

so you have to make your own split

quartz ferry Dec 8, 2025, 12:18 AM

#

I joined late to this comp so..

thin root Dec 8, 2025, 12:18 AM

#

same lol

quartz ferry Dec 8, 2025, 12:18 AM

#

That is helpful

#

Lol

thin root Dec 8, 2025, 12:19 AM

#

im very curious on what other people are getting with a properly done split

quartz ferry Dec 8, 2025, 12:19 AM

#

I'll let you know what I see with proper split

thin root Dec 8, 2025, 12:44 AM

#

quartz ferry I'll let you know what I see with proper split

dm me what you get and ill share mine

noble portal Dec 8, 2025, 1:40 PM

#

faint pine <@270989667611049984> Will the Test Period Be Extended if Random sub seed Lead t...

We've allowed ourselves to discard any submissions that is disingenuous. So random seed allocations cannot win.

noble portal Dec 8, 2025, 1:41 PM

#

thin root whats a good score metric

Hard to tell, this is essentially bounded by what the market does. If the market has a Sharpe of -0.3 in the next 6 months and you score 0.3, that would be good. If you score 0.3 when market has a Sharpe of 0.8, that is bad.

noble portal Dec 8, 2025, 1:42 PM

#

young egret As I know, the last date in train.csv it is the first day before the start of co...

Train will be updated up to last day of the competition, and online learning for days after.

thin root Dec 8, 2025, 2:01 PM

#

noble portal Hard to tell, this is essentially bounded by what the market does. If the market...

yeah but like on average throughout the entire history

noble portal Dec 8, 2025, 7:56 PM

#

You can find that yourself using the data.

young egret Dec 8, 2025, 7:57 PM

#

noble portal Train will be updated up to last day of the competition, and online learning for...

So, new data during this 3 months will be after the end of the competition?

noble portal Dec 8, 2025, 7:58 PM

#

the train.csv at the end of the competition will have all the dates up to the end of the competition, any new data will be available via online learning.

#

I'm working with Sohier to get data up to from last week up so people can do final adjustments this week.

thin root Dec 9, 2025, 12:19 AM

#

noble portal the train.csv at the end of the competition will have all the dates up to the en...

is there a way to setup online learning ?

#

so when you guys forward test our submission u guys will call predict() right?

#

and do we have to save artifacts or anything like that

thin root Dec 9, 2025, 12:41 AM

#

noble portal the train.csv at the end of the competition will have all the dates up to the en...

can i DM you? i have a question unrelated to the compettion but related to the industry

quartz ferry Dec 10, 2025, 3:39 PM

#

probably been answered before, but will the final test data be scored row by row? or are we going to be able to create lags/rolling averages if the test data is loaded all at once...

quartz ferry Dec 10, 2025, 4:00 PM

#

or do we store each row by row in a csv or something and reload each instance?

pulsar apex Dec 11, 2025, 3:53 PM

#

quartz ferry probably been answered before, but will the final test data be scored row by row...

I asked it on kaggle forum, but no one answered, 0 upvotes. I have no idea either and there is no clear instruction anywhere, 4 days left

noble portal Dec 11, 2025, 5:53 PM

#

The score is aggregated from the first true out of sample observation to the last.

#

The scoring metric is a variant of the Sharpe ratio, you can't score that using one row. The API of Kaggle is feeding the predictions output row by row. I can understand the confusions.

warm scroll Dec 11, 2025, 6:56 PM

#

How many of these models submitted do you think are manipulating data leakage?

#

Open to any answers.

#

So far, my model is scratching 11.5 and I'm trying to move it up to the leakage range with just pure good design & innovation.

#

I'll be happy if my model hits 16. I am concerned however if I push too hard to get to 17.5 the model might not be adaptive enough to handle new data. I might add in some small adaptive tools to create that flexibility before final submission

thin root Dec 11, 2025, 8:32 PM

#

warm scroll I'll be happy if my model hits 16. I am concerned however if I push too hard to ...

Bro if you are at 11.5 your already cooked

#

The leaderboar is fully leakage

warm scroll Dec 11, 2025, 8:46 PM

#

thin root Bro if you are at 11.5 your already cooked

what makes you say that?

#

is your opinion that market daily returns are not sufficiently predictable in order to build a sharpe above 10?

#

I've engineered the model very carefully. if you're saying I'm overfitting too hard, then you may be right.

#

IMO, it would probably be much easier to build a high vol-adjusted sharpe system if we weren't forced to only allocate to the indexes or to the risk free rate alone

#

also, I find it interesting that the allocation range isn't [-2.0, 2.0]

#

s.t. the system is permitted to either short the market or leverage long on bonds

thin root Dec 11, 2025, 9:10 PM

#

warm scroll is your opinion that market daily returns are not sufficiently predictable in or...

Yes bro

#

Reliabilbly over decades? my friend

warm scroll Dec 11, 2025, 9:13 PM

#

thin root Yes bro

what is your evidence that it can't be done? outside of it not being done (yet)

thin root Dec 11, 2025, 9:14 PM

#

warm scroll what is your evidence that it can't be done? outside of it not being done (yet)

Industry dude

thin root Dec 11, 2025, 9:15 PM

#

warm scroll I've engineered the model very carefully. if you're saying I'm overfitting too h...

overfitting too hard? You shuldnt overgit at all lol

warm scroll Dec 11, 2025, 9:15 PM

#

thin root Industry dude

I'm sure that horses and carriage drivers also thought automated vehicles were impossible as well

thin root Dec 11, 2025, 9:15 PM

#

warm scroll I'm sure that horses and carriage drivers also thought automated vehicles were i...

You know what bro

#

Your right

#

Go ahead

#

Take over the world with your 10 sharpe

#

When you get 1st place i congradulate you friend.

warm scroll Dec 11, 2025, 9:15 PM

#

you seem somewhat hostile

#

I hope your day goes better

thin root Dec 11, 2025, 9:16 PM

#

warm scroll you seem somewhat hostile

ur gonna regret it lol

#

im telling u its not a vlid score

#

ur better off working on a valid score than focusuing on public lB

#

public LB isnt valid, it says that on the commpeion websie

#

No shade bro im warning you if anything

warm scroll Dec 11, 2025, 9:18 PM

#

what do you think makes a score valid?

#

I like our conversation. I don't want you to feel like I'm attacking your ideas, I just love to learn why people see things differently

thin root Dec 11, 2025, 9:18 PM

#

warm scroll I like our conversation. I don't want you to feel like I'm attacking your ideas,...

Bet lets go at it then lol

warm scroll Dec 11, 2025, 9:19 PM

#

let's say I built a model with 3.0. that's pretty neat. why not push for 3.1?

thin root Dec 11, 2025, 9:19 PM

#

warm scroll what do you think makes a score valid?

So like if you fit hard on the training data and then test the last 180 rows of that, its gonna look good

thin root Dec 11, 2025, 9:19 PM

#

warm scroll let's say I built a model with 3.0. that's pretty neat. why not push for 3.1?

There is a ceiling that you will hit

warm scroll Dec 11, 2025, 9:19 PM

#

I'm running the score on the whole dataset

thin root Dec 11, 2025, 9:19 PM

#

There is a limit on how much alpha you can extract from the feautre

thin root Dec 11, 2025, 9:19 PM

#

warm scroll I'm running the score on the whole dataset

Thats even worse lol

#

You train it and test it on the whole dataset?

#

Cmon bro

warm scroll Dec 11, 2025, 9:20 PM

#

I know what you're talking about. it's not a good practice in general.

thin root Dec 11, 2025, 9:20 PM

#

warm scroll I know what you're talking about. it's not a good practice in general.

Thats the issue, its never ever a good practice

#

You dont win the compettion based on public leaderboard

thin root Dec 11, 2025, 9:21 PM

#

warm scroll I know what you're talking about. it's not a good practice in general.

Whats the point of a trading bot if it cannot work on real markets

warm scroll Dec 11, 2025, 9:21 PM

#

I use the public LB to let me know maximal performance

thin root Dec 11, 2025, 9:21 PM

#

warm scroll I use the public LB to let me know maximal performance

That doesnt help you

warm scroll Dec 11, 2025, 9:21 PM

#

why do you limit performance?

thin root Dec 11, 2025, 9:21 PM

#

🤣

#

My friend

#

This isnt performance limiting

#

So your model will work on real markets then

warm scroll Dec 11, 2025, 9:22 PM

#

if you have a model that scores 3.0, why do you not push for more?

thin root Dec 11, 2025, 9:22 PM

#

warm scroll if you have a model that scores 3.0, why do you not push for more?

That isnt my point

#

Your model got a 10

warm scroll Dec 11, 2025, 9:22 PM

#

well, it is my point

thin root Dec 11, 2025, 9:22 PM

#

Your point is not that?

warm scroll Dec 11, 2025, 9:22 PM

#

what I'm trying to understand from you is why you don't aim higher

thin root Dec 11, 2025, 9:22 PM

#

I argued that a 10 is overfitting and isnt valid, you said why limit performance?

thin root Dec 11, 2025, 9:22 PM

#

warm scroll what I'm trying to understand from you is why you don't aim higher

Nobody is saying dont aim higher

#

Im talking about how you applied your model

#

Is that pushing for more? Or is that straight up overfitting

warm scroll Dec 11, 2025, 9:23 PM

#

don't make assumptions about how I built it, just think about goals

thin root Dec 11, 2025, 9:23 PM

#

warm scroll don't make assumptions about how I built it, just think about goals

You told me how you built it

#

There is no assumption

warm scroll Dec 11, 2025, 9:23 PM

#

I did not

thin root Dec 11, 2025, 9:23 PM

#

Yes you did

#

You ran the score on the entire dataset

#

To do that you also must train on the entire dataset

warm scroll Dec 11, 2025, 9:24 PM

#

OK, go ahead and lay out my layer architecture for me

#

and tell me how I built my loss function

thin root Dec 11, 2025, 9:24 PM

#

thin root You ran the score on the entire dataset

Here you go

thin root Dec 11, 2025, 9:24 PM

#

warm scroll and tell me how I built my loss function

Does not matter

warm scroll Dec 11, 2025, 9:24 PM

#

hmm, I disagree

thin root Dec 11, 2025, 9:24 PM

#

Also if you want to push for more and get a 17 i can tell you how

thin root Dec 11, 2025, 9:24 PM

#

warm scroll hmm, I disagree

let me ask you a more important question

#

Do you think i have bad intentions?

warm scroll Dec 11, 2025, 9:24 PM

#

no, but I think you're uninformed

thin root Dec 11, 2025, 9:25 PM

#

So you think I have good intentions

warm scroll Dec 11, 2025, 9:25 PM

#

yes

thin root Dec 11, 2025, 9:25 PM

#

Got it

#

good to lay that out

#

Can you confirm that you trained and tested on the entire dataset?

warm scroll Dec 11, 2025, 9:25 PM

#

I know where you're going with this and you don't need to explain why that's bad.

thin root Dec 11, 2025, 9:26 PM

#

warm scroll I know where you're going with this and you don't need to explain why that's bad...

Do you think i have bad intentions

warm scroll Dec 11, 2025, 9:26 PM

#

Nope.

thin root Dec 11, 2025, 9:26 PM

#

warm scroll I know where you're going with this and you don't need to explain why that's bad...

So wherever this is going is being done is done with good intents

warm scroll Dec 11, 2025, 9:26 PM

#

fair point, go ahead

thin root Dec 11, 2025, 9:26 PM

#

warm scroll fair point, go ahead

If you trained and tested on the entire dataset, your score isnt valid, thats nt performance limiting

#

Its kind of like cheating on a test and claiming to be smart, you know? The idea of a test is to test your skills on unseen information

thin root Dec 11, 2025, 9:27 PM

#

thin root Its kind of like cheating on a test and claiming to be smart, you know? The idea...

Then going to the teacher and saying why cant i cheat? I got a 100, you are limiting my performance

warm scroll Dec 11, 2025, 9:28 PM

#

Hmm, I think you don't quite understand, nothing you said is wrong but you don't conceptualize what I'm doing

#

I don't care about being top of the public LB

thin root Dec 11, 2025, 9:28 PM

#

warm scroll Hmm, I think you don't quite understand, nothing you said is wrong but you don't...

Can you confirm that you trained and tested on the entire dataset?

warm scroll Dec 11, 2025, 9:29 PM

#

Yes, I did so

thin root Dec 11, 2025, 9:29 PM

#

thats what i mean

pulsar apex Dec 11, 2025, 9:29 PM

#

bro estimates performance on train dataset, how do you guys decide to spend time on kaggle without listening to a single ml lecture?

thin root Dec 11, 2025, 9:29 PM

#

warm scroll Yes, I did so

im saying doing that

#

isnt valid at all

#

its not just " not practical " its nothing at all

warm scroll Dec 11, 2025, 9:30 PM

#

there's info you don't have

thin root Dec 11, 2025, 9:30 PM

#

warm scroll there's info you don't have

Ok

#

Do you plan on applying your model to live markets

warm scroll Dec 11, 2025, 9:31 PM

#

I wouldn't be able to, because I don't have the exact methodology to extract the data in the manner that Hull has done itin

thin root Dec 11, 2025, 9:31 PM

#

warm scroll I wouldn't be able to, because I don't have the exact methodology to extract the...

In terms of the competetion

#

Obviously

warm scroll Dec 11, 2025, 9:31 PM

#

I'm going to do my absolute best to try to win.

thin root Dec 11, 2025, 9:31 PM

#

Good luck bro

#

Its a journey

#

When day comes and you want to learn more you can always contact me i love talking about ML

warm scroll Dec 11, 2025, 9:32 PM

#

thanks. it's a lot of fun overall. I'll take you up on that at some point. Do you like to discuss model architectures and alternative learning structures i.e. innovation in new layers?

#

I could show you some of my recent designs

regal basin Dec 11, 2025, 9:53 PM

#

Hello everyone, I have implemented a solution based on the principles of Marco Lopez de Prado's book ‘Advances in Financial ML’, but I have noticed that many candidates have simply used "brute force", employing XGBoost + LightGBM + CatBoost + Optuna. In your opinion, which solution would be best? In this type of competition, is one method better than another? Thank you to those who take the time to respond.

thin root Dec 11, 2025, 10:50 PM

#

warm scroll thanks. it's a lot of fun overall. I'll take you up on that at some point. Do yo...

Yeah

#

Dm

#

I’m curious to know

thin root Dec 11, 2025, 10:50 PM

#

regal basin Hello everyone, I have implemented a solution based on the principles of Marco L...

I’ve read the book, what solution?

#

lol i definelty didn’t read it in and out

thin root Dec 11, 2025, 10:52 PM

#

regal basin Hello everyone, I have implemented a solution based on the principles of Marco L...

Definelty not optuna haha

#

Overfiitiig

regal basin Dec 11, 2025, 11:25 PM

#

thin root I’ve read the book, what solution?

I used meta-labelling, sample weighting, purged K-fold CV, triple barrier, MDA...

thin root Dec 11, 2025, 11:31 PM

#

regal basin I used meta-labelling, sample weighting, purged K-fold CV, triple barrier, MDA.....

How do you use triple barrier for this ? There isn’t OHLCV

#

I’m not too sure

#

There’s forward returns

#

You use forward returns

#

?

pulsar apex Dec 11, 2025, 11:46 PM

#

thin root How do you use triple barrier for this ? There isn’t OHLCV

you could make target for your model with TBM, for example

pulsar apex Dec 11, 2025, 11:50 PM

#

warm scroll So far, my model is scratching 11.5 and I'm trying to move it up to the leakage ...

Check how 2.35 sharpe strategy performs compared to SP500, 1243% returns against 107% in past 5.5 years
https://quantiacs.com/leaderboard/23

thin root Dec 11, 2025, 11:55 PM

#

pulsar apex Check how 2.35 sharpe strategy performs compared to SP500, 1243% returns agains...

💰💰💰💰

thin root Dec 11, 2025, 11:56 PM

#

pulsar apex Check how 2.35 sharpe strategy performs compared to SP500, 1243% returns agains...

what is this website ?

#

can anyone compete ? is tjrrr prizes ?

pulsar apex Dec 11, 2025, 11:57 PM

#

thin root what is this website ?

well, someone on Kaggle mentioned better LB for quant competition, I went to check, found this website

thin root Dec 11, 2025, 11:57 PM

#

pulsar apex well, someone on Kaggle mentioned better LB for quant competition, I went to che...

oh

#

Yo i got a question

#

I’ve personally had a strategy ok

#

Backtest was 8 sharpe

#

I ran it live, retraining it weekly

#

I got 4.68 sharpe over 100 trades ish

#

Valid

thin root Dec 11, 2025, 11:59 PM

#

thin root I got 4.68 sharpe over 100 trades ish

This is luck?

#

Or regime luck

#

Where it happened to fit well on regime

pulsar apex Dec 12, 2025, 12:00 AM

#

who knows, if its profitable after commissions against buy-n-hold - just use it, but keep a reasonable stop loss

thin root Dec 12, 2025, 12:00 AM

#

thin root Backtest was 8 sharpe

Yes i found data leakage

thin root Dec 12, 2025, 12:00 AM

#

pulsar apex who knows, if its profitable after commissions against buy-n-hold - just use it,...

Ok

gentle oak Dec 12, 2025, 7:27 PM

#

Hi, I can't find anywhere in the rules how to generate the submission parquet file. I think it does it for you if you call the api, but it says I had a submission scoring error. What should my final dataframe look like coming out of the predict function?

warm scroll Dec 12, 2025, 8:35 PM

#

gentle oak Hi, I can't find anywhere in the rules how to generate the submission parquet fi...

out = pd.DataFrame({"prediction": preds})
return out

preds are your allocations in this case
make sure that preds are clipped between 0.0 and 2.0
add a print line like print(out) to verify inside of the predict() function and run the cell to verify your output

warm scroll Dec 15, 2025, 6:55 AM

#

@noble portal is it possible for me to update my submissions with the newest versions of my models?

maiden moon Dec 15, 2025, 12:28 PM

#

@noble portal Hi, I noticed that train.csv hasn't been updated to the latest data yet. Just to confirm: when the hidden test set evaluation starts on Dec 16, will I be able to access the historical data from early December through Dec 15 by reading train.csv via online learning?

noble portal Dec 15, 2025, 2:06 PM

#

maiden moon <@270989667611049984> Hi, I noticed that train.csv hasn't been updated to the la...

The train.csv will have the data up to December 15th (today) and the rest of the dates will be available via online learning.

noble portal Dec 15, 2025, 2:07 PM

#

warm scroll <@270989667611049984> is it possible for me to update my submissions with the ne...

Yes, there is 10h left in the competition.

thin root Dec 15, 2025, 2:20 PM

#

noble portal Yes, there is 10h left in the competition.

Will the headboard update live?

#

Or would we only know how its doing at the end of the forecasting phase

noble portal Dec 15, 2025, 2:21 PM

#

I believe it should update once a month with new data.

pulsar apex Dec 16, 2025, 12:00 AM

#

meh, Kaggle closed sub choosing before 0:00

thin root Dec 16, 2025, 5:00 AM

#

f

warm scroll Dec 16, 2025, 8:50 PM

#

it's time to begin

#

lets GO!