#hull-tactical-market-prediction
1 messages · Page 1 of 1 (latest)
Yeah I'm not clicking those
Hi, I am a little bit confused by the competition description. 1. In the forecasting phase, the model would be retrained on the most recent data ? for example, in the day 10 during the forecasting phase, can the model access the data of day 1 to day 10? or the model is just trained before the forecasting period, can only see one day data everyday and cannot leverage the data of the past days? Thanks @noble portal
Hi, I'm new and still learning. Is it normal to have difficulty getting a positive R²? I'm using GA to improve feature selection, but even so, the Sharpe ratio increases while the R² remains close to zero or lower.
You will be able to use latest data via online learning.
Yes, predicting market return generally involves very low R2.
Hi! Anyone in for teaming up
Hi Everyone, I'm new to the competition, and need some basic clarification, what's the target column that we are predicting? I see the measuring metric is shape ratio, does that means we are using ML to predict level of S&P holding (between 0-2). Thanks 😃
The goal is to use the dataset to get a daily exposure to the market between 0-200% in order to maximize some variant of Sharpe Ratio.
yeah, i get that, but we can't use ML to output the portfolio balance level, so likely the direct target is a return? then we work out the 0-200% level according to the shape ratio (evaluation metric), is that general idea?
quick quesitons for market_forward_excess_returns, it says "Train set only", does that means this will be missing in actual test file? and it will be missing during "Forecasting Timeline" so we shouldn't use it as feature?
The columns available during the live phase are in the test dataset.
Of course you can
hello everyone i have posted an exploratory data analysis going through the data, analysing the target market_forward_excess_returns and suggesting some modelling ideas in this discussion : https://www.kaggle.com/competitions/hull-tactical-market-prediction/discussion/610981
You can either train a model to maximise the sharpe ratio score using the position size (0 - 2) directly, but it might be easier to train a model to predict excess_returns, and then use that to pick a position size which you will submit
Having issus submitting. Anyone have advice?
Also, I opened the train.csv to take a look and all the features except for the D columns have missinng vaues up to around row 1000. Is this a mistake?
Cool! I had a quick look at the covariance matrix for the features (including the lagged ones in the test data) and it looks like there is a zero eigenvalue. Did you see this in your analysis?
yes but i only went as far as doing a VIF filtering (Variance Inflation Factor) to drop features with very high multicollinearity (VIF > 10) i did also mention it briefly i think how all these features might be redundant and lead to worse models overall if used all for training
the whole purpose of that eda was to shed some light on the data structure how to use it for the end goal (allocation) and how the features interact with the target (excess returns)
as the competition guys themselves mentioned simpler models tend to perfrom better which is what i'm also noticing after deplyoing few models with different levels of complexity
you could not use any one of the features they provdied and go based on just the daily returns and calculate the std yourself and you'd still get a running model that can beat some of the more complex ones
anybody want to team up ?
someone ban this guy
In this competition what are the target columns, these three only right ?
forward_returns
risk_free_rate
market_forward_excess_returns
we can treat this problem as multi input multi output regression problem !, correct me if I'm wrong.
Thank you !
@noble portal
The columns you mention are very useful to build your daily position, but the goal of the competition is not to predict a target column but to build a daily leverage. Please read the competition details.
"D* - Dummy/Binary features" -- what exactly is meant by "Dummy features"?
It's value is either 0 or 1.
how are we suppose to submit our notebook without internet ?
Hey everyone, quick question about the date_id format in the training data. It looks like a sequential integer counter (e.g., 8980, 8981, 8982), but converting it with pd.to_datetime() defaults to the Unix Epoch (1970-01-01). Can someone confirm if the date_id is just an anonymized counter or if it maps to actual calendar dates? I want to be sure before deciding on creating calendar-based features or merging in external data. Thanks!
The predict range is between 0 and 2, does that mean shorting is not allowed?
Hi, I am Rahul Raj Sirapuram. This is my linkedin:
www.linkedin.com/in/wbcoder
If anyone is interested to team up for:
https://www.kaggle.com/competitions/hull-tactical-market-prediction
Feel free to ping me here or linkedin. We will discuss and edit together. My kaggle id:
roadrashfifa21
Mon - Fri from 15:00 to 17:00 we will work together to do what we can regarding the competition.
This is the current state of my notebook:
https://www.kaggle.com/code/roadrashfifa21/hull-tactical-market-prediction-demo-submission
Hi, I have read the details. So, betting strategy must be a dataframe that says long short or flat for each row of the dataframe along with the predicted market_forward_excess_returns?
Outta curiosity, what kinda rmse are y’all getting? I’m having trouble getting below 0.01 tbh
Also. Is it just me or does the test set seem to be a little bit funky and not quite behave like the rest of the data set?
Probs just me tbh
Look at the starter notebook to get an idea of the output format.
Hi, am I missing something or is this competition not telling us what each column means? If that's indeed the case, why????
Maybe a bit of a dumb question but does predict function return the first or the last date of the trading interval? Like, what date_id is targeted when it is called? I don't think this information is specified anywhere..
hi, do we have any information on private evaluation, for now the last date_id is 8989, should we expect 8990 and so on for final results?
The date_id for private evaluation will be from the start to the end of the live phase.
Likely the date_id also doesn't really need to be sequential on the test server, correct?
I was trying to replicate the leaderboard score locally and I ran into some pretty wild discrepancy. Essentially I used the default gateway to produce a submission.parquet - then I used the following code:
from argparse import ArgumentParser
from pathlib import Path
import pandas as pd
from hull_challenge.data import determine_data_dir
from hull_challenge.score import score
def main(
submission_path: Path,
):
data_dir = determine_data_dir()
df = pd.read_csv(data_dir / 'train.csv', index_col="date_id")
submission = pd.read_parquet(submission_path)
submission = submission.set_index("date_id")
solution = df[df.index.isin(df.index)][["risk_free_rate", "forward_returns"]]
submission_score = score(
solution,
submission,
''
)
print(f"Submission score: {submission_score}")
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--submission-path", required=True, type=Path)
args = parser.parse_args()
main(submission_path=args.submission_path)
Where the determine_data_dir returns '/kaggle/input/hull-tactical-market-prediction/' on Kaggle, and the score function is the exact same as in their Metric notebook. I got a score of 0.002 locally and a score of 1.2 on the leaderboard. I verified that the predictions are indeed the same as well. What could be going on here?
I will raise this in the Kaggle forums but this is essentially a bad state when we simply can't validate our ideas because the LB does some unhinged things in the background that we do not know
Hi @noble portal, quick question:
In the forecasting phase, will train.csv continue to grow with new dates so we can re-train daily, or is it frozen after the submission deadline?
Look for data description
The safe assumption is to use only the data you have unless stated.
Online learning will allow you to update your model with the new incoming data.
Hi @noble portal , I tried to replicate locally the lb score. I just return a constant 0.7 in the predict function, in lb I get 0.467, however, if run it locally I get 0.592. I use exactly the last 180 rows in the training set, and exactly the same score function provided officially. Is there anything I misunderstand?
We've recently updated the data. Could that been the cause?
I have noticed the updated training set, and it is not the cause.
Hi everyone, i’m a physicist looking for teammates for this competition. I have spent 3 months writing and just published marketML (https://github.com/Microcosmos22/TradeBot_public/tree/main) a python package covering the whole process from data acquisition to training LSTM machines on historical crypto data as well as implementing them in trading strategies. I have some interesting insights and would like to discuss with a data scientist.
yoo
Could you please clarify how that would work in practice? Do you plan to run the evaluation daily and grow the train.csv accordingly, or would the user need to manually handle the incoming data and append it to the existing training set? That would very be useful to know to design the notebook and in particular to know how to best handle momentum-based features which rely on historical data. Thanks!
Hey guys is data leakage in test set still an issue? Like can I use all train rows for submission now?
Why do you want to train with the evaluation data?
Sorry it maybe wasn’t very clear, but I meant continuously tuning/retraining the model based on historical data, i.e. for evaluating the model on the current day t using all the data until t-1. Of course that can already be done in the current training phase, but I’m not sure how it would work once we get to the forecasting phase
Wrote you a PM
I think the last 180 rows of the train set are being used for LB Evaluation, thus you should not use them to train, or you will end up overfitting like the current LB leaders
Hi
@everyone
💬 My Conceptual & Deep Learning Questions for the Kaggle AI Agents Livestream
1️⃣ As AI agents become increasingly autonomous and goal-driven, how can we design incentive systems that keep their long-term behavior aligned with human ethics instead of just short-term reward optimization?
2️⃣ As AI agents get more goal-driven through incentive structures, how do we make sure their reward optimization doesn’t conflict with the broader ethical or social alignment we expect?
And if an agent gains the ability to modify its own mechanisms, how can we formally guarantee that these self-changes stay consistent with human-aligned objectives and don’t create unintended behaviors over time?
3️⃣ When agents start self-modifying their reasoning or learning mechanisms, what kind of formal or mathematical safeguards can ensure their updated versions remain predictable, stable, and still aligned with human objectives?
4️⃣ In multi-agent systems, how can we balance cooperation and autonomy—so that agents don’t end up competing or working against each other while still staying efficient and independent?
5️⃣ With the rise of deep reinforcement learning in multi-agent setups, how can we prevent emergent adversarial behaviors that arise from agents learning implicit competition through shared environments?
6️⃣ How can transformer-based architectures be adapted for continual learning within agent frameworks—so that agents can retain past knowledge while still adapting to new contexts without catastrophic forgetting?
✨ Thanks to the Kaggle and Google teams for hosting such an insightful course and livestream.
Really looking forward to hearing the experts’ thoughts on long-term safety, alignment, and the next wave of deep learning–driven agent architectures. 🚀
The current LB is meaningless. The user should take care of the train/test split evaluation on its own. The data provided to the user will contain all information up to that date, and use that to predict next's day return.
Ok thank you
hey dears ! anyone who need a teamate . I want to join..
This has been mentioned a few times...but the detail on how seems to be lacking. Is train.csv updated? How is new train data served to the notebook?
Hi
Is scheduled here the Hull Tactical Market Prediction Competition live Ask Me Anything (AMA) session
I'm here for the next hour, if anyone has questions about the competition.
Hey Laurent, have you tried deep learning? Want to share ideas, experiences?
I am at biginner level and I ask you of possibile tò have a complete example of submission file required
How do you see the future of DL for your work. Is it going to be an important direction to explore?
We have for a few of our internal projects, but not specifically for this problem. The low noise to signal ratio, non-stationary nature of the data, as well as small sample makes this not best suited for deep learning. We're welcomed to new ideas on that front though.
If we ever move to trading at higher frequency for capturing intraday alpha, we may consider using DL.
(How) do you try to detect regime changes?
I am a student and this is my first challange, I am kinda confused, what the real target is.
Is it the forward_returns and I use this for the calculation of my Risk 0,1 or 2. Or do we use the risk_free_rate and the market_forward_excess_return aswell for it?
Depends on what kind of regime we are trying to detect changes, but we hope to capture change in dynamics through volatility indicators.
The confusion is understandable. We have a few set of columns that represent future returns. These are necessary in order to evaluate the performance of the trading strategy. What we are actually asking, is to incorporate the information as best you can in order to get a daily signal between 0 and 200% of your exposure to S&P500. This exposure with the returns will be your strategy's return, and its the cumulative set of returns that is being scored.
I recognized that people submit the predict function in the submission notebook differently... How frequently is the notebook called in the test phase? Is it day by day prediction? Is expected to predict the whole array (e.g. 180 days) at ones?
The API feeds one row at a time, but the score will be from the cumulative daily performance from December 15th 2025 to June 16th 2026.
I ask you if you have in mind a particular theoretical (or empirical) model in the literature that support your competition (challenge)
Is it allowed to do online training in the submission notebook (e.g. for lagged features)?
Yes, you can ask Sohier on Kaggle about details.
Is this metric code still up to date? We plan to expand our training pipeline to include the kaggle evaluation in order to directly measure actual performance. https://www.kaggle.com/code/metric/hull-competition-sharpe
The metric will be the same for the duration of the competition.
Here is a well cited paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=962461
Others that could be of interest:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=948309
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1747345
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5035294
Oke so that I understand it right, we for testing the feature and the lagged( forward_returns, market_forward_excess_return, risk_free_rate)
and then we can try to predict ( forward_returns, market_forward_excess_return, risk_free_rate) if we need them for the calculation of the daily signal?
Why did you decide to only add 10 days to the test set in the data section but use the last 180 days for the actual score calculation?
What also makes me wonder is why you didn't include a detailed description of every feature? Is there a reason why this should be taken into account?
The scoring dates remained the same during the initial phase so that same rows are being scored on before and after the update.
We want a completely algorithmic, reproducible model. We were concerned that by knowing the data, participants might, knowingly or otherwise, incorporate discretion.
Is possibile to use also external data other that in your dataset, and you consider this (use of external data too) with favour or not
To use additional data, we would first need to review it and approve it. We would need to make sure that the data is public and easily accessible to anyone. We would also have to make sure that it would be available until the end of the competition. Requests are possible.
Hi Laurent, When will the stream start?
There is no stream, I'm just answering questions here.
Could you answer this one Thanks for the clarification. I have one last question — and sorry to bother you. Do we expect this competition to be held every year or every two years on Kaggle? And once this one ends, is there any possibility of collaborating with participants who have demonstrated robust approaches over several years, including real out-of-sample backtesting?
We don't plan, thus far, to run this competition again. We will be monitoring the performance of the participants and there are possibilities in the future for collaboration yes.
I'll continue taking questions sporadically on this channel going forward. Thank you everyone for your participation.
Thank to your answers
Please provide more clarification about the potential for collaboration. I’m quite certain that the top performers on the private leaderboard may not maintain strong results over a long period — say, five to seven years — and that may become evident in the future. Of course, we should respect the winners, but if there’s an opportunity to collaborate with participants who have demonstrated robust long-term approaches, I believe they should be given a way to validate and prove their methods — for example, by submitting reports or notebooks. It would also be valuable to provide them with a benchmark to assess whether their approaches truly add value before they start collaborating or connecting with your team.
Sorry, I don't have more to provide than what I've already mentioned.
thank you
@noble portal could you address the questions about online learning? Specifically, how will updated training dated be provided? Is the train csv updated daily?
I ask because I can imagine it will be important to update the model over the period when evaluation occurs.
This is a question for Sohier on Kaggle.
Thank you
Is Sohier also on this discord channel?
I don't believe so.
Okay. Thanks. I know it can't be easy to keep up with the questions
!rank
has anyone figured out how to get updated targets during evaluation peirod for online training?
nevermind. it's "lagged_forward_returns"
I posted a discussion about online training, and I mentioned that if it receives enough upvotes, I will publish a tutorial notebook showing the community how to use online training during API inference. Unfortunately, the discussion started with a downvote, haha.
thanks, I figured it out: create a class with a "call" method, use it to store/update data, wrap it so it's a function type
Hi
hello everyone , i have some problems .The output only exist one file names "Submission_parquet" or include other file(model.pkl;feature_cols.pkl etc) .So how do i submit? And i do not know the submission format ? Is only two element "date_id" and "prediction"?(i have seem this from the kaggle Date )
Hey everyone, I’m interested in working on this project, but before I start, I want to know more about the quality of the dataset. I previously tried the Mitsui dataset, but people on kaggle reddit community mentioned that Kagglers tend to avoid it due to poor data quality. I just want to make sure that’s not the case here. I’d appreciate any input, thanks for reading!
train.csv Historic market data. The coverage stretches back decades; expect to see extensive missing values early on.
but the data is like 12mb??
hey everyone, i have a question. when our model is been called in testing phase, will that test data include past lagged_risk_free column? i guess the question is should us this column (current day's risk_free_rate) to adjust our strategy leverage level, like would this column be available in each row.
there are a lot missing value, but i think most of them are good. data are very rich, they created a lot features (names are covered)~~~
i want to ask, how does the public score been evaluated? like it's the diff from xxx sharp or accuracy or what?
This may have already been asked, but why are the momentum-related features missing from the dataset? They are mentioned in the dataset description but are missing from train.csv
it's been asked many times. AFAIK, it has not been addressed.
So my notebook (more like a copied starter notebook) threw an exception at runtime despite successful running, no errors in logs, so how can i debug this?
@noble portal
I have a few questions about forecasting phase: the last date_id in the train.csv is 9020.
Does that mean that the first date_id in the future test set on the forecasting phase will equal to 9021 (if not how it is possible to calculate some statistics which need previous values)?
If predicting new positions will happen every day by one row does it possible to use values from previous forecasting days (if answer is yes so where will be possible to find them, in updated train.csv or anywhere else). For example is it possible to use values from January 2026 in February 2026?
OK, the current LB leaders are just overfitting right?
what makes you think it stretehces decades
Hi Mykyta,
I don't handle the data to dateid conversion. That will be Kaggle's team that is in charge of that so I can't give you a specific answer of what the dateid will be like for the forecasting phase. You can use your forecast or any information from the forecasting phase via online learning. So yes, you will be able to use January 2026 data for February 2026. Online learning is also handled by Kaggle.
The test set is 180 days from the train set.
If i understand correctly, the test set is 180 days thats included in training
Whats the point of this? this is overfitting, i thinki misunderstood you?
Its there just Nan starting out
https://media.discordapp.net/attachments/1444971360047726605/1445085758598938824/image1.gif?ex=692f107d&is=692dbefd&hm=94f18cd6e7350e7cc612826beb5d11a9fd125485a58ee1e39a16a03b6f9e2426&=&width=237&height=315
https://media.discordapp.net/attachments/1444971360047726605/1445085766937088000/image2.gif?ex=692f107f&is=692dbeff&hm=51e8429e6818b166e21485a613e8f0c706d64c765aefc93f65a7bcefa10907c2&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085774562197535/image3.gif?ex=692f1081&is=692dbf01&hm=e520e8e4edd4eea02e82168a7059a868ea59c19d9b90c7c34402f7bb3616c76f&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085781801566319/image4.gif?ex=692f1082&is=692dbf02&hm=bdc0715977fdcda4b7804916e5bfb36af1d3132f535d1b4327894a067fbfc769&=&width=725&height=907
Why prposefully provide a useless submission score? Sorry to come off as rude, if i am, i dont mean to, i just want to undestand Its not my intention just dont know how to ask
The reason the competition is evaluated live is because it is the only fair evaluation method for the submissions. Kaggle requires to have some leaderboard setup by design, it is also to validate your code pipeline will work.
Wouldnt a more fair evaluation be a truly OOS ?
Yes, that will be the forecasting phase. We also want to supply as much recent data as possible for users to tune their model. They have to do diligent about their walk-forward train/test split though.
Oh okay, so forecating phase is wha actually is measured for the first place
The submission score doesnt mean much
Thank you for your help
Yes.
Hm okay
And some question about feaures, are some of these features purposefully bad?
What are you guys trying to test
Features and target defenition are given, but it really comes down to features and how the competitor chooses to work with it
Nothing is intentionally bad. Finding value in the features IS the competition. The noise to signal ratio is high, so the difficulty of the competition is feature engineering to produce signal out of noise.
Thanks for your insight ❤️
stupid question but how are ya all dealing with the null values
Hey, I am trying to undestand this scoring function.
https://www.kaggle.com/code/metric/hull-competition-sharpe
Is submission['prediction'] supposed to be market_forward_excess_returns from csv?
What does position column mean in the scoring function?
https://www.kaggle.com/code/laurentlanteigne/hull-starter-notebook
Also in this starter code, why are we using different columns as targets for training and test set.
"""
Loads and preprocesses the training dataset.
Returns:
pl.DataFrame: The preprocessed training DataFrame.
"""
return (
pl.read_csv(DATA_PATH / "train.csv")
.rename({'market_forward_excess_returns':'target'})
.with_columns(
pl.exclude('date_id').cast(pl.Float64, strict=False)
)
.head(-10)
)
def load_testset() -> pl.DataFrame:
"""
Loads and preprocesses the testing dataset.
Returns:
pl.DataFrame: The preprocessed testing DataFrame.
"""
return (
pl.read_csv(DATA_PATH / "test.csv")
.rename({'lagged_forward_returns':'target'})
.with_columns(
pl.exclude('date_id').cast(pl.Float64, strict=False)
)
)```
It does not make any sense at all.
@noble portal Hi, i wanted to ask, when you guys made this competition, did you guys do it to answer a question you guys want answered or just for the love of the game
feels like the score is pointless
i assume everyone's model performing is not so well? curious to ask the accuracy or whatever metric you guys achieved
if model can forecast 1d movement so well, then we shuld be very confidence to max leverage or leave 0 position right? am i interpreting this right? because trying to overfit the best combo level of positions for test set mean nothing in real environment?
i do it for fun, also bc the sponsor already create vast amount of features, including some data we don;t have as retail traders, so, it's kind fun to "try" finding machine learning methods that can do some sort of prediction~~~
yes score is piontpless
if even with their data, i still can't train a OK model, that means it's almost impossible for normal people to play around market data fitting ML models
It's not easy, that's for sure. I think the main problem people run into is they are used to systematic approaches that are generic in their machine learning pipelines. This is more of a surgery type of forecasting where its not about multi-layered complicated machine-learning pipelines, but more so in carefully crafting robust features.
So main focus is literally feature surgery
Can we trust that the featres were crafted well?
Or like the entire game is filtering it
Whats a good scoremetric to target, the public LB is untrsutable so we dont have anyway to measure or proxy towards anything
Bottom/top approach is better than top/bottom approach in this case.
What about a good scoremetric?
and our pipeleins would be used for live forcasting correct for the leaderboard
So its more about precision than full stack kitchen sink
got it
i think the precision is the key?
surgery
Yo
I think i got a good version, but since the LB is off, I dont know if i should continue tinkering at it or do something
I wanted to ask what scoremetric is good?
@noble portal
How will you guys test our bots? If i engineer more features from what was given should i put it inside predict()?
What would inference look like, live foracsting?
@noble portal
Should we worry about inference if we were able to successfully upload a submission?
Just gon keep it here in xase anypne want to answer

null
@noble portal Do you know if test data in forecasting phase is continue train data? Or between them some distance is present? In other words, will it possible to calculate for first rows on the forecasting phase different features that depends on previous days (lagged features, rolling window features etc.)?
The data will continue to represent each market days continuously as it updates.
Very insightful thank you
hello everyone, i have some questions about competitions in kaggle,
Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:
CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv
will these be assesed based on the notebook you use to train your (pre-trained) models? what if i use another source like colab pro i and just save the best models the upload to kaggle to inference, does that count as cheating or anything?
As I know, the last date in train.csv it is the first day before the start of competition or something close to that, so it's approximately September 16, 2025. The first date in data for forecasting should be December 16, 2025. So, how it can continue train data continuously if we have 3 month distance between them. Or some of this dates are incorrect? Thanks for answers
@noble portal Will the Test Period Be Extended if Random sub seed Lead the Leaderboard?
Anyone who is still confused about maintaining dataset continuity during API inference, facing issues with re-training their models during API inference (‘online training’), or not knowing how to structure data for creating lag features — this is my tutorial notebook. You will find all the steps I mentioned here. Don’t forget to upvote it if you find the information helpful.
in real life I think 0.7-1 is considered very good. Though for this competition, anything up to 3 is what I am reading in discussions saying would be reasonable. Above that is probably severely over fitting. Though, this is just speculation. A lot of unknowns here.
no way 3 is reasonable
3 sharpe alone is extremely good, a 3 with this scoreemtric ? damn bro
what number have you gotten
All over the place
As low as .45 to has high as 10ish I think....
3 is unrealistic in the real world
But for this test data ....
I'm just saying 3 might be reasonable
do you have a baackground in finace or something?
dont mean to be rude lol but 3 is crazy high
its not sharpe 3
Yeah it is, definitely possible data leakage
oh wait
That's what I'm looking into
Yeah
that score derives from the last 180 days of the train.csv
Yeah
thats why people overfit to 17
Ahh I see
so you have to make your own split
I joined late to this comp so..
same lol
im very curious on what other people are getting with a properly done split
I'll let you know what I see with proper split
dm me what you get and ill share mine
We've allowed ourselves to discard any submissions that is disingenuous. So random seed allocations cannot win.
Hard to tell, this is essentially bounded by what the market does. If the market has a Sharpe of -0.3 in the next 6 months and you score 0.3, that would be good. If you score 0.3 when market has a Sharpe of 0.8, that is bad.
Train will be updated up to last day of the competition, and online learning for days after.
yeah but like on average throughout the entire history
You can find that yourself using the data.
So, new data during this 3 months will be after the end of the competition?
the train.csv at the end of the competition will have all the dates up to the end of the competition, any new data will be available via online learning.
I'm working with Sohier to get data up to from last week up so people can do final adjustments this week.
is there a way to setup online learning ?
so when you guys forward test our submission u guys will call predict() right?
and do we have to save artifacts or anything like that
can i DM you? i have a question unrelated to the compettion but related to the industry
probably been answered before, but will the final test data be scored row by row? or are we going to be able to create lags/rolling averages if the test data is loaded all at once...
or do we store each row by row in a csv or something and reload each instance?
I asked it on kaggle forum, but no one answered, 0 upvotes. I have no idea either and there is no clear instruction anywhere, 4 days left
The score is aggregated from the first true out of sample observation to the last.
The scoring metric is a variant of the Sharpe ratio, you can't score that using one row. The API of Kaggle is feeding the predictions output row by row. I can understand the confusions.
How many of these models submitted do you think are manipulating data leakage?
Open to any answers.
So far, my model is scratching 11.5 and I'm trying to move it up to the leakage range with just pure good design & innovation.
I'll be happy if my model hits 16. I am concerned however if I push too hard to get to 17.5 the model might not be adaptive enough to handle new data. I might add in some small adaptive tools to create that flexibility before final submission
Bro if you are at 11.5 your already cooked
The leaderboar is fully leakage
what makes you say that?
is your opinion that market daily returns are not sufficiently predictable in order to build a sharpe above 10?
I've engineered the model very carefully. if you're saying I'm overfitting too hard, then you may be right.
IMO, it would probably be much easier to build a high vol-adjusted sharpe system if we weren't forced to only allocate to the indexes or to the risk free rate alone
also, I find it interesting that the allocation range isn't [-2.0, 2.0]
s.t. the system is permitted to either short the market or leverage long on bonds
Yes bro
Reliabilbly over decades? my friend
what is your evidence that it can't be done? outside of it not being done (yet)
Industry dude
overfitting too hard? You shuldnt overgit at all lol
I'm sure that horses and carriage drivers also thought automated vehicles were impossible as well
You know what bro
Your right
Go ahead
Take over the world with your 10 sharpe
When you get 1st place i congradulate you friend.
ur gonna regret it lol
im telling u its not a vlid score
ur better off working on a valid score than focusuing on public lB
public LB isnt valid, it says that on the commpeion websie
No shade bro im warning you if anything
what do you think makes a score valid?
I like our conversation. I don't want you to feel like I'm attacking your ideas, I just love to learn why people see things differently
Bet lets go at it then lol
let's say I built a model with 3.0. that's pretty neat. why not push for 3.1?
So like if you fit hard on the training data and then test the last 180 rows of that, its gonna look good
There is a ceiling that you will hit
I'm running the score on the whole dataset
There is a limit on how much alpha you can extract from the feautre
Thats even worse lol
You train it and test it on the whole dataset?
Cmon bro
I know what you're talking about. it's not a good practice in general.
Thats the issue, its never ever a good practice
You dont win the compettion based on public leaderboard
Whats the point of a trading bot if it cannot work on real markets
I use the public LB to let me know maximal performance
That doesnt help you
why do you limit performance?
🤣
My friend
This isnt performance limiting
So your model will work on real markets then
if you have a model that scores 3.0, why do you not push for more?
That isnt my point
Your model got a 10
well, it is my point
Your point is not that?
what I'm trying to understand from you is why you don't aim higher
I argued that a 10 is overfitting and isnt valid, you said why limit performance?
Nobody is saying dont aim higher
Im talking about how you applied your model
Is that pushing for more? Or is that straight up overfitting
don't make assumptions about how I built it, just think about goals
You told me how you built it
There is no assumption
I did not
Yes you did
You ran the score on the entire dataset
To do that you also must train on the entire dataset
OK, go ahead and lay out my layer architecture for me
and tell me how I built my loss function
Here you go
Does not matter
hmm, I disagree
Also if you want to push for more and get a 17 i can tell you how
let me ask you a more important question
Do you think i have bad intentions?
no, but I think you're uninformed
So you think I have good intentions
yes
Got it
good to lay that out
Can you confirm that you trained and tested on the entire dataset?
I know where you're going with this and you don't need to explain why that's bad.
Do you think i have bad intentions
Nope.
So wherever this is going is being done is done with good intents
fair point, go ahead
If you trained and tested on the entire dataset, your score isnt valid, thats nt performance limiting
Its kind of like cheating on a test and claiming to be smart, you know? The idea of a test is to test your skills on unseen information
Then going to the teacher and saying why cant i cheat? I got a 100, you are limiting my performance
Hmm, I think you don't quite understand, nothing you said is wrong but you don't conceptualize what I'm doing
I don't care about being top of the public LB
Can you confirm that you trained and tested on the entire dataset?
Yes, I did so
thats what i mean
bro estimates performance on train dataset, how do you guys decide to spend time on kaggle without listening to a single ml lecture?
im saying doing that
isnt valid at all
its not just " not practical " its nothing at all
there's info you don't have
Ok
Do you plan on applying your model to live markets
I wouldn't be able to, because I don't have the exact methodology to extract the data in the manner that Hull has done itin
In terms of the competetion
Obviously
I'm going to do my absolute best to try to win.
Good luck bro
Its a journey
When day comes and you want to learn more you can always contact me i love talking about ML
thanks. it's a lot of fun overall. I'll take you up on that at some point. Do you like to discuss model architectures and alternative learning structures i.e. innovation in new layers?
I could show you some of my recent designs
Hello everyone, I have implemented a solution based on the principles of Marco Lopez de Prado's book ‘Advances in Financial ML’, but I have noticed that many candidates have simply used "brute force", employing XGBoost + LightGBM + CatBoost + Optuna. In your opinion, which solution would be best? In this type of competition, is one method better than another? Thank you to those who take the time to respond.
Yeah
Dm
I’m curious to know
I’ve read the book, what solution?
lol i definelty didn’t read it in and out
Definelty not optuna haha
Overfiitiig
I used meta-labelling, sample weighting, purged K-fold CV, triple barrier, MDA...
How do you use triple barrier for this ? There isn’t OHLCV
I’m not too sure
There’s forward returns
You use forward returns
?
you could make target for your model with TBM, for example
Check how 2.35 sharpe strategy performs compared to SP500, 1243% returns against 107% in past 5.5 years
https://quantiacs.com/leaderboard/23
💰💰💰💰
what is this website ?
can anyone compete ? is tjrrr prizes ?
well, someone on Kaggle mentioned better LB for quant competition, I went to check, found this website
oh
Yo i got a question
I’ve personally had a strategy ok
Backtest was 8 sharpe
I ran it live, retraining it weekly
I got 4.68 sharpe over 100 trades ish
Valid
This is luck?
Or regime luck
Where it happened to fit well on regime
who knows, if its profitable after commissions against buy-n-hold - just use it, but keep a reasonable stop loss
Yes i found data leakage
Ok
Hi, I can't find anywhere in the rules how to generate the submission parquet file. I think it does it for you if you call the api, but it says I had a submission scoring error. What should my final dataframe look like coming out of the predict function?
out = pd.DataFrame({"prediction": preds})
return out
preds are your allocations in this case
make sure that preds are clipped between 0.0 and 2.0
add a print line like print(out) to verify inside of the predict() function and run the cell to verify your output
@noble portal is it possible for me to update my submissions with the newest versions of my models?
@noble portal Hi, I noticed that train.csv hasn't been updated to the latest data yet. Just to confirm: when the hidden test set evaluation starts on Dec 16, will I be able to access the historical data from early December through Dec 15 by reading train.csv via online learning?
The train.csv will have the data up to December 15th (today) and the rest of the dates will be available via online learning.
Yes, there is 10h left in the competition.
Will the headboard update live?
Or would we only know how its doing at the end of the forecasting phase
I believe it should update once a month with new data.
meh, Kaggle closed sub choosing before 0:00
f