#hull-tactical-market-prediction

1 messages · Page 1 of 1 (latest)

radiant crag
#

Glad to see this data looks like much better than mitsui

stray stump
#

Yeah I'm not clicking those

spiral geyser
#

Hi, I am a little bit confused by the competition description. 1. In the forecasting phase, the model would be retrained on the most recent data ? for example, in the day 10 during the forecasting phase, can the model access the data of day 1 to day 10? or the model is just trained before the forecasting period, can only see one day data everyday and cannot leverage the data of the past days? Thanks @noble portal

fathom finch
#

Hi, I'm new and still learning. Is it normal to have difficulty getting a positive R²? I'm using GA to improve feature selection, but even so, the Sharpe ratio increases while the R² remains close to zero or lower.

noble portal
noble portal
dark vector
#

Hi! Anyone in for teaming up

hallow elbow
#

Hi Everyone, I'm new to the competition, and need some basic clarification, what's the target column that we are predicting? I see the measuring metric is shape ratio, does that means we are using ML to predict level of S&P holding (between 0-2). Thanks 😃

noble portal
hallow elbow
hallow elbow
#

quick quesitons for market_forward_excess_returns, it says "Train set only", does that means this will be missing in actual test file? and it will be missing during "Forecasting Timeline" so we shouldn't use it as feature?

noble portal
#

The columns available during the live phase are in the test dataset.

viscid zinc
ripe briar
cosmic grove
#

Having issus submitting. Anyone have advice?

cosmic grove
#

Also, I opened the train.csv to take a look and all the features except for the D columns have missinng vaues up to around row 1000. Is this a mistake?

stray stump
viscid zinc
#

the whole purpose of that eda was to shed some light on the data structure how to use it for the end goal (allocation) and how the features interact with the target (excess returns)
as the competition guys themselves mentioned simpler models tend to perfrom better which is what i'm also noticing after deplyoing few models with different levels of complexity

#

you could not use any one of the features they provdied and go based on just the daily returns and calculate the std yourself and you'd still get a running model that can beat some of the more complex ones

final sage
#

anybody want to team up ?

boreal oracle
#

someone ban this guy

crystal axle
#

In this competition what are the target columns, these three only right ?
forward_returns
risk_free_rate
market_forward_excess_returns
we can treat this problem as multi input multi output regression problem !, correct me if I'm wrong.
Thank you !
@noble portal

noble portal
tropic fiber
#

"D* - Dummy/Binary features" -- what exactly is meant by "Dummy features"?

noble portal
versed skiff
#

how are we suppose to submit our notebook without internet ?

humble gulch
#

Hey everyone, quick question about the date_id format in the training data. It looks like a sequential integer counter (e.g., 8980, 8981, 8982), but converting it with pd.to_datetime() defaults to the Unix Epoch (1970-01-01). Can someone confirm if the date_id is just an anonymized counter or if it maps to actual calendar dates? I want to be sure before deciding on creating calendar-based features or merging in external data. Thanks!

feral cradle
#

The predict range is between 0 and 2, does that mean shorting is not allowed?

wet cosmos
wet cosmos
modest trench
#

Outta curiosity, what kinda rmse are y’all getting? I’m having trouble getting below 0.01 tbh

#

Also. Is it just me or does the test set seem to be a little bit funky and not quite behave like the rest of the data set?

#

Probs just me tbh

noble portal
feral cradle
#

Hi, am I missing something or is this competition not telling us what each column means? If that's indeed the case, why????

coarse swallow
#

Maybe a bit of a dumb question but does predict function return the first or the last date of the trading interval? Like, what date_id is targeted when it is called? I don't think this information is specified anywhere..

true goblet
#

hi, do we have any information on private evaluation, for now the last date_id is 8989, should we expect 8990 and so on for final results?

noble portal
#

The date_id for private evaluation will be from the start to the end of the live phase.

coarse swallow
#

Likely the date_id also doesn't really need to be sequential on the test server, correct?

coarse swallow
#

I was trying to replicate the leaderboard score locally and I ran into some pretty wild discrepancy. Essentially I used the default gateway to produce a submission.parquet - then I used the following code:

from argparse import ArgumentParser
from pathlib import Path

import pandas as pd

from hull_challenge.data import determine_data_dir
from hull_challenge.score import score


def main(
        submission_path: Path,
):
    data_dir = determine_data_dir()

    df = pd.read_csv(data_dir / 'train.csv', index_col="date_id")
    submission = pd.read_parquet(submission_path)
    submission = submission.set_index("date_id")

    solution = df[df.index.isin(df.index)][["risk_free_rate", "forward_returns"]]

    submission_score = score(
        solution,
        submission,
        ''
    )
    print(f"Submission score: {submission_score}")


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--submission-path", required=True, type=Path)

    args = parser.parse_args()
    main(submission_path=args.submission_path)

Where the determine_data_dir returns '/kaggle/input/hull-tactical-market-prediction/' on Kaggle, and the score function is the exact same as in their Metric notebook. I got a score of 0.002 locally and a score of 1.2 on the leaderboard. I verified that the predictions are indeed the same as well. What could be going on here?

#

I will raise this in the Kaggle forums but this is essentially a bad state when we simply can't validate our ideas because the LB does some unhinged things in the background that we do not know

wary jungle
#

Hi @noble portal, quick question:

In the forecasting phase, will train.csv continue to grow with new dates so we can re-train daily, or is it frozen after the submission deadline?

leaden isle
#

The safe assumption is to use only the data you have unless stated.

noble portal
stone crystal
#

Hi @noble portal , I tried to replicate locally the lb score. I just return a constant 0.7 in the predict function, in lb I get 0.467, however, if run it locally I get 0.592. I use exactly the last 180 rows in the training set, and exactly the same score function provided officially. Is there anything I misunderstand?

noble portal
stone crystal
tired hazel
#

Hi everyone, i’m a physicist looking for teammates for this competition. I have spent 3 months writing and just published marketML (https://github.com/Microcosmos22/TradeBot_public/tree/main) a python package covering the whole process from data acquisition to training LSTM machines on historical crypto data as well as implementing them in trading strategies. I have some interesting insights and would like to discuss with a data scientist.

spark wasp
#

yoo

vital flint
# noble portal Online learning will allow you to update your model with the new incoming data.

Could you please clarify how that would work in practice? Do you plan to run the evaluation daily and grow the train.csv accordingly, or would the user need to manually handle the incoming data and append it to the existing training set? That would very be useful to know to design the notebook and in particular to know how to best handle momentum-based features which rely on historical data. Thanks!

hidden geyser
#

Hey guys is data leakage in test set still an issue? Like can I use all train rows for submission now?

tired hazel
vital flint
# tired hazel Why do you want to train with the evaluation data?

Sorry it maybe wasn’t very clear, but I meant continuously tuning/retraining the model based on historical data, i.e. for evaluating the model on the current day t using all the data until t-1. Of course that can already be done in the current training phase, but I’m not sure how it would work once we get to the forecasting phase

tired hazel
#

Wrote you a PM

tired hazel
mossy swift
#

Hi

proud pivot
#

@everyone
💬 My Conceptual & Deep Learning Questions for the Kaggle AI Agents Livestream

1️⃣ As AI agents become increasingly autonomous and goal-driven, how can we design incentive systems that keep their long-term behavior aligned with human ethics instead of just short-term reward optimization?

2️⃣ As AI agents get more goal-driven through incentive structures, how do we make sure their reward optimization doesn’t conflict with the broader ethical or social alignment we expect?
And if an agent gains the ability to modify its own mechanisms, how can we formally guarantee that these self-changes stay consistent with human-aligned objectives and don’t create unintended behaviors over time?

3️⃣ When agents start self-modifying their reasoning or learning mechanisms, what kind of formal or mathematical safeguards can ensure their updated versions remain predictable, stable, and still aligned with human objectives?

4️⃣ In multi-agent systems, how can we balance cooperation and autonomy—so that agents don’t end up competing or working against each other while still staying efficient and independent?

5️⃣ With the rise of deep reinforcement learning in multi-agent setups, how can we prevent emergent adversarial behaviors that arise from agents learning implicit competition through shared environments?

6️⃣ How can transformer-based architectures be adapted for continual learning within agent frameworks—so that agents can retain past knowledge while still adapting to new contexts without catastrophic forgetting?

✨ Thanks to the Kaggle and Google teams for hosting such an insightful course and livestream.
Really looking forward to hearing the experts’ thoughts on long-term safety, alignment, and the next wave of deep learning–driven agent architectures. 🚀

noble portal
gritty valley
#

hey dears ! anyone who need a teamate . I want to join..

night crescent
tame rover
#

Hi

trim flume
#

Is scheduled here the Hull Tactical Market Prediction Competition live Ask Me Anything (AMA) session

noble portal
#

I'm here for the next hour, if anyone has questions about the competition.

pale mauve
#

Hey Laurent, have you tried deep learning? Want to share ideas, experiences?

trim flume
#

I am at biginner level and I ask you of possibile tò have a complete example of submission file required

pale mauve
#

How do you see the future of DL for your work. Is it going to be an important direction to explore?

noble portal
noble portal
pale mauve
#

(How) do you try to detect regime changes?

fierce pumice
#

I am a student and this is my first challange, I am kinda confused, what the real target is.
Is it the forward_returns and I use this for the calculation of my Risk 0,1 or 2. Or do we use the risk_free_rate and the market_forward_excess_return aswell for it?

noble portal
noble portal
# fierce pumice I am a student and this is my first challange, I am kinda confused, what the rea...

The confusion is understandable. We have a few set of columns that represent future returns. These are necessary in order to evaluate the performance of the trading strategy. What we are actually asking, is to incorporate the information as best you can in order to get a daily signal between 0 and 200% of your exposure to S&P500. This exposure with the returns will be your strategy's return, and its the cumulative set of returns that is being scored.

trim stream
#

I recognized that people submit the predict function in the submission notebook differently... How frequently is the notebook called in the test phase? Is it day by day prediction? Is expected to predict the whole array (e.g. 180 days) at ones?

noble portal
trim flume
#

I ask you if you have in mind a particular theoretical (or empirical) model in the literature that support your competition (challenge)

trim stream
#

Is it allowed to do online training in the submission notebook (e.g. for lagged features)?

noble portal
trim stream
noble portal
fierce pumice
trim stream
#

Why did you decide to only add 10 days to the test set in the data section but use the last 180 days for the actual score calculation?

#

What also makes me wonder is why you didn't include a detailed description of every feature? Is there a reason why this should be taken into account?

noble portal
noble portal
trim flume
#

Is possibile to use also external data other that in your dataset, and you consider this (use of external data too) with favour or not

noble portal
faint pine
noble portal
#

There is no stream, I'm just answering questions here.

faint pine
# noble portal There is no stream, I'm just answering questions here.

Could you answer this one Thanks for the clarification. I have one last question — and sorry to bother you. Do we expect this competition to be held every year or every two years on Kaggle? And once this one ends, is there any possibility of collaborating with participants who have demonstrated robust approaches over several years, including real out-of-sample backtesting?

noble portal
#

I'll continue taking questions sporadically on this channel going forward. Thank you everyone for your participation.

trim flume
#

Thank to your answers

faint pine
# noble portal We don't plan, thus far, to run this competition again. We will be monitoring th...

Please provide more clarification about the potential for collaboration. I’m quite certain that the top performers on the private leaderboard may not maintain strong results over a long period — say, five to seven years — and that may become evident in the future. Of course, we should respect the winners, but if there’s an opportunity to collaborate with participants who have demonstrated robust long-term approaches, I believe they should be given a way to validate and prove their methods — for example, by submitting reports or notebooks. It would also be valuable to provide them with a benchmark to assess whether their approaches truly add value before they start collaborating or connecting with your team.

noble portal
night crescent
#

@noble portal could you address the questions about online learning? Specifically, how will updated training dated be provided? Is the train csv updated daily?

#

I ask because I can imagine it will be important to update the model over the period when evaluation occurs.

noble portal
night crescent
night crescent
noble portal
#

I don't believe so.

night crescent
stuck galleon
#

!rank

night crescent
#

has anyone figured out how to get updated targets during evaluation peirod for online training?

#

nevermind. it's "lagged_forward_returns"

faint pine
night crescent
rapid sphinx
#

Hi

dull egret
#

hello everyone , i have some problems .The output only exist one file names "Submission_parquet" or include other file(model.pkl;feature_cols.pkl etc) .So how do i submit? And i do not know the submission format ? Is only two element "date_id" and "prediction"?(i have seem this from the kaggle Date )

drowsy rain
#

Hey everyone, I’m interested in working on this project, but before I start, I want to know more about the quality of the dataset. I previously tried the Mitsui dataset, but people on kaggle reddit community mentioned that Kagglers tend to avoid it due to poor data quality. I just want to make sure that’s not the case here. I’d appreciate any input, thanks for reading!

blazing spear
#

train.csv Historic market data. The coverage stretches back decades; expect to see extensive missing values early on.

but the data is like 12mb??

hallow elbow
#

hey everyone, i have a question. when our model is been called in testing phase, will that test data include past lagged_risk_free column? i guess the question is should us this column (current day's risk_free_rate) to adjust our strategy leverage level, like would this column be available in each row.

hallow elbow
hallow elbow
#

i want to ask, how does the public score been evaluated? like it's the diff from xxx sharp or accuracy or what?

tropic fiber
#

This may have already been asked, but why are the momentum-related features missing from the dataset? They are mentioned in the dataset description but are missing from train.csv

night crescent
young ginkgo
#

So my notebook (more like a copied starter notebook) threw an exception at runtime despite successful running, no errors in logs, so how can i debug this?

young egret
#

@noble portal
I have a few questions about forecasting phase: the last date_id in the train.csv is 9020.
Does that mean that the first date_id in the future test set on the forecasting phase will equal to 9021 (if not how it is possible to calculate some statistics which need previous values)?
If predicting new positions will happen every day by one row does it possible to use values from previous forecasting days (if answer is yes so where will be possible to find them, in updated train.csv or anywhere else). For example is it possible to use values from January 2026 in February 2026?

thin root
thin root
thin root
#

@noble portal

#

Is the submission score legit? Or is it the last 180 days of train

noble portal
# young egret <@270989667611049984> I have a few questions about forecasting phase: the last ...

Hi Mykyta,

I don't handle the data to dateid conversion. That will be Kaggle's team that is in charge of that so I can't give you a specific answer of what the dateid will be like for the forecasting phase. You can use your forecast or any information from the forecasting phase via online learning. So yes, you will be able to use January 2026 data for February 2026. Online learning is also handled by Kaggle.

noble portal
thin root
thin root
noble portal
thin root
noble portal
thin root
#

Thank you for your help

noble portal
#

Yes.

thin root
# noble portal Yes.

Hm okay

And some question about feaures, are some of these features purposefully bad?

#

What are you guys trying to test

#

Features and target defenition are given, but it really comes down to features and how the competitor chooses to work with it

noble portal
#

Nothing is intentionally bad. Finding value in the features IS the competition. The noise to signal ratio is high, so the difficulty of the competition is feature engineering to produce signal out of noise.

sand copper
#

stupid question but how are ya all dealing with the null values

tacit fulcrum
#

What does position column mean in the scoring function?

#

https://www.kaggle.com/code/laurentlanteigne/hull-starter-notebook
Also in this starter code, why are we using different columns as targets for training and test set.

    """
    Loads and preprocesses the training dataset.

    Returns:
        pl.DataFrame: The preprocessed training DataFrame.
    """
    return (
        pl.read_csv(DATA_PATH / "train.csv")
        .rename({'market_forward_excess_returns':'target'})
        .with_columns(
            pl.exclude('date_id').cast(pl.Float64, strict=False)
        )
        .head(-10)
    )

def load_testset() -> pl.DataFrame:
    """
    Loads and preprocesses the testing dataset.

    Returns:
        pl.DataFrame: The preprocessed testing DataFrame.
    """
    return (
        pl.read_csv(DATA_PATH / "test.csv")
        .rename({'lagged_forward_returns':'target'})
        .with_columns(
            pl.exclude('date_id').cast(pl.Float64, strict=False)
        )
    )```
#

It does not make any sense at all.

thin root
#

@noble portal Hi, i wanted to ask, when you guys made this competition, did you guys do it to answer a question you guys want answered or just for the love of the game

hallow elbow
#

feels like the score is pointless

#

i assume everyone's model performing is not so well? curious to ask the accuracy or whatever metric you guys achieved

#

if model can forecast 1d movement so well, then we shuld be very confidence to max leverage or leave 0 position right? am i interpreting this right? because trying to overfit the best combo level of positions for test set mean nothing in real environment?

hallow elbow
thin root
hallow elbow
#

if even with their data, i still can't train a OK model, that means it's almost impossible for normal people to play around market data fitting ML models

thin root
#

no your wrong lol

#

normal people? dude just lock in gng

noble portal
#

It's not easy, that's for sure. I think the main problem people run into is they are used to systematic approaches that are generic in their machine learning pipelines. This is more of a surgery type of forecasting where its not about multi-layered complicated machine-learning pipelines, but more so in carefully crafting robust features.

thin root
#

Can we trust that the featres were crafted well?

#

Or like the entire game is filtering it

#

Whats a good scoremetric to target, the public LB is untrsutable so we dont have anyway to measure or proxy towards anything

noble portal
#

Bottom/top approach is better than top/bottom approach in this case.

thin root
#

and our pipeleins would be used for live forcasting correct for the leaderboard

thin root
#

got it

hallow elbow
#

i think the precision is the key?

thin root
thin root
#

I think i got a good version, but since the LB is off, I dont know if i should continue tinkering at it or do something

#

I wanted to ask what scoremetric is good?

thin root
#

@noble portal

#

How will you guys test our bots? If i engineer more features from what was given should i put it inside predict()?

#

What would inference look like, live foracsting?

thin root
#

@noble portal
Should we worry about inference if we were able to successfully upload a submission?

sand copper
formal pivot
#

null

young egret
#

@noble portal Do you know if test data in forecasting phase is continue train data? Or between them some distance is present? In other words, will it possible to calculate for first rows on the forecasting phase different features that depends on previous days (lagged features, rolling window features etc.)?

noble portal
sand copper
paper raven
#

hello everyone, i have some questions about competitions in kaggle,
Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:

CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv

will these be assesed based on the notebook you use to train your (pre-trained) models? what if i use another source like colab pro i and just save the best models the upload to kaggle to inference, does that count as cheating or anything?

young egret
thin root
#

@noble portal

#

whats a good score metric

faint pine
#

@noble portal Will the Test Period Be Extended if Random sub seed Lead the Leaderboard?

#

Anyone who is still confused about maintaining dataset continuity during API inference, facing issues with re-training their models during API inference (‘online training’), or not knowing how to structure data for creating lag features — this is my tutorial notebook. You will find all the steps I mentioned here. Don’t forget to upvote it if you find the information helpful.

quartz ferry
thin root
#

3 sharpe alone is extremely good, a 3 with this scoreemtric ? damn bro

quartz ferry
#

All over the place

#

As low as .45 to has high as 10ish I think....

#

3 is unrealistic in the real world

#

But for this test data ....

#

I'm just saying 3 might be reasonable

thin root
#

dont mean to be rude lol but 3 is crazy high

#

its not sharpe 3

quartz ferry
#

Yeah it is, definitely possible data leakage

thin root
#

oh wait

quartz ferry
#

That's what I'm looking into

thin root
#

so

#

when you make a submisssion

quartz ferry
#

Yeah

thin root
#

that score derives from the last 180 days of the train.csv

quartz ferry
#

Yeah

thin root
#

thats why people overfit to 17

quartz ferry
#

Ahh I see

thin root
#

so you have to make your own split

quartz ferry
#

I joined late to this comp so..

thin root
#

same lol

quartz ferry
#

That is helpful

#

Lol

thin root
#

im very curious on what other people are getting with a properly done split

quartz ferry
#

I'll let you know what I see with proper split

thin root
noble portal
noble portal
# thin root whats a good score metric

Hard to tell, this is essentially bounded by what the market does. If the market has a Sharpe of -0.3 in the next 6 months and you score 0.3, that would be good. If you score 0.3 when market has a Sharpe of 0.8, that is bad.

noble portal
thin root
noble portal
#

You can find that yourself using the data.

young egret
noble portal
#

the train.csv at the end of the competition will have all the dates up to the end of the competition, any new data will be available via online learning.

#

I'm working with Sohier to get data up to from last week up so people can do final adjustments this week.

thin root
#

so when you guys forward test our submission u guys will call predict() right?

#

and do we have to save artifacts or anything like that

thin root
quartz ferry
#

probably been answered before, but will the final test data be scored row by row? or are we going to be able to create lags/rolling averages if the test data is loaded all at once...

quartz ferry
#

or do we store each row by row in a csv or something and reload each instance?

pulsar apex
noble portal
#

The score is aggregated from the first true out of sample observation to the last.

#

The scoring metric is a variant of the Sharpe ratio, you can't score that using one row. The API of Kaggle is feeding the predictions output row by row. I can understand the confusions.

warm scroll
#

How many of these models submitted do you think are manipulating data leakage?

#

Open to any answers.

#

So far, my model is scratching 11.5 and I'm trying to move it up to the leakage range with just pure good design & innovation.

#

I'll be happy if my model hits 16. I am concerned however if I push too hard to get to 17.5 the model might not be adaptive enough to handle new data. I might add in some small adaptive tools to create that flexibility before final submission

thin root
#

The leaderboar is fully leakage

warm scroll
#

is your opinion that market daily returns are not sufficiently predictable in order to build a sharpe above 10?

#

I've engineered the model very carefully. if you're saying I'm overfitting too hard, then you may be right.

#

IMO, it would probably be much easier to build a high vol-adjusted sharpe system if we weren't forced to only allocate to the indexes or to the risk free rate alone

#

also, I find it interesting that the allocation range isn't [-2.0, 2.0]

#

s.t. the system is permitted to either short the market or leverage long on bonds

thin root
#

Reliabilbly over decades? my friend

warm scroll
# thin root Yes bro

what is your evidence that it can't be done? outside of it not being done (yet)

thin root
warm scroll
thin root
#

Your right

#

Go ahead

#

Take over the world with your 10 sharpe

#

When you get 1st place i congradulate you friend.

warm scroll
#

you seem somewhat hostile

#

I hope your day goes better

thin root
#

im telling u its not a vlid score

#

ur better off working on a valid score than focusuing on public lB

#

public LB isnt valid, it says that on the commpeion websie

#

No shade bro im warning you if anything

warm scroll
#

what do you think makes a score valid?

#

I like our conversation. I don't want you to feel like I'm attacking your ideas, I just love to learn why people see things differently

warm scroll
#

let's say I built a model with 3.0. that's pretty neat. why not push for 3.1?

thin root
thin root
warm scroll
#

I'm running the score on the whole dataset

thin root
#

There is a limit on how much alpha you can extract from the feautre

thin root
#

You train it and test it on the whole dataset?

#

Cmon bro

warm scroll
#

I know what you're talking about. it's not a good practice in general.

thin root
#

You dont win the compettion based on public leaderboard

thin root
warm scroll
#

I use the public LB to let me know maximal performance

thin root
warm scroll
#

why do you limit performance?

thin root
#

🤣

#

My friend

#

This isnt performance limiting

#

So your model will work on real markets then

warm scroll
#

if you have a model that scores 3.0, why do you not push for more?

thin root
#

Your model got a 10

warm scroll
#

well, it is my point

thin root
#

Your point is not that?

warm scroll
#

what I'm trying to understand from you is why you don't aim higher

thin root
#

I argued that a 10 is overfitting and isnt valid, you said why limit performance?

thin root
#

Im talking about how you applied your model

#

Is that pushing for more? Or is that straight up overfitting

warm scroll
#

don't make assumptions about how I built it, just think about goals

thin root
#

There is no assumption

warm scroll
#

I did not

thin root
#

Yes you did

#

You ran the score on the entire dataset

#

To do that you also must train on the entire dataset

warm scroll
#

OK, go ahead and lay out my layer architecture for me

#

and tell me how I built my loss function

thin root
thin root
warm scroll
#

hmm, I disagree

thin root
#

Also if you want to push for more and get a 17 i can tell you how

thin root
#

Do you think i have bad intentions?

warm scroll
#

no, but I think you're uninformed

thin root
#

So you think I have good intentions

warm scroll
#

yes

thin root
#

Got it

#

good to lay that out

#

Can you confirm that you trained and tested on the entire dataset?

warm scroll
#

I know where you're going with this and you don't need to explain why that's bad.

thin root
warm scroll
#

Nope.

thin root
warm scroll
#

fair point, go ahead

thin root
#

Its kind of like cheating on a test and claiming to be smart, you know? The idea of a test is to test your skills on unseen information

thin root
warm scroll
#

Hmm, I think you don't quite understand, nothing you said is wrong but you don't conceptualize what I'm doing

#

I don't care about being top of the public LB

thin root
warm scroll
#

Yes, I did so

thin root
#

thats what i mean

pulsar apex
#

bro estimates performance on train dataset, how do you guys decide to spend time on kaggle without listening to a single ml lecture?

thin root
#

isnt valid at all

#

its not just " not practical " its nothing at all

warm scroll
#

there's info you don't have

thin root
#

Do you plan on applying your model to live markets

warm scroll
#

I wouldn't be able to, because I don't have the exact methodology to extract the data in the manner that Hull has done itin

thin root
#

Obviously

warm scroll
#

I'm going to do my absolute best to try to win.

thin root
#

Good luck bro

#

Its a journey

#

When day comes and you want to learn more you can always contact me i love talking about ML

warm scroll
#

thanks. it's a lot of fun overall. I'll take you up on that at some point. Do you like to discuss model architectures and alternative learning structures i.e. innovation in new layers?

#

I could show you some of my recent designs

regal basin
#

Hello everyone, I have implemented a solution based on the principles of Marco Lopez de Prado's book ‘Advances in Financial ML’, but I have noticed that many candidates have simply used "brute force", employing XGBoost + LightGBM + CatBoost + Optuna. In your opinion, which solution would be best? In this type of competition, is one method better than another? Thank you to those who take the time to respond.

thin root
#

lol i definelty didn’t read it in and out

thin root
#

Overfiitiig

regal basin
thin root
#

I’m not too sure

#

There’s forward returns

#

You use forward returns

#

?

pulsar apex
pulsar apex
thin root
#

can anyone compete ? is tjrrr prizes ?

pulsar apex
thin root
#

Yo i got a question

#

I’ve personally had a strategy ok

#

Backtest was 8 sharpe

#

I ran it live, retraining it weekly

#

I got 4.68 sharpe over 100 trades ish

#

Valid

thin root
#

Or regime luck

#

Where it happened to fit well on regime

pulsar apex
#

who knows, if its profitable after commissions against buy-n-hold - just use it, but keep a reasonable stop loss

thin root
gentle oak
#

Hi, I can't find anywhere in the rules how to generate the submission parquet file. I think it does it for you if you call the api, but it says I had a submission scoring error. What should my final dataframe look like coming out of the predict function?

warm scroll
warm scroll
#

@noble portal is it possible for me to update my submissions with the newest versions of my models?

maiden moon
#

@noble portal Hi, I noticed that train.csv hasn't been updated to the latest data yet. Just to confirm: when the hidden test set evaluation starts on Dec 16, will I be able to access the historical data from early December through Dec 15 by reading train.csv via online learning?

noble portal
noble portal
thin root
#

Or would we only know how its doing at the end of the forecasting phase

noble portal
#

I believe it should update once a month with new data.

pulsar apex
#

meh, Kaggle closed sub choosing before 0:00

thin root
#

f

warm scroll
#

it's time to begin

#

lets GO!