#jane-street-real-time-market-data-forecasting | Kaggle | Page 1

livid vortex Oct 14, 2024, 10:21 PM

#

100

livid elbow Oct 15, 2024, 2:07 AM

#

What is the format for the submission? Does my notebook need to have an output file?

steel trench Oct 15, 2024, 2:57 PM

#

how to submit and what exactly to submit

lapis marsh Oct 15, 2024, 8:43 PM

#

Per the competition rules, "You must submit to this competition using the provided evaluation API, which ensures that models do not peek forward in time." See: https://www.kaggle.com/code/ryanholbrook/jane-street-rmf-demo-submission

Jane Street RMF Demo Submission

Explore and run machine learning code with Kaggle Notebooks | Using data from Jane Street Real-Time Market Data Forecasting

#

@steel trench

south willow Oct 15, 2024, 8:46 PM

#

Hello

brittle crown Oct 15, 2024, 9:10 PM

#

hi

ember meteor Oct 15, 2024, 9:18 PM

#

Hi Everyone. I am new to market data forecasting. I was just reading the problem statement and have a question on terminology. What do we mean by responders? I couldn't find the exact definition anywhere.

west glade Oct 16, 2024, 12:53 PM

#

@ember meteor Hi! "responder" or more specifically "responder_6" is meant to be your target or the outcome. They don't want to expose any sensitive info so they use the term responder. Lets say you take a buy trade at 7:35 AM. "responder" would be the value at 7:36 AM. That's just an example. Your goal is to predict that value at 7:36 AM.

ember meteor Oct 16, 2024, 1:50 PM

#

west glade <@1295852376690790522> Hi! "responder" or more specifically "responder_6" is mea...

Thanks a lot for the explanation. So if I got this right each record in the training dataset is one trading opportunity. The 79 features are time series values of a particular scrip. The 9 responders are the future vales of that same time series when extended. Among them the responder 6 is what we are predicting.

west glade Oct 16, 2024, 1:54 PM

#

I have not dug into the data well enough yet but that sounds right. the features are probably values of different indicators. ex: RSI, MOVING AVERAGE, OTHER SECURITIES, ETC..

gloomy spear Oct 16, 2024, 4:09 PM

#

Hello everyone this is my first Kaggle challenge, Is there an environment file we can use to set up everything that's needed for the APIs and inference servers ?

simple shoal Oct 16, 2024, 4:21 PM

#

west glade I have not dug into the data well enough yet but that sounds right. the features...

This lead me to a question, if we don't know the nature of those features how can we do features engineering e.g. if I'm not sure if the feature is already a moving average then how can I know if should made a moving average out of it or not

west glade Oct 16, 2024, 4:28 PM

#

simple shoal This lead me to a question, if we don't know the nature of those features how ca...

Our task is just to find and use useful relationships between the features that they provided us if there is any. I also prefer to have actual meaningful categories names so we can play with but if we look at this purely from a time series analysis point of view, then we can think more clearly.

Lets say that we know one of them is a moving average, we might be tempted to try every moving average under the sun. We can do our own feature engineering since we know its market. The way they presented it leaves things more focused in my opinion. Puts everyone on the same playing field.

west glade Oct 16, 2024, 4:29 PM

#

gloomy spear Hello everyone this is my first Kaggle challenge, Is there an environment file w...

The kaggle page provides a notebook to work from. We should stick to using a Kaggle notebook for all of our work.

gloomy spear Oct 16, 2024, 4:31 PM

#

west glade The kaggle page provides a notebook to work from. We should stick to using a Kag...

Oh sorry, I meant an environment file for dependencies, versions of packages etc... that are needed for said notebook

west glade Oct 16, 2024, 4:31 PM

#

ahhh

simple shoal Oct 16, 2024, 4:32 PM

#

west glade Our task is just to find and use useful relationships between the features that ...

So what you are saying is that I should anyway try the features transformations I believe would add useful signals to the data

#

I'm a beginner in machine learning and this my first time to deal with anonymous features so I might fail to make the most intuitive conclusion about the problem we have in this competition

west glade Oct 16, 2024, 4:35 PM

#

The way I like to approach is use what is given, then manipulate/add/remove as needed to get better scores.

west glade Oct 16, 2024, 4:36 PM

#

simple shoal I'm a beginner in machine learning and this my first time to deal with anonymous...

I'm pretty new too so no worries. We are all here to learn and gain experience. I only feel slightly more confident because of my experience with a few recent projects and my love for trading the market.

simple shoal Oct 16, 2024, 4:38 PM

#

What about missing values NaNs I'm quite sure you didn't drop these rows but only want to confirm

#

I'm surprised that no one in the discussion talked about it in detail although there's significant number of NaNs in this dataset

#

I have the impression that the fact that a value is missing in this dataset holds a useful signal and that the testing data will include many NaNs as well so I think imputing or dropping them isn't smart

west glade Oct 16, 2024, 7:05 PM

#

Handling missing or NaN values for time series can be a bit of an art.
Is that row important?
Can I use the row before it in my model to predict the next row?
All valid questions. In the case of returns, I personally would most likely impute in some way. That may or may not be the best approach in this case but it's up to the programmer 💪 🤓 .

narrow nest Oct 16, 2024, 7:21 PM

#

Hey, just starting in this comp so apologies if this is a silly question.

In the test set are we provided lags for each date_id, time_id pair? Or only for the first one?

So in the current example, test and lags have the same number of rows. But if only the first time_id for each date is given for the lags, test.csv might have more rows than lags - right?

west glade Oct 16, 2024, 7:26 PM

#

There are no stupid question here. I haven't had the chance to dig into the data well enough yet. I'm sorry I'm not sure.

crisp hawk Oct 17, 2024, 12:35 AM

#

what are tags?

warm oriole Oct 17, 2024, 1:56 AM

#

Sorry, new to these type of competitions. I see that there are many submission submitted already. Are teams meant to be making new submissions throughout the project until the deadline ?

knotty spade Oct 17, 2024, 3:25 AM

#

During the forecasting phase, is our model forced to be fixed? Can we train dynamically when new information flows in, or feed this new information into our prediction function in anyway via not a serious "training"? In other words, can we use the information of sequentially earlier part of the test set when predicting the sequentially later part of the test set?

modern bronze Oct 17, 2024, 8:59 AM

#

Hello there! "weight - The weighting used for calculating the scoring function" what's the scoring function here?

knotty spade Oct 17, 2024, 9:45 AM

#

modern bronze Hello there! "weight - The weighting used for calculating the scoring function" ...

I think it's in the overview -> evaluation

modern bronze Oct 17, 2024, 11:06 AM

#

thanks!

solid bobcat Oct 19, 2024, 1:05 AM

#

im a little confused about what some of the files are for. are the responders.csv and features.csv supposed to be used? or just to look at?

lofty ferry Oct 19, 2024, 1:56 AM

#

I believe they're just to look at

rugged summit Oct 19, 2024, 2:30 PM

#

guys im new i want to try a constant submission just to test but i cant import the module how can i do

#

meager garnet Oct 19, 2024, 4:51 PM

#

Has anyone tried PySpark for submission. Is it even possible? I am new to Pyspark so was thinking of applying it in this contest.

thick creek Oct 20, 2024, 3:50 PM

#

rugged summit

same here.
any updates how to install kaggle_evaluation
FYI I posted this to kaggle as well
https://www.kaggle.com/competitions/jane-street-real-time-market-data-forecasting/discussion/541541

Jane Street Real-Time Market Data Forecasting

Predict financial market responders using real-world data.

abstract vale Oct 21, 2024, 10:40 PM

#

My notebook is failing scoring, but nothing appears in the logs. My guess is that this is due the inference server being in a different thread. Is there an example of how to capture those logs, so I can figure out the source of the failure?

wise spire Oct 22, 2024, 4:51 AM

#

rugged summit

Not sure, but I don’t think that’s the correct way to import “kaggle evaluation” module.

burnt forge Oct 22, 2024, 6:18 AM

#

Hello, what is the best practice regarding installing packages in kaggle notebooks? I see that there's a default environment with some reasonable package versions (i.e. lightgbm 4.2.0). I tried updating it to 4.5.0 by connecting the notebook to the internet, but it turns out we cannot submit notebooks with connection to the internet.

Curious to hear if there are Kaggle ways to deal with this?

somber mauve Oct 22, 2024, 3:29 PM

#

So, in order to generate/create features based on the responders as example, will they be available at each date_id/time_id ? all of them not just responder_6 ? Same goes for features, for each date_id/time_id will they be available as well ?

fiery sage Oct 22, 2024, 3:46 PM

#

what is the reason for the Nulls in the data? a decent chunk of some of the features are nulls

compact sedge Oct 22, 2024, 5:19 PM

#

Hello I am trying to make submission through API, I tried setting os.environ['KAGGLE_IS_COMPETITION_RERUN'] = "True" and then run the Serve function and the server kept running with no results so far, Can anyone please help me?

rich sorrel Oct 23, 2024, 2:25 AM

#

burnt forge Hello, what is the best practice regarding installing packages in kaggle noteboo...

yes. you can upload the package as a Kaggle Dataset and then import the dataset and pip install it from the file. this is an example https://www.kaggle.com/datasets/marketneutral/cvxpy-python-package

CVXPY Python Package

pip installable CVXPY for internet-disabled code competitions

rich sorrel Oct 23, 2024, 2:29 AM

#

solid bobcat im a little confused about what some of the files are for. are the responders.cs...

features.csv may be used to aid in feature selection (tags may indicate features belonging to the same style or group). multiple responders may be used as additional targets to help regularize the model

#

this is not correct. the inference server gives you the t-1 responder values. you therefore can train an online model that includes test data

knotty spade Oct 23, 2024, 3:22 AM

#

rich sorrel this is not correct. the inference server gives you the t-1 responder values. yo...

Yeah, I am still confused about that point. It seems that the current "submit a single prediction function" lacks the ability to do any inferences a posteriori or in general, to use any information from the sequentially last or previous datapoints.

compact sedge Oct 23, 2024, 4:21 AM

#

compact sedge Hello I am trying to make submission through API, I tried setting os.environ['KA...

Can Anyone please tell me what should I do

compact sedge Oct 23, 2024, 2:18 PM

#

Hello EveryOne, Can anyone verify if we need to set KAGGLE_IS_COMPETITION_RERUN env variable before running the server and what how much time does it usually take to complete a submission?

queen trench Oct 23, 2024, 5:39 PM

#

as of my understanding of the data, we have timeseries data for training set, but only the responders (0,[...],8) from T-1 and the features (0, [...],78) from T for the prediction right? so for the prediction we do not have historical data available?

rich sorrel Oct 24, 2024, 2:12 AM

#

knotty spade Yeah, I am still confused about that point. It seems that the current "submit a ...

I haven’t worked on this part of the problem and won’t for awhile. But you can cache the test data up to t-1 and on t at time 0 you are given all the historical responders. hence you could, say, retrain every 100 test days or something like that.

rich sorrel Oct 24, 2024, 2:13 AM

#

fiery sage what is the reason for the Nulls in the data? a decent chunk of some of the feat...

since the data are anonymized, we don’t know.

still thunder Oct 24, 2024, 5:00 AM

#

queen trench as of my understanding of the data, we have timeseries data for training set, bu...

Yeah I actually feel the same. Let me know if it's otherwise

strange canyon Oct 24, 2024, 3:01 PM

#

i am facing memory issue while running notebook in kaggle. what's the solution? should i use less partition from train data rather than using all 10, will that impact model's accuracy?

tame urchin Oct 24, 2024, 3:40 PM

#

I，SJTU master，Xiamen univer graduated，want a Chinese team .

still thunder Oct 24, 2024, 5:13 PM

#

strange canyon i am facing memory issue while running notebook in kaggle. what's the solution? ...

Have you tried this - https://www.kaggle.com/code/yuanzhezhou/jane-street-baseline-lgb-xgb-and-catboost

🥇🥇Jane Street Baseline lgb, xgb and catboost🥇🥇

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

somber mauve Oct 25, 2024, 9:02 AM

#

For every time_id per date_id, will the features be available same goes for responders ? My point is if I want to generate features like rolling mean, lags as well (ignore the files for now)

worthy needle Oct 25, 2024, 1:29 PM

#

hi i want to know how long will the calculate takes after a submission? i have waited for 20 mins

worthy needle Oct 25, 2024, 1:36 PM

#

worthy needle hi i want to know how long will the calculate takes after a submission? i have w...

forget it. done in 24mins

fiery sage Oct 26, 2024, 4:06 AM

#

strange canyon i am facing memory issue while running notebook in kaggle. what's the solution? ...

try running it per partition i guess

strange canyon Oct 26, 2024, 7:40 AM

#

this is my first competetion. i have made 8 submissions but can't go above score of 0.0043. i have tried lightgbm and xgb. should i focus on improving my existing submission OR try different model? any suggestion?

gritty summit Oct 26, 2024, 10:44 PM

#

Hello, I am trying to submit a simple dummy submission just to get a good handle on the submission format. I am getting "Notebook threw Exception" despite passing the assertions in the given "predict" function. All I did was copy the example notebook and edit the predict response, it doesnt even use a model. Any help with navigating the API and submission format would be appreciated.

#

Example code without any model would be appreciated just to understand submission better

gritty summit Oct 26, 2024, 11:09 PM

#

As far as I can tell the demo submission doesn’t actually work as is??

strange canyon Oct 27, 2024, 4:59 AM

#

when i submit my notebook yo competetion it is failing again and again saying."Notebook Inference server error"

strange canyon Oct 27, 2024, 11:32 AM

#

not able to submit.. tried so many times..

sonic gust Oct 28, 2024, 3:22 AM

#

still thunder Have you tried this - https://www.kaggle.com/code/yuanzhezhou/jane-street-baseli...

hey, I've got past the memory issue when loading the data, however this exact notebook seems to fail on the training phase (hitting the RAM limit), specifically on train(model_dict, 'lgb'), did you solve somehow this issue if you had it?

gritty summit Oct 28, 2024, 7:30 AM

#

Using all my daily submissions juxst to debug submission isn't even a model 😭

strange canyon Oct 28, 2024, 10:27 AM

#

sonic gust hey, I've got past the memory issue when loading the data, however this exact no...

Train on less partition

strange canyon Oct 28, 2024, 10:28 AM

#

strange canyon not able to submit.. tried so many times..

still haven't resolved the issue of inference error... if anyone faced the same. please suggest how to solve

strange canyon Oct 28, 2024, 11:54 AM

#

stuck zealot Oct 28, 2024, 2:43 PM

#

strange canyon still haven't resolved the issue of inference error... if anyone faced the same....

I am facing the same issue here

#

Do you think it could be an error with the API?

#

or the environment where the notebook is run

#

I've used 3 of my 5 submissions trying to debug this error..

strange canyon Oct 28, 2024, 3:04 PM

#

stuck zealot I am facing the same issue here

i don't think it's an error with api. cause my submissions didn't give any error. now have made some changes and trying to submit but i can't.

stuck zealot Oct 28, 2024, 3:27 PM

#

stuck zealot or the environment where the notebook is run

Then very likely to do with the environment

#

I've tried to submit the notebook versions that were submited and scored last week without any issue, and the notebook inference error keeps popping up.... 🥹

strange canyon Oct 28, 2024, 4:54 PM

#

stuck zealot Then very likely to do with the environment

i tried the same thing... i tried to submit notebook which where accepted last week. and that got accepted again. but new one with complex code and higher run time is not being accepted. showing inference error

bitter tapir Oct 28, 2024, 5:09 PM

#

Hello, I was hoping someone can explain to me the exact way in which the API serves up the test data. We get batches which correspond to specific time_id's, but are consecutive batches also consecutive time_id's? That is, would it be similar to looping over date_id and time_id in the training dataset?

bitter tapir Oct 28, 2024, 5:18 PM

#

rich sorrel this is not correct. the inference server gives you the t-1 responder values. yo...

I saw this post up above which seems to imply something of that nature but I'm not entirely sure

bitter tapir Oct 28, 2024, 6:25 PM

#

I put this inside the "predict" function

    global date_id
    global time_id
    global switched_date
    global dates_without_time
    
    if date_id is None:
        date_id = test.select("date_id").max().item()
        time_id = test.select("time_id").max().item()
    
    if date_id != test.select("date_id").max().item():
        switched_date += 1
        date_id = test.select("date_id").max().item()
        time_id = test.select("time_id").max().item()
    else:
        assert test.select("time_id").max().item() == time_id + 1
        switched_date = 0
    if switched_date > 1:
        dates_without_time += 1
    if dates_without_time > 10:
        raise ValueError("Too many dates without time.")

The submission says that the notebook threw an exception. So I guess that means that, even if two batches have the same date_id, two consecutive batches do not generally have the consecutive time_id's. Do correct me if I'm wrong, the notebook may have thrown an exception for some other reason

stuck zealot Oct 28, 2024, 6:43 PM

#

I am guessing that there is something wrong with the scoring environment, as all the submissions are failing afaik

#

@bitter tapir you can read it above

bitter tapir Oct 28, 2024, 6:51 PM

#

I also saw that which is why I'm not confident about what caused the error - so if somebody has an insight into the time series that is served up by the API, please do share

gritty summit Oct 28, 2024, 10:37 PM

#

I can’t try it yet but I think I know why it’s breaking

#

Ran out of submissions need to wait an hourish

#

Bc it does work on the example notebook with no changes

gritty summit Oct 28, 2024, 10:54 PM

#

bitter tapir I put this inside the "predict" function ``` global date_id global time_...

you can print stuff in notebook and look at logs even if it fails

bitter tapir Oct 28, 2024, 10:54 PM

#

Oh? Good to know, I'm new to this

gritty summit Oct 28, 2024, 10:55 PM

#

yeah trying to figure out just from if it throws error sounds fucked, I thought it was like that at first too lol

gritty summit Oct 29, 2024, 12:17 AM

#

Nvmd idk it didnt work, shit just seems fukt i cant get anything that isnt the literal copy of the example to work

strange canyon Oct 29, 2024, 11:56 AM

#

stuck zealot Then very likely to do with the environment

Bro this inference error coming because there is 15 min time limit for inference to start. our code is taking longer than 15 min to execute..

#

model training should happen in less than 15 min and inference should start.

gritty summit Oct 29, 2024, 12:28 PM

#

strange canyon Bro this inference error coming because there is 15 min time limit for inference...

Read thing it says can use first predict call

stuck zealot Oct 29, 2024, 4:41 PM

#

I think I've managed to overcome the issue

#

You have 90 seconds to finish the first predict call, if not, the notebook will launch an inference error. So all the time consuming operations (e.g.: loading the model/s) need to happen before the first call

#

That solved the issue for me 🥹

thin verge Oct 30, 2024, 6:52 AM

#

Hello. On the "Overview" page, under "Code Requirements -> Training Phase", there is a point that states: "Your notebook must use time-series module to make predictions". Could somebody clarify what this means?

tame urchin Oct 30, 2024, 8:30 AM

#

I have a question .what is the relation between row_id ,date_id ,time_id and symbol_id.I see symbol_id from 0 to 38 ,and than time_id plus 1,and than symbol_id from 0 to38 again.

tame urchin Oct 30, 2024, 11:09 AM

#

symbol_id is stocks.is row_id stocks?

stuck zealot Oct 30, 2024, 3:42 PM

#

thin verge Hello. On the "Overview" page, under "Code Requirements -> Training Phase", ther...

I was wondering about this too. Any help would be appreciated!

coarse herald Oct 31, 2024, 12:17 AM

#

Anyone looking to collab?

harsh relic Oct 31, 2024, 11:49 PM

#

life would be easy if responders are updated intra-day

grim oriole Nov 1, 2024, 2:55 AM

#

harsh relic life would be easy if responders are updated intra-day

very true

#

i had a model that gave an r squared of 0.8 using the previous responder

#

but then i realized they only update every day

#

😢

hallow heart Nov 1, 2024, 4:55 PM

#

Hello, i'm new here, i got one team

keen crest Nov 1, 2024, 5:23 PM

#

Hi, what is the team size allowed for this competition?

harsh relic Nov 1, 2024, 6:17 PM

#

Hi there anyone knows which one is better for gradient boosting?

solid ibex Nov 2, 2024, 5:31 PM

#

Hi all, I am looking for a team. I am based in New York City

strange canyon Nov 3, 2024, 10:09 AM

#

geting submission scoring error after submitting. but output is same as my earlier notebooks outputs which got accepted.

fiery sage Nov 3, 2024, 1:03 PM

#

strange canyon geting submission scoring error after submitting. but output is same as my earli...

this happened to me as well a couple times yesterday

strange canyon Nov 3, 2024, 4:11 PM

#

bitter tapir Nov 3, 2024, 6:55 PM

#

harsh relic life would be easy if responders are updated intra-day

Nice to know, been trying to figure this out forever. Can't run the submission notebooks without receiving errors. So there is no "responder_6" field provided, except for previous days?

grim oriole Nov 3, 2024, 8:07 PM

#

i believe the test.parquet file is an example of what the test dataframes that are given to the predict method look like

#

test.parquet has the columns, "row_id", "date_id", "time_id", "symbol_id", "weight", "is_scored" and features 00-78

exotic smelt Nov 4, 2024, 3:14 PM

#

hi

#

Hi there
I need help in kaggle jane street as there are so many partition_id which one i should chose to predict responder 6.Thanks

exotic smelt Nov 5, 2024, 8:16 AM

#

Please help

bronze storm Nov 6, 2024, 8:53 AM

#

Does anyone know that in the prediction period, we can have the responder for last time_id?

grim oriole Nov 6, 2024, 6:55 PM

#

the responders are given through the lag dataframe in the predict function at the first time_id and it contains all of the responders for all of the time_ids of the previous day

bronze storm Nov 7, 2024, 2:22 AM

#

grim oriole the responders are given through the lag dataframe in the predict function at th...

Thanks for sharing! Yes this is very helpful, but also wondering whether we can get one previous time_id data, like for day N+1 prediction, when time_id = 1, can we get the info of date_id=N+1, time_id=0, this may be helpful😊

bronze storm Nov 7, 2024, 2:24 AM

#

exotic smelt Hi there I need help in kaggle jane street as there are so many partition_id whi...

Every partition_id are corresponding to part of historical data and I think they (may) are all useful for future prediction

abstract vale Nov 7, 2024, 3:12 AM

#

abstract vale My notebook is failing scoring, but nothing appears in the logs. My guess is tha...

Finally have some time to look at the challenge again - basic question here about catching exception. Any ideas?

errant sierra Nov 7, 2024, 6:26 AM

#

rugged summit

In that notebook see right section is it showing Jane street competition if not then that's the issue recreate notebook and copy paste from this to there and rerun the notebook or u might have disabled internet

errant sierra Nov 7, 2024, 6:27 AM

#

strange canyon Bro this inference error coming because there is 15 min time limit for inference...

Create separate notebook for submission only, upload ur models and scaler there and try again, if it is code competition then you need to optimize it

#

I'm getting negative score on leaderboard

glossy vigil Nov 7, 2024, 3:18 PM

#

@errant sierra That means your forecast didn't beat the baseline, 0 forecast.

errant sierra Nov 7, 2024, 9:11 PM

#

glossy vigil <@1077622322258772070> That means your forecast didn't beat the baseline, 0 fore...

Ohh I see

cedar gyro Nov 7, 2024, 10:41 PM

#

will there be new unseen symbol id shown in the testing stages？

grim oriole Nov 8, 2024, 12:34 AM

#

cedar gyro will there be new unseen symbol id shown in the testing stages？

lofty ferry Nov 10, 2024, 3:52 AM

#

Is it possible to submit locally rather than using kaggle notebooks? I find them rather cumbersome to use.

worthy girder Nov 10, 2024, 8:36 AM

#

lofty ferry Is it possible to submit locally rather than using kaggle notebooks? I find them...

Check out the Kaggle API, you can work locally and push your changes to Kaggle.

torpid wolf Nov 10, 2024, 12:42 PM

#

hello, I cannot submit either choosing notebook and file upload. I already finished the verification though

blissful axle Nov 10, 2024, 8:34 PM

#

Hello, Could we re-train our model during Evaluation time???

lofty ferry Nov 11, 2024, 6:30 PM

#

worthy girder Check out the Kaggle API, you can work locally and push your changes to Kaggle.

Ah great! Thanks.

feral hare Nov 11, 2024, 7:55 PM

#

Did anyone fix the Notebook Inference server error? I load a pretrained model outside of predict that seems to run quickly (18 µs with the local test.parquet) but getting the server error when I try to submit

deft moat Nov 12, 2024, 3:53 PM

#

can someone help me understand the data a bit more please by replying to my message. Ive loaded all the data into a pandas data frame and im trying to plot responder_6. Now i cant do this nicely at the minute because of the duplicate date_id. I tried to fix it to one symbol_id to fix the issue but theres still duplicate date_id. Can someone just help me with my understanding of the data please. (this is my first data science project and competition). My thought was that the symbol_id represented a stock ticker lets say and responder_6 was the price lets say, and i thought by fixing the symbol there would be no duplicate date.

grim oriole Nov 12, 2024, 4:34 PM

#

deft moat can someone help me understand the data a bit more please by replying to my mess...

i think you are getting multiple rows with the same date_id because there is also a time_id variable that represents around 900 time steps each day for each symbol

tiny gate Nov 13, 2024, 12:36 AM

#

hey how do you know if you've submitted succesfully, I run the code with a saved model on the inference server and get no errors, but can't see any submissions on my end. Thanks so much!

glossy vigil Nov 13, 2024, 9:27 PM

#

blissful axle Hello, Could we re-train our model during Evaluation time???

I believe we can peepoNewYear , haven't explored it. The lag data frame shall have target as well.

bitter tapir Nov 13, 2024, 9:50 PM

#

glossy vigil I believe we can <a:peepoNewYear:925028200797597736> , haven't explored it. The ...

Only for the previous date_id. So you don't have a target variable to train on, during inference

glossy vigil Nov 14, 2024, 12:43 AM

#

bitter tapir Only for the previous date_id. So you don't have a target variable to train on, ...

peepoBlankey Can't we just save previous date_id dataframes and calibrate?

bitter tapir Nov 14, 2024, 11:10 AM

#

No, there are no intraday responders provided

toxic spindle Nov 14, 2024, 6:52 PM

#

Hi, everyone. I'm looking for a team member and I'm kaggle novice. I have deep understanding RL and Causal AI. Before that, I experienced multi agent systems and high frequency trading and market neutral strategy. Yes, also, I'm going to collaborate in jane streeet competition. Recently, I can work for more than 5 hours in kaggle competitions.
If you are more than kaggle expert, that's better. But that's not necessity.
Also, I'm going to try AI research in the near future and hope long collabrations.
If you are passionate with AI, please hit me.

shy cliff Nov 15, 2024, 4:30 PM

#

deft moat can someone help me understand the data a bit more please by replying to my mess...

Hey Owen, as advice, I suggest you to plot the responder_6 for only one symbol and only one file (for example train/partition_id=0/part-0). Just yo give you a taste, here what the distribution of responder_6 looks like 🙂

deft moat Nov 16, 2024, 2:04 PM

#

Hey im wondering if anyone can help me in making my first submission. I think im setting up the inference server correctly, and i believe my notebook is in the competition enviroment so im not sure what im not doing.

dusk rampart Nov 17, 2024, 4:29 PM

#

Hey all. I have a question regarding the datasets. Not quite sure that does the feature.csv show. Can anyone explain me please

queen trench Nov 17, 2024, 6:35 PM

#

Does anyone have issues with installing packages through the Kaggle Package Manger in the submission? My submission aborts with a Server Inference Error after <60 seconds
Looks like someone had a similar issue: https://www.kaggle.com/competitions/jane-street-real-time-market-data-forecasting/discussion/543875#3039358

Submission looks fine when I run pip install and also when I use the import statement, but fails when I use the library

Jane Street Real-Time Market Data Forecasting

Predict financial market responders using real-world data.

hardy canyon Nov 18, 2024, 8:06 PM

#

Hi all, what is lags.parquet file and how should I use it in my model prediciton? Should I do something with this data or just pass it as it is in this method?:

def predict(test: pl.DataFrame, lags: pl.DataFrame | None) -> pl.DataFrame | pd.DataFrame:
    global lags_
    if lags is not None:
        lags_ = lags

    predictions = test.select(
        'row_id',
        pl.lit(0.0).alias('responder_6'),
    )

    if isinstance(predictions, pl.DataFrame):
        assert predictions.columns == ['row_id', 'responder_6']
    elif isinstance(predictions, pd.DataFrame):
        assert (predictions.columns == ['row_id', 'responder_6']).all()
    else:
        raise TypeError('The predict function must return a DataFrame')
    # Confirm has as many rows as the test data.
    assert len(predictions) == len(test)

    return predictions

feral hare Nov 19, 2024, 1:20 PM

#

is there an upper limit to the total number of submission we can make?

grim oriole Nov 19, 2024, 2:23 PM

#

you can make 5 submissions per day

full sail Nov 20, 2024, 12:54 PM

#

Hey everyone,
I am pretty new to AI/ML and wanna participate in this competition so can anyone guide me on:

What should I learn(prerequisites)?
Any beginner-friendly resources to get started?
Thanks a lot!

bitter tapir Nov 26, 2024, 4:17 PM

#

So, just reading the description of the data again:

lags.parquet - Values of responder_{0...8} lagged by one date_id. The evaluation API serves the entirety of the lagged responders for a date_id on that date_id's first time_id. In other words, all of the previous date's responders will be served at the first time step of the succeeding date.

How does this even makes sense? We know that the responders' values depend not just on the date_id but also on the time_id. So which of the previous day's time_id is served up the following day in the lagged responders? It doesn't yield all of them, as evidenced by the lags.parquet file. Are we just to assume that it's last value of the previous day?

grim oriole Nov 26, 2024, 4:54 PM

#

https://www.kaggle.com/competitions/jane-street-real-time-market-data-forecasting/discussion/543567

Jane Street Real-Time Market Data Forecasting

Predict financial market responders using real-world data.

#

i think this might be helpful

bitter tapir Nov 26, 2024, 6:01 PM

#

grim oriole https://www.kaggle.com/competitions/jane-street-real-time-market-data-forecastin...

Maybe they should have actually updated the article to reflect this information, that's a massive difference from what's written

#

This means you can, in theory, do semi-online learning (if that were computationally feasible)

#

Thanks for the link, that completely changes my outlook on the problem

bitter tapir Nov 26, 2024, 6:07 PM

#

bitter tapir Only for the previous date_id. So you don't have a target variable to train on, ...

Also need to correct the statement I made here, then

west glade Nov 28, 2024, 3:47 PM

#

How is everyone doing with the competition?

hollow cedar Nov 29, 2024, 2:04 PM

#

Just started and very confused harold

waxen inlet Dec 2, 2024, 7:03 AM

#

Just started, is anyone running the code on Kaggle? My kernel dies on just reading the parquet files

near shuttle Dec 3, 2024, 6:35 AM

#

what are responders?

#

What are their relation with features?

near shuttle Dec 3, 2024, 7:14 AM

#

what does responder.csv mean?

tired dagger Dec 3, 2024, 6:52 PM

#

can I close my computer while submitting? If I submitted the version but it is still computing the score, will it break if close my computer?

strange canyon Dec 3, 2024, 7:07 PM

#

tired dagger can I close my computer while submitting? If I submitted the version but it is s...

yes, you can shut down

tired dagger Dec 3, 2024, 7:12 PM

#

thanks

glossy vigil Dec 3, 2024, 9:27 PM

#

From the post we have:

Lags only covers responders, not features.
For each time_stamp, we have responder 1 day before, same time stamp.
Am I understanding the post correctly? This is very confusing.

west glade Dec 3, 2024, 9:50 PM

#

Yep, given all features plus 1 day lagged responders to work with

lavish tusk Dec 5, 2024, 4:13 PM

#

just started, have no idea how to do this kind of competition, can anyone give some pointers on how to move forward

#

?

vocal void Dec 6, 2024, 6:33 AM

#

#

I am getting this error, this is mine first competition, can some1help, here are few more SS which I think would be relevant

#

#

heady condor Dec 7, 2024, 9:55 PM

#

west glade Yep, given all features plus 1 day lagged responders to work with

I joined this discord to confirm this, thanks! Seemed like the case to me, but I was expecting a longer sequence time series challenge, but this is an interesting twist.

So in reality what we have for an observation is features_0..n and t-1 responders for a given date_id and time_id.

oak star Dec 8, 2024, 3:22 AM

#

Has anyone had success just performing daily predictions? Based on test.parquet, they've only got it at date_id =0 and time_id =0; is it reasonable just to give a daily prediction and call that enough?

fossil wharf Dec 8, 2024, 6:01 PM

#

the explanation of the problem itself, and the data used is horrible for this project.

swift harbor Dec 8, 2024, 6:07 PM

#

Hi guys, does anyone know why doing a test submission works, but the actual submission raises an unhandled error at runtime?Not sure what the exception is. Same thing if I manually check lags for None.

def predict(test: pl.DataFrame, lags: pl.DataFrame | None) -> pl.DataFrame | pd.DataFrame:
    predictions = test.select(
        'row_id',
        pl.lit(lags['responder_6_lag_1']).alias('responder_6'))
    return predictions

heady condor Dec 8, 2024, 6:19 PM

#

swift harbor Hi guys, does anyone know why doing a test submission works, but the actual subm...

I was getting that too... I don't know why but I found a certain config that fixed it.

    global lags_
    if lags is not None:
        lags_ = lags

    predictions = test.select(
        'row_id',
        pl.lit(0.0).alias('responder_6').cast(pl.Float64),
    )    
        
    if not lags is None:
        lags = lags.group_by(["date_id", "symbol_id"], maintain_order=True).last() # pick up last record of previous date
        test = test.join(lags, on=["date_id", "symbol_id"], how="left")
    else:
        test = test.with_columns(
            (pl.lit(0.0).alias(f'responder_{idx}_lag_1') for idx in range(9))
        )

    # Replace this section with your own predictions
    predictions = test.select(
        'row_id',
        pl.col('responder_6_lag_1').alias('responder_6').cast(pl.Float64),
    )

I think the cast to Float64 is important, but it's hard to debug with only 5 submissions a day. Only difference here is I merged the lags into the test dataframe.

west glade Dec 9, 2024, 12:45 AM

#

Anyone have any luck with Pandas?

#

My code runs fine but at the end the submissions section shows that it "Threw Exception".

west glade Dec 9, 2024, 12:46 AM

#

fossil wharf the explanation of the problem itself, and the data used is horrible for this pr...

Completely agree.

rocky totem Dec 9, 2024, 5:30 AM

#

Is the symbol_id representing the ticker symbol (e.g. voo, goog) or something else?

static rampart Dec 9, 2024, 8:24 AM

#

I have a question, Do I have to save the submission.parquet file myself or server code will save itself for me?

swift harbor Dec 9, 2024, 11:23 AM

#

heady condor I was getting that too... I don't know why but I found a certain config that fix...

Running this code actually gives me an error, although the error is different, now it's actually specifying data format as throwing some error

west glade Dec 9, 2024, 12:16 PM

#

rocky totem Is the symbol_id representing the ticker symbol (e.g. voo, goog) or something el...

Correct

west glade Dec 9, 2024, 12:17 PM

#

static rampart I have a question, Do I have to save the submission.parquet file myself or serve...

I think it's done on it's own but someone who got their code to submit properly would need to answer.

swift harbor Dec 9, 2024, 12:18 PM

#

west glade Anyone have any luck with Pandas?

I had memory issues with pandas so just using polars

west glade Dec 9, 2024, 12:19 PM

#

Fudge. Thank you Danila.

swift harbor Dec 9, 2024, 12:20 PM

#

heady condor I was getting that too... I don't know why but I found a certain config that fix...

is this code block working on its own as a pred?

west glade Dec 9, 2024, 12:22 PM

#

As general question, if I read in the same file with pandas vs polars, is Polars really using less memory to hold the same amount of data?

heady condor Dec 9, 2024, 2:08 PM

#

swift harbor is this code block working on its own as a pred?

It was. let me grab the whole thing.

lags_ : pl.DataFrame | None = None


# Replace this function with your inference code.
# You can return either a Pandas or Polars dataframe, though Polars is recommended.
# Each batch of predictions (except the very first) must be returned within 1 minute of the batch features being provided.
def predict(test: pl.DataFrame, lags: pl.DataFrame | None) -> pl.DataFrame | pd.DataFrame:
    """Make a prediction."""
    # All the responders from the previous day are passed in at time_id == 0. We save them in a global variable for access at every time_id.
    # Use them as extra features, if you like.
    global lags_
    if lags is not None:
        lags_ = lags

    predictions = test.select(
        'row_id',
        pl.lit(0.0).alias('responder_6').cast(pl.Float64),
    )    
        
    if not lags is None:
        lags = lags.group_by(["date_id", "symbol_id"], maintain_order=True).last() # pick up last record of previous date
        test = test.join(lags, on=["date_id", "symbol_id"], how="left")
    else:
        test = test.with_columns(
            (pl.lit(0.0).alias(f'responder_{idx}_lag_1') for idx in range(9))
        )

    # Replace this section with your own predictions
    predictions = test.select(
        'row_id',
        pl.col('responder_6_lag_1').alias('responder_6').cast(pl.Float64),
    )

    if isinstance(predictions, pl.DataFrame):
        assert predictions.columns == ['row_id', 'responder_6']
    elif isinstance(predictions, pd.DataFrame):
        assert (predictions.columns == ['row_id', 'responder_6']).all()
    else:
        raise TypeError('The predict function must return a DataFrame')
    # Confirm has as many rows as the test data.
    assert len(predictions) == len(test)

    return predictions

swift harbor Dec 9, 2024, 2:39 PM

#

heady condor It was. let me grab the whole thing. ```python lags_ : pl.DataFrame | None = No...

Ok I'm running this identical cell with the inference server and it's giving me a slightly frustrating Your notebook generated a submission file with incorrect format

inference_server = kaggle_evaluation.jane_street_inference_server.JSInferenceServer(predict)
if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    print("running local gateway - submitting predictions")
    inference_server.run_local_gateway(
        (
            '/kaggle/input/jane-street-real-time-market-data-forecasting/test.parquet',
            '/kaggle/input/jane-street-real-time-market-data-forecasting/lags.parquet',
        )
    )

I'll have to try again tomo, surprised mine is being thrown errors. thx for sharing

rocky totem Dec 9, 2024, 4:24 PM

#

west glade Correct

Thank you!

rocky totem Dec 9, 2024, 5:30 PM

#

Also, does anyone know if the weight column is just used for scoring the model for evaluation and unrelated to the rest of the market data?

west glade Dec 9, 2024, 5:30 PM

#

Not sure but thats something you can test in your model by removing it to see if the score goes up.

rocky totem Dec 9, 2024, 5:31 PM

#

ok thank you!

weary wolf Dec 10, 2024, 12:28 AM

#

Are responder tags [tag_0, tag_1..., tag_4] the same as the feature tags [tag_0, tag_1..., tag_4]?

west glade Dec 10, 2024, 6:41 PM

#

Anyone solve this thrown exception problem during submission? I can't understand the problem. My Predictions, Sample submission, & what gets exported for submission are identical.

#

All being read in using pandas.
Sample CSV submission:

#

My Predictions:

#

Outputted submission parquet file after attempting to submit:

#

How are others getting this to submit correctly?

weary wolf Dec 10, 2024, 11:23 PM

#

Hi, The instructions (Overview > Code Requirements > Training Phase) say, "Your notebook must use THE time-series module to make predictions". What module is this?

west glade Dec 11, 2024, 12:11 AM

#

All I did was add in my predictions to the given sample code. What you see above is an info dump of the data frame on my personal machine so I could compare.

#

The third image is from the file that gets spit out after using the server code in a kaggle notebook

#

The codes runs successfully

#

The output for is the issue somehow

#

Does everyone's submission parquet have these attributes?

jolly parcel Dec 11, 2024, 2:27 PM

#

vocal void

In your screenshot, have you looked at the very bottom of the message? I have seen that to give some clue about the error in my case.

wooden gazelle Dec 12, 2024, 1:02 AM

#

Hi, does anyone know why the scoring process is so slow? I already put the model loading code before the prediction function. But just a simple lgbm would task over an hour to finish scoring.

west glade Dec 12, 2024, 10:09 AM

#

wooden gazelle Hi, does anyone know why the scoring process is so slow? I already put the mode...

Is it always slow? Or just at certain times?

wooden gazelle Dec 12, 2024, 3:07 PM

#

Always! And it's a bit difficult to install some other packages to make the inference faster in the kaggle notebook, so some new ideas engaging feature engineerings could be difficult to evaluate.

west glade Dec 12, 2024, 4:52 PM

#

wooden gazelle Always! And it's a bit difficult to install some other packages to make the infe...

Before doing massive feature engineering, did you try to do a simple configuration first to test speed? As mentioned above, the simplest would be responder_6 = responder_6_lag_1.

steel lantern Dec 12, 2024, 11:04 PM

#

west glade Before doing massive feature engineering, did you try to do a simple configurati...

lags_: pl.DataFrame | None = None

def predict(test: pl.DataFrame, lags: pl.DataFrame | None) -> pl.DataFrame | pd.DataFrame:
    """Make a prediction."""
    global lags_
    if lags is not None:
        # Rename `responder_6_lag_1` to `responder_6` if it exists
        if "responder_6_lag_1" in lags.columns:
            lags = lags.rename({"responder_6_lag_1": "responder_6"})
        
        # Add `responder_6_2` as the square of `responder_6` if it exists
        if "responder_6" in lags.columns:
            lags = lags.with_columns(
                (pl.col("responder_6") ** 2).alias("responder_6_2")
            )
        
        # Save the processed lags globally for future use
        lags_ = lags

    # Initialize predictions with default values
    predictions = test.select('row_id', pl.lit(0.0).alias('responder_6'))

    if lags is not None and "responder_6" in lags.columns:
        # Ensure alignment between test and lags and update the responder_6 values
        pred = lags["responder_6"].to_numpy()
        predictions = predictions.with_columns(pl.Series("responder_6", pred.ravel()))

    # Return the predictions as a Polars DataFrame
    return predictions

When I try this I'm able to generate a submission file on my own, but it throws an exception when I try to submit on the competition page. Any ideas? I replace the predict fucntion in the 'provided Jane Street RMF Demo Submission' file

#

This should just return the most recent lag of responder 6

steel lantern Dec 12, 2024, 11:27 PM

#

Theres a few similar questions in the discussion section but no answers, was anyone able to debug this?

west glade Dec 13, 2024, 11:01 AM

#

steel lantern Theres a few similar questions in the discussion section but no answers, was any...

I'm in the same boat as you. Been spending way too much time on this exact issue rather than improving a prediction model.

swift harbor Dec 13, 2024, 2:45 PM

#

steel lantern ``` lags_: pl.DataFrame | None = None def predict(test: pl.DataFrame, lags: pl....

had this exact issue, someone was able to fix it by casting preds to the right data type, but that didn't work for me

steel lantern Dec 13, 2024, 9:54 PM

#

Hey @west glade @swift harbor.
I was able to solve the Thrown Exception error with the following code:

lags_: pl.DataFrame | None = None
def predict(test: pl.DataFrame, lags: pl.DataFrame | None) -> pl.DataFrame:
if lags is not None:
lags = lags.group_by(["date_id", "symbol_id"], maintain_order=True).last()
test = test.join(lags, on=["date_id", "symbol_id", "time_id"], how="left")
else:
test = test.with_columns(
pl.lit(0.0).alias('responder_6_lag_1')
)

# Use the lagged responder_6 value as the prediction
# Assuming 'responder_6_lag_1' is the column that represents the most recent lag
predictions = test.select(
    'row_id',
    pl.col('responder_6_lag_1').alias('responder_6')
)
# test = test.write_parquet("test_out.parquet")
return predictions

However, I now get a scoring submission error. Let me know if you guys are able to make any progress

west glade Dec 14, 2024, 12:58 PM

#

steel lantern Hey <@464582889531637764> <@171376540297199617>. I was able to solve the Thrown ...

I'm testing your code out now. I'll let you know what happens on my end.

nocturne shore Dec 14, 2024, 8:12 PM

#

I believe the issue is that the lags are served on the first time step of each date. This means you need to save them outside of the predict function - presumably this is what the global lags_ object is for.

nocturne shore Dec 14, 2024, 10:02 PM

#

I'm not able to submit anymore today, but this is my code, that I believe should work: https://www.kaggle.com/code/joshlevent/js-null-hypothesis

west glade Dec 14, 2024, 11:24 PM

#

nocturne shore I'm not able to submit anymore today, but this is my code, that I believe should...

It worked!

eternal terrace Dec 15, 2024, 1:27 AM

#

Friends, I'm getting this stubborn error after a few hours or so from submission: "Notebook Inference Server Error
Your submission notebook's inference server was disconnected unexpectedly, or a request timed out. See more debugging tips". Is that the problem due to 60-sec timeout, or could that be memory issues, etc.?

west glade Dec 15, 2024, 2:28 AM

#

eternal terrace Friends, I'm getting this stubborn error after a few hours or so from submission...

I'm sorry I haven't gotten that far yet. My current notebook is scoring with my own information for the first real time.

west glade Dec 15, 2024, 2:33 PM

#

@swift harbor @steel lantern Josh's code worked all the way through. I was able modify it to fit how my own code predictions and its functional. I thought I had done the same as this code right at the start but apparently I missed something and overcomplicated it.

swift harbor Dec 15, 2024, 3:18 PM

#

west glade <@171376540297199617> <@350664307727663105> Josh's code worked all the way throu...

I got it working also, it's great. Although why is the R2 score <0? Otherwise works.

west glade Dec 15, 2024, 3:19 PM

#

I found that interesting too. As our model gets better I suppose it will move into the positive.

tawny ledge Dec 15, 2024, 3:29 PM

#

how long does it takes to score ? appx?

west glade Dec 15, 2024, 3:30 PM

#

For me I think it took 2 hours.

tawny ledge Dec 15, 2024, 3:31 PM

#

ohk!

west glade Dec 15, 2024, 3:31 PM

#

I didn't do anything too fancy. I only made my prediction on what's existing, then passed that through to return it.

tawny ledge Dec 15, 2024, 3:32 PM

#

yeah ig it'll depend on model

west glade Dec 15, 2024, 3:32 PM

#

For an actual model that is making a prediction, I think 2 hours is a good baseline.

rocky totem Dec 16, 2024, 12:13 AM

#

Does anyone know what the responder values represent in the dataset?

west glade Dec 16, 2024, 1:30 AM

#

I understand them to be other securities

#

If we were talking about crypto currency:
Bitcoin
Etherium
Dogecoin
etc...

Those would all be responders.

fossil wharf Dec 19, 2024, 4:35 PM

#

i stopped tring to figure3 this one out. ll the data is based of assumptions they are working by. None of which is necessary. Just give me OHLCV vals

west glade Dec 19, 2024, 5:35 PM

#

Agreed. So much time spent trying to figure things out but I'm still chugging along to at least get an ok score. OHLCV for the win.

keen kite Dec 21, 2024, 4:58 AM

#

Did anyone tried using sequence models? I saw Motono223's preprocessing only creates one lag of timestep

west glade Dec 21, 2024, 12:22 PM

#

keen kite Did anyone tried using sequence models? I saw Motono223's preprocessing only cre...

I was planning to but I don't have enough time so I'm keeping it a bit more simple although very tuned.

ocean shadow Dec 22, 2024, 2:41 PM

#

How long does it take to submit your code?

glossy vigil Dec 22, 2024, 9:45 PM

#

fossil wharf i stopped tring to figure3 this one out. ll the data is based of assumptions the...

OHLCV is really something in mid frequency. Since what is the open price? Many people refers to the first tick (trade) of an interval. However you won't be able to obtain it unlike the mid-frequency strategies. Also for illiquid names, in the mid of the day, it is very likely to be very quiet and no trade happened.

heady condor Dec 22, 2024, 11:47 PM

#

west glade I was planning to but I don't have enough time so I'm keeping it a bit more simp...

Glad you got it working, do you understand the lags they are providing? It sounded like we get only one lag from the previous day. And I assume within the submission test dataset we will be able to calculate the lags within a day.

glossy vigil Dec 22, 2024, 11:57 PM

#

heady condor Glad you got it working, do you understand the lags they are providing? It sound...

Seems like they will provide the responders of the same time_id yesterday in the lag field.

keen kite Dec 23, 2024, 3:12 AM

#

heady condor Glad you got it working, do you understand the lags they are providing? It sound...

What I saw from the discussion is that at the end they will provide the whole previous day's data in lag, not just one time_id

eternal lotus Dec 23, 2024, 6:12 AM

#

Hi guys, If anybody can help me, I wanted to know how should I start this jane street modelling challenge. I have only experience with basic models for regression and classifications. I have some knowledge about complex models like XGBooost and ensemble methods, I wanted to learn through this competition. I tried loading the dataset my VScode crashed (I have m2 pro macbook) is it difficult to this modelling on local machine? I really appreciate any help Thank You!

heady condor Dec 23, 2024, 6:42 PM

#

eternal lotus Hi guys, If anybody can help me, I wanted to know how should I start this jane ...

Yeah, the dataset is large for a small to medium laptop/desktop it's going to be a tough first one, you'll probably spend most of your time finding tricks to process the data. I used polars which has a streaming feature so I could lazy define things and then I setup some creative data generators that read only a portion of the data at any given time.

eternal lotus Dec 23, 2024, 8:11 PM

#

heady condor Yeah, the dataset is large for a small to medium laptop/desktop it's going to be...

Thank you for the response!

west glade Dec 23, 2024, 8:49 PM

#

heady condor Glad you got it working, do you understand the lags they are providing? It sound...

Like Shane had said, I assumed that they would always give us the lag so no need to calculate anything.

west glade Dec 23, 2024, 8:50 PM

#

keen kite What I saw from the discussion is that at the end they will provide the whole pr...

The entire previous day? 0_o
Did it say that in the competition information anywhere?

west glade Dec 23, 2024, 8:51 PM

#

keen kite What I saw from the discussion is that at the end they will provide the whole pr...

So if our code is running at the middle of the day, We would get the entire previous days values plus the last time_id?
Too bad this wasn't more clear.

keen kite Dec 23, 2024, 8:52 PM

#

https://www.kaggle.com/competitions/jane-street-real-time-market-data-forecasting/discussion/541106

Jane Street Real-Time Market Data Forecasting

Predict financial market responders using real-world data.

#

check out this thread

west glade Dec 23, 2024, 9:00 PM

#

Still so many question 😂
Crazy that it took for someone to test this out to try and verify that instead of just getting a clear instruction within the competition notes.
If this is true then what they gave us as an example is crap.
A well, Not enough time to work my code to verify this or use it in a meaningful way if its true.
@keen kite Thanks for pointing this out.

#

Dang just thinking about it now. That would mean that the previous time_id lag won't be given at each time_id that is not zero. Fudge.

keen kite Dec 23, 2024, 9:03 PM

#

The instructions are just confusing

#

And for the test set, only one time_id lag is given

#

I think the goal is to use lag data (the previous day or all previous days) to predict the next day t=0 responders

steel parcel Dec 23, 2024, 9:06 PM

#

I have a little pet hypothesis. I think that responder_6 is either implied volatility or is somehow related to implied volatility.

#

if I'm right, make sure to credit me later. KEKW

west glade Dec 23, 2024, 9:07 PM

#

lol

steel parcel Dec 23, 2024, 9:07 PM

#

I've been working on a very interesting solution to this problem that for now I'm gonna keep secret. Once it's over I'd be glad to share it

#

would love to discuss how others approached this problem as well

west glade Dec 23, 2024, 9:08 PM

#

How long is it taking for everyone's code to run?
I'm currently sitting at 6 hours of scoring and getting worried 0_0

steel parcel Dec 23, 2024, 9:08 PM

#

hmm if you run your own scoring system, does it take 6 hrs to run through it?

#

on your own pc, that is

west glade Dec 23, 2024, 9:09 PM

#

@steel parcel It would be great to discuss after the competition!

#

I had a little trouble getting the kaggle evaluation package to work for me so I just focused on my model on my PC and figured I'll let the kaggle notebook do the scoring.

steel parcel Dec 23, 2024, 9:11 PM

#

west glade I had a little trouble getting the kaggle evaluation package to work for me so I...

do you load the entire dataset at once?

west glade Dec 23, 2024, 9:11 PM

#

I did at the very start yes.

steel parcel Dec 23, 2024, 9:11 PM

#

ya that might be it. what if you fractionalized the dataset and ran through it procedurally?

west glade Dec 23, 2024, 9:11 PM

#

But then I soon realized that I needed to work with it in pieces probably like everyone else.

#

Do you know how the scoring portion actually works?

steel parcel Dec 23, 2024, 9:12 PM

#

they tell us how it's calculated

west glade Dec 23, 2024, 9:12 PM

#

Are they scoring on the data that they've given us to train on?

steel parcel Dec 23, 2024, 9:13 PM

#

#

but if you're talking about the data they feed in...

west glade Dec 23, 2024, 9:14 PM

#

Yeah whatever they are using to get our Public score. Are they scoring our model on the data that they've already given us that we are training off of.

steel parcel Dec 23, 2024, 9:14 PM

#


At the start of the forecasting phase, the unscored public test set will be extended up to the final day of the model training phase and the private set updated roughly every two weeks. Submissions will be rescored at the time of each update.

During the forecasting phase, the evaluation API will serve test data from the beginning of the public set to the end of the private set. You must make predictions at every timestep, but, in this phase, only predictions on the private set are scored. (You may predict 0.0 on the unscored segments, if you like.)```

#

from the data tab

west glade Dec 23, 2024, 9:16 PM

#

Hmmm...Seems maybe I should have spent more time getting the evaluation package to function.

steel parcel Dec 23, 2024, 9:16 PM

#

you still got plenty of time

#

get in there, soldier

stable onyx Dec 23, 2024, 9:16 PM

#

hey! does anyone know what the forecasting window is supposed to be? ik this is a simple question but I'm still confused

#

aka 1 time step into the future or n time steps

#

the test set they provide is 38 rows long, does this mean the window is just whatever length test set they give us?

west glade Dec 23, 2024, 9:18 PM

#

It's 1 time step. The rows correspond to different symbol_ids.

stable onyx Dec 23, 2024, 9:18 PM

#

ohh makes sense

#

awesome tysm

west glade Dec 23, 2024, 9:18 PM

#

Very welcome.

#

So in regards to this lags situation, lets say we are at time_id = 25. I won't have any access to any of the lags of any responders for the past 5 steps?

steel parcel Dec 23, 2024, 9:23 PM

#

west glade So in regards to this lags situation, lets say we are at time_id = 25. I won't ...

you can store the data in memory or in file data that can be read dynamically, right?

#

@west glade did you try pre-training your model before you upload it and loading it into memory to save time?

west glade Dec 23, 2024, 9:24 PM

#

I was thinking of that but....we won't ever get any of the actual values untill the next time_id = 0 which is the next day.

steel parcel Dec 23, 2024, 9:24 PM

#

just trying to think of ways to make your work easier

west glade Dec 23, 2024, 9:24 PM

#

I hope I'm wrong.

#

Thanks 🙂

steel parcel Dec 23, 2024, 9:25 PM

#

try storing useful relevant data into a readable format, unless they wipe the data clean on your system it should stay... right?

#

I think in my case I'm building several contingencies for situational chaos

west glade Dec 23, 2024, 9:26 PM

#

I completely agree with you, as long as they provide the last time_id's lag at every step rather then only giving us yesterdays lags only at the beginning of a new day.

#

I am trying to confirm that there is something for me to store aside from what I was given for yesterday.

steel parcel Dec 23, 2024, 9:27 PM

#

you might be able to extract current datetime & utilize that as a tool

#

no guarantee on that, though

#

I think that for the most part, they're shopping on kaggle for potential hires

west glade Dec 23, 2024, 9:27 PM

#

ahhhhh

steel parcel Dec 23, 2024, 9:27 PM

#

I don't think they care very much about the results,

#

obviously the results are useful & any code/algorithms/models they can obtain are probably worth a few pennies

#

& 50k is definitely just a few pennies to these guys

west glade Dec 23, 2024, 9:28 PM

#

agreed.

steel parcel Dec 23, 2024, 9:29 PM

#

I think that's why they aren't being more open about what can & can't be done

#

if you really cared strongly about the results, you'd be more specific with available options

west glade Dec 23, 2024, 9:30 PM

#

They are probably enjoying watching us stress out lol

steel parcel Dec 23, 2024, 9:30 PM

#

hey I find it fun

#

anonymized column data is actually kinda interesting, it got me thinking about how I can work around that

#

which I'd be really eager to discuss later

west glade Dec 23, 2024, 9:30 PM

#

Same!

steel parcel Dec 23, 2024, 9:31 PM

#

the math olympiad competition seems pretty challenging in a not-fun way

#

I read through it and it seemed more challenging than arc 2024

#

and not nearly as "nice"

#

I could be wrong though

west glade Dec 23, 2024, 9:36 PM

#

I wish I had time to check out the other comps. I found that the spine MRI image classification was pretty cool but couldn't do it at the time.

west glade Dec 24, 2024, 12:40 AM

#

steel parcel <@464582889531637764> did you try pre-training your model before you upload it a...

I just realized that I didn't answer this question. Sorry. Yes I did.

west glade Dec 24, 2024, 1:17 AM

#

And it failed at the 8th hour unfortunately.

stable onyx Dec 24, 2024, 6:01 PM

#

Another question: in the sample test parquet they gave us, the is_scored field is all true, but this is the very first date and time step (which is part of the public set). they claim only the private set predictions are scored. what am I misunderstanding?

keen kite Dec 24, 2024, 6:21 PM

#

anyone having 'too many requests' issue?

keen kite Dec 24, 2024, 7:12 PM

#

Does anyone know. How can we get previous day features? I saw that the lag dataset only contains responders

wheat pumice Dec 24, 2024, 8:03 PM

#

I believe you can store them in a global variable manually

keen kite Dec 26, 2024, 6:41 AM

#

keen kite Dec 26, 2024, 6:42 AM

#

keen kite

I am kinda confused about the data they provide. Please pont me out if I am wrong in this post

west glade Dec 26, 2024, 3:57 PM

#

keen kite I am kinda confused about the data they provide. Please pont me out if I am wron...

Everything seems right except what you said about test.parquet. As I understand it (take with a grain of salt) You will get a single date_id & time_id each time your predict is called. It will contain features for all symbol_ids. You are right about the lags in that at time_id = 0, you get the previous "date_id's" lags.

what I don't know is at, lets say time_id = 25, if we will get the lag for the previous time_id step (responder value at time_id_{t-1}.

Someone can correct me if any of that is wrong.

split sphinx Dec 26, 2024, 11:46 PM

#

Anyone has problem called"submission format errors"? I tried many time, it is still fail and I can't find issues on my code

#

This is my prediction code

keen kite Dec 27, 2024, 2:15 AM

#

west glade Everything seems right except what you said about test.parquet. As I understand ...

Thanks for pointing me out. I think you are correct. We know 'what time is it now' from the test df passed in

keen kite Dec 27, 2024, 2:17 AM

#

west glade Everything seems right except what you said about test.parquet. As I understand ...

I think you get the previous day's lag response all together at the next day t0. So extra work to cache the features and do df join

west glade Dec 27, 2024, 11:15 AM

#

split sphinx This is my prediction code

When you run your code on the test and lags parquet files that we were given, do you get something that looks like this:

split sphinx Dec 27, 2024, 8:58 PM

#

yeah. I fixed issues later by changing the logic of merging. I need to groupby first and merge

steel parcel Dec 31, 2024, 1:19 AM

#

man, this submission process is driving me crazy

#

Your submission notebook's inference server was disconnected unexpectedly, or a request timed out. See more debugging tips```

#

and not a single clue as to why.

#

and I'm not sure how to troubleshoot this, either. the test submission ran perfectly

zealous lance Jan 5, 2025, 1:27 AM

#

making the features anonymized/indiscriminate, unintelligible without context, IMO is a deficit to building the most optimal model for predicting markets

#

less realistic to exclude domain expertise and knowledge, better for challenge innovation though 😉

hallow dust Jan 6, 2025, 1:01 AM

#

zealous lance making the features anonymized/indiscriminate, unintelligible without context, I...

I would also love to know, but how they generated these features and responders are probably worth a lot of money

unreal cobalt Jan 9, 2025, 1:28 PM

#

what's the frequency of this data(Hourly, daily or weekly?),
Did i missed where it was mentioned or it is hidden and there no way to figure it out.

grim oriole Jan 10, 2025, 6:12 AM

#

it says in the data section "It's important to note that the real time differences between each time_id are not guaranteed to be consistent" which doesn't exactly say much but there are around 968 time_ids per date_id

#

one theory i've seen is that its roughly minutely and includes trading hours and after hours trading

hallow dust Jan 12, 2025, 1:43 PM

#

has anyone's notebooks' ran successfully, but scoring will inevitably result in Notebook thrown exception?

I'm at my wits end, can't get any logs or prints to debug this. I've sprinkled defensive checks everywhere, types, shapes, bounds, anything I can think of, the code looks like messy spaghetti.

and Notebook thrown exception always laughs in my face, literally pulling my hairs out

hasty musk Feb 26, 2025, 11:08 AM

#

Hello Kaggle Team, Kaggle Community, and Competition hosts,

Our team participated in the Jane Street Real-Time Market Data Forecasting competition, and we encountered a critical issue during the forecasting period where none of our submissions were properly scored on the Private Leaderboard.

During the Public Leaderboard phase, our submissions were successfully evaluated, and we had no issues. However, once the competition transitioned into the forecasting period, every submission we made—whether it was our own developed solution, a publicly available notebook solution, or a combination of both—failed to be scored correctly. This happened regardless of whether we selected those solutions as our final submissions.

The submission logs indicate that all our submissions were marked as "Succeeded," yet they were not evaluated on the Private LB. The attached image provides evidence of this issue.

We want to clarify that our team did not engage in any rule violations or unethical practices. Given that multiple solutions were affected, we believe this could be a technical issue rather than a problem specific to our team.

Could the Kaggle team and Community please investigate this matter and provide clarification on why our submissions were not evaluated on the Private LB? We would appreciate any insights or possible resolutions.

#

fossil wharf Mar 2, 2025, 6:30 PM

#

categories of assets

#

unfortunately, not much use for competition

fossil wharf Mar 2, 2025, 6:49 PM

#

Jane Street unfortunately will always suffer from how they frame the Aproach; to which they framed the Problem. I may contact their clients rather then seek the Prize money

worthy girder Mar 3, 2025, 8:39 PM

#

hasty musk Hello Kaggle Team, Kaggle Community, and Competition hosts, Our team participat...

You should post in the forums, there is no support through discord.

tawny geyser Mar 4, 2025, 5:41 AM

#

Can we join this competition now

trail wedge Mar 26, 2025, 5:39 PM

#

Says, "New entrants are currently not allowed. You will be able to accept the rules and submit late predictions after the competition completes."

fleet igloo Mar 26, 2025, 10:41 PM

#

Hi All, I am looking for Kaggle Grandmasters who have won competition who can mentor me. I am willing to pay for mentorship. Thank you!

marsh flint Mar 27, 2025, 2:38 AM

#

trail wedge Says, "New entrants are currently not allowed. You will be able to accept the ru...

hmm its a forecasting comp ofc not anytime u wake up n think u can join

trail wedge Mar 27, 2025, 2:39 AM

#

marsh flint hmm its a forecasting comp ofc not anytime u wake up n think u can join

👩‍🦼

fading warren Mar 31, 2025, 3:45 PM

#

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful https://medium.com/pythoneers/machine-m. D

marsh flint Mar 31, 2025, 5:50 PM

#

fading warren Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful https:/...

They do not exist btw

Screenshot_2025-03-31-23-20-14-675-edit_com.android.chrome.jpg

quiet fog Mar 31, 2025, 11:41 PM

#

marsh flint They do not exist btw

Now we don't know they exists fr, lol

marsh flint Apr 1, 2025, 5:02 AM

#

Yes I'm omniscient

tired lark Apr 8, 2025, 1:45 AM

#

chunshake

upbeat arrowBOT Apr 29, 2025, 7:22 PM

#

ellyassam has been warned

Reason: Bad word usage

#

ellyassam has been banned

Reason: Too many infractions

upbeat arrowBOT Jun 7, 2025, 2:54 PM

#

quetzal_002 has been warned

Reason: Bad word usage

#

quetzal_002 has been banned

Reason: Too many infractions

fossil wharf Jun 18, 2025, 8:59 AM

#

fossil wharf Jun 18, 2025, 9:02 AM

#

marsh flint Yes I'm omniscient

I knew you where going to say that 🤣

marsh flint Jun 18, 2025, 9:02 AM

#

No more

upbeat arrowBOT Aug 9, 2025, 6:34 PM

#

nickcillor has been warned

Reason: Posted an invite

#

nickcillor has been banned

Reason: Too many infractions

upbeat arrowBOT Aug 10, 2025, 6:35 PM

#

eversoda has been warned

Reason: Posted an invite

#

eversoda has been banned

Reason: Too many infractions

upbeat arrowBOT Aug 23, 2025, 11:41 PM

#

codelover10 has been warned

Reason: Posted an invite

#

codelover10 has been banned

Reason: Too many infractions