#predict-energy-behavior-of-prosumers | Kaggle | Page 1

mighty rock Nov 5, 2023, 9:37 AM

#

Hey , anybody active guy here working on this project

wanton herald Nov 5, 2023, 10:44 AM

#

☝️

dense agate Nov 5, 2023, 7:57 PM

#

yea. so stuck in the DEFCON CTF and came here to refresh a bit..

wanton herald Nov 5, 2023, 10:12 PM

#

nothing better than a good old fashion time serie project to cheer up the mood

thin ibex Nov 7, 2023, 6:55 PM

#

I like tabular datasets and forecasting problems

late monolith Nov 7, 2023, 7:14 PM

#

Time series. Perfect.

autumn seal Nov 8, 2023, 6:38 PM

#

I used to predict the electrical load using LSTM , they accuracy was good

dense agate Nov 9, 2023, 5:32 AM

#

autumn seal I used to predict the electrical load using LSTM , they accuracy was good

Would love to see some good NN based solutions. gradient boosting is so dominant in the public notebooks

wanton herald Nov 9, 2023, 9:23 AM

#

I don't like so much NN on this kind of time series, I have the feeling there is not enough data / too much subtilities to have something relevant

dense agate Nov 9, 2023, 5:56 PM

#

I usually don't like NN for time series. but I got a feeling this one can be different. Enough feature interation for NN to get an edge

quaint cedar Nov 10, 2023, 3:49 AM

#

I got a super newb question

#

I downloaded the project and got it set up in VSCode.

#

I tried running the enefit-xgboost-start notebook

#

however first line fails

#

Looking in links: /kaggle/input/xgboost-python-package/
WARNING: Location '/kaggle/input/xgboost-python-package/' is ignored: it is either a non-existing path or lacks a specific scheme.
ERROR: Could not find a version that satisfies the requirement xgboost (from versions: none)
ERROR: No matching distribution found for xgboost

#

anyone know what I am missing?

#

i tried downloading the package as a whole and placing it in that directory but still same issue

thin ibex Nov 10, 2023, 9:51 AM

#

you have xgboost installed in your local python ?

wanton herald Nov 10, 2023, 10:35 AM

#

dense agate I usually don't like NN for time series. but I got a feeling this one can be dif...

for now I am trying to focus on a clean framework to do featuree engineering on the time series. I'd like also to try some corrections from "physics" behavior, like applying a production profile or stuff like this

sacred steeple Nov 10, 2023, 1:34 PM

#

wanton herald for now I am trying to focus on a clean framework to do featuree engineering on ...

What is a production profile?

wanton herald Nov 10, 2023, 1:47 PM

#

stuff like the average production for a given day

#

which is different from month to month, like the chart above is in the summer (production starts early morning and end late evening, logic: its sunny longer).
In winter the profile is different.

sacred steeple Nov 10, 2023, 2:29 PM

#

Ah I see. Thx for the explanation, makes a lot of sense

quaint cedar Nov 10, 2023, 5:56 PM

#

thin ibex you have xgboost installed in your local python ?

That is what I was missing. xgboost install. go figure.

dense agate Nov 10, 2023, 7:32 PM

#

wanton herald stuff like the average production for a given day

I am thinking about the same thing. A handy option for encoding cyclical information is to use the cos() and sin() done in many public notebooks. I am using that too since it's convenient, but don't feel like that's the best option.

wanton herald Nov 10, 2023, 11:36 PM

#

using the fourrier transform is something else I think here no? Using profiles seems to improve a bit my score
I'm just at the beginning of my feature engineering, I spent a bit of time having a robust framework that can I easily transpose from training to inference with the submission API

edit: Actually nvm, did not improve much for now.

dense agate Nov 11, 2023, 12:23 AM

#

IDK, I feel like fourrier is similar to sin/cos. I am thinking of something more flexible, like representation learning... That's why I mentioned NN. I haven't tried this idea though.... I feel like I am giving away some secret sauce harold .

sacred steeple Nov 11, 2023, 1:02 PM

#

Is it allowed to train a model locally and then upload it to a kaggle notebook as a "dataset"? or do I have to copy-paste my local code to a kaggle notebook and train from scratch?

wanton herald Nov 11, 2023, 2:28 PM

#

yes its allowed, and its even recommanded

#

you can also train a model in a kaggle notebook, and access it via another kaggle notebook. When you click on "add data" from the right pannel, you have an option to select the output of a notebook.
To use it, you must simply save the files you want to pass from the notebook where your model (for example with pickle), and save the notebook

dense agate Nov 11, 2023, 9:38 PM

#

wait... is it always reading the latest version of the notebook output, or it has to be pinned to the version when you added the notebook? Can you bypass the internet switch with this?

#

Damn PTSD from security hackathon..

wanton herald Nov 11, 2023, 11:12 PM

#

its reading the latest version until you save your notebook for inference

dense agate Nov 11, 2023, 11:20 PM

#

but to evaluate with updated test data, the notebook will be rerun... the qusetion is, what version the notebook can read when rerun during future inference

#

I posted the same question in the discussion,it may be a stupid question....

wanton herald Nov 11, 2023, 11:22 PM

#

ah no !
So basically, you run a model in notebook A and save it in version 1
You run the inference notebook, it run with version 1 of notebook A.
Later you change notebook 1 to version 2.
As long as you dont change save again inference notebook, it will stay on version 1 of notebook A. At least thats how I understand it

#

the internet off is there to avoid people building functions that could leak the test set by sending it remotly. Without this feature, you could easily save the hidden data to a remote bucket for example. But as long as the notebook is not connected to internet, there cannot be exchange of data with outside

dense agate Nov 11, 2023, 11:24 PM

#

that's my concern. The default setting of importing Kaggle dataset is "Pinned to the latest version"

#

not sure if that "latest" is the as of the notebook saved, or as of the notebook is run

wanton herald Nov 11, 2023, 11:26 PM

#

after if you want to make sure there is no problems like this, you can always fork a notebook when you want to make a change

dense agate Nov 11, 2023, 11:27 PM

#

It's probably illegal to use leaked information, even if feasible. just some security concern because of the CTF PTSD....

wanton herald Nov 11, 2023, 11:28 PM

#

yeah but with the no internet policy, there is no way data is leaked anyway.
A thing that is sometimes done by kaggler, when the dataset for test is small, is probing

#

or there is some data leak technique, like you can train a model in the inference notebook adding the testset somehow - like when you make a PCA, it can sometimes be usefull.
Not here anyway because submitting through the API guarantee no leak for the futur

dense agate Nov 11, 2023, 11:29 PM

#

How about this though: "For example, in late april 2024, I can collect the real weather (much more accurate than just forecast) info throughout the test period, update the private dateset. The submitted notebook needs to be rerun to evaluate on the updated test anyway. If the submitted notebook can read the latest dataset, then it can "predict" using the leaked weather info."

wanton herald Nov 11, 2023, 11:31 PM

#

thats why once a notebook is saved, it is binded to the version it has been saved on

#

but you can try to confirm with the admins

dense agate Nov 11, 2023, 11:32 PM

#

wanton herald or there is some data leak technique, like you can train a model in the inferenc...

That's exactly what I am going to do later...

wanton herald Nov 11, 2023, 11:32 PM

#

but we cannot here, because the data is given and infered day by day, guaranteing no leak

dense agate Nov 11, 2023, 11:33 PM

#

of course you can, you can log the revealed target

#

otherwise you can't engineer time lag features

wanton herald Nov 11, 2023, 11:33 PM

#

thats not a leak in that case, you use past data 🙂

#

I guess you are already doing it no ? otherwise you could not get such a high score as a good features include past targets

dense agate Nov 11, 2023, 11:34 PM

#

hmm, yea. I meant I will probably retrain the model with revealed data, not leaked data

wanton herald Nov 11, 2023, 11:34 PM

#

ah

#

yeah you can do a rolling training

dense agate Nov 11, 2023, 11:35 PM

#

not right now. Currently I am focusing on things that won't prolong the developemnet cycle too much

wanton herald Nov 11, 2023, 11:36 PM

#

on my side I prefere first to focus on a robust framework to build feature without leaking the futur by mistake

#

problem: its much more slow to try new stuff
advantage: very easy to plug to inference

#

i'm thinking of implementing a custom cache to calculate faster the rolling, but i'm a bit lazy atm aha

dense agate Nov 11, 2023, 11:43 PM

#

yea, a big part of this competition is pandas data processing technique...

wanton herald Nov 11, 2023, 11:43 PM

#

getting rid of pandas can considerably accelerate the preprocessing

dense agate Nov 11, 2023, 11:44 PM

#

exactly, but I am too lazy to do that now

wanton herald Nov 11, 2023, 11:44 PM

#

or using cudf from nvidia also

#

cudf+gpu is also a good way to preprocess faster

dense agate Nov 11, 2023, 11:46 PM

#

yea, just learned about that. pretty interesting

wanton herald Nov 11, 2023, 11:47 PM

#

i am using native cudf personnaly, did not try the new way with
%load_ext cudf.pandas
import pandas as pd

dense agate Nov 12, 2023, 12:05 AM

#

internet off, can't install/use cudf during inference. lol...

#

maybe learning the polars package is the way to go

wanton herald Nov 12, 2023, 12:08 AM

#

cudf is preinstalled on kaggle kernels

#

i am using it

#

you have two ways to solve your problem:

Make a dataset with the package which allow you to do %load_ext cudf.pandas (if nobody has done it yet). You can then load that package in your inference notebook and pip install it without internet
Use cudf without the %load_ext option. In that case you will have to use directly the functions from cudf which are more or less the same as pandas (you can do the rolling, merging, etc... all the same. The complex stuff comes with custom lambda and .apply)

dense agate Nov 12, 2023, 12:10 AM

#

ah... so I just need to use cudf directly, not %load_ext cudf.pandas magic

wanton herald Nov 12, 2023, 12:11 AM

#

if you want to use the pandas magic (might be convenient in some cases), you need to package the repo in a kaggle dataset and load it in your notebook. Then you can pip install it without internet

#

you can check if nobody has not done it already, if it is not the case, you can try, and upload the dataset as public, would certainly have some upvotes

dense agate Nov 12, 2023, 12:13 AM

#

Damn... I am learning advanced stuff everyday

wanton herald Nov 12, 2023, 12:14 AM

#

cudf is a good one 🙂 actually i'm sad they made a magic for pandas, I liked the fact that it was not so popular before, always made a nice effect in interviews aha

dense agate Nov 12, 2023, 12:16 AM

#

kaggle GPU kernel only has 2 cpu cores harold

#

using the GPU kernel makes everyhing even slower for me...

wanton herald Nov 12, 2023, 12:17 AM

#

ah yeah ? i get a x10 increase on my notebook, but i'm doing 100% df operations

#

if you want to train your model on CPU but the preproc on GPU you can do like this:

dense agate Nov 12, 2023, 12:18 AM

#

maybe mine time-consuming operations are mostly silly indexing that can't be sped up a lot?

wanton herald Nov 12, 2023, 12:20 AM

#

are you doing the training also in your notebook ? that might be also what takes more time, in that case you can split in between building the training set (with GPU), then training the model (using the training set as a dataset for model 2) on a notebook on CPU. Or alternatively, the tree boosting have methods with GPU also

#

regarding silly indexing, it would be weird, given the dataset, i'd say that many weird operations can be done smoothly with smart df operations

dense agate Nov 12, 2023, 12:21 AM

#

yes, i need to optmize my df operations, they are really a mess

late herald Nov 12, 2023, 10:44 AM

#

Guys I have a question
I'm not sure I understand what each observation is.
Is every observation the total amount of energy consumed every hour by businesses (or individuals)in each county for every contract type (product type)?

wanton herald Nov 12, 2023, 12:01 PM

#

You need to predict at for each day and each hour the production and the consumption of electricity of people that have solar panels, for different categories of persons (split by county, buisness/personal, type of product).

sacred steeple Nov 12, 2023, 1:04 PM

#

wanton herald you can also train a model in a kaggle notebook, and access it via another kaggl...

thx Jacky. I am really a noob with competitions where you submit a notebook, these questions may be trivial. Atm I have a codebase that I locally developed on my laptop, with notebooks importing from files which themselves import from other files etc. Is there a recommended way of turning this codebase to a submission, like making a package out of it? Or do you actually write all the code on kaggle notebooks and never use a local system?

wanton herald Nov 12, 2023, 2:28 PM

#

there is no recommended way, it is really up to you and there is different trade off depending on what you prefere doing.
personnaly, I have a set of function to simulate the API behavior and make sure the features I am building are not leaking information from the futur. Makes the preprocessing longer, but more robust.
Then I have a set of functions to do the feature calculations that I can plug direclty on the data from the API or on my simulated version. I have them copy/pasted into the different notebook I used because my computer is not powerfull enough to work locally with a clean package.
In term of notebook, I have one with my simulated API + feature creation package which help me save new features. When I create a new feature, I store it in an AWS bucket with a pair row_id / feature. It is time saving because if i want to try different experiment I already have the features available and dont need to recompute them.
I have a second notebook for training model and feature selection. In this one, I simply load the features i am interested about, train a model, and save the model.
Finally I have an inference notebook in which I load my trained models, I can then calculate the features using my set of functions and simply do the prediction

sacred steeple Nov 12, 2023, 2:34 PM

#

wanton herald there is no recommended way, it is really up to you and there is different trade...

thanks a lot for the detailed answer. I will try to come up with a setup this afternoon and will report back 😄 I think this is very useful knowledge for beginners

wanton herald Nov 12, 2023, 5:59 PM

#

i'm wondering how many people are actually really in the 80- range without copying the top ranked notebook

dapper edge Nov 12, 2023, 8:10 PM

#

I really hate people who copy public notebooks and do submissions with it

#

overinflates the scores

dense agate Nov 12, 2023, 8:27 PM

#

dapper edge I really hate people who copy public notebooks and do submissions with it

I had a look at the optiver competition. Looked at the leadborader and got turned away for the exact same reason...

#

#

the rankings are just so crowded

dapper edge Nov 12, 2023, 8:32 PM

#

I wonder if you can just farm medals by copying the best public notebook in each competition

dense agate Nov 12, 2023, 8:32 PM

#

silver and bronze yes

#

so easy

dapper edge Nov 12, 2023, 8:33 PM

#

damn

#

seems like an easy thing to fix, bit obvious when 100 people have the same score

dense agate Nov 12, 2023, 8:38 PM

#

for competition with this much participants, ~top150 get silver. you can copy the public notebook, make some improvement and get easy silver

dapper edge Nov 12, 2023, 8:42 PM

#

bit boring tho

wanton herald Nov 12, 2023, 8:53 PM

#

A good and simple fix would be to remove the possible to fork the kernels, and desactivate copy/paste.
Would requiere much more efforts to copy at least while keeping the philosophy of sharing

dapper edge Nov 12, 2023, 9:25 PM

#

Or to force private any notebook that is in a medal spot

dense agate Nov 12, 2023, 11:14 PM

#

Some strange observation, when I do GroupKFold by month, my validation mean error (signed) is obviously negatively biased....

#

That's strange beacuse one sided bias is usally caused by some long term trend. but using GroupKFold by month is already somehow using leaked information because newer time data is trained to predict older time target...... doesn't make much sense

wanton herald Nov 12, 2023, 11:21 PM

#

i dont think this is so much important here personnally. At the end most of the "difficult" part to predict does not come from the variation of the time series itself but of the weather, which is what it is at a moment in time.
The thing to avoid is to have data from the same day in different train and test split I think, appart from that it should be fine. Our models are most likely learning how weather influence the production/consumption and some very general trend (like the production profiles depending on the months)

dense agate Nov 12, 2023, 11:24 PM

#

indeed, the weather condition have a lot of impact. It's just that a one sided bias seems like a low hanging fruit, yet I can't grab it. (like, I can multiply the predictio by 1.xx and improve the result but that's so dumb). Secondly, figureing out these anormality usually leads to some insights neglected by others

dense agate Nov 12, 2023, 11:26 PM

#

wanton herald i dont think this is so much important here personnally. At the end most of the ...

Yep, I wish lgbm support multi output regression. If that's available, I would do the prediction by day, not by hours..

wanton herald Nov 12, 2023, 11:27 PM

#

xgboost does

#

its a new feature they introduced recently

dense agate Nov 12, 2023, 11:29 PM

#

I heard that too, definitely will try it later

wanton herald Nov 12, 2023, 11:29 PM

#

I made an attempt, but so far not successful on my side

dense agate Nov 12, 2023, 11:31 PM

#

Oh I missed NN so much. A transformer or RNN like prediction head to spit out a 24 hour predictions one by one would work so well conceptually

wanton herald Nov 12, 2023, 11:32 PM

#

there was a competition where transformers were better than FE+boosting, the RIID ones, but the TS were not of the same nature

#

and there was muuuuch more data

#

here I think its as usual, too much variance + lot of complex seasonalities + not enough data

dense agate Nov 12, 2023, 11:35 PM

#

I think given enough developement time, NN will win for most cases, even for small dataset. but the time needed to regulate different features at differeny layers make it really hard

wanton herald Nov 12, 2023, 11:36 PM

#

i have not enough experience on successfull NN to tell :p

dense agate Nov 12, 2023, 11:40 PM

#

wanton herald I made an attempt, but so far not successful on my side

the features dimension would grow dramatically if we are going to predict by day, it's a bit tricky to handle. Maybe that's the reason it is not performing well?

wanton herald Nov 12, 2023, 11:43 PM

#

yeah to avoid an explosion of feature, i stayed generic with weather features at a day level, but it was not enough. and if you try to include all the features hour per hour, well you risk indeed an explosion (not sure the RAM could handle it also)

late herald Nov 13, 2023, 1:14 AM

#

can someone explain this error to me please?

#

i can't seem to predict anymore?

dense agate Nov 13, 2023, 1:35 AM

#

you've already call predict() in line 23

#

it is supposed to be a loop, where you iteratively observe -> predict

dense agate Nov 13, 2023, 2:06 AM

#

why people are using historical weahter as features when you have weather forecast? I can't wrap my head around it....

#

and several historical weahter feature apprear at the top of feature importance in the latest 72.87 public notebook. That drives me crazy...

wanton herald Nov 13, 2023, 9:24 AM

#

dense agate and several historical weahter feature apprear at the top of feature importance ...

The historical weather data is more accurate, when you calculate past features, you want to use this historical data and compare it to the data of the current weather forecast. Its better than using only features derived by the past weather forecast

late herald Nov 13, 2023, 10:13 AM

#

dense agate it is supposed to be a loop, where you iteratively observe -> predict

Oohh right
Didn't see that. Thank youu

wanton herald Nov 13, 2023, 11:29 AM

#

@dense agate are you team multi-models or single models ?
I started on single model, but the more I look at the data, the more I am tempted on splitting the models

dense agate Nov 13, 2023, 12:17 PM

#

wanton herald The historical weather data is more accurate, when you calculate past features, ...

exactly, but that public notebook is just feeding everything to the model, not sure the model is smart enough to capture all those nuance.

#

I tried to compare forecast and historical features, but most of them are not even comparable, measured in different methods

dense agate Nov 13, 2023, 12:19 PM

#

wanton herald <@788479631548088421> are you team multi-models or single models ? I started on...

multi-models will win. The issue is how to split and how many

wanton herald Nov 13, 2023, 12:20 PM

#

yes but it does not matter really no ? I guess a comparaison is made between rolling features on target and rolling feature on historical weather to capture the "trend" which is then compared to the value from the forecast to "balance" the "trends" obtain with the historical values to get a deviation

#

So far, I am considering 2 very distincts models (there is clearly 1 particular cluster which is really different from all the others), but I am also redoing all my feature engineernig and calculate some particular coefficients for some clusters.
I don't like so much the idea of using a boosting trees with unbounded trends, so I'm trying to normalise everything properly

#

tree-based algo are very efficient if the data stays within the boundaries of the training set, but they are bad for extrapolating outside of those boundaries unlike LR (if I'm correct). Or here, we are exactly in the case where we will go outside of the boundaries as the installed capacity increase over time

dense agate Nov 13, 2023, 12:29 PM

#

hmm... you are right. maybe there is some interesting feature based on historical weather that can proxy some "state"/"trend"...

dense agate Nov 13, 2023, 12:32 PM

#

wanton herald tree-based algo are very efficient if the data stays within the boundaries of th...

normalizing the target would make sense then.

wanton herald Nov 13, 2023, 12:32 PM

#

I want to check something, holdon a sec :p

dense agate Nov 13, 2023, 12:32 PM

#

normalizing targets that would mess up the loss though, needs to be careful.

wanton herald Nov 13, 2023, 12:35 PM

#

aaah

#

perfect

#

look at this

#

N=100
X = np.random.random(N)10
y = X2+6+np.random.random(N)

new_X_1 = np.random.random(N)x10
news_y_1 = new_X_1*2+6+np.random.random(N)

new_X_2 = (np.random.random(N)x2+10)
news_y_2 = new_X_2*2+6+np.random.random(N)

With this code I create a simple linear relation between X and y, and my new_X_2 is outside of the train set (X)

#

lr = LinearRegression()
lr.fit(X.reshape(-1,1),y)
dt = LGBMRegressor()
dt.fit(X.reshape(-1,1),y)

print(lr.score(new_X_1.reshape(-1,1),news_y_1))
print(dt.score(new_X_1.reshape(-1,1),news_y_1))

print(lr.score(new_X_2.reshape(-1,1),news_y_2))
print(dt.score(new_X_2.reshape(-1,1),news_y_2))

What is going to happen when I score/predict with a LR and a lgbregressor in your opinion? :p

#

this could make a nice interview question for a junior ds actually aha

sacred steeple Nov 13, 2023, 12:43 PM

#

wont the lgbm flatline?

wanton herald Nov 13, 2023, 12:45 PM

#

if by flatline you mean "predicting always the same value outside of the training range", then yes

sacred steeple Nov 13, 2023, 12:47 PM

#

yeah i mean if you would plot the preds they would stop increasing at around 25ish

dense agate Nov 13, 2023, 12:48 PM

#

I actually have no idea how gbdt works at low level, this competition actually introduced me to tree-based methods. My understading is that it somehow formulate the regression problem as a classification with a lot of bins. But I also saw somewhere that the prediction can go beyond the target range seen in the train set. That's something I don't understand.

sacred steeple Nov 13, 2023, 12:49 PM

#

the kaggle time series course showed a neat trick: first fit a linear model, then fit a lgbm on the residuals. ols extrapolates, lgbm fits the nuances

wanton herald Nov 13, 2023, 12:49 PM

#

a decision tree actually simply split the training space to optimize the entropy.
In the case of the LR, it average the target within each subspace
a boosting tree leverage multiple dt, so it is limited by the limits of the decision trees

dense agate Nov 13, 2023, 12:50 PM

#

sacred steeple the kaggle time series course showed a neat trick: first fit a linear model, the...

oh, your comment actually reminded me of the answer to my own question.

sacred steeple Nov 13, 2023, 12:50 PM

#

dense agate oh, your comment actually reminded me of the answer to my own question.

yeah i thought that could have been the answer :p

wanton herald Nov 13, 2023, 12:51 PM

#

why the target increase over time ? Because of the number of clients increasing over time, and the proxy for that is eic_count and installed_capacity

sacred steeple Nov 13, 2023, 12:52 PM

#

we shouldnt have to extrapolate sooo much though, given rolling training

wanton herald Nov 13, 2023, 12:53 PM

#

it all depends how you build your features I think

#

what you want is having a bounded space, it can be done in multiple ways

dense agate Nov 13, 2023, 12:55 PM

#

I think there are many ways of normalizing based on different proxy. The issue is probably how to adjust the cost once you do that.

wanton herald Nov 13, 2023, 12:55 PM

#

and you need to be able to reverse the normalisation

#

but if you divide by a coef, normally you can always multiply your prediction by the same coef 🙂

dense agate Nov 13, 2023, 12:57 PM

#

you will probably need a customized loss right? otherwise, you are minimizing a proxy of the MAE, not the real MAE

wanton herald Nov 13, 2023, 12:58 PM

#

I think it should be roughly equivalent no ?

dense agate Nov 13, 2023, 12:59 PM

#

I image they would be quite different. If the normalization has an meaningful impact.

#

These things are pretty easy to do with NN. don't know about boosting packagss.

wanton herald Nov 13, 2023, 1:01 PM

#

you can build a custom loss in your lgb

#

that includes the coefficient vector you use to normalize

#

basically what you want to do is something like:
loss(y, pred, c) = np.abs(y x c - pred x c)

dense agate Nov 13, 2023, 1:04 PM

#

that's neat

sacred steeple Nov 13, 2023, 1:57 PM

#

sacred steeple thanks a lot for the detailed answer. I will try to come up with a setup this af...

I settled on the following development workflow (for now): write code locally in some folder, then zip and upload that folder to a Kaggle notebook. Then within the notebook, add the path to the uploaded folder to PATH, now all imports within the folder work, and I can also import the code to the notebook. If a pickled trained model is in that folder, can just unpickle it and it is ready for prediction even. Still have to look at the sklearn version mismatch between kaggle and local, but this seems good enough.

wanton herald Nov 13, 2023, 2:49 PM

#

the more I look at the data, the more I am confused

#

some time-series don't behave at all as they should

marble trail Nov 13, 2023, 2:54 PM

#

Anyone interested in teaming up?

wanton herald Nov 13, 2023, 2:56 PM

#

on my side I prefere trying to go for a solo gold, but happy to discuss ideas and thoughts here

sacred steeple Nov 13, 2023, 2:59 PM

#

wanton herald on my side I prefere trying to go for a solo gold, but happy to discuss ideas an...

good luck!

dense agate Nov 13, 2023, 9:48 PM

#

Linear Boosting is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtain predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met.

#

why the heck binary feature.... I am lost. Is it going to overfit after like 10 seconds?

indigo cedar Nov 13, 2023, 10:20 PM

#

Just started looking into this comp. From what I gathered there are predictions to be made for every county, product type and is_business combination for production and consumption. This would make up for a maximum (not all have to be present) 1624*2 time series.

The API is necessary to make prediction but confuses me. It provides for every iteration a new row of information of the test set. Per row_id a target should be provided to make a submission.

does the test dataframe provide the indication what to predict? which county, product type, date and hour etc
does the test dataframe follow 'after' the last row of the training set? and continues to pour new information with every iteration?
is there a way (besides using the leader board or the training data) to use the values in the test set for validation? (MAE scores)
if you would fit a time series model for a individual timeseries (county / product type / is_business etc) is there a direct identifier to connect this timeseries (and thus the prediction) to the row_id in the sample prediction the iteration?
how would you create lag values of a single time series

I'll try to answer these myself but if someone would have some pointers, that would great.

wanton herald Nov 13, 2023, 11:02 PM

#

indigo cedar Just started looking into this comp. From what I gathered there are predictions ...

basically the API is here to make sure you make the predictions before accessing the next values, which would be a problem in term of data leaks.
check this notebook: https://www.kaggle.com/code/sohier/enefit-basic-submission-demo it gives a simple routine to gather the test data (iteration by iteration) and submit (iteration by iteration).
At each iteration you will get multiple datasets providing same data as in the training set (but one day at the time). You need to predict all the rows in "test". Note that the notebook crashes if you miss submitting one or several rows, if you have duplicates etc... So among the datasets send by the api, one is called "sample_submission" and indicate you the format to respect (the rows_id with the associated target). Then its up to you to make your model fit the model (I am personnally using a left join and filling na with 0, this way I guarantee no duplicates/no missing data).

Enefit Basic Submission Demo

Explore and run machine learning code with Kaggle Notebooks | Using data from Enefit - Predict Energy Behavior of Prosumers

#

regarding your other questions:
I think the test set will follow the train set but will be provided step by step by the API, it will not possible otherwise to calculate rolling features. For now there is a 2 days overlap, so you must make sure you have a continuity between train and test set if you compute rolling features etc...
To create the lag features, everybody has its own method, I personnaly have a class Enefit which has in attributes each dataset from the train set. Then at each iteration from the API, I add the rows/format the columns/etc.. of each df to its corresponding df in my class. Then, and only then, I calculate the feature vector X using all the data I have compiled with joins/rolling/etc..., and use them to make my predictions.
for the identifiers, I think the orgs provided one in the main df, I personally just use multi index/multi columns depending on what I need

wanton herald Nov 13, 2023, 11:10 PM

#

dense agate why the heck binary feature.... I am lost. Is it going to overfit after like 10 ...

no idea, I never used this kind of linear methods with boosting... Don't think I will for this comp either :p

#

For now, I am looking at each ts, one by one, and try to separate the ones with strange behaviors from the ones which are very similar

#

#

#

those are the "clean" ones

#

but there is also some weird specimens that I am considering treating separatly

#

(those are ts smoothed over week + a little normalisation btw for those interested)

dense agate Nov 14, 2023, 3:34 AM

#

I think I discovered some method that may be worth a paper...

#

basically making tree-based method works better for TS regression, after thinking of the issue you raised yesterday @wanton herald

dense agate Nov 14, 2023, 3:42 AM

#

wanton herald

interesting approach. I recall that in the last energy theme regression comp, removing outliers was the key to gold

wanton herald Nov 14, 2023, 9:24 AM

#

cool i see your improved your score !
I'm curious to see where we will be landing by the end of the comp, there is so much to do

#

I was running a simple experiment this morning, that all of us should be doing, but a simple forward strategy give me already an incredibly high score

#

well.. "incredibly good" = 66 in MAE, but in 3 lines of code and without models

dense agate Nov 14, 2023, 9:36 AM

#

I am going to try some good old TS forecasting methods at some point

#

that's indeed very good for 3 lines of code

wanton herald Nov 14, 2023, 9:37 AM

#

i might put the notebook public, i want to see how much it scores first. I'm just using pivot/melt but in a smart way

dense agate Nov 14, 2023, 9:38 AM

#

After thinking about the tree-based methods limiation. I managed to improve the LB MAE by ~2.6, at the cost of 7x the runtime though. Now it takes about 2 hours just to have the trained models. Could be a curse for me since it is so early now, now everything takes a lot time to see the full effect...

wanton herald Nov 14, 2023, 9:39 AM

#

well you are far from the 2nd for now, and many people will not even read these stuff that we are discussing (which are super important!)

dense agate Nov 14, 2023, 9:39 AM

#

wanton herald i might put the notebook public, i want to see how much it scores first. I'm jus...

I like EDA and alternative solution 👍 much better than all the generic LGBM baselines

wanton herald Nov 14, 2023, 9:42 AM

#

yeah the problem is that many people just rush into boosting without even trying to understand the underlying problem, and for time series its very important

#

if you want a bit of food for thoughts, check this time series:
df[(df.is_business==1) & (df.product_type==3) & (df.is_consumption==1) & (df.county==13)]

#

I find it very interesting

dense agate Nov 14, 2023, 9:46 AM

#

exactly. as much as I hated the defcon CTF. The PTSD forced me to read the competiton details again and again..... that helped

#

hmm, county 13 is not on may radar... county 0 is a big trouble for me

wanton herald Nov 14, 2023, 9:52 AM

#

its not particularly county 13 the problem but the combo of county 13 and the other features

dense agate Nov 14, 2023, 9:53 AM

#

(df.product_type==3) & (df.is_consumption==1) is supposed to be the trouble anyway

wanton herald Nov 14, 2023, 9:54 AM

#

yes, thats true that most of my outlliers are coming from that

#

in this particular one, there is a very interesting drop, and I don't think it can be explained by the added/removed capacity. I think there is some information we don't have that would be somehow missing

#

I was wondering if a client is a co-generation power plant (that can produce with gas and solar for example) or a plant that produce hydrogen based on electricity. That could generate pretty big outliers in comparaison to smaller players

dense agate Nov 14, 2023, 9:57 AM

#

maybe lost a big customer...

#

product_type==3 is the spot contract. I thought having the electricy cost will help explain a lot the variance. But the elec price only helped marginally in my model.

wanton herald Nov 14, 2023, 9:57 AM

#

yeah but you would expect to see this in the capacity installed also no ?

#

ah no yeah because here the drop is on the time serie of consumption... So it might be a eater of energy alone that leave. Makes sense

dense agate Nov 14, 2023, 9:58 AM

#

production is easier, because it doesn't make sense to turn it off. that's free money

#

consumption is another story, there are so many alternative even if the facility is installed

wanton herald Nov 14, 2023, 9:59 AM

#

yeah I saw that too

#

and same observation on elec/gas price not very usefull and no obvious correlation on filters like elec > gas or things like this

#

do we have an idea btw of the benchmark from ENEFIT ? Whats their own baseline ?

dense agate Nov 14, 2023, 10:04 AM

#

I am curious too

#

it's one of the few industries where forecasting accuracy directly translate into $$, and the host seems to have decent skills

#

damn, if this competition goes well. after it is done, I may as well concact my local electricity provider and ask for a project&funding

wanton herald Nov 14, 2023, 10:16 AM

#

ahaha

dense agate Nov 14, 2023, 10:17 AM

#

And there is a guy on the forum asking to giving up the chance to receive prize in returns for participation without handing in codes. Maybe this is the reason. If a propreitary algo can improve the foreast accuracy considerably, that means a lot of $ for these big corp.

wanton herald Nov 14, 2023, 10:17 AM

#

yeah its a huge subject

dense agate Nov 14, 2023, 10:17 AM

#

the compe prize is probably nothing compared to what the big corp can save

wanton herald Nov 14, 2023, 10:17 AM

#

and god knows the amount of data scientist not doing things properly

#

and thats one of the few fields where a little saving in accuracy can save a lot

#

ok I think I managed to put up the submission pipeline with my naive method

#

time to see how it scores!

dense agate Nov 14, 2023, 10:24 AM

#

#

btw this is my favorite baseline

#

you got to do better than this.

wanton herald Nov 14, 2023, 10:24 AM

#

how much does it score ?

dense agate Nov 14, 2023, 10:25 AM

#

219

wanton herald Nov 14, 2023, 10:25 AM

#

thats what i am doing, but in more evolved :p

#

and yes, that should be how to do a baseline

#

but there is better than ffill simply last

#

you need to ffill the right value, which is not the last

#

not even the last day of a given hour

dense agate Nov 14, 2023, 10:26 AM

#

oh! interesting

wanton herald Nov 14, 2023, 10:27 AM

#

there is also a weekly periodicity

#

lot of stuff are working slower during weekend

#

not too bad including the weekly periodicity :p

dense agate Nov 14, 2023, 10:28 AM

#

👍

wanton herald Nov 14, 2023, 10:29 AM

#

I guess only combining the previous hour + last weekly hour you can get already very high scores

dense agate Nov 14, 2023, 10:30 AM

#

wanton herald not even the last day of a given hour

I might be overthinking this sentence. I was thinking a K-mean clustering with many bins, and then do a classification based on feature distance

#

I guess it will end up similar but inferior to tree-based methods

wanton herald Nov 14, 2023, 10:32 AM

#

I was just saying that there is multiple way of doing a baseline. The easier is to propagate for a given ts the last hour.
another method is, for each hour, to propagate the day before
a more advance method is to actually propagate for each hour and each day of the week the last iteration

#

actually we cannot do the first one because we don't have access to the last hour when we do the forecast harold

wanton herald Nov 14, 2023, 10:34 AM

#

dense agate I might be overthinking this sentence. I was thinking a K-mean clustering with m...

Kmeans are usually much worsts than boosting trees

indigo cedar Nov 14, 2023, 10:35 AM

#

wanton herald regarding your other questions: I think the test set will follow the train set b...

Thanks for the answer. I'll probably have to play around with the data a bit more to fully grasp how to compute the lagged values. But are you modelling individual time series (as there are at least 2 for the production and consumption, but possibly much more distinct time series as you are discussing as well), then making a forecast for each time series, and the mapping the right prediction value to the county/product_type/business/cons_prod?
I'm struggling to see how to come from different models (which are fed row by row on training data) to the row_id target value. Wouldn't it get really messy if prediction_units drop out etc?

wanton herald Nov 14, 2023, 10:42 AM

#

indigo cedar Thanks for the answer. I'll probably have to play around with the data a bit mor...

you can do the way you want, in many notebook, people simply predict the target given all the other information without manipulating the time series themselves (its somehow handled by the model if you give information like hour, dayofweek etc...)
If you want to build a forecast for each time series, you need indeed a good layer of pre-processing and post-processing. Its not straight forward and it might requiere different things depending on your approach. There is no absolute solution for this

indigo cedar Nov 14, 2023, 10:47 AM

#

No I understand that's up to every to design a method to approach this. It kinda makes the entry to prediction a bit more difficult as you need to handle the output in a very narrow and specified way.
The method used with trees algorithm can fit to all data, but if you want to try ARIMA based models or variants that do not generalize well to exogenous features that you already need to split the models and match the output to API row-based setup. Probably doable but more prone to errors.

#

By the way, when you were taking that it doesnt make sense to shut down production. In some countries they have mechanism in place to reduce the production by incentive scheme or penalties when there is to much production. But that's probably edge case and not applicable here though

dense agate Nov 14, 2023, 11:00 AM

#

my model are so bad at May, and luckily the private test ends in April harold

#

wondering what's special about May...

wanton herald Nov 14, 2023, 11:04 AM

#

in France we have a lot of days off in may

#

add 2-3 days off you can add already a 10% error

dense agate Nov 14, 2023, 11:06 AM

#

lol... are there some statistics on % people taking days off during the year. that would be an funny feature

wanton herald Nov 14, 2023, 11:06 AM

#

no but like nationnal day off

#

in france for example we have 1st May, 8th May, pentcost, ascension thursday...

dense agate Nov 14, 2023, 11:10 AM

#

I even studied the school schdule in estonia, and made feature for when schools are open and students are in class. Thought that will make a hughe difference, but nothing...

#

maybe schoos are not Enefit's customers, IDK...

#

Maybe I can do a public notebook of uesless features...

wanton herald Nov 14, 2023, 11:16 AM

#

it can also be that may is a hard month both for consumption and production

#

mmh

#

I had a mistake in my training, I was shifting by two rows instead of 1

#

when I shift of 1 row only I get a MAE of 35

#

where is the fluck ? :p

#

ah I know i forgot a param...

#

@dense agate what is your current score on training data ? out of curiousity ?

#

i'm getting MAE 58 now with my ffill method

dense agate Nov 14, 2023, 11:36 AM

#

GroupKFold by month, MAE around 35.5

#

I don't konw why GroupKFold by month works a bit better on LB... by year-month or by day should make more sense, and much better local MAE. but doesn't improve LB

wanton herald Nov 14, 2023, 11:37 AM

#

dont know... The advantage with the ffill method is that I dont need groupKfold :p

#

now that I fixed a bug, i'm expecting better results, maybe I can sub 100

dense agate Nov 14, 2023, 11:39 AM

#

Good old seasonal forecasting methods could be interesintg too https://otexts.com/fpp2/holt-winters.html

7.3 Holt-Winters’ seasonal method | Forecasting: Principles and Pra...

2nd edition

#

I was thinking of doing the H-W algo

wanton herald Nov 14, 2023, 1:43 PM

#

@dense agate do you use functions pd.melt/pd.pivot_table ?

dense agate Nov 14, 2023, 1:47 PM

#

not for this competition

wanton herald Nov 14, 2023, 1:49 PM

#

its pretty handy to do quickly feature engineer, you might want to have a look to accelerate your code

#

(worth for all participant readying this message)

timid swan Nov 14, 2023, 1:54 PM

#

just to expand here on my message in the overfit thread (kaggle website discussions) -- I tried out to predict not the direct target, but e.g. the lagged (log) difference. It does seem to slightly improve the perf in short term (as in if I train until end of March, performance in April is better). But performance is worse long term with those transformations (e.g. May). Really wondering how to transform this problem to either make tree methods work / try out linear trees / get rid of trees completely...

wanton herald Nov 14, 2023, 1:56 PM

#

why using log diff and not just diff ?

timid swan Nov 14, 2023, 1:57 PM

#

because with normal diff a few time series are still non stationary (checked using Dickey Fuller test)

#

with log they are all stationary

wanton herald Nov 14, 2023, 1:58 PM

#

I am personnally trying to regress the capacity_installed with target and use the prediction as a coefficient to normalize the time serie

#

this way I get rid of the drift (the main drift coming from new clients)

timid swan Nov 14, 2023, 2:02 PM

#

Oh okay. I did try to predict target per installed capacity. However, this didn't really work out (> 100 MAE) for me. And did not want to waste more time in that direction with these initial results^^

wanton herald Nov 14, 2023, 2:04 PM

#

yeah its not the best, but it help bounding the targets.
This is for example a time serie for a given hour for a given dayofweek without / with applying this factor:
so we see that the upward trend is corrected, and its easy to reverse back.

#

But for now, it did not improve my analysis so much because the lagged error is not really affected by the long trend of installed capa...

dense agate Nov 15, 2023, 3:33 PM

#

is data augmentation a thing for tree-based methods? I am throwing random thoughts to solve the "out of boundry issue". It is reasonable to synthesize a new sample by combining two samples with similar features two days with simiar features: sum the client features, and average the others

#

that would solv the out of boundary issue, but could a toll on training time

wanton herald Nov 15, 2023, 3:35 PM

#

I never saw data augmentation working for tabular data

#

I think there is a much better shot into removing outliers

#

On my side I managed to get to MAE 55 on the train set with just smoothing + ffill, but I dont manage to infere properly for now

#

This score is important because it can help creating stationnary ts like Lasse was saying

dense agate Nov 15, 2023, 3:51 PM

#

isn't Dickey Fuller test a bit too harsh though. after all, we have many features other than just time

#

if there exist a test that examamine stationaryness wrt to features, i think the data will pass easily

wanton herald Nov 15, 2023, 3:53 PM

#

Im not using a test, but i want stationnary ts to be able to predict the delta from one date to the other using the delta of the other metrics (like temperature, clouds etc..)

dense agate Nov 15, 2023, 3:53 PM

#

wanton herald Im not using a test, but i want stationnary ts to be able to predict the delta f...

I have similar thoughts.

wanton herald Nov 15, 2023, 3:54 PM

#

For now i would like my sub with lagged target to work correctly that would be a nice first step 😅

dense agate Nov 15, 2023, 4:04 PM

#

the last line of my notebooks is printing the the number of na counts in the features...lol

wanton herald Nov 15, 2023, 4:41 PM

#

the problem is that we cannot see the public test set, so its tough to debug :p

dense agate Nov 15, 2023, 6:02 PM

#

There is one new column in the test file: currently_scored. This is intended to allow you to reliably avoid spending time performing inference on unscored delivered by the API, such as the initial rows of train data.
the new column is interesting

#

If it is what I understand it is, then we can probably delay our training untill we hit the first row with currently_scored=True

timid swan Nov 15, 2023, 11:34 PM

#

dense agate isn't Dickey Fuller test a bit too harsh though. after all, we have many feature...

Well yes of course it's quite hard. The point is that without stationary targets, your tree will often have to predict at its bounds, where by definition it will be less well trained. Linear trees might be a huge step up with proper feature normalization. Also, if you look at the Kaggle M5 competition paper, there are some interesting augmentation strategies for tabular time series data.

#

For me, I currently do get my best scores by just predicting the actual target. But I do feel that having a non stationary target should help. At least for long term quality forecasts. However, this competition seems more and more about who is re-training the most often during Inference 🙃

wanton herald Nov 15, 2023, 11:44 PM

#

i dont want to believe that retraining is the key personnally... :p
We can get very good score just by playing on the periodicity of the target (55 MAE on the training set by simply using a 2/7 days lag with a 3hrs smoothing is not bad!) and I assume that weather as a large part to play for explaining the remaining error (at least on the production side).

timid swan Nov 15, 2023, 11:52 PM

#

Of course there are many things at play ! You will be able to get a great score without retraining at all ! But the very top submissions will surely do that. In the end we will have to predict 10 months of future data. No way around at least adjusting your model (e.g. by fitting past errors or just retraining the whole thing).

#

The smoothing is a nice idea. People tend to forget about post-processing 😄

marble trail Nov 16, 2023, 11:12 AM

#

wanton herald using the fourrier transform is something else I think here no? Using profiles s...

when you say that you use the profiling, do you mean that you created a new variable with the average production per month? Moreover, when using these profiles, did you create the avg production/consumption at month and hourly level?

marble trail Nov 16, 2023, 12:03 PM

#

wanton herald (those are ts smoothed over week + a little normalisation btw for those interest...

Hi @wanton herald , thanks for sharing, when you say smoothed over week, do you mean that you have taken the weekly avg to plot the series? normalization you mean that for all the points you have substarcted the mean and divided by the std?

wanton herald Nov 16, 2023, 1:01 PM

#

marble trail when you say that you use the profiling, do you mean that you created a new vari...

For now i am not using the profiles but to visualise them, you can normalise each time series each day by the mean value, and average for each hour and every month.
You can build other features with this ideas or use it as preprocessing when you calculate other features. For example: if you want to smooth target by applying a rolling window, you might want to consider this type of profile before smoothing

wanton herald Nov 16, 2023, 1:04 PM

#

marble trail Hi <@390207020290146305> , thanks for sharing, when you say smoothed over week, ...

For the graph i did weekly average of daily max, the weekly average is used to smooth weekly seasonal effects and daily max to smooth the daily effect

marble trail Nov 16, 2023, 1:17 PM

#

wanton herald For the graph i did weekly average of daily max, the weekly average is used to s...

Coming back to his point, I also came across some weird cases by plotiing the charts. I didn't smooth the series, I just calculated the avg daily of the target. There are more cases where either the consumption/ production increases or decreases breaking all the sesonality and trend patters. In all of these cases, how would you guys treat each TS?

charred spoke Nov 18, 2023, 4:48 AM

#

Hi guys, i don't understand the logic between the target and is_consumption in the train dataset, we have 2 observation per prediction_unit_id, his relationsip is (target when is_consumption = 1) - (target when is_consumption = 0) or (target when is_consumption = 1) + (target when is_consumption = 0) ¿?, this is my first competition, i'll really appreciate your help

marble trail Nov 19, 2023, 7:23 AM

#

Actually sebas, we have two different time series per prediction_unit. One for is consumption = 0, which means that energy is been produced and stored and that usually happens in summer months. The other TS associated with the prediction unit is when consumption = 1, which means that energy is being consumed. Ideally that happens during the winter months

ornate elbow Nov 19, 2023, 7:50 PM

#

hello there
in the weather forecast file there is two Predictions for almost every hour in all counties. my problem is that the eraly prediction got dtat_block_id that indicates that it will not be available until next day, in other words after a lag for one day, but in the other hand we have the column hours_ahead which suggest that the data will be available exactly when the day starts so which one should I trust

dense agate Nov 21, 2023, 3:30 AM

#

wanton herald i dont want to believe that retraining is the key personnally... :p We can get ...

I tend to agree with you. The gain on retraining as a function of retraining times N is likely on the order of logN. Anyway, I would be curious to see what is the initial gain of retraining 1 time compared to no retraining.

#

At the end, anyone wants a gold would probably need to retrain their model a few times, but unlikely an advantage if you retrain many more times than others.

signal hollow Nov 21, 2023, 5:52 PM

#

Hi guys, I kinda stuck with problem of submitting via env. Make some adjustments and got Submission Scoring Error. Checked everythin, no NaN, filling NaN with 0.0, float64 column type, but spent 5 submissions with no results. Read the topic on forum, but nothing changed. Maybe someone found a way to guarantee submission scoring?

wanton herald Nov 21, 2023, 7:39 PM

#

Generate your results, join on row_id with test_submission, fill nan with a method of your choice
Encapsulate the loop in a try/except submit test_submission

signal hollow Nov 21, 2023, 8:02 PM

#

hm, it possibly will work, so bad that we can't get more traces of error cause of possibly dataleak

#

thanks!

wanton herald Nov 21, 2023, 10:59 PM

#

Yeah its very frustrating... The reason behind that is that having access to the error would allow participant to leak the test dataset

ornate elbow Nov 22, 2023, 10:24 AM

#

hello guys, is there any cases where data_block_id is misleading and do not represent the true time of the availabilty of data?

#

my problem is that according to data_block_id some forecasts for the weather will be available after one day lag like gas prices but that does not make sense, since the forecast are made at the start of the day and should be available before we predict the target

#

I am really struggling with this 😔

dense agate Nov 22, 2023, 12:15 PM

#

this post by the host may be helpful: https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/455833

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

dense agate Nov 22, 2023, 12:19 PM

#

ornate elbow hello guys, is there any cases where data_block_id is misleading and do not repr...

If I understand your question correctly. The issue is the actual business process involved. Read this description by the host: "Let’s say we are on day D at 11am. We want to predict next day D+1 net consumption from 00 to 23 for every hours."

#

This is how it works, you don't get to collect all information at 00:01am of D+1 and make prediction after that. The prediction for day D+1 is needed at 11am of day D.

ornate elbow Nov 22, 2023, 12:26 PM

#

If I understand you correctly you meant that even though the forecast of day D is available on day D we can't use it since we will predict day D at day D-1 where these data will not be available yet

dense agate Nov 22, 2023, 12:28 PM

#

yes. We make prediciton for the 24 hours of day D, at 11am of day D-1

ornate elbow Nov 22, 2023, 12:29 PM

#

Thank you so much for the explanation 👍

upbeat sparrow Nov 23, 2023, 4:40 PM

#

Guys im currently looking for a team for this competition can any one pull me in? Im a undergraduate student that studies on computer science in data analyst and me myself did some predictive model and descriptive model as well.

wanton herald Nov 24, 2023, 4:27 PM

#

i took a week of break and I'm back in the comp, hard to put my head back in the notebooks!

dapper edge Nov 24, 2023, 8:07 PM

#

I havnt started yet, is it too late to get into it? 😁

wanton herald Nov 24, 2023, 11:23 PM

#

there is still plenty of time!

dense agate Nov 25, 2023, 7:27 AM

#

I took a two-week break from the other competition. It's closing soon and my LB ranking droped from like 2nd to 200th because of some public notebook. lol....

late monolith Nov 25, 2023, 8:37 AM

#

dapper edge I havnt started yet, is it too late to get into it? 😁

It's not too late.

#

and yet, help me please.

wanton herald Nov 25, 2023, 9:36 AM

#

dense agate I took a two-week break from the other competition. It's closing soon and my LB ...

which competition was it ?

#

there is really someone that published a public notebook with a top 5 score ? 🤯

dapper edge Nov 25, 2023, 11:00 AM

#

dense agate I took a two-week break from the other competition. It's closing soon and my LB ...

Kaggle really need to start doing something about this

dense agate Nov 25, 2023, 1:25 PM

#

Open problem 2

#

It was ensembles of multiple good public solutions. People published the ensembles with scores in the Gold zone, thinking it is just overfitting the public test. And other people keep blending... But now there is speculation that there could a slight chance some of those ensembles are not just overfitting

#

Now I have to work my a*s off to make sure my previous efforts are not flushed into toiletts

wanton herald Nov 25, 2023, 1:37 PM

#

is the test set big for that ones ? I saw in many competitions very funny shakeup for people blending high public scores

#

its actually very easy to overfit a testset

dense agate Nov 25, 2023, 1:38 PM

#

the dataset is quite small

#

and the distribution of train/public test/private test are very different from each other. Personally I think those blends are overfitting. but still... quite scary to see the shakeup just before the closing

wanton herald Nov 25, 2023, 1:46 PM

#

blending can be efficient when used correctly but most of people don't, and it often result in a big overfitting of the test set.
I remember on the MOA, which is the first competition i took seriously: at the end of the comp, a lot of people where publishing blending of blend and it shifted a lot the silver/bronze area.
At the end of the day, all the people that used those public notebook got catapulted to the end of the ladder ahah

signal hollow Nov 25, 2023, 5:09 PM

#

Is it ok, that score can variate with the same submissions? I suppose that reason - I got different test cases each time

#

Or it is just noise?

wanton herald Nov 25, 2023, 6:27 PM

#

If you train your model in the same notebook without fixing the random state, yes it is normal, although it should not change much. What is sent by the api is deterministic, so any variation is coming from your side

dull chasm Nov 26, 2023, 12:44 PM

#

Hi, I made a notebook so one can ise the train data through the API. So, if someone is interested:

https://www.kaggle.com/code/ginkobalboa/run-train-data-with-the-api

Run train data with the API

Explore and run machine learning code with Kaggle Notebooks | Using data from Enefit - Predict Energy Behavior of Prosumers

marble trail Nov 26, 2023, 1:01 PM

#

I kept encountering this error: ModuleNotFoundError: No module named 'enefit.competition' when running the API, is there any modules that is missing from my part? thank you

wanton herald Nov 26, 2023, 1:23 PM

#

you need to import the data from the competition, the api is part of them

#

amber oasis Nov 26, 2023, 4:17 PM

#

the distribution of train/public test/private test are very different from each other <-- Sounds bad.

amber oasis Nov 26, 2023, 4:17 PM

#

dense agate and the distribution of train/public test/private test are very different from e...

^--

#

Why would Kaggle organizers not take more care, to ensure the training and test set are from the same distribution?

#

https://www.freecodecamp.org/news/what-to-do-when-your-training-and-testing-data-come-from-different-distributions-d89674c6ecd8/

marble trail Nov 26, 2023, 6:51 PM

#

wanton herald

hi, I imported from the same folder the folder enefit as
import eneft

but recieved error msg:
ModuleNotFoundError Traceback (most recent call last)
Cell In[15], line 1
----> 1 import enefit

File ~/Dropbox/KaggleChallenges/predict-energy-behavior-of-prosumers/enefit/init.py:2
----> 2 from .competition import make_env
4 all = ['make_env']

ModuleNotFoundError: No module named 'enefit.competition'
i wonder what could cause this kind of problem? thank you 🙂

dull chasm Nov 26, 2023, 8:28 PM

#

You should have competition.cpython-310-x86_64-linux-gnu.so and __init__.py files in your enefit folder.

marble trail Nov 26, 2023, 8:55 PM

#

my folder looks like this, so the file is there..

dense agate Nov 27, 2023, 9:52 AM

#

amber oasis Why would Kaggle organizers not take more care, to ensure the training and test ...

Becasue distribution shift is a real problem and needs to be solved as part of the solution

amber oasis Nov 27, 2023, 6:50 PM

#

Alright, makes sense. Winter is different from summer for solar output. Duh.

#

Any other causes of drift?

signal hollow Nov 27, 2023, 10:25 PM

#

amber oasis Any other causes of drift?

There are many causes, just look data description in competition

marble trail Nov 28, 2023, 7:08 AM

#

marble trail my folder looks like this, so the file is there..

Pardon me the file is in the folder but I still get the ModuleNotFoundError. Any ideas how to fix it? Thank you 🙂

wanton herald Nov 28, 2023, 9:45 AM

#

you have problems of relative paths. you need to run your notebook (or your scripts) in the same folder in which you have the folder enefit
|- working_path/
|- enefit/
|- notebook.ipynb
If your notebook is not located in the right place, you'll have some errors.
Another way would be to add the folder enefit/ in your python PATH.

Ask GPT3.5 for assistance, for this type of problems it is very efficient.

ornate elbow Nov 28, 2023, 10:28 AM

#

guys what do we know about the format of the testing data that will be used to give a score for the model
let's say that i used Autoregressive Integrated Moving Average model but when they test the model they provided day by day data this will make the model useless

#

so what do we know about the mechanism of how the data will be provided during testing phase

wanton herald Nov 28, 2023, 10:34 AM

#

at each iteration you get a sample of all the dataset for a particular given date.
For example for 2023-11-28 you would get:

the rows to predict (for 2023-11-28)
the lagged target (from 2023-11-26)
the lagged client
etc...

After its your role to keep in memory those data and do whatever model you want to do with it

ornate elbow Nov 28, 2023, 10:46 AM

#

my concern is that we are working with a time series problem where historical data is crucial "e.g. the weather of one day is not informative as the weather of the whole month" but we do not know if the data format will allow us to utilize histrical data or not

wanton herald Nov 28, 2023, 10:51 AM

#

you can do everything you are doing during training as long as you organise yourself your time series.
You must keep in memory the historical values, and when a new dataset is released, have a processing that will add to the historical values the new values you received

ornate elbow Nov 28, 2023, 11:03 AM

#

that will be a bit of a challenge since we know so little about the data nature especially that when you submit a notebook you get only a score or a very general error message

wanton herald Nov 28, 2023, 11:14 AM

#

before submitting you can check what data is served. Check the basic submission notebook and try to run the cells: https://www.kaggle.com/code/sohier/enefit-basic-submission-demo

Then you can check one by one the data served by the API
for (test, revealed_targets, client, historical_weather,
forecast_weather, electricity_prices, gas_prices, sample_prediction) in iter_test:

Enefit Basic Submission Demo

Explore and run machine learning code with Kaggle Notebooks | Using data from Enefit - Predict Energy Behavior of Prosumers

marble trail Nov 28, 2023, 12:02 PM

#

Hi! I have a question regarding the API! I am using Windows so the .so file is not working for me. Is there anyone who could use the .so file on Windows with a workaround?

wanton herald Nov 28, 2023, 12:11 PM

#

why don't you use kaggle kernels directly ?

marble trail Nov 28, 2023, 12:16 PM

#

I am quite new to kaggle so I didn't know I could to be honest!

wanton herald Nov 28, 2023, 12:17 PM

#

on the top of the competition in the code section you can click on "new notebook"

marble trail Nov 28, 2023, 12:17 PM

#

Thank you so much!:)

wanton herald Nov 28, 2023, 12:18 PM

#

good luck!

marble trail Nov 28, 2023, 4:37 PM

#

wanton herald you have problems of relative paths. you need to run your notebook (or your scri...

I did in fact asked GPT4 but it did not work better than your support :). Thank you for your help. I’ll try it!

wanton herald Nov 28, 2023, 4:53 PM

#

the issue might be due to this .so file, apparently does not work on windows according to natalia

plain plume Dec 3, 2023, 11:46 AM

#

I'd like to clarify public test data period and private test data period. 2024-01-31 is final submission deadline and 2023-04-30 is competition end date. Does it mean that private leaderboard will be scored by MAE by actual data generated between 2024-02-01 and 2024-04-30? How public leaderboard is scored? Competition dataset includes target between 2021-09-01 and 2023-05-31, and env.iter_test seems to start from 2023-05-28. What is the last datetime the API will provide when I submit my code, and how will the public leaderboard be scored?

thick pecan Dec 4, 2023, 5:39 PM

#

Hello dear competition participants!
I would like to understand the data a little better and what we should achieve as an output because I am new to the competition so in short everything you need to know about the competition for more clarification. THANKS

willow swift Dec 5, 2023, 10:33 PM

#

Hello dear competition participants!
i have questions about train and other datasets concerning data_block_id
why train.csv begins with data_block_id = 0 in 2021-09-01
and the others don't begin with 0
for example in the electricity dataset we have forecast_date 2021-09-01 same as train but data_block_id = 1
Also if data_block_id = 637 in both train and electricity
we don't have the same date
in train date = 2023-05-31 and in electricity date = 2023-05-30
i need to understand to merge the data correctly

gaunt ruin Dec 6, 2023, 2:18 PM

#

Hello. I'm new to kaggle and don't know what this time-series API is. I can't install the enefit package to experiment with the provided notebook. Can someone tell me what this api does?

jade lark Dec 6, 2023, 9:11 PM

#

The main folder change ?

dense agate Dec 7, 2023, 7:04 AM

#

another data patch harold . Good news for most people, not so good news for LB top teams

ornate elbow Dec 7, 2023, 11:20 AM

#

jade lark The main folder change ?

I just checked the data and I did not find this new main folder

#

also they mentioned some edits that should be done on the data but I did not find anything new so now I do not know should keep working on the data I have or wait for some update

wanton herald Dec 7, 2023, 12:50 PM

#

dense agate another data patch<:harold:1138901472835293195> . Good news for most people, not...

were you shifting your forecast datasets ?

#

I'm circling in this competition, trying to do stuff more fancy that just "training vulgar boosting tree + feature engineering" but without much success so far

dense agate Dec 7, 2023, 1:07 PM

#

wanton herald were you shifting your forecast datasets ?

I had a processing pipeline that shift the data correctly before the patch (through reading and some trial and error). Now the patch is out, which nullify my pipeline. And I honestly don't know what changed and what not. The description for the patch is so brief and not clear, the timestamp in the updated data doesn't look like it is in the right format. And the description in the data tab is not updated... In short, I think the patch is a mess in the current form. I would rather wait for clarification before doing anything.

#

The data was correct before the patch anyway, just required people to pay attention to the description. Now after the patch, I am not sure the data is even correct.

soft frost Dec 7, 2023, 1:12 PM

#

Thankyou @lyric bough

wanton herald Dec 7, 2023, 1:13 PM

#

yeah i had a look, did not find any changes either

dense agate Dec 7, 2023, 1:16 PM

#

I didn't download the data before the patch, so I can not compare at all...lol. All I can tell is that rerunning my notebooks give me worse local CV..

wanton herald Dec 7, 2023, 1:19 PM

#

i just rerun my current wip notebook and did not see many changes in the score... But my work is chaotic

ornate elbow Dec 7, 2023, 1:24 PM

#

I remember that the first value in the column datetime in historical weather was 2021-09-01 01:00:00
now it is 2021-09-01 00:00:00, however i am not sure about that

dense agate Dec 7, 2023, 4:47 PM

#

Based on what I saw, I am expecting a patch on the recent patch shortly... not worth working on the current data. Time to take a break.

glossy bison Dec 7, 2023, 5:55 PM

#

I see in the sample submission there is three columns: row_id, data_block_id, and target. But when I look at the discussion: Enefit Basic Submission Demo There is no data_block_id column. Is there any reason for this? Just trying to understand what my output should look like and how to leverage the API. I am working locally so not sure how to use the API

thick pecan Dec 7, 2023, 8:29 PM

#

thick pecan Hello dear competition participants! I would like to understand the data a littl...

hello, @everyone

ornate elbow Dec 7, 2023, 8:40 PM

#

glossy bison I see in the sample submission there is three columns: row_id, data_block_id, an...

I think you should omit data_block_id

#

guys, the lag of historical target is two days i.e. its data_block_id should starts at value 2 on timedate 2021-09-01 00:00:00
am I right ?

dense agate Dec 8, 2023, 6:47 AM

#

thick pecan hello, @everyone

I recommend you to read through the competition page and then fork the submisssion demo. And start from there.

#

Then when you have a more specific question, the chance of getting an answer would be greater.

thick pecan Dec 8, 2023, 7:39 PM

#

dense agate Then when you have a more specific question, the chance of getting an answer wou...

wanton herald Dec 10, 2023, 2:11 PM

#

I'm really surprise by the difficulty of LGBMRegressor to infer ratios of target.
For example:
using only the lagged_target_2days as a feature gives better result than using lagged_target smoothed over 24hrs period (including hours and ts metadata as a feature to have different ratio for each hour/ each ts).
using lagged_target_7days gives better results than using lagged_target_2days + dayofweek which means the model struggle to infer the weekends ratio from dayofweek

dense agate Dec 11, 2023, 9:04 AM

#

wanton herald I'm really surprise by the difficulty of LGBMRegressor to infer ratios of target...

I think your results are in line with the results I got.

wanton herald Dec 11, 2023, 11:59 AM

#

its anoying 😅

wanton herald Dec 11, 2023, 12:58 PM

#

i was trying to do some renormalizing operations to be able to use the target value at D-2 instead of the one at D-7 for the week-periodicity cases, and while I managed to improve significantly the MAE for those cases (going from MAE 168 to 120 with D-2), I cannot beat the D-7 benchmark (MAE 95)

dense agate Dec 11, 2023, 1:28 PM

#

BTW, I think the current patch is problematic and mishandled the data....

#

you can't trust any result after the patch.

#

It's interesting that nobody has flagged it 4 days after the patch, maybe someone is capitalizing it by not revealing the issue....But that's just my opinion, what do I know.

wanton herald Dec 11, 2023, 1:36 PM

#

no idea, to be honnest I did not look to much in the details yet, there is a lot of other stuff I am trying to figure out before this and with which I struggle

#

did you manage to retrieve a similar sub by shifting the hours ?

wanton herald Dec 11, 2023, 8:52 PM

#

wanton herald i was trying to do some renormalizing operations to be able to use the target va...

I actually managed to get similar scores for D-2 / D-7 for weekly-periodic time series using a custom correction for each TS

dense agate Dec 12, 2023, 3:07 AM

#

wanton herald did you manage to retrieve a similar sub by shifting the hours ?

On the way there, still figuring out what exactly have been done wrong for this patch... I have a big picture, but didn't have the time to check some small details.

errant pagoda Dec 14, 2023, 2:55 AM

#

Hello @everyone ! I need some help with the submission. I dont understand why its giving me an error.... i looked at the output and has the same rows and cols of the naive submission file... has anyone had the same problem?

dense agate Dec 14, 2023, 6:19 AM

#

errant pagoda Hello @everyone ! I need some help with the submission. I dont understand why it...

We have all been there...

#

try to isolate the problem, e.g. is the submission in the wrong format or the data processing pipeline throws error

dense agate Dec 14, 2023, 6:59 AM

#

okay, maybe not your problem. Many other people on the forum and myself are seeing the same issue now

#

ah... Another confusing patch. I want to rant so much... If it were not for these patches, we may be seeing sub 60 scores already.

wanton herald Dec 14, 2023, 9:17 AM

#

Ah a new patch ? 😅

wanton herald Dec 14, 2023, 9:22 AM

#

errant pagoda Hello @everyone ! I need some help with the submission. I dont understand why it...

A few advices:
To make it works you need two parameters:

Make sure you make a forecast for each row proposed by the API. This can be achieved by doing a left join on the given row_id with your prediction and fillna with something.
Make sure your code don't generate an error at execution time because of an edge case forgotten (for example, if you have are using a dictionnary to lookup for some historical data, a new key at inference time might generate an error). As a starting point, you can wrap your inference cell in a try: YOUR_CODE except: SUBMIT_DUMMY_FORECAST (for example all 0).

To go further:
The best way to anticipate later issue is to use the training data to replicate the API behavior on the training dataset. This way you can make sure your inference code works on a large amount of new data and you should spot early all kind of bugs.

dense agate Dec 14, 2023, 2:23 PM

#

And it seems that many people reported (including myself) the submission is broken since the "new patch". Previously successful submissions now throw error. I will stop working on it until an official answer or until someone reported a successful submission

wanton herald Dec 14, 2023, 2:34 PM

#

understandable

#

i might submit something on my side by the end of the week. I am still far from your crossval MAE of 35, but i'm getting closer (i have MAE 40), with a very simple yet difficult to implement methodology

#

i'm curious to see how much it would score on the LB

dapper wigeon Dec 14, 2023, 7:15 PM

#

does anyone know how to make the online enefit api return "data_block_id" for test data?

wanton herald Dec 14, 2023, 7:30 PM

#

you would have to add to column by yourself. Each iteration would be a new data_block_id.

eager wren Dec 14, 2023, 9:38 PM

#

Does anyone know the extent of historical weather information the test will be running with?

dense agate Dec 15, 2023, 1:27 PM

#

Does anyone have a succesful submission lately? Kaggle stuff is not responding regarding the submission errors yet. Trying to understand what's happening

wanton herald Dec 15, 2023, 1:27 PM

#

with the new new update i got my new local CV going from 40 to 37

#

i was not using any offset before, so it looks like this new update fixed indeed the times of the weather

dense agate Dec 15, 2023, 1:31 PM

#

yep, that's expected

wanton herald Dec 15, 2023, 1:32 PM

#

did you try to try/except on your sub ?

dense agate Dec 15, 2023, 1:32 PM

#

I think the patch (when done right) will boost LB score for most people

Screen_Shot_2023-12-15_at_9.32.07_PM.png

dense agate Dec 15, 2023, 1:34 PM

#

wanton herald did you try to try/except on your sub ?

no.. I am still assuming I did something wrong (or didn't consider some edge cases), and taking the opportunity to make the script more robust.

#

but no, it didn't work. Submission fails for me

wanton herald Dec 15, 2023, 1:40 PM

#

maybe a problem of parsing in a datetime ?

#

or depending how you preprocess the forecast you might be missing a particular hour ?

wanton herald Dec 15, 2023, 1:47 PM

#

dense agate no.. I am still assuming I did something wrong (or didn't consider some edge cas...

yeah making script robust has been my biggest challenge... I spent actually most of my time on that also 😭

dense agate Dec 15, 2023, 1:53 PM

#

OMG... the host says the historical weather is in EET/EEST time...

#

I have a feeling that he is mistaken...

wanton herald Dec 15, 2023, 1:59 PM

#

try offsets and see which one gives the best score, ultimatly thats what i'll be doing i think

dense agate Dec 15, 2023, 1:59 PM

#

Okay, now I am waiting for a new new new patch to fix the submission error. and a new new new new patch to correct the time in historical weather...

wanton herald Dec 15, 2023, 2:00 PM

#

they should just make a rollback to the original dataset

dense agate Dec 15, 2023, 2:00 PM

#

wanton herald they should just make a rollback to the original dataset

on that I agree...

wanton herald Dec 15, 2023, 2:00 PM

#

many people there at the beginning of the comp but that stop looking at it later on might be penalized

#

on the other hand, thats a very good demonstration of a real world usecase where the data specs change every two weeks aha

dense agate Dec 15, 2023, 2:02 PM

#

btw, if you find a minor issue (like, inaccurate description of the data), are you allowed to keep it as a secrete and capitalize it by not flagging the issue?

wanton herald Dec 15, 2023, 2:05 PM

#

no idea, i guess you could keep it a secret even if its not exactly the philosophy

#

i personnaly like giving this kinds of hint as i'm looking for the discussion master rank and its better than sharing a notebook that everybody will fork without even looking :p

ornate elbow Dec 15, 2023, 5:37 PM

#

hello guys
Which option do you believe is better: implementing a model by yourself, or utilizing one of the implementations available in various frameworks?
I understand that there is no definitive answer to such a question, but I am curious about your opinions.

eager wren Dec 15, 2023, 7:20 PM

#

wanton herald i personnaly like giving this kinds of hint as i'm looking for the discussion ma...

As someone whos a student and new to Kaggle comps I'd love to read about what y'all have found on this so far. Still doing EDA on the data and trying to understand the model logic, lag variables, etc.

#

I understand that in the forecast weather df the cloud cover is in percentage at a certain altitude AND at the end of the hour not beginning, whereas historical weather is the total volume of the area (from a disc. post). Is this correct?

dapper edge Dec 16, 2023, 4:19 PM

#

ornate elbow hello guys Which option do you believe is better: implementing a model by yourse...

Utilizing frameworks

#

Its very rare you would reinvent the wheel when working on these types of problema for real (except in research maybe). But thats just from my experience

torn musk Dec 17, 2023, 8:35 PM

#

Hey! I recently joined the competition and was wondering if anyone tried out encoder decoder architecture. Specifically, I want to understand will we able to test using a fixed context window of encoded information before starting prediction?

dense agate Dec 18, 2023, 9:27 AM

#

This is so frustrating....

I am doing this at the end of my iter loop:

sample_prediction['target'] = 0.0
env.predict(sample_prediction)

But I am still getting the submission scroing error (submission file with incorrect format in the wrong format), but not an notebook exception error..

#

btw, does anyone know if the submission run out of memory would return an OOM error or just an submission scoring error?

wanton herald Dec 18, 2023, 9:43 AM

#

Just submission scoring error

#

I made a sub, it works well on my side

#

Did you try to check the inputs type from the date column? In my case i add issues while joining before of a datetime casting

#

If you suspect oom, you should trim your historical data to -X months to limit the size

#

Also use a notebook for training and one for inference

dense agate Dec 18, 2023, 9:52 AM

#

I tried these two things :

I explicit convert sample_prediction['target'] to float64, this once fixed my submission scoring error a long time ago, but not working since the latest patch.
I explicitly set sample_prediction['target'] = 0.0 right before env.predict(sample_prediction)

#

with the 2) I did, there should not be any reason for a "submissino scoring error"

wanton herald Dec 18, 2023, 9:53 AM

#

The datetimes have a different casting also

dense agate Dec 18, 2023, 9:53 AM

#

I would quietly debug or accept failure if the message is "Notebook threw exception"..

wanton herald Dec 18, 2023, 9:55 AM

#

For each piece of data i do:
df["datetime"] = pd.to_datetime(df["datetime"]).astype(str) to be sure they are in a uniform format

#

And for the date ones i just add .dt.date.astype(str)

#

Add also a fillna(0) at the end just to make sure you are not accidentally sending nan values

dense agate Dec 18, 2023, 10:05 AM

#

I am doing sample_prediction['target'] = 0.0, so I am just sending 0s

#

still a submissino scoring error

wanton herald Dec 18, 2023, 10:06 AM

#

You dont have any preproc in the inference loop?

dense agate Dec 18, 2023, 10:07 AM

#

I do have. But if those code cause any issue, I expect a notbook threw exception error, because those code has no effect since I have the sample_prediction['target'] = 0.0 right before calling env.predict(sample_prediction)

wanton herald Dec 18, 2023, 10:25 AM

#

so it might be an OOM ?

#

did you try to cut your datasets to limitate ram use ?

#

the oom can also happen if you are performing merging and forgot to filter some duplicates

#

i had the issue earlier in the comp, while concatenating the historical data with the new ones, i was badly filtering the duplicates, resulting in merging with double times the amount of rows

dense agate Dec 18, 2023, 12:53 PM

#

According to Kaggle's guide https://www.kaggle.com/code-competition-debugging, OOM should produce an "Notebook Exceeded Allowed Compute" error.

wanton herald Dec 18, 2023, 12:54 PM

#

ah i didnt know. If you want I can try to kill one of my sub and generate an OOM to see how looks the error

dense agate Dec 18, 2023, 12:55 PM

#

thank you for offering, but that's not neccessary.

#

still trying to isolate the problem. but with the quota of 5 submission/day, that's a lot of opportunity cost.

wanton herald Dec 18, 2023, 12:56 PM

#

did you try to simulate the API behavior ?

#

def simulate_api(block_id): sub_df = df[df.data_block_id == block_id] sub_df["prediction_datetime"] = sub_df["datetime"] sub_df = sub_df.drop(["datetime","target"],axis=1) prev_targets = df[df.data_block_id==block_id-2] subclient = clients[clients.data_block_id==block_id] subhist = historical_weather[historical_weather.data_block_id==block_id] subforecast = forecast_weather[forecast_weather.data_block_id==block_id] subgas = gas_price[gas_price.data_block_id==block_id] subelec = elec_price[elec_price.data_block_id==block_id] sample_pred = sub_df[["row_id"]] sample_pred["target"]=0 return (sub_df, prev_targets, subclient, subhist, subforecast, subelec, subgas, sample_pred)
I use something like this

dense agate Dec 18, 2023, 1:12 PM

#

Alright, I just used 1 submission and isolated the problem downto the infererence code...I left the training code intact and comment out the inference code, and the error is gone.

#

If I am lucky, I should be able to locate the issue within less than 10 submission. lol...

#

Theoratically, I should be able to identify one line of code out of 1000 lines with 10 submission. lol...

wanton herald Dec 18, 2023, 1:20 PM

#

good luck, its an art to debug :p

#

1k line is a lot, there is probably refactoring to do no ?

dense agate Dec 18, 2023, 1:26 PM

#

Luckily, I don't have 1000 lines to debug, just referring to the theoratical limit.

wanton herald Dec 18, 2023, 1:29 PM

#

on my side I have a model that perform very well on the paper but give a shitty submission and I dont find obvious leaks 😢

noble pagoda Dec 18, 2023, 7:48 PM

#

I get the error "ModuleNotFoundError: No module named 'enefit.competition'" working with a M1 macbook, I suspect because it can't use the file "competition.cpython-310-x86_64-linux-gnu.so". is there any workaround for this?

wanton herald Dec 19, 2023, 10:59 PM

#

gosh after 3 days and almost giving up, I finally found the bug that explain why my score during inference was so bad (90 of MAE while my crossval is around 37 MAE)

#

i was merging on latitudes/longitudes on the weather dataset, but because of a rounding error on lat/lon, i was not merging new data.
And as data in the api overlap with the train data, I didnt notice the new data was not correctly added to my historical data

#

that was not introducing any breaking point, just nan values in features I was not able to see, leading to incorrect predictions.
My personal API simulator, based on the training dataset, was not seeing this, because the rounding is correct there.

dense agate Dec 20, 2023, 5:21 AM

#

Screen_Shot_2023-12-20_at_1.21.28_PM.png

#

@wanton herald I should have made that a seperate post....

#

exactly what you experienced

wanton herald Dec 20, 2023, 9:03 AM

#

ah yeah i did not see that one :p

#

the worst is that i saw that there was this discrepency and i thought i had handle it already

#

anyway, i'm at 71 LB with my original method without any fine tunning or feature engineering

#

any progress in debugging on your side ?

dense agate Dec 20, 2023, 10:41 AM

#

yep... figured out what's wrong with the data

#

I am pretty sure there's some minor issue with the forecast or historical weather data (in the public test)

#

Depending on the pipeline, it may has no effect or break the code (in my case)

#

Will delay flagging that to the host a bit later, after I got the time to fix my pipeline.

#

So that I don't put myself in disavantage by wasting so much of my time on what is not supposed to be my job...

wanton herald Dec 20, 2023, 10:53 AM

#

can't you make your pipeline robust to it ?

#

another casting stuff ?

#

to avoid issues, I am now casting all columns in the format i want (string or float correctly rounded) as a preprocessing step

dense agate Dec 20, 2023, 1:01 PM

#

yes, once the issue is located it is easy to deal with.

#

It's not casting. It's something that's been flagged for the train data and fixed. Apparently not fixed for the test data.

If you don't have submission error, you are probably not affected.

wanton herald Dec 20, 2023, 1:12 PM

#

probably something I handled also, I used to have a few sub errors as well.

#

anyway now I can finally try to compete with my approach of the problem

#

this competition is extremly hard from a data processing point of view. Definitly not one I would recommand for beginners

dense agate Dec 23, 2023, 8:57 AM

#

approximately my time spent on: 10% modelling; 20% preparing data; 70% fixing bugs in preparing data harold

jovial kernel Dec 24, 2023, 9:10 AM

#

Hey guys is this a multi step forecasting or single step ?

errant pagoda Dec 24, 2023, 7:20 PM

#

dense agate It's not casting. It's something that's been flagged for the train data and fixe...

Hi 👋! I ve got an format file errot when submitting. The funny thing is that the submission.csv file doesnt have any inssue... Im doing a left join just to not miss any data point and fillna for target.... I cant work out the error...

#

Any guess?

jovial kernel Dec 24, 2023, 10:16 PM

#

Hello I'm new to this competition and this area and I want to participate in this competition mostly to learn and understand these type of projects . can someone explain the competition in simple terms ? what should we do and what type of task is this and what is the data
and what is the input and the expected output ?

Thanks

flint magnet Dec 25, 2023, 12:30 AM

#

I am new to this, and when doing my feature engineering it goes over the memory limit (ram) over 30gb. Is it possible to do this in jupyter notebooks outside of kaggle and then import it for submission or would that not work?

formal surge Dec 25, 2023, 12:39 AM

#

flint magnet I am new to this, and when doing my feature engineering it goes over the memory ...

Welcome to the world of data science! It's not uncommon to run into memory issues when working with large datasets. You can increase the memory limit of your Jupyter notebook by following these steps:

Generate a config file using the command jupyter notebook --generate-config.
Open the jupyter_notebook_config.py file located inside the jupyter folder and edit the following property: NotebookApp.max_buffer_size = your desired value.
Remember to remove the # before the property value.
Save and run the Jupyter notebook. It should now be able to utilize the set memory value.

Alternatively, you can run the notebook using the following command: jupyter notebook --NotebookApp.max_buffer_size=your_value.

Regarding your question about importing the feature-engineered data for submission, it is possible to do so. You can save the data as a .csv or .pkl file and then load it into your submission notebook using pandas.read_csv() or pandas.read_pickle(), respectively. This way, you can avoid having to re-run the feature engineering code every time you want to make a submission.

I hope this helps! Let me know if you have any other questions. 😊

Source: Conversation with Bing, 25/12/2023
(1) How to increase Jupyter notebook Memory limit? - Stack Overflow. https://stackoverflow.com/questions/57948003/how-to-increase-jupyter-notebook-memory-limit.
(2) Is there any way to increase memory assigned to jupyter notebook. https://stackoverflow.com/questions/51202801/is-there-any-way-to-increase-memory-assigned-to-jupyter-notebook.
(3) [FIXED] How to increase Jupyter notebook Memory limit?. https://www.pythonfixing.com/2022/03/fixed-how-to-increase-jupyter-notebook.html.

Stack Overflow

How to increase Jupyter notebook Memory limit?

I am using jupyter notebook with Python3 on windows 10. My computer has 8GB RAM and at least 4GB of my RAM is free.

But when I want to make a numpy ndArray with size 6000*6000 with this command:
np.

Stack Overflow

Is there any way to increase memory assigned to jupyter notebook

I am using python3.6

My jupyter notebook is crashing again and again when I try to run NUTS sampling in pymc3.

My laptop has 16gb and i7 I think it should be enough. I ran same code on 8gb and i7

indigo cedar Dec 25, 2023, 1:58 PM

#

I'm curious if it's possible and even advice to re-train your model during the public tes set and eventually the prive test set. I can imagine there is some drift and you want to re-train your model based on the data. This means the data provided need to be stored/collected along the way. Is this possible?

jovial kernel Dec 26, 2023, 10:24 PM

#

Hey guys is this a multi step forecasting or single step ?

eager wren Dec 27, 2023, 4:54 AM

#

@dense agate anyone noticed that the county lat lon data is missing some lat/long pairs in the weather DFs?

indigo cedar Dec 27, 2023, 1:27 PM

#

jovial kernel Hey guys is this a multi step forecasting or single step ?

the prediction interval is daily for every hour of the day. so daily batch forecasting

jovial kernel Dec 27, 2023, 1:29 PM

#

indigo cedar the prediction interval is daily for every hour of the day. so daily batch forec...

thanks

jovial kernel Dec 29, 2023, 8:41 PM

#

so like this :
The lagged target(and other features) as X1 and the target as Y1
X1p ==> ModelP ==> Y1p(pred) (for production )

#

X1c ==> ModelC ==> Y1c(pred) (for consumption)

#

?

sharp needle Dec 30, 2023, 8:47 AM

#

Why I Coding like this for predict in enefit is error whereas im upload sample_prediction ?

soft ingot Dec 30, 2023, 11:48 AM

#

Hi there, I am participating in this competition. Can anyone here help me out in merging the train and clients mapping datasets? Is there a significance of using data_block_id for merging? Currently I figured out to use date, county, is_business and product_type as keys for merging these two dataframes.

serene silo Jan 1, 2024, 10:19 PM

#

hi I am having problems submitting to the API. I have broken down that is fails in this line of code y_predict = gbm_upload.predict(test_sub_final.values).clip(0). the model has been instantiated before in the following format: gbm_upload = load('xxxxxxgbm_model.pkl).My notebook runs fine in Kaggle with no errors, the predictions are generated fine. Any ideas why the submission fails when calling the model? Thanks

serene silo Jan 1, 2024, 10:25 PM

#

errant pagoda Hi 👋! I ve got an format file errot when submitting. The funny thing is that th...

same for me fede, I have raised this to the organizers too but no reply. My notebook runs fine , no errors, no timeouts, and my submission format checked too. I think if the organizers have other versions of packages installed or other dependencies they should tell us otherwise it will continue failing and it does not give same chance to everyone to participate and it is not a fair competition :/ @sinful vine

dapper edge Jan 5, 2024, 8:47 PM

#

Out of curiosity, why is consumption even part of this challenge? The description only mentions the inaccuracy of energy production, not consumption.

Feels like having consumption targets adds alot of hours to this competition even though it is a "solved" challenge for Enefit.

dense agate Jan 6, 2024, 4:00 PM

#

Not sure which part of the descirption you are referring to, but here's my take. The energy that Enefit need to produce (P_enefit) need to align with the net consumption of their prosumers(C_prosumer - P_prosumer)

#

neither of C_prosumer or P_prosumer is "solved". THe inaccuracy of energy production probably refer to P_enefit, not P_prosumer

#

If (big if) anything is considered "sovled", it's probably the consumption pattern of pure consumer, not the consumption pattern of prosumer, these two are different

dull blaze Jan 7, 2024, 2:05 PM

#

dense agate Not sure which part of the descirption you are referring to, but here's my take....

but if they are a business they produce more than they consume

#

but we have to predict the amt produced as well as consumed for each prosumer(or active prosumer),right?

ornate elbow Jan 7, 2024, 3:21 PM

#

guys anyone have a clue about the period of the testing data in the API (at whihc day it starts and ends)

dapper edge Jan 8, 2024, 5:18 PM

#

Hmm yeah that makes sense tbf,that it's not necessarily production we care about but the prosumer activity as a whole

timid lodge Jan 10, 2024, 11:10 AM

#

I have recently joined the competition and can't figure out why do people in kaggle notebooks use this piece of code when joining the clients set to the training set
df_client.with_columns( (pl.col("date") + pl.duration(days=2)).cast(pl.Date) )
Why do they shift it on 2 days?

dense agate Jan 10, 2024, 8:01 PM

#

because during inference, the client data is available with a 2 day delay

cold moat Jan 10, 2024, 8:43 PM

#

Hello, I just found the discord channel. I am currently in position 27. Now, I am trying different features (like target_diff), the results locally seem to be really good but not in LB.

ornate elbow Jan 10, 2024, 10:46 PM

#

cold moat Hello, I just found the discord channel. I am currently in position 27. Now, I a...

I really wonder how representative the hidden test set is,
like do we know its size and if it cover different seasons and so on?

cold moat Jan 10, 2024, 10:51 PM

#

There was a discussion post saying it was around 90 days

#

Last training date is 2023-05-31, so the company could have up to 6 more months

ornate elbow Jan 10, 2024, 10:53 PM

#

are they from 6/2023 to 9/2023 ?

cold moat Jan 10, 2024, 10:55 PM

#

We don't know, this is the post: https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/463640

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

ornate elbow Jan 10, 2024, 10:56 PM

#

the issue is that the target distribution vary in different seasons very much so if you test your model in certain season you will get quite different results compare to if you test it in other season. i found some cases where i get lower validation loss than the training loss in the first iteration 😂

#

the reason was that i set the test set to be the last 20% of my data

cold moat Jan 10, 2024, 10:58 PM

#

Yes! In winter, production is easy to guess. Consumption is more difficult to model

ornate elbow Jan 10, 2024, 11:00 PM

#

i think the reason is that the average of the target significantly decrease which lead directly to a decrease in the mae loss despite of your model and data quality

ornate elbow Jan 11, 2024, 1:15 AM

#

the api and the whole submitting story is to me harder than the problem itself. anyways, i have a small question if my code succeeded to preprocess the example data that is revealed by the api should i expect that the code will make it when i submit my notebook or there are some problems such as some discrepancy in columns names or in data type and so on compare to the exaple they provided

cold moat Jan 11, 2024, 7:26 AM

#

API also got me several problems

#

I don't think you shouldn't find any column names discrepancies

ornate elbow Jan 11, 2024, 12:37 PM

#

to predict a point i use the previous 256 points so if i want to predict first point in the test set i will need the last 256 from the training set. my problem is that after testing the model on the first test batch there will be a gap between the training data and the second patch of testing data, this gap is the first patch of testing data so how can i use the first patch of testing data as a context to predict the second patch of testing data?

#

what i mean is that for example the testing data that cover period 6/2023 to 9/2023 should be giving as input when we predict the period 10/2023 to 1/2024 so is that the case?

cold moat Jan 11, 2024, 12:48 PM

#

yess, that's why you need to concat the new data (revealed targets, forecast weather, etc.) to the existing dataframes, so no null appear. So every time you read new data, you store it for the following test days. I don't know if you mean this

ornate elbow Jan 11, 2024, 12:55 PM

#

yes this what i meant

#

but my question is how to store it.

cold moat Jan 11, 2024, 1:02 PM

#

i am using the code someone made with polars and this problem is handled.

ornate elbow Jan 11, 2024, 1:16 PM

#

i guess that the case is as follows for the api
version1: patch1
version2: patch1 + patch2
version3: patch1 + patch2+patch3
and so on.
Then, the column currently_scored guide you to know which rows are already predicted and can be used as input and which rows are new and should be predicted in the submitted file

#

can someone please confirm if this is the case or not

mighty zenith Jan 12, 2024, 3:57 PM

#

Hello there, I'm new with the challenge procedure. I don't know which environment is better to ask my questions so I link here the interrogation that I have, this is about the dates in the example_test_files. https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/467460

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

ornate elbow Jan 12, 2024, 4:16 PM

#

if i got you right then the answer for your question lays in the column data_block_id which exist in every data frame they provided

#

by checking the data_block_id for every file you will be able to figure out the delay of each feature

mighty zenith Jan 12, 2024, 4:53 PM

#

Yes, thank you, it helps me to find the following discussion that explains clearly the time availability of the data. https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/455833

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

#

More on that, in the discussion he specifies that he is looking at the availability of the data at 11 am, is it the time at which the prediction will be made ? In fact, if we predict after 2 pm, we then have more info on the next day

ornate elbow Jan 12, 2024, 6:01 PM

#

i think the model's purpose is to predict energy consumption/production for the next day at 11 am each day, so it is not up to you to decide when to make the predictions

ornate elbow Jan 12, 2024, 7:55 PM

#

ornate elbow i guess that the case is as follows for the api version1: patch1 version2: patc...

anyone can help me with that guys?

dense agate Jan 13, 2024, 10:57 PM

#

ornate elbow i guess that the case is as follows for the api version1: patch1 version2: patc...

"the column currently_scored guide you to know which rows are already predicted and can be used as input"
I think you get the idea, but I think your statement is not quite precise.

currently_scored=False doesn't neccessarily means it's already "predicted". It just let you know which rows are being scored. For what we know now, June, July, August of 2023 is being used for public test LB now, and those period will give you currently_scored=True during the public test runs. But later during the **private **test, after the submission deadline, rows in those period will give you currently_scored=False.

ornate elbow Jan 14, 2024, 1:15 AM

#

Thanks for the clarification, maybe if i said it is the rows that you should make predictions for it, because the loss will be calculated based on these rows, for now, that will be more accurate.

#

but my main concern is about if the api is going to provide the data for the period between training data 'ends at 31/5/2023' and the rows that you should predict now

#

i.e. is the case something like this
version1: patch1
version2: patch1 + patch2
version3: patch1 + patch2+patch3

dense agate Jan 14, 2024, 4:53 AM

#

yes, the data will be provided in continuity

mighty zenith Jan 14, 2024, 10:41 AM

#

Hello, when trying to submit my notebook in kaggle, I'm struggling to overwrite the submission.csv file by doing output.to_csv('/kaggle/working/submission.csv', index=False), An error is raised saying : PermissionError: [Errno 1] Operation not permitted: '/kaggle/working/submission.csv'. Is there a trick to know how to submit ?

mighty zenith Jan 14, 2024, 10:36 PM

#

It seems that the for loop over iter_test does the job, but why does it iterate 4 times and the sample_prediction is splitted in 4 ?

tiny light Jan 14, 2024, 10:42 PM

#

has anyone tried nn's till now?
wanna know how they are working xD
trying some nn's rn

dapper edge Jan 15, 2024, 7:43 PM

#

for sure someone has tried lstm

mighty zenith Jan 16, 2024, 11:36 PM

#

Are people still struggling with the submission scoring error ? I have no NaN, all the entries are float and the scoring error created in local give me a score of 80, what happen during the scoring process that could fail ? counter = 0 for (test, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices, sample_prediction) in iter_test: value = prediction(test, revealed_targets, client, historical_weather, forecast_weather, electricity_prices, gas_prices, sample_prediction) sample_prediction['target'] = np.array(value["target"]) env.predict(sample_prediction) counter += 1 the function prediction return exactly the same dataframe that is found in the submission.csv (column and line wise)

marble trail Jan 18, 2024, 11:30 AM

#

Hi! It seems to me that the forecast date of the electricity price data we receive from the API is always behind a day compared to the other dataframes. Does anyone know why?

#

so for example the prediction_datetime column of the test dataframe is 2023.05.28 but in the same iteration the forecast date of the electricity price is 2023.05.27

mighty zenith Jan 18, 2024, 8:40 PM

#

Look at this, it explains how the data in test is given : https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/455833

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

marble trail Jan 19, 2024, 12:20 PM

#

Thank you! I have already read this and it still seems to me that the electricity price and the gas price returned by the API is wrong.😅 If the prediction date is 2023.05.28. then the forecast date for the electricity and gas should be 2023.05.28 as well and not one day earlier

ornate elbow Jan 19, 2024, 4:57 PM

#

the data will not be available at the time where you make the predictions this is why the forecast date do not match the date where you will make the predictions

#

you can look at column data_block_id which you can find in training files to understand the delay amount and when each feature will be available and after how much delay

marble trail Jan 19, 2024, 8:32 PM

#

ah okay. I got it! Thank you!:)

ornate elbow Jan 19, 2024, 9:03 PM

#

you're welcome 🥰

ornate elbow Jan 22, 2024, 3:57 PM

#

mighty zenith Are people still struggling with the submission scoring error ? I have no NaN, a...

did you figure out what was the problem? i think i am having the same problem now

mighty zenith Jan 22, 2024, 9:45 PM

#

The solution seems to be : np.clip(...,0,np.inf)

#

Do not ask me why, some magics from the authors

indigo summit Jan 24, 2024, 1:07 AM

#

ModuleNotFoundError: No module named 'holidays'
Yester day every module is fine but today when I run my script this show up, can anyone help? I cannot use polars module,too

ornate elbow Jan 24, 2024, 12:28 PM

#

the loss is now calculated on new data which is not the same data as week ago am i right?

marble trail Jan 24, 2024, 9:30 PM

#

I read somewhere that the hidden test grows over the three month but they didn't really specify. I guess there are more points in the hidden test set

#

By the way, does the scoring takes a lot of time sometimes? It runs scoring for the past hour for me, which is weird because it was faster in the morning

#

ah okay the greater the MAE the longer it takes fo the scoring to run haha

ornate elbow Jan 24, 2024, 9:43 PM

#

no, actually the hidden dataset has been updated

#

they added new test data and i think they also set the old hidden test data to have currently_scored == false

marble trail Jan 25, 2024, 9:34 PM

#

so you don't need to score on those rows which have the currently_scored == false? because it is not relevant anymore?

ornate elbow Jan 25, 2024, 10:13 PM

#

I can't really tell, each time I read what they said I understand something different

#

However I think that they added extra data for the period after the original data and the column currently scored for this new data is set == false so you have to consider this to avoid scoring error

#

And yet people submitted without accounting for this new data and got successful submtions

marble trail Jan 25, 2024, 10:27 PM

#

https://www.kaggle.com/code/gitfox/working-dummy-submission-after-2024-01-19-update/notebook

https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/469293

I have found these two links which are helpful I think, even though I am still not 100% sure I get it

Working Dummy Submission after 2024-01-19 Update

Explore and run machine learning code with Kaggle Notebooks | Using data from Enefit - Predict Energy Behavior of Prosumers

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

rustic osprey Jan 28, 2024, 12:30 PM

#

mighty zenith The solution seems to be : np.clip(...,0,np.inf)

where did u put this code

#

did u clip sample prediction's target?

mighty zenith Jan 28, 2024, 8:57 PM

#

Yes

rustic osprey Jan 29, 2024, 6:50 AM

#

didnt work for me tho

#

i think it is related to the hidden dataset update

#

https://www.kaggle.com/code/vitalykudelya/enefit-update-submission-logic

Enefit: Update Submission Logic

Explore and run machine learning code with Kaggle Notebooks | Using data from Enefit - Predict Energy Behavior of Prosumers

#

https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/470790

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

#

i guess i should sacrifice my daily submission to debug

ornate elbow Jan 30, 2024, 4:27 AM

#

guys am i the only one how is facing that problem or is column prediction_datetime column has become like this in the new data generated by the api

#

rustic osprey Jan 30, 2024, 9:27 AM

#

ornate elbow

in kaggle notebook options, change enviornment to pin to original environment

ornate elbow Jan 30, 2024, 2:02 PM

#

I think it is a problem caused by the most recent env in kaggle notebook along with other problems I noticed and i think it is also caused by the new env

#

i.e. if you make a new notebook and change enviornment to pin to original environment this will not solve the problem

dusk gorge Jan 30, 2024, 2:35 PM

#

Hi, I joined the competition a week ago. It is my first competition. Yesterday, when ready to complete my first submission, I realized that it will take some debugging (and submissions) to successfully submit. Being new to Kaggle and joining late in a competition with a complicated to debug "Submission Scoring Error" issue is not optimal. Something seems to happen when the hidden stuff is executed when running env.predict(sample_prediction) in the iter_test loop. I think it is unfortunate, that there is a 5-a-day limit for first time successful submissions late in the competition, something for you to consider. Thx.

turbid grove Jan 31, 2024, 1:28 PM

#

Hello all! I've just made the following question in the discussion section in the competition. I know it's a bit late, but any help from you would be great 🙂

https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/discussion/472267

Enefit - Predict Energy Behavior of Prosumers

Predict Prosumer Energy Patterns and Minimize Imbalance Costs.

dapper edge Feb 4, 2024, 8:47 PM

#

Is the competition still going? I thought it was supposed to end in februari

rustic osprey Feb 5, 2024, 2:43 PM

#

submission is closed and ongoing private lb

marble trail Feb 6, 2024, 12:10 PM

#

when can we see the scores of the private leaderboard?

marble trail Feb 6, 2024, 12:35 PM

#

Also, can we still modify our models or is it only for testing the existing one on a new dataset?

ornate elbow Feb 11, 2024, 9:03 PM

#

For those who is been in kaggle for long time, can we expect to see a new time series competition any time soon?

slow wigeon Feb 12, 2024, 3:52 PM

#

I made a summary video of this competition. Let me know if you find it at all useful... https://www.youtube.com/watch?v=rR9i9tO4BIQ

YouTube

Ben Lebovitz

Kaggle Enfit contest: Basics to grand master ML strategies for time...

Kaggle's Enfit competition review!

The goal of this competition was to predict energy produced and consumed by customers of a power company who have installed solar panels. Here you'll learn both fundamental and state of the art techniques for building a machine learning model for a time series problem.

Chapters:
0:00 Welcome to my channel!
...

▶ Play video