#Feature Neutralization Performance lower than Hello Numerai
1 messages · Page 1 of 1 (latest)
I am not sure that it is consistently worse. I have Kaggle versions of tutorial notebooks symbolically staked, and 1Y return of feature neutralization (https://numer.ai/jos_kaggle_medium_fn) is better than Hello Numerai. And, Hello Numerai is worst in CORR (maybe MMC score is helped by its uniqueness: few people stake simple LGBM model based on small feature set).
Real advantage comes when you combine both techniques FN and TE, like in https://numer.ai/jos_kaggle_sunshine. There 1Y return is well above average and ranks 230 of 4039 - not bad at all...
I was looking at the tutorial model 1Y scores on the benchmark models account:
https://numer.ai/~benchmark_models
Did you train with the shallow parameters or deep parameters in the tutorial notebooks?
You can check for yourself. Each source code is public notebook referenced in model page e.g. https://www.kaggle.com/code/svendaj/numerai-feature-neutralization. I would say that these are shallow parameters:
model = lgb.LGBMRegressor(
n_estimators=2000,
learning_rate=0.01,
max_depth=5,
num_leaves=2 ** 5,
colsample_bytree=0.1,
verbosity=-1,
num_threads=4
)
But there might be more differences. Example model says that it is "submitting since late 2023", my models are retrained with every new data available (you can see it on notebook version history), also my submitted models are trained on medium feature set (except https://numer.ai/jos_kaggle_hello)
Yeah those are shallow, also I think the tutorial notebooks have a num_leaves of 31 rather than 32
I think retraining on new data could be the difference, but I am pretty sure the tutorial models are also trained on the medium feature set