#ai-mathematical-olympiad-progress-prize-2
1 messages · Page 1 of 1 (latest)
Welcome to the 2nd AIMO progress prize. We have launched today!
I'm Frieder, the AIMO Prize Manager, and I'm here to answer questions (although I'll more often be found on the Kaggle forum - post there if you really need an answer quickly 😉).
Problems are harder this year (national Olympiad level) and AI-resistant (your favourite open-weight LLM will score close to zero).
The prize pot has doubled to two million dollars, and you are allowed to use the latest open-weight models.
Surprise us with some cool new math-AI models, and climb up the Kaggle leaderboard! 🪜
SO EXCITED
I am getting
GatewayRuntimeError: (<GatewayRuntimeErrorType.SERVER_RAISED_EXCEPTION: 3>, "name 'submission' is not defined"), is anyone else getting the same error?
I'm newbe in LLM and sorry if is a too newbe question but... is https://huggingface.co/AI-MO/NuminaMath-7B-TIR a "decoder-only" and "autoregressive" model? I'm tryng to replicate the text generation given by predefined pipe to understand the process and I'm struggling a bit
It's a finetune of a deepseek math model.
it is indeed decoder-only and autoregressive.
Thank you very much
I've been able to roughly reproduce text generation by taking last logits of every iteration to predict next token (since mean over all of them was not working at all, I think because it is causal and each token doesn't sees any token after him, not sure about that). I've obtained a quite similar text, but not the same. Some one knows where I can find the model.generate.forward implementation or knows how it proceeds?
Is there any guide on how to get started with the solution?
So you’re saying the DeepSeek math model is free to use? Am I correct?
How can someone use and customize it for this competition? Is there a guide available for fine-tuning it for these mathematical problems?
Yeah. The competition allows the use of open-source models.
As for finetuning models in general, there are many guides on the internet. It is important you find good math datasets. Take a look at solutions for the previous edition of this contest for some inspiration. I’m sure there’s also datasets mentioned in the discussion.
I would greatly appreciate your help; I can't find the model or the guide on how to do this.
Search huggingface, deepseek 7b math
More explanation is given there
It’s also available on Kaggle by just searching it in the models tab. since then other good math models have also been released, such as Qwen2.5 Math series
@sick grotto ,The model seems costly. How can I access it for free on Kaggle?
I have no clue what you mean
Its a freely available model
What do you mean costly?
How would you run the model in Kaggle? we need APIs for that r?but It's showing prices on their site.
You can’t make API requests when making a submission to this contest, so you don’t have to worry about that.
The model can be downloaded
yes
And can be used in kaggle by loading it with code in a notebook and running it
I recommend looking at other people’s public notebooks to see how they did that.
thanks @sick grotto
Perhaps it does Beam Search?
@inland wedge Is it possible to relax a rule about Googlers potentially receiving a prize?
@modest sun Googlers, as well as any other large lab, are very welcome to participate as long as they follow our rules, in particular or transparency requirements (meaning: all LLMs have to be open weight, you have to document your training process, etc.). 😉
It's about this part of the rules:
I found out that Kaggle is Alphabet's subsidiary
B. Unless otherwise stated in the Specific Competition Rules above or prohibited by internal policies of the Competition Entities, employees, interns, contractors, officers and directors of Competition Entities may enter and participate in the Competition, but are not eligible to win any Prizes. Individuals or entities who were engaged, employed or contracted by the Competition Sponsor or its affiliates to advise on the Competition are prohibited from entering the Competition. "Competition Entities" means the Competition Sponsor, Kaggle Inc., and their respective parent companies, subsidiaries and affiliates. If you are such a participant from a Competition Entity, you are subject to all applicable internal policies of your employer with respect to your participation.
Ok, I see what the issue is, let me get back to you on that
Hi !
So we are back! It's been a couple of days, and the competition is already eating my Kaggle GPU quota and my colab compute units 😅 , and I love it! 😎
Good luck everyone!
excuse me kinda even newbie doubt but do we only ve option to use it via torch not tensorflow or kerasnlp or spaCy ? (cause i ve never tried pytorch yet)
u should be able to use them all via import statements in your python code. And scikit, numpy, etc. as well. If it is a freely available and widely used python API, you should be good. I'd personally go with TensorFlow over Pytorch, but that's just me.
And Keras is a layer built on top of TensorFlow. btw. Think of it as boilerplate code that interfaces with it, (built by
François Cholletv for use with TensorFlow for AI tasks). You using it, you are using Tensorflow, in a nutshell.
SpinDoctorWalker is right, HugginFace usually gives both choices. But in this specific model looks like only torch is available. I'm sure it can be translated, but it won't be a trivial task.
@dense vale The only things you can't use are things such as chatGPT, or Gemini, or other LLMs u would access online
u simply meant i will need to be good at pytorch first ??
u can go with either framework (keras or pytorch). They are both for general AI exploration.
also, kinda expected you can import either in all kaggle competitions ;
sir i meant with regard to the tuned deepseek model (numina one) can we access it with frameworks other than pytorch ?
had to check it @dense vale . A definitive no, i think, because this requires you to access a resource that is online. And one of the requirements of the competition is no external internet access. So you can make of all the APIs you want, just not ones that interact with an external source, when a potential user interacts with it.
and it doesn't even count as a freely available source of information either (which is allowed in competition)
btw i m confused how will we run the kaggle notebook(s) without external internet ?
In your example, DeepSeek is an external service, so to speak. And they are telling you they don't want that as part of the solution they are seeking. Make sense?
oh ok we can use the downloaded models though right ?
but still kaggle notebook how the code can be executed without internet ?
yup. and i quote from competition page: "Freely & publicly available external data is allowed, including pre-trained models "
It's not that u don't use the internet, or using the cloud, when submitting your algo, and running it. It's more about saying you develop a program that can run offline, once done. A program/ AI that can reason mathematically: of what use is an internet connection. It can run offline. Ideally.
ok i get it now 😅 (but i m kinda new to llms just learnt the basics of nlp with spaCy could u recommend me what should i learn now kinda confused)
To clarify: Your model has to be open-weight (that you will probably download from somewhere), and it then needs on the Kaggle container without internet, within the given GPU budget. For AIMO1, a number of top-scoring teams actually spent significant amounts of time with engineering tricks, to get big LLMs to run on the given compute budget. That is one way to approach this competition, but not the only way ...
Thanks sir but can you please also elaborate other ways ?
Hi, can someone suggest a nice dataset that I could use for becnhmark or fine-tuning. is the dataset from the first competition available?
Yep same error
The MCTS comp is eating up all my GPU quota haha, probably will only join after it ends 🤣
@dense vale It is all written in detail in the competition rules, in particular check out the section about using tools (https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/rules)
I published a notebook with the highest public score of 5 https://www.kaggle.com/code/huikang/qwen2-5-math-72b-instruct-with-tir
does it perform well on aime 2024 problems ?
hi ,fine turn the model on large datasets from huggingface and other data source , it might help in improving the score
anyone familiar with LEAN a programming language used by deepmind's ultra model claiming good score at IMO problems ?
is using two separate models or two lora adapters on the top of the same model in line with the rules ?
could anyone explain how do i resolve this error? Thank you
/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:230: UserWarning:
NVIDIA L4 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_60 sm_70 sm_75 compute_70 compute_75.
If you want to use the NVIDIA L4 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(
this error
afaik, it's for theorem proving
share your code, do you use vllm ?
yeah it's the mathematicians' favorite proof assistant, this book teaches it
https://hrmacbeth.github.io/math2001/
coq is also quite famous due to these nice books
https://softwarefoundations.cis.upenn.edu/
Thanks alot for sharing these
share your code
@acoustic rampart @storm atlas Here is the screenshot of the code that lead to this output
I went to the competition page, opened a new notebook from there and ran this code block, and then I get this error.
probably need a different Pytorch installation? (no idea haven't done this)
I tried it, when I make my own notebooks I get this error, even tried changing PyTorch and cuda versions
I also tried by copying someone else’s notebook from the competition and edited it; then it worked properly
@inland wedge Another gentle ping about Googlers participation 😄
@modest sun Please give me a bit of time - the competition will run long enough, so there is no need to rush. We have an internal process, and I'm working through it, but there are several parties involved whose feedback I still need to collect before we can arrive at a decision.
Ah yes, of course, sorry for that, I won't ping you anymore
It may indeed look like I've forgotten, since I haven't posted, but behind the scenes we're pretty active 🙂
I'll be sure to let you know when a decision has been reached
Check your CUDA and PyTorch versions—they need to be compatible.
hello, i'm new here, i get one team
lol i saw this ad (censored identifying information, I am not promoting this)
idk maybe if your annotations are bad you don't earn money
but educational to learn what are they annotating
@modest sun You should have recieved an email from Maggie explaining Google's policies around this. I'll send you an internal gchat to continue this rather than talking on discord.
Hello, I've just joined, are we allowed to install/use extra libraries in our notebook? I noticed that "Internet access disabled" is mentioned in the code requirements and I can't seem to pip install extra dependencies on my copy of the Demo submission
upload them as datasets and install from them
There is this utility scripts feature but I havent researched how to use it
This limitation does seem a bit ridiculous
Hi guys, I'm new to the competition and I'm having a great deal of trouble with vllm. Currently, I am installing vllm with the Kaggle package manager, but I always get errors when importing vllm, or when trying to start an LLM API server with vllm. The error is usually that vllm tried to call some pytorch or torchvision method that does not exist, but the exact error depends on the exact versions of pytorch, torchvision and vllm that are installed, I've tried dozens of configurations now an am still having no luck. Any suggestions to get vllm working would be greatly appreciated
Hey!! I encountered the same issue with vllm on Kaggle, and after some searching in the forums, I found that it's a common problem due to version mismatches between libraries used to run vllm.
Here’s the solution!! Coming from @pseudo remnant
Just add a Utility Script that fixes these issues. Go to your Kaggle notebook, select Add Input > Utility Script, and search for "vLLM Installation Fix" or abdullahmeda/vllm-installation-fix. Add it to your notebook, and after that, you should be able to import vllm without issues. Easy fix!
If you’re new to adding inputs or need help with it, feel free to ask!
Thank you so much, I wasn't expecting such a quick and easy fix
guys, im looking for a team for this one, im into math and i need some exp in ml
and want to chill on call while working on it
I'm doing BS IT with expertise and interest in machine learning previously I have joined two competition, if you would like to connect with me then dm
wrote something https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/546772
anyone in fimo ??
Hi,
Is there a channel to discuss and understand solutions of the problems?
I tried solving the "3 Airline companies" Problem by hand. In my solution, I found the greatest consecutive days could be 99 days. I checked the solution to find the correct answer is: 79 days.
For my solution, the A120 flight departs along with first A100 flight. I have attached the image to this thread.
Could someone comment: is my solution a valid alternate solution or am I violating some pre-requisite condition?
The question text here
Three airline companies operate flights from Dodola island. Each company has a different schedule of departures. The first company departs every 100 days, the second every 120 days and the third every 150 days. What is the greatest positive integer d for which it is true that there will be d consecutive days without a flight from Dodola island, regardless of the departure times of the various airlines?
If I find a configuration of departure times where the the maximum interval is less than 80, the answer cannot be 80 or above. The "original solution" has shown such a configuration.
The question asks for a minimum possible "x" over all possible configurations. Given a configuration, "x" is the maximum interval.
The question did NOT ask for a maximum possible "x" over all possible configurations.
that airline question is interesting, because I looked at it and guessed the answer was 79 or 80 (didn't have time to reason which) in < 10 seconds without doing any calculation. So it's not true that these new problems in progress prize 2 are all very difficult to guess
They're exactly 1/100 hard to guess
20/50 gets the prize. So close!
Hi, I'm new in kaggle, what might be the reason of I can't submit?
any one tried to use RAG?
hi
Can someone clarify if it is ok to scrape data from https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions and use it for fine-tuning and validation? Are there any license issues?
I don't think you can scrape data directly, during run of program. You can scrape though and create a result of that scrape saved as data to your Kaggle account (aka, an extarnal model)
Is it possible to see the running time of submission?
as in this contest it might be pretty important
I investigate this question in https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/549184
Hi,
I wanted to discuss the professional aspect of fine-tuning approaches.
To clarify, I have no issue with teams fine-tuning models to achieve higher scores. However, it seems that very few participants actually have access to sufficiently powerful GPUs for such tasks.
This brings me to wonder:
What are the actual limitations of fine-tuning these models?
Apologies if I’ve missed any prior discussions on this topic. If there are relevant discussions, please feel free to point me in their direction.
hello @everyone I am little bit confused about dataset and other files that are showing in side bar, i want to know how start with this problem, where can i find dataset, here are two one is reference showing only test showing 3-5 problem, test set showing i think approximately 10 problem and one pdf contain 10 problem so can anyone guide me about how to start working on this from where can i get dataset and more. kindly help me thanks.
Anyone still interested in teaming up?
yes I am
Is there a vacancy?
I have the same issue!!! Can someone help??
I think use prompts
Is internet allowed during submission?
No It is not
Can anyone explain the message "Your submission file must be named submission.parquet" shown when you try to submit your notebook? I don't see anything about saving a parquet file in the provided example submission notebook.
is gemma 2 allowed?
Would anybody be willing to teamup for the AI Mathemtical Olympiad competition? If so, respond
I'm willing to team up
I want to write a Kaggle post detailing the public information (i.e. where they worked, what did they recently published) of the team members of NemoSkills and PolyMath, I wonder if it is appropriate
Can you please elaborate more on how to use prompts! Thank you in advance
Hey I am begginer here.i don't know anything about kaggle.can anyone explain me the link between ai and mathematics is the exam over
Reason: Bad word usage
Reason: Too many infractions
Well they are from Nvidia they have GPUs
Anyboyd from NVIDIA want to team up with a poor man and bring him to his first gold medal ?
hi guys my first ever submission on kaggle, is this normal running time ?
Yes, it takes almost 5 hrs
😆 that is very normal. The first time I ran a heavy model it took me by surprise too....
what do you think would happen if the private test set has USAMO level problems. Would NemoSkills model hold or would the leaderboard be reversed??
Why no one tried rStar approach yet ?
Is symbolic reasoning(Sympy) explicitly needed in rStar maths method ?
tried it but don't have the kind of compute available to actually do it well unfortunately 😔
what do u mean ? does it require that much compute even for a single run ?
yeah if you see in the repo the first step is to generate a bunch of step wise training data using rollouts. i was running it using the 4xL4 setup and timed out after 5-6 hours and only got to like 5 rollouts
i tried smaller and smaller models, decreasing the amount of exploration in the params, didn't really make it close to being feasible
wdym by 4xL4 setup btw 🤔 😅
um the 4 L4s provided for this competition?
what L4s u mean ?
these??
oh u meant that 😅
what models did u try btw ?
i think i tried deepseek 1.5B, qwen 1.5B and 0.5B
oh did u try phi4 ?
i was thinking to try 7Bs 😅
yeah my reasoning for only trying those models was that if i couldn't get the rollouts to finish in time using the small models, no way i could with the larger models
make sense so does that mean rStar isnt feasible at all for this 😞
honestly, unfortunately almost any type of method that requires even a little bit of finetuning is out of reach for anyone without access to external GPU compute
sensible 😔
anyways did u try mathstral ?
having problem loading it smh or im not getting how to use it with transformers
nope i haven't
u r at 1334th in lb btw ?
i have no idea really, most of my work has been unsubmitted. i just submitted the early submission awq qwen 32B model for benchmark and left it at that
would've liked to have done more with this competition, but too busy this month unfortunately
with jane street one ?
Our team did not look into the problem yet, but the public nbs give a pretty good score tbh
but
apparently they are pretty unstable
nbs ???
notebooks
I am in 2 other comps concurrently
hence I sheldom have gpu remaining at the end of the week
- 4xL4 worker groups consume twice as much quota(iirc)
||can u teach me rStar method sir||
||anyways lemme get u dqd using it as evidence that u use alt accounts n instead of the person u replied is evident that it was ur alt acc||
😂
😂
- I am extremely busy with non AI stuff, which I prob should be doing more than AI rn 😂
🛐
nah college stuff
jane street i barely gave any time to too
how much score did this get
i didn't submit it because i didn't have compute to actually finish the whole process
also can you just train on whitelisted models and that would be cosidered okay to use right
it scores whatever it has answered right?
im not sure how the entire evaluation part works
but isnt that why smaller models are getting better scores than bigger ones like 32b
since theyre answering mmore questions
i didn't finish generating the stepwise training data required for the SFT for rstar math bruh 💀
what
i dont think you have to train and submit in the same notebook
also rstar then sft wth
sft is would just bad for this
sft is part of rstar...
please look at the steps to implement rstar
i think a lot of people misunderstand how rstar works because of the recent RL hype
oh like coldstart training
not really no
posted some of my unexplored ideas here
https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/567419
Is sft really part of it ?
Oh right thnx
I did not read the paper, but in their abstract they specifically mentioned without finetuning right?
@mortal vault
just check out the repo yall https://github.com/microsoft/rStar
and read through this discussion https://github.com/microsoft/rStar/issues/6 it helped me understand how the thing works
thanks
I get that I am an idiot, why are you making fun of me?
Is it too late to join competition?
Its never too late to join
Each time in a featured competition top teams in lb we see newbies
Who 've never been into any previous competition
I'm new to this competition. I'm a bit confused with whitelisting of models. Do we need to use them as they are for the competition or can we finetune them with any public data offline, use the finetuned models for competition and publish them later?
@steady stirrup
im not sure, im not much into this comp, but I believe you can finetune
But that whitelisting thing ??
read the discussion page once
I've read but still in doubt do we 've to request for whitelisting our fine tuned models to use them ?
anyone having issues saving and running notebooks? when i do that, not quick save, my notebook get queued for more than an hour. never seems to start...
That's not normal ig
Yeah , never experienced it before on kaggle
Could be a possibility u r trying to load a larger llm consuming lot of vram than usual
it hasnt gotten to the stage where the model is loaded yet, it hasnt even started running the notebook. the activity log says its queued
and its just the normal 32B preview in 4bit awq
It happened to me once
it might be because of high demand of l4 worker groups
im not sure tho
Reason: Bad word usage
Reason: Too many infractions
Thanks.
Thnx alot sir
I have a question. When you guys train an LLM model, how do you do it? Training the entire model takes a very long time. My model is deepseek-r1-distill-qwen-7b-awq-casperhansen which is in public notebook.
To many time, i made dataset with 1/10. However, still long time with l4 x 4 gpus.
Don't u use bitsandbytes ?
If yes also try gradient accumulation
If yes also try instruction tuned slms around 135-200M ones
Nope, i try my own code to train.
Then, i try to change my code to using Lora or Quantization...
I'm not very experienced with LLMs, so there might be some mistakes in my approach.
Theoretically, I've read the R1 paper and implemented the RL method from that paper in my own way, and currently, I'm training the model based on it. I'm wondering if applying LoRA would be an appropriate approach in this case. Could you advise me on this?
Most ppl said they were too expensive so I'm not considering I'm considering anyways ReFt mostly though yet learning it firstly not yet tried but it seems good enough
Thanks for the suggestion! Actually, I haven't looked deeply into ReFt yet, but based on your advice, it seems like a promising approach. I'll check it out and see if it fits better than LoRA for my use case. If you happen to try it before me, let me know how it goes!
Hey I have a question...so after my GRPO run I saved lora weights and did inference using vllm + lora weights...is this why the model was not as good...should i have saved the entire model instead?
From what I've seen in the discussion, there seems to be a tendency for trained models to perform worse. Please check the following link:
https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/568061
thank you this is very hepful
But know it is changed...
https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/discussion/568509#3151143
oh lord
excuse me, is fine-tuned version of models released after October 1, 2024 allowed in this contest?
I just noticed this in the rule.
Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful https://medium.com/pythoneers/machine-m. D
Hi everyone, our team is new and would appreciate any guidance. Our submissions consistently fail with an "unhandled error at runtime," but we notice the Logs tab always shows "successfully ran in x seconds." We're curious if the run in the Logs is separate from the actual scoring run. Is it common for the notebook to run twice? Thanks in advance!
Just a quick question, if we submitted our notebook at the last minute before the deadline, and it was running the pubilc test, will the result still count?
Nope no point better luck/try next time
Is it possible to modify the notebook selection at this point?
Wdym ?
Kaggle auto-selected two submissions. We would like to replace one of the auto-selected submissions with a more recent notebook that has the same public score, if it is possible.
If it's already over ofc not possible
But maybe as practice competition u might submit
will there be a aimo - progress prize 3?
Next year
Will you publish the hidden dataset at some point,
or provide some ways for us to test our model for research?
What's the level of the third contest?National team selection or IMO?
Yes in between