#ai-mathematical-olympiad-progress-prize-2

1 messages · Page 1 of 1 (latest)

floral inlet
inland wedge
#

Welcome to the 2nd AIMO progress prize. We have launched today!
I'm Frieder, the AIMO Prize Manager, and I'm here to answer questions (although I'll more often be found on the Kaggle forum - post there if you really need an answer quickly 😉).

Problems are harder this year (national Olympiad level) and AI-resistant (your favourite open-weight LLM will score close to zero).
The prize pot has doubled to two million dollars, and you are allowed to use the latest open-weight models.

Surprise us with some cool new math-AI models, and climb up the Kaggle leaderboard! 🪜

lavish bear
#

SO EXCITED

shut mica
#

I am getting

GatewayRuntimeError: (<GatewayRuntimeErrorType.SERVER_RAISED_EXCEPTION: 3>, "name 'submission' is not defined"), is anyone else getting the same error?

outer sonnet
sick grotto
#

it is indeed decoder-only and autoregressive.

outer sonnet
#

Thank you very much

#

I've been able to roughly reproduce text generation by taking last logits of every iteration to predict next token (since mean over all of them was not working at all, I think because it is causal and each token doesn't sees any token after him, not sure about that). I've obtained a quite similar text, but not the same. Some one knows where I can find the model.generate.forward implementation or knows how it proceeds?

acoustic rampart
#

Is there any guide on how to get started with the solution?

acoustic rampart
sick grotto
#

As for finetuning models in general, there are many guides on the internet. It is important you find good math datasets. Take a look at solutions for the previous edition of this contest for some inspiration. I’m sure there’s also datasets mentioned in the discussion.

acoustic rampart
sick grotto
#

Search huggingface, deepseek 7b math

#

More explanation is given there

#

It’s also available on Kaggle by just searching it in the models tab. since then other good math models have also been released, such as Qwen2.5 Math series

acoustic rampart
sick grotto
#

Its a freely available model

#

What do you mean costly?

acoustic rampart
sick grotto
#

The model can be downloaded

north island
sick grotto
#

And can be used in kaggle by loading it with code in a notebook and running it

#

I recommend looking at other people’s public notebooks to see how they did that.

outer sonnet
modest sun
#

@inland wedge Is it possible to relax a rule about Googlers potentially receiving a prize?

inland wedge
#

@modest sun Googlers, as well as any other large lab, are very welcome to participate as long as they follow our rules, in particular or transparency requirements (meaning: all LLMs have to be open weight, you have to document your training process, etc.). 😉

modest sun
# inland wedge <@596901610937122826> Googlers, as well as any other large lab, are very welcome...

It's about this part of the rules:
I found out that Kaggle is Alphabet's subsidiary
B. Unless otherwise stated in the Specific Competition Rules above or prohibited by internal policies of the Competition Entities, employees, interns, contractors, officers and directors of Competition Entities may enter and participate in the Competition, but are not eligible to win any Prizes. Individuals or entities who were engaged, employed or contracted by the Competition Sponsor or its affiliates to advise on the Competition are prohibited from entering the Competition. "Competition Entities" means the Competition Sponsor, Kaggle Inc., and their respective parent companies, subsidiaries and affiliates. If you are such a participant from a Competition Entity, you are subject to all applicable internal policies of your employer with respect to your participation.

inland wedge
#

Ok, I see what the issue is, let me get back to you on that

maiden flicker
#

Hi !
So we are back! It's been a couple of days, and the competition is already eating my Kaggle GPU quota and my colab compute units 😅 , and I love it! 😎
Good luck everyone!

dense vale
tight cobalt
#

And Keras is a layer built on top of TensorFlow. btw. Think of it as boilerplate code that interfaces with it, (built by
François Cholletv for use with TensorFlow for AI tasks). You using it, you are using Tensorflow, in a nutshell.

outer sonnet
#

SpinDoctorWalker is right, HugginFace usually gives both choices. But in this specific model looks like only torch is available. I'm sure it can be translated, but it won't be a trivial task.

tight cobalt
#

@dense vale The only things you can't use are things such as chatGPT, or Gemini, or other LLMs u would access online

dense vale
tight cobalt
#

also, kinda expected you can import either in all kaggle competitions ;

dense vale
tight cobalt
#

had to check it @dense vale . A definitive no, i think, because this requires you to access a resource that is online. And one of the requirements of the competition is no external internet access. So you can make of all the APIs you want, just not ones that interact with an external source, when a potential user interacts with it.

#

and it doesn't even count as a freely available source of information either (which is allowed in competition)

dense vale
tight cobalt
dense vale
#

but still kaggle notebook how the code can be executed without internet ?

tight cobalt
tight cobalt
dense vale
inland wedge
#

To clarify: Your model has to be open-weight (that you will probably download from somewhere), and it then needs on the Kaggle container without internet, within the given GPU budget. For AIMO1, a number of top-scoring teams actually spent significant amounts of time with engineering tricks, to get big LLMs to run on the given compute budget. That is one way to approach this competition, but not the only way ...

dense vale
thorn yacht
#

Hi, can someone suggest a nice dataset that I could use for becnhmark or fine-tuning. is the dataset from the first competition available?

analog sparrow
#

The MCTS comp is eating up all my GPU quota haha, probably will only join after it ends 🤣

inland wedge
shut mica
dense vale
acoustic rampart
dense vale
#

anyone familiar with LEAN a programming language used by deepmind's ultra model claiming good score at IMO problems ?

storm atlas
#

is using two separate models or two lora adapters on the top of the same model in line with the rules ?

solemn nymph
#

could anyone explain how do i resolve this error? Thank you

/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:230: UserWarning:
NVIDIA L4 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_60 sm_70 sm_75 compute_70 compute_75.
If you want to use the NVIDIA L4 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(

this error

storm atlas
storm atlas
hoary granite
solemn nymph
#

@acoustic rampart @storm atlas Here is the screenshot of the code that lead to this output

I went to the competition page, opened a new notebook from there and ran this code block, and then I get this error.

shut mica
#

probably need a different Pytorch installation? (no idea haven't done this)

solemn nymph
#

I tried it, when I make my own notebooks I get this error, even tried changing PyTorch and cuda versions

#

I also tried by copying someone else’s notebook from the competition and edited it; then it worked properly

modest sun
#

@inland wedge Another gentle ping about Googlers participation 😄

inland wedge
#

@modest sun Please give me a bit of time - the competition will run long enough, so there is no need to rush. We have an internal process, and I'm working through it, but there are several parties involved whose feedback I still need to collect before we can arrive at a decision.

modest sun
#

Ah yes, of course, sorry for that, I won't ping you anymore

inland wedge
#

It may indeed look like I've forgotten, since I haven't posted, but behind the scenes we're pretty active 🙂

#

I'll be sure to let you know when a decision has been reached

acoustic rampart
inner warren
#

hello, i'm new here, i get one team

shut mica
#

lol i saw this ad (censored identifying information, I am not promoting this)

analog sparrow
#

Is this Numina ? Lol

#

But honestly it looks like a scam

shut mica
#

idk maybe if your annotations are bad you don't earn money

#

but educational to learn what are they annotating

tawdry vale
thin vector
#

Hello, I've just joined, are we allowed to install/use extra libraries in our notebook? I noticed that "Internet access disabled" is mentioned in the code requirements and I can't seem to pip install extra dependencies on my copy of the Demo submission

outer sonnet
#

upload them as datasets and install from them

shut mica
#

There is this utility scripts feature but I havent researched how to use it

tight cobalt
fair shoal
#

Hi guys, I'm new to the competition and I'm having a great deal of trouble with vllm. Currently, I am installing vllm with the Kaggle package manager, but I always get errors when importing vllm, or when trying to start an LLM API server with vllm. The error is usually that vllm tried to call some pytorch or torchvision method that does not exist, but the exact error depends on the exact versions of pytorch, torchvision and vllm that are installed, I've tried dozens of configurations now an am still having no luck. Any suggestions to get vllm working would be greatly appreciated

minor inlet
# fair shoal Hi guys, I'm new to the competition and I'm having a great deal of trouble with ...

Hey!! I encountered the same issue with vllm on Kaggle, and after some searching in the forums, I found that it's a common problem due to version mismatches between libraries used to run vllm.

Here’s the solution!! Coming from @pseudo remnant
Just add a Utility Script that fixes these issues. Go to your Kaggle notebook, select Add Input > Utility Script, and search for "vLLM Installation Fix" or abdullahmeda/vllm-installation-fix. Add it to your notebook, and after that, you should be able to import vllm without issues. Easy fix!

If you’re new to adding inputs or need help with it, feel free to ask!

fair shoal
north wyvern
#

guys, im looking for a team for this one, im into math and i need some exp in ml

#

and want to chill on call while working on it

lime raptor
shut mica
dense vale
#

anyone in fimo ??

austere pecan
#

Hi,
Is there a channel to discuss and understand solutions of the problems?

I tried solving the "3 Airline companies" Problem by hand. In my solution, I found the greatest consecutive days could be 99 days. I checked the solution to find the correct answer is: 79 days.
For my solution, the A120 flight departs along with first A100 flight. I have attached the image to this thread.

Could someone comment: is my solution a valid alternate solution or am I violating some pre-requisite condition?

shut mica
#

The question text here

Three airline companies operate flights from Dodola island. Each company has a different schedule of departures. The first company departs every 100 days, the second every 120 days and the third every 150 days. What is the greatest positive integer d for which it is true that there will be d consecutive days without a flight from Dodola island, regardless of the departure times of the various airlines?

If I find a configuration of departure times where the the maximum interval is less than 80, the answer cannot be 80 or above. The "original solution" has shown such a configuration.

#

The question asks for a minimum possible "x" over all possible configurations. Given a configuration, "x" is the maximum interval.

The question did NOT ask for a maximum possible "x" over all possible configurations.

stiff hill
#

that airline question is interesting, because I looked at it and guessed the answer was 79 or 80 (didn't have time to reason which) in < 10 seconds without doing any calculation. So it's not true that these new problems in progress prize 2 are all very difficult to guess

outer sonnet
#

They're exactly 1/100 hard to guess

shut mica
#

20/50 gets the prize. So close!

lethal magnet
#

Hi, I'm new in kaggle, what might be the reason of I can't submit?

runic path
#

Click save version first

#

Save/run all

dim jetty
#

any one tried to use RAG?

wispy grail
#

hi

pseudo willow
tight cobalt
hot yoke
#

Is it possible to see the running time of submission?

#

as in this contest it might be pretty important

shut mica
worthy kraken
#

Hi,

I wanted to discuss the professional aspect of fine-tuning approaches.

To clarify, I have no issue with teams fine-tuning models to achieve higher scores. However, it seems that very few participants actually have access to sufficiently powerful GPUs for such tasks.

This brings me to wonder:
What are the actual limitations of fine-tuning these models?

Apologies if I’ve missed any prior discussions on this topic. If there are relevant discussions, please feel free to point me in their direction.

lime raptor
#

hello @everyone I am little bit confused about dataset and other files that are showing in side bar, i want to know how start with this problem, where can i find dataset, here are two one is reference showing only test showing 3-5 problem, test set showing i think approximately 10 problem and one pdf contain 10 problem so can anyone guide me about how to start working on this from where can i get dataset and more. kindly help me thanks.

minor inlet
#

Anyone still interested in teaming up?

lime raptor
tardy dragon
fluid maple
lime raptor
cursive wolf
#

Is internet allowed during submission?

minor inlet
#

No It is not

limpid ether
#

Can anyone explain the message "Your submission file must be named submission.parquet" shown when you try to submit your notebook? I don't see anything about saving a parquet file in the provided example submission notebook.

cursive wolf
#

is gemma 2 allowed?

green flint
#

Would anybody be willing to teamup for the AI Mathemtical Olympiad competition? If so, respond

shut mica
#

I want to write a Kaggle post detailing the public information (i.e. where they worked, what did they recently published) of the team members of NemoSkills and PolyMath, I wonder if it is appropriate

fluid maple
olive lark
#

Hey I am begginer here.i don't know anything about kaggle.can anyone explain me the link between ai and mathematics is the exam over

fresh prawnBOT
#
onebelowall1218 has been warned

Reason: Bad word usage

#
onebelowall1218 has been banned

Reason: Too many infractions

frank wagon
#

is anybody seeing this?

#

@shut mica I would like your thoughts on this

shut mica
#

Well they are from Nvidia they have GPUs

frank wagon
grand palm
#

Anyboyd from NVIDIA want to team up with a poor man and bring him to his first gold medal ?

shell cloak
#

hi guys my first ever submission on kaggle, is this normal running time ?

tardy dragon
#

Yes, it takes almost 5 hrs

sharp loom
#

what do you think would happen if the private test set has USAMO level problems. Would NemoSkills model hold or would the leaderboard be reversed??

dense vale
#

Why no one tried rStar approach yet ?

#

Is symbolic reasoning(Sympy) explicitly needed in rStar maths method ?

mortal vault
dense vale
mortal vault
#

i tried smaller and smaller models, decreasing the amount of exploration in the params, didn't really make it close to being feasible

dense vale
mortal vault
dense vale
mortal vault
dense vale
mortal vault
#

i think i tried deepseek 1.5B, qwen 1.5B and 0.5B

dense vale
#

i was thinking to try 7Bs 😅

mortal vault
#

yeah my reasoning for only trying those models was that if i couldn't get the rollouts to finish in time using the small models, no way i could with the larger models

dense vale
mortal vault
#

honestly, unfortunately almost any type of method that requires even a little bit of finetuning is out of reach for anyone without access to external GPU compute

dense vale
#

having problem loading it smh or im not getting how to use it with transformers

mortal vault
dense vale
mortal vault
#

i have no idea really, most of my work has been unsubmitted. i just submitted the early submission awq qwen 32B model for benchmark and left it at that

#

would've liked to have done more with this competition, but too busy this month unfortunately

steady stirrup
#

but

#

apparently they are pretty unstable

steady stirrup
#

notebooks

#

I am in 2 other comps concurrently

#

hence I sheldom have gpu remaining at the end of the week

#
  • 4xL4 worker groups consume twice as much quota(iirc)
dense vale
dense vale
steady stirrup
steady stirrup
mortal vault
#

jane street i barely gave any time to too

lost seal
mortal vault
lost seal
#

also can you just train on whitelisted models and that would be cosidered okay to use right

lost seal
#

im not sure how the entire evaluation part works

#

but isnt that why smaller models are getting better scores than bigger ones like 32b

#

since theyre answering mmore questions

mortal vault
#

i didn't finish generating the stepwise training data required for the SFT for rstar math bruh 💀

lost seal
#

what

#

i dont think you have to train and submit in the same notebook

#

also rstar then sft wth

#

sft is would just bad for this

mortal vault
#

sft is part of rstar...

#

please look at the steps to implement rstar

#

i think a lot of people misunderstand how rstar works because of the recent RL hype

lost seal
#

oh like coldstart training

mortal vault
#

not really no

mortal vault
dense vale
steady stirrup
#

@mortal vault

mortal vault
#

oh dear

#

um

#

so i meant rstar math when i was saying rstar

steady stirrup
#

nvm

mortal vault
steady stirrup
#

I get that I am an idiot, why are you making fun of me?

stiff pelican
#

Is it too late to join competition?

dense vale
#

Each time in a featured competition top teams in lb we see newbies

#

Who 've never been into any previous competition

stiff pelican
#

Oh, that's cool

#

Thanks!

slow sonnet
#

I'm new to this competition. I'm a bit confused with whitelisting of models. Do we need to use them as they are for the competition or can we finetune them with any public data offline, use the finetuned models for competition and publish them later?

steady stirrup
dense vale
steady stirrup
dense vale
alpine coral
#

anyone having issues saving and running notebooks? when i do that, not quick save, my notebook get queued for more than an hour. never seems to start...

alpine coral
#

Yeah , never experienced it before on kaggle

dense vale
alpine coral
#

it hasnt gotten to the stage where the model is loaded yet, it hasnt even started running the notebook. the activity log says its queued

#

and its just the normal 32B preview in 4bit awq

steady stirrup
#

it might be because of high demand of l4 worker groups

#

im not sure tho

fresh prawnBOT
#
ren_truecon has been warned

Reason: Bad word usage

#
ren_truecon has been banned

Reason: Too many infractions

stiff pelican
#

I have a question. When you guys train an LLM model, how do you do it? Training the entire model takes a very long time. My model is deepseek-r1-distill-qwen-7b-awq-casperhansen which is in public notebook.

#

To many time, i made dataset with 1/10. However, still long time with l4 x 4 gpus.

dense vale
#

If yes also try gradient accumulation

#

If yes also try instruction tuned slms around 135-200M ones

stiff pelican
#

Nope, i try my own code to train.

#

Then, i try to change my code to using Lora or Quantization...

#

I'm not very experienced with LLMs, so there might be some mistakes in my approach.
Theoretically, I've read the R1 paper and implemented the RL method from that paper in my own way, and currently, I'm training the model based on it. I'm wondering if applying LoRA would be an appropriate approach in this case. Could you advise me on this?

dense vale
stiff pelican
cursive wolf
#

Hey I have a question...so after my GRPO run I saved lora weights and did inference using vllm + lora weights...is this why the model was not as good...should i have saved the entire model instead?

stiff pelican
stiff pelican
distant fulcrum
#

excuse me, is fine-tuned version of models released after October 1, 2024 allowed in this contest?

#

I just noticed this in the rule.

shrewd hemlock
past grotto
#

Hi everyone, our team is new and would appreciate any guidance. Our submissions consistently fail with an "unhandled error at runtime," but we notice the Logs tab always shows "successfully ran in x seconds." We're curious if the run in the Logs is separate from the actual scoring run. Is it common for the notebook to run twice? Thanks in advance!

frank eagle
#

Just a quick question, if we submitted our notebook at the last minute before the deadline, and it was running the pubilc test, will the result still count?

dense vale
#

Nope no point better luck/try next time

warm fossil
#

Is it possible to modify the notebook selection at this point?

warm fossil
#

Kaggle auto-selected two submissions. We would like to replace one of the auto-selected submissions with a more recent notebook that has the same public score, if it is possible.

dense vale
#

If it's already over ofc not possible

#

But maybe as practice competition u might submit

alpine coral
#

will there be a aimo - progress prize 3?

dense vale
#

Next year

frank eagle
#

Will you publish the hidden dataset at some point,
or provide some ways for us to test our model for research?

distant fulcrum
dense vale
#

Yes in between