#nvidia-nemotron-model-reasoning-challenge
1 messages · Page 1 of 1 (latest)
hello!
Wow
'sup everyone!
Is there anyone from support forum here ?
Anyone mentor here, need help - I am not able to join challenge
Hello everyone! I'm a senior right now in about to get my BSCS in the U.S. and am an aspiring AI Engineer and am looking to join a team. This will be my first Kaggle challenge and so I'm very new to all of this but I have internship experience with LLMs and am currently doing undergrad research focusing on finetuning LLMs. Please DM me if you're interested in me joining your team.
Kaggle Staff and Hosts running competitions will not monitor Discord or be available to answer questions in Discord.
I am having a hard time getting Peft/LORA training working. I have described the issue here https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/682877 and made a public notebook here https://www.kaggle.com/code/antonkratz/nvidia-nemotron-finetune-ipynb
Several others report very similar issues triggering the "CUDA error: no kernel image is available for execution on the device". I believe causal_conv1d and mamba_ssm are not compiled correctly for the Kaggle competition environment. I think this should really be fixed in the base environment.
Am I holding it wrong? Is anyone actually successfully doing SFT with the provided RTX Pro 6000 environment? I find it super strange that this is not addressed.
yes tell me
Now I am in, actually issue was in my Kaggle account and it got fixed - thank you
great
There seems to be no pattern behind these types of problems in the training data:
7f5758a8,"In Alice's Wonderland, a secret set of transformation rules is applied to equations. Below are a few examples:
?!*{# = ?!{#
?'+?> = )>
{{-)@ = ?>
'?-#> = -?(
Now, determine the result for: )>+<#",'!!
beware the ", in the last line are added by the csv format
Ya removing this actually increased my score from 0.65 to 0.66
They must be some kind of pattern that we need to understand and specify while training model cuz if not and included in training it will make learning worse
Perhaps the heavy usage of special characters changed the composition of the problems somewhere in the generation pipeline such that they all came out with no underlying pattern
I did two identical submissions. One scored 0.50 and the other 0.51. A change of 0.01 does not mean anything because the LLM is not deterministic.
Right now you are just giving model the example and waiting for it to predict and well this work well in other kind of usage but in pattern recognition or maths you need to teach model approach like there are 6 type of pattern in data each having 4 to 5 sub part you need to apply all kind of to the example when it give correct output put that in question and you need to mention everything in data how you approach you can write up python script to test that and right the approach. In aimo data already has reasoning how to solve question ,whats the steps. So from same logic you need to teach it to think not to pridect
Hiya! Does anyone know how to get access to the G4 VMs to start the training on Kaggle? How is it to be accessed?
Are they to be accessed through Kaggle, or are they Accessible through GCP?
Hey @verbal jetty, not sure of we agree or disagree, I simply wanted to point out that the LLM answer is non-deterministic and it is normal to expect small fluctuations in performance. To really see the effect of small changes, it would be necessary to measure the performance many times and thus get a measure of confidence in the measurement. I do think that that the "wonderland" preamble just wastes tokens and potentially could detract the LLM. Anyway, godspeed!
is it really non-deterministic? in evaluation environment, temperature = 0 so it's interesting that the eval is non-deterministic
Alice prefix is not decoration or noise — it is literally the task identifier. It tell model to use the trained lora for reasining so do not remove it
I simply cannot get the environment to work for SFT in the official Kaggle notbooks. They claim to have now updated the environment (https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/683067#3426963) but now I run into different errors (cutlass not found). Does anyone have success with SFT in Kaggle notebooks and RTX PRO 6000 Blackwell? Or are you all using your own infrastructure?
10th place currently, I made https://nemotron.huikang.dev/, shared in https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/684212
The traces are they of the base model or your fine tuned model?
Base model
Do I understand that right, you get around 2k == 2,000 tokens / second and could go throught the entire data set in less than 6 hours?
The official Kaggle environment, even whith the recent patches, seems very broken. I am using Blackwell 6000, newest script pack etc... I get a little over 2 (two) tokens/sec.
How much performance gain can I expect after only SFT with high quality reasoning traces for 4 categories?
Try unsloth inference I am getting 100 tokens/sec
Can I use vllm with the Kaggle infrastructure?
AIMO 3 people has been using vLLM for inference
my issue with Kaggle is that I can't easily Claude Code
I was wondering how to set that up, too.
I just gave up lol, I just make my code runnable on local macbook, and I upload it manually
how do you get the g4 vms?
The Kaggle envirnment is really a blocker. Today I spent 13 GPU hours trying to get this to run. I cannot even import vllm. I get inference below 4 token/sec. I find it absolutely weird that the Kaggle team does not provide a working envirnoment. Many have put work into various workarounds... these are cool but... the speed to meaningfully compete is just not possible like that.
I was looking at https://nemotron.huikang.dev/
It seems base cannot get Equations right out-of-the-box at all; Numeral always right. The easiest gains should be in Bit manipulation and Cipher. One thing I would like to try is SFT on the bit manipulation and cipher examples only. Then on even more, synthetic bit manipulation and cipher examples. Maybe that's obvious but it sounds like a reasonable strategy to me.
Hey bro amazing work
Is it the trained model or base nemotron prediction and reasoning like this is generated by your trained model or base nemotron without any cot training
<|im_start|>system
<|im_end|>
<|im_start|>user
In Alice's Wonderland, numbers are secretly converted into a different numeral system. Some examples are given below:
12 -> XII
19 -> XIX
86 -> LXXXVI
Now, write the number 68 in the Wonderland numeral system.
Please put your final answer inside `\boxed{}`. For example: `\boxed{your answer}`<|im_end|>
<|im_start|>assistant
<think>
We have to infer the numeral system mapping from given examples. They gave:
12 -> XII
19 -> XIX
86 -> LXXXVI
We need to output 68 in that system. They seem to be using Roman numerals? Let's check:
12 in Roman is XII (X=10, I=1, I=1). Yes.
19 in Roman is XIX (X=10, I=1 before X=10? Actually 19 = XIX: X (10) + IX (9). Yes.
86 in Roman is LXXXVI (L=50, XXX=30, V=5, I=1). Yes, that's 86.
Thus the mapping is standard Roman numerals. So 68 in Roman? Let's compute: 68 = 60 (LX) + 8 (VIII). So LXVIII. Check: L=50, X=10 -> 60, then VIII=8. So LXVIII. So answer: \boxed{LXVIII}.
</think>
\(\boxed{LXVIII}\)<|im_end|>
Hi, does anyone have this error and know how to fix it? - I am working with a demo notebook on the current free Kaggle GPU T4 x2 runtime for direct model loading. Getting ERROR "ModuleNotFoundError: No module named 'cutlass'" - help me to fix it - thank you
Have you fixed the issue with "ModuleNotFoundError: No module named 'cutlass'" ?
it is the base model
I have not the solution. As it stands, I cannot train or even predict with the Kaggle infrastructure. After 30+ GPU hours, very near to giving up.
How do you feel about the “Equations” problem type? Most of the others seem easy except for bit manipulation, but I personally haven’t been able to solve a single one of the equations problems (to be clear I’m talking about the problems with the equations involving lots of special characters)
I have no idea on the pattern in equation symbolic
I was able to train using the utility scripts but since yesterday - They have stopped working . I dont what has changed - the mamba-ssm package is not being installed suddenly & I am not able to find any workaround
Yes getting upto 3000 tokens/sec
Is your score achievable on SFT only? Or is RLVR required?
probably look at my previous work and infer what I would be doing
run this in first cell
import subprocess, sys, os
from pathlib import Path
def resolve_python_path(target_dir):
for pth_file in Path(target_dir).glob("*.pth"):
with pth_file.open() as fp:
relpath = fp.read()
rel_pack_path = (pth_file.parent/relpath)
if rel_pack_path.exists():
print(f"append {rel_pack_path}")
sys.path.append(str(rel_pack_path))
offline_dir = "/kaggle/input/nvidia-nemotron-offline-packages/offline_packages"
target_dir = "/kaggle/working/packages"
os.makedirs(target_dir, exist_ok=True)
resolve_python_path("/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/")
if os.path.exists(offline_dir):
subprocess.check_call([
sys.executable, "-m", "pip", "install", "-q",
"--no-index",
"--find-links", offline_dir,
"--target", target_dir,
"datasets", "trl"
])
print("Installed from offline packages")
else:
subprocess.check_call([
sys.executable, "-m", "pip", "install", "-q",
"--target", target_dir,
"datasets", "trl"
])
print("Installed from PyPI")
# Add to Python path
sys.path.append(target_dir)
resolve_python_path(target_dir)
import datasets, trl, cutlass
print("datasets:", datasets.__version__, "trl:", trl.__version__, "cutlass:", cutlass.__version__)
with https://www.kaggle.com/datasets/dennisfong/nvidia-nemotron-offline-packages added to your session
Hello everyone, this is my first time participating in such a competition. I have some industrial and theoretical experience in deep learning and ML but not in Lora and LLM finetuning. Any tips for someone like me?
Claude browser use can upload to kaggle and review stuff by itself but yeah it's really clunky
Hello, could you please assist me in identifying the most cost-effective “Marketplace” options for creating and using a GPU VM for my project? - thank you
Anyone had any kind of success doing RL with the given GPU ?
vastai would be a nice place, but haven't tried it yet and practicing poverty by using the GPU provided by Kaggle itself.
Wow thanks! But where does this all come from? What's happening here in this code really that can't be fixed by kaggle?
Infact it doesn't work for me 😛 When I run that cell I get this a few times:
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7bbbdf796630>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/datasets/
and then finally:
ERROR: Could not find a version that satisfies the requirement datasets (from versions: none)
ERROR: No matching distribution found for datasets
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
/tmp/ipykernel_161/395518196.py in <cell line: 0>()
26 print("Installed from offline packages")
27 else:
---> 28 subprocess.check_call([
29 sys.executable, "-m", "pip", "install", "-q",
30 "--target", target_dir,
/usr/lib/python3.12/subprocess.py in check_call(*popenargs, **kwargs)
411 if cmd is None:
412 cmd = popenargs[0]
--> 413 raise CalledProcessError(retcode, cmd)
414 return 0
415
CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', 'install', '-q', '--target', '/kaggle/working/packages', 'datasets', 'trl']' returned non-zero exit status 1.
```
It looks like those retry warnings are coming because of no internet, but I can't activate internet with the RTX Pro 6000 Accelerator anyway.
Do one thing go in code section fock a notebook and run that if it run without error then write your script there and in starting cell you will se same code as mine do not remove it and if you try to make a new notebook it will not run kaggle environment is not compatible natively with rtx 6000 pro they are working on this till then you need to use those work through
Yes I see this notebook: Submission Demo with Blackwell Patch. Thanks, I now get why this is happening!
I assume things work a lot better in GPU cloud (non-Kaggle) environments?
Do you have a link to a specific notebook like this? I just tried one (didnt have exact code as yours but claimed to solve the compatibility issues) and it didn't work....
Thanks @verbal jetty it looks better but I think I'll try some GPU cloud providers. This is a real pain.
What problems are you guys talking about? the new environment works just fine. My notebook which I am running didn't need any workarounds?
True, I have tried many hours and days with utilizing the GPU provided by Kaggle, but all the time it's falling, I have decided to move to runpod.ai, in the first phase, it's good use but not free😀
I tried out a coud GPU instance (Jarvislabs) but that has another set of issues. Looks like these Nemotron models on Huggingface are a mess too. I kept getting: ImportError: cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils.import_utils' (/home/myenv/lib/python3.12/site-packages/transformers/utils/import_utils.py
Now I seem to have got a forked notebook running on kaggle. Its full of sneaky overides and version suppresions etc. that I'm trying to figure out now.
I now get why Chris Lattner says that the AI tech stack is basically a mess.
Looks like its all Triton stuff that's messing things up here.
Is bitmanuplation and symbol equation transformation even solvable through natural language??
looking for a team to join in nvidia nemotron model reseasoning challenge
Did anyone experiment with RL? I was planning to try GRPO but it seems like it is not going to be feasible with the kaggle notebook constraints and it will be too expensive to host on the cloud
So is the absolute max amount of free rtx pro 6000 hours 30 per week?
Haven't yet.
Yeah that is the limit
Cant do much with it except a couple of supervised runs
Can anyone explain to me what this error is about, and how to fix it? -
torch_dtype is deprecated! Use dtype instead!
Starting minimal model load probe...
/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/torch/compiler/init.py:148: FutureWarning: torch._dynamo.allow_in_graph is deprecated and will be removed in a future version. Use torch._dynamo.nonstrict_trace instead.
return torch._dynamo.allow_in_graph(fn)
MODEL_LOAD_FAILED
ModuleNotFoundError No module named 'cutlass'
I wrote this, would need help to check for correctness https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/687961
Training Nemotron-3-Nano-30B-A3B-BF16 with rank 32 LoRA on length 8192 sequences
https://www.kaggle.com/code/kienngx/nvidia-nemotron-training-cot-labels
I ran this notebook, but noticed that the training is happening only on a sample of 600 datapoints as compared to the original 3500 CoT + Labels, maybe because of that I got a score of 0.62 on LB. Did anyone run training on the full dataset , how many hours did it take ? It has been 12 hrs + for me and the training is yet to complete . Was there a corresponding jump in the LB score when training with the full data ?
I analysed the CoT and I'm shocked with how 0.72 score is possible with this dataset.
If you observe carefully there are no true reasoning traces for bitwise and equation transformation
Guys can someone explain what is happening instead of reacting😭
just agreeing with you. I don't think Kaggle's notebook scores are even remotely reliable, they must have used a different eval settings. and yes I can confirm that those CoTs are bad. my own methods barely got me into 0.70 range (leaderboard).
Need a team mate for this competition must be proficient in llms, fine-tuning and deep generative models
With slow generate I do not know how GRPO should be possible I only get like 2 tokens/s
His results might be less about the data and more about his LoRA settings?
supposedly this guy got GRPO working: https://www.kaggle.com/code/johnnyhyland/nvidia-nemotron-sft-grpo-colab-faster/notebook - but he didnt run it before saving so I have no insight into whether he is acheiving reasonable speeds
Can someone help me understand how prompt engineering plays a role in this competition?! The competition description mentions it a couple of times but as I see it, we submit a Lora adapter only. How does our prompt help in the submission?
We have no control over the prompt that is used at inference right?
You prompt Claude Code
You prompt the base model to explore its latent capabilities then adjust your training strategy.
Thanks! As I expected, this prompting has nothing to do with a submission per se. Was a bit confused by the description since it appear to put fine-tuning on par with prompting as a strategy.
placeholder post for my 0.85 progress prize publication https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/689915
check my bio 😁
check my bio 😁
Congratulation @fast jungle , just want to know are you utilized kaggle GPUs quotas for your submission or you have used your own for this submission, your answer will be very helpful - thank you
I have updated my write-up, I used Tinker
@fast jungle - thank you for valuable update, as I am continuing on Kaggle (free) with the Kaggle submission format and my 1 run per week, with slow iteration due to quota limitations, as I don't have other options.
I did not find the GPU version I was looking for A100 (80GB). Only got the P100 and T4 ones. How can I run my notebook?
anyone with RL understanding wanna join us
who also has credits
@fast jungle join us? We're 66/2058
i don't know RL
Does training on Kaggle RTX 6000 work for anyone at all?!?! Ive wasted a week trying to get it working and it simply doesn't work. So many folks out there struggling and posting the error messages but not a single response from the organiser adressing these issues.
Hey can I know more about your team and what you guys are doing? And yeah I have some understanding of RL, and I am going to be implementing them.
Btw congratulations @fast jungle
I'm pretty sure a lot of the people at 86% used your work+theirs to get that, they should give you credit lol.
by the math you should at least be able to train with micro-batch size of one - https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/687961
This is definitely the case. I know because I'm currently going through his post to try and understand it, lol
I wonder if he's still aiming for 87.7 accuracy
Mhm well he did was it was the tinker issues.
I'm working on DGX Spark so hopefully I can avoid that
I got some free tinker credits lying around so I'm probably gonna use that at some point
For now, unsloth + colab is doing it for me
Ah
So right now you're trying his method?
Are you adding anything yourself?
Not yet. I'm using this hackathon to learn more about finetuning. Right now, I'm still working on compiling some synthetic data
I see.
I'm saving the tinker credits for when I have a solid approach, lol
I see. Well some advice is this is a reasoning challenge. You want the model to solve the questions.
Thanks for the advice, lol
Incredible. Thank you for this insight
is that sarcastic
anyway my suggestion is that you should focus on generating high quality data first by solving the harder puzzles first.
no i didnt finish my advice
i like to talk in seperate messages
That's what I'm working on, yes
Yeah what are your methods? if you dont mind me asking
Using models that are smarter than Nemotron to give me reasoning traces for certain puzzles. Starting with equations problems atm
Keep in mind, I am a complete noob at this so my approaches are also naive. My experience is mainly in training regular deep learning models and I'm trying to translate those skills into LLM finetuning
Nemotron 3 Super, yes. But I'm working on getting a data generation pipeline using DeepSeek
Yeah I tried nemotron 3 super it wasn't very good.
Slightly better than nano but not much at these questions.
Were you able to get super running on your dgx spark?
Yeah
I installed it on mine and it came in 3 parts, lol
NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00001-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00002-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00003-of-00003.gguf
Running one of the GGUFs takes up a 100 gb of the 128 gb on my DGX Spark. This model is meant for multiple GPUs, I'm pretty sure
no it works perfectly fine
Yeah
you need to run nvfp4
but anyways trust me Super isnt the solution for this contest
I have a training pipeline set up and hopefully I can get to 90%, I think I might have a breakthrough on solving the puzzles, but it'll take time
I guess
if it works ill be sure to share in a notebook
I mean, if you achieve that, you should probably keep it to yourself until you've won, lmao
yes yes of course i meant like after i win
Or is it common for people to share their winning strat in kaggle competitions?
no no only after they win
So Tong was either being generous or he only cared about the midpoint prize, lol
Either way, respect
yeah well it says to get a prize you need to publish a notebook
but yeah
Is the Nemotron nano model that is posted on Kaggle as part of this competition any different from the one on huggingface with the same name?
Idts given that the READMEs are the exact same
Absolutely diabolical if they were different though, lol
Why the leaderboard right now just copy paste submissions of Tong's submission.
Like the top 500 is just Tong's submission being reproduced
Lol, it's just become a baseline atp
imo, if no one can top that with a better technique, tong deserves the W
It would need more and expensive computation then kaggle provide
Yeah probably
Anyway, someone just hit 0.87 on the leaderboard
Huge
Real
What is the most important for this competition model's reasoning, or model's equations?
Hi everyone
How to incrase the speed of computation.
How to decrease the evaluation time and also. what are some steps to reduce the whole training time. I am a bit beginer, I know I can increase the batch size and resuce the epochs but, I want to know if there are any other techniques to consider, specifically to lower the evaluationn time after submission because it is taking lot of time and the GPUs are not getting available to test experiments
watch me jump to 0.9
Has anyone managed to get trl and vllm running for GRPO training?
I’ve been trying to get that (also Unsloth) to work and it hasn’t really worked anywhere. Without internet on the Kaggles RTX pro 6000 that’s super challenging anyway but even the unsloth notebook for Nemotron nano doesn’t work beyond the first import on Google colab
that is what i am going to do next week once my quota resets
i dont think a lot of people have tried GRPO training because of how long it will take on a RTX 6000
it would easily take 25 hours so there is no room for error
140 hour for 1400 sample at 4048 max token and group size 2😁
Wow that’s crazy. Did you just estimate that using some tokens/sec or did you a actually train that long?
Trained for 1 hour then got those number
Kv cache us not working on rtx 6000 pro with Nemotron architecture
Ah ok that’s some 20 tokens per second I guess
Ya maybe
Perhaps we could make a GRPO team and split the training costs
Ya we are doing that only
This is with TRL library only and not with vllm right?
No vllm work perfectly I tried on 14k sample with max token 7k it has done all that in 7hour
Wow on Kaggle?! Thats amazing! I am running into quantisation problems with vllm or OOM (without quantisation)
It was without quantisation on rtx 6000 pro
Ran perfectly
With kv cache
what score did you get
So this was with trust_remote_code=False ?
I didn't run it was too much
For vllm it was true if you set false lora will have many missing module
Has anyone experimented with enable_thinking=True in the chat template during SFT training? Curious if it makes a difference vs manually wrapping reasoning in <think> tags in the assistant content.
i am doing that right now
model is 7 hours in training
Here's another reasoning challenge. Spot the error (if any):
nemotron-3-nano-30B-A3B-BF16requirestransformers>=5.3.0trlrequiresvllm <=0.18.0vllm<=0.18.0requirestransformers<5.0
--> It is not possible to trainnemotron-3-nano-30B-A3B-BF16via GRPO withtrlandvllm.
whats you highest score so far?
@verbal jetty did this all work on RTX 6000 on Kaggle? Was it in collocate mode or server mode? ( up to now I have only tried collocate since its a single GPU....)
Yaa all work on rtx 6000 pro
Did your training complete, how is the score
i had a memory leak and my BM solver was stripping the beginning 0 so the score ended up at 0.56
I predict that once i fix these two minor issues my score should reach 0.9 with ease.
I cant even install vllm and use it on the RTX. How did you do that?
If I import vllm I get RuntimeErrror cant find libcudart.so.13 ...
fock this notebook and run
"""Metric for NVIDIA (129716)."""
import subprocess
import sys
# Set up environment
commands = [
"uv pip uninstall torch torchvision torchaudio",
"tar -cf - -C /kaggle/usr/lib/notebooks/metric/nvidia_metric_utility_script . | tar -xf - -C /tmp",
"chmod +x /tmp/triton/backends/nvidia/bin/ptxas",
"chmod +x /tmp/triton/backends/nvidia/bin/ptxas-blackwell",
]
if RUN_EVALUATION:
for cmd in commands:
print(f"Running: {cmd}")
subprocess.run(cmd, shell=True, check=True)
sys.path.insert(0, "/tmp")
and then you will be able to run use vllm and just remove other cell according to your need
Ahh very clever! SInce the metric uses vllm!!! Thanks, I will try this
😋😋
It works! Thanks @verbal jetty
Ok I spoke too soon. Vllm works but not with trl. You used trl @verbal jetty ?
I have a question I am participating in NVIDIA nemotron reasoning challenge, can we use dataset that is publically available generated CoT traces on the given dataset for this competition. The dataset is public generated by @fast jungle
you are free to use whatever in https://www.kaggle.com/datasets/huikang/huikang-nemotron-repository-snapshot, I do not restrict
Thank you @fast jungle
No just normal vllm it worked with both see the notebook it has use vllm just twic the code
Check my bio 😁
I sent friend request, pls check
Hey everyone looking to team up for the Nemotron Challenge. We’re currently two members, including a professor with strong expertise in NLP and dozens of publications in this area.
My background is in LLM pre-training, post-training, and data curation.
Constraint: I’m currently limited to Colab, so I can’t realistically run serious experiments on a 30B model. We’re looking to collaborate with someone who has reliable GPU access. Me and the professor will take full ownership of research direction, training strategy, data curation, pipeline design, and post-training.
If you’ve got compute and want a serious, execution-focused team, DM me.
Hi I am bit new to this kind of LLM competition. Can some one kindly help me to start the work?
Streamline to the above question can I use any LLM model ?
@fast jungle - What reasoning format are you using for equation puzzles? That category seems hardest to crack, any comments or suggestions?
What you see in my writeup is what I have, I haven’t worked on this competition since
Thanks @fast jungle, your notebook has been a goldmine. Totally understand, equation puzzles feel like a rabbit hole. We noticed the model seems to pick up the other 5 categories well from your traces, so maybe the juice isn't worth the squeeze there. One thing I'm experimenting with is repeating your corpus 3x in the index to get ~734 training steps instead of 244. Curious if you ever tried multi-epoch training on the same traces, and whether it helped or hurt in your experiments?
the winning submission repeated the data for some problems
ideally you should make your own synthesized problems and train on those instead, you can generate as many syntheized problems as you want
hey, your solution and writeup are awesome. can i use your codes you published in your github repo (https://github.com/tonghuikang/nemotron) ? and congratulations on winning the open progress prize 🙂
you are free to use my code, which is why they are published
thank you!
Hi, I just joined the competition. Could anyone kindly tell if we can use the provided NVidia GPU in the training process? Thank you so much!
Yes you can use the RTX Pro 6000 for training.
Can anyone shed light on this whole trust_remote_code= True/False thing? When does it important / not?
How can be reduce training time. In my case it was not end within 12 hours 🙂
what is your define values for - LORA_RANK, and LORA_ALPHA if you have defined
LoRA r=32, alpha=64
Anyone struggling in Consistency and want to learn together.
DM me.
If you are using Kaggle GPU RTX for reasoning, then use alpha also 32 as well
hii guys! is it too late to join this competition??
Deadline for this one is Jun 9, you have approx 13 days 👀
if true the nvidia implementation is used, if false the transformers impl is used (only if version >= 5.3.0);
and for >=5.3.0 setting it true will cause GRPO KV Cache anomalies mentioned in discussion 690161
So we should use False? When would one ever want to use True?!
🚀 Hey @everyone!
We’re building PromptGram and aiming to reach 150 GitHub stars ⭐
If you like AI, FastAPI, microservices, or developer tools, please check it out and support the project by starring the repo 🙌
GitHub Repo: https://github.com/dewangsahuji/promptgram
Every star really helps and motivates us to improve the project further 💙
Guys when I load the model I get
SafetensorError: Error while serializing: I/O error: No space left on device (os error 28)
any fixes for this ?
I was using
model_path = path
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
offload_folder = "/kaggle/working/offload_folder"
)