sullen shale Mar 18, 2026, 8:02 AM

#

Hi

static remnant Mar 18, 2026, 9:17 AM

#

hello!

verbal jetty Mar 18, 2026, 10:41 AM

#

Wow

plucky thicket Mar 18, 2026, 10:47 AM

#

'sup everyone!

sullen shale Mar 18, 2026, 11:45 AM

#

Is there anyone from support forum here ?

sullen shale Mar 18, 2026, 4:04 PM

#

Anyone mentor here, need help - I am not able to join challenge

dire gulch Mar 18, 2026, 8:40 PM

#

Hello everyone! I'm a senior right now in about to get my BSCS in the U.S. and am an aspiring AI Engineer and am looking to join a team. This will be my first Kaggle challenge and so I'm very new to all of this but I have internship experience with LLMs and am currently doing undergrad research focusing on finetuning LLMs. Please DM me if you're interested in me joining your team.

plucky thicket Mar 19, 2026, 12:47 AM

#

sullen shale Is there anyone from support forum here ?

Kaggle Staff and Hosts running competitions will not monitor Discord or be available to answer questions in Discord.

plucky thicket Mar 19, 2026, 9:05 AM

#

I am having a hard time getting Peft/LORA training working. I have described the issue here https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/682877 and made a public notebook here https://www.kaggle.com/code/antonkratz/nvidia-nemotron-finetune-ipynb
Several others report very similar issues triggering the "CUDA error: no kernel image is available for execution on the device". I believe causal_conv1d and mamba_ssm are not compiled correctly for the Kaggle competition environment. I think this should really be fixed in the base environment.
Am I holding it wrong? Is anyone actually successfully doing SFT with the provided RTX Pro 6000 environment? I find it super strange that this is not addressed.

slender musk Mar 20, 2026, 3:36 AM

#

sullen shale Anyone mentor here, need help - I am not able to join challenge

yes tell me

sullen shale Mar 20, 2026, 10:32 PM

#

slender musk yes tell me

Now I am in, actually issue was in my Kaggle account and it got fixed - thank you

slender musk Mar 21, 2026, 6:25 AM

#

sullen shale Now I am in, actually issue was in my Kaggle account and it got fixed - thank yo...

great

uneven kestrel Mar 22, 2026, 12:06 AM

#

There seems to be no pattern behind these types of problems in the training data:

7f5758a8,"In Alice's Wonderland, a secret set of transformation rules is applied to equations. Below are a few examples:
?!*{# = ?!{#
?'+?> = )>
{{-)@ = ?>
'?-#> = -?(
Now, determine the result for: )>+<#",'!!

beware the ", in the last line are added by the csv format

verbal jetty Mar 22, 2026, 9:53 AM

#

uneven kestrel There seems to be no pattern behind these types of problems in the training data...

Ya removing this actually increased my score from 0.65 to 0.66

#

They must be some kind of pattern that we need to understand and specify while training model cuz if not and included in training it will make learning worse

uneven kestrel Mar 22, 2026, 5:31 PM

#

Perhaps the heavy usage of special characters changed the composition of the problems somewhere in the generation pipeline such that they all came out with no underlying pattern

plucky thicket Mar 22, 2026, 11:30 PM

#

verbal jetty Ya removing this actually increased my score from 0.65 to 0.66

I did two identical submissions. One scored 0.50 and the other 0.51. A change of 0.01 does not mean anything because the LLM is not deterministic.

verbal jetty Mar 23, 2026, 3:34 AM

#

plucky thicket I did two **identical** submissions. One scored 0.50 and the other 0.51. A chang...

Right now you are just giving model the example and waiting for it to predict and well this work well in other kind of usage but in pattern recognition or maths you need to teach model approach like there are 6 type of pattern in data each having 4 to 5 sub part you need to apply all kind of to the example when it give correct output put that in question and you need to mention everything in data how you approach you can write up python script to test that and right the approach. In aimo data already has reasoning how to solve question ,whats the steps. So from same logic you need to teach it to think not to pridect

leaden basin Mar 23, 2026, 4:28 AM

#

Hiya! Does anyone know how to get access to the G4 VMs to start the training on Kaggle? How is it to be accessed?

#

Are they to be accessed through Kaggle, or are they Accessible through GCP?

plucky thicket Mar 23, 2026, 5:51 AM

#

verbal jetty Right now you are just giving model the example and waiting for it to predict an...

Hey @verbal jetty, not sure of we agree or disagree, I simply wanted to point out that the LLM answer is non-deterministic and it is normal to expect small fluctuations in performance. To really see the effect of small changes, it would be necessary to measure the performance many times and thus get a measure of confidence in the measurement. I do think that that the "wonderland" preamble just wastes tokens and potentially could detract the LLM. Anyway, godspeed!

split venture Mar 23, 2026, 6:33 AM

#

is it really non-deterministic? in evaluation environment, temperature = 0 so it's interesting that the eval is non-deterministic

verbal jetty Mar 23, 2026, 6:52 AM

#

plucky thicket Hey <@1146018775578193992>, not sure of we agree or disagree, I simply wanted to...

Alice prefix is not decoration or noise — it is literally the task identifier. It tell model to use the trained lora for reasining so do not remove it

plucky thicket Mar 23, 2026, 10:55 PM

#

I simply cannot get the environment to work for SFT in the official Kaggle notbooks. They claim to have now updated the environment (https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/683067#3426963) but now I run into different errors (cutlass not found). Does anyone have success with SFT in Kaggle notebooks and RTX PRO 6000 Blackwell? Or are you all using your own infrastructure?

fast jungle Mar 24, 2026, 3:53 AM

#

10th place currently, I made https://nemotron.huikang.dev/, shared in https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/684212

cedar sentinel Mar 24, 2026, 4:50 AM

#

fast jungle 10th place currently, I made https://nemotron.huikang.dev/, shared in https://ww...

The traces are they of the base model or your fine tuned model?

fast jungle Mar 24, 2026, 5:56 AM

#

Base model

plucky thicket Mar 24, 2026, 8:35 AM

#

fast jungle 10th place currently, I made https://nemotron.huikang.dev/, shared in https://ww...

Do I understand that right, you get around 2k == 2,000 tokens / second and could go throught the entire data set in less than 6 hours?

#

The official Kaggle environment, even whith the recent patches, seems very broken. I am using Blackwell 6000, newest script pack etc... I get a little over 2 (two) tokens/sec.

cedar sentinel Mar 24, 2026, 8:54 AM

#

How much performance gain can I expect after only SFT with high quality reasoning traces for 4 categories?

cedar sentinel Mar 24, 2026, 8:54 AM

#

plucky thicket Do I understand that right, you get around 2k == 2,000 tokens / second and could...

Try unsloth inference I am getting 100 tokens/sec

plucky thicket Mar 24, 2026, 10:52 AM

#

Can I use vllm with the Kaggle infrastructure?

fast jungle Mar 24, 2026, 11:06 AM

#

AIMO 3 people has been using vLLM for inference

#

my issue with Kaggle is that I can't easily Claude Code

plucky thicket Mar 24, 2026, 11:12 AM

#

fast jungle my issue with Kaggle is that I can't easily Claude Code

I was wondering how to set that up, too.

fast jungle Mar 24, 2026, 11:16 AM

#

I just gave up lol, I just make my code runnable on local macbook, and I upload it manually

leaden basin Mar 24, 2026, 11:18 AM

#

how do you get the g4 vms?

plucky thicket Mar 24, 2026, 11:20 AM

#

The Kaggle envirnment is really a blocker. Today I spent 13 GPU hours trying to get this to run. I cannot even import vllm. I get inference below 4 token/sec. I find it absolutely weird that the Kaggle team does not provide a working envirnoment. Many have put work into various workarounds... these are cool but... the speed to meaningfully compete is just not possible like that.

#

I was looking at https://nemotron.huikang.dev/
It seems base cannot get Equations right out-of-the-box at all; Numeral always right. The easiest gains should be in Bit manipulation and Cipher. One thing I would like to try is SFT on the bit manipulation and cipher examples only. Then on even more, synthetic bit manipulation and cipher examples. Maybe that's obvious but it sounds like a reasonable strategy to me.

verbal jetty Mar 24, 2026, 5:35 PM

#

fast jungle 10th place currently, I made https://nemotron.huikang.dev/, shared in https://ww...

Hey bro amazing work
Is it the trained model or base nemotron prediction and reasoning like this is generated by your trained model or base nemotron without any cot training

<|im_start|>system
<|im_end|>
<|im_start|>user
In Alice's Wonderland, numbers are secretly converted into a different numeral system. Some examples are given below:
12 -> XII
19 -> XIX
86 -> LXXXVI
Now, write the number 68 in the Wonderland numeral system.
Please put your final answer inside `\boxed{}`. For example: `\boxed{your answer}`<|im_end|>
<|im_start|>assistant
<think>
We have to infer the numeral system mapping from given examples. They gave:

12 -> XII
19 -> XIX
86 -> LXXXVI

We need to output 68 in that system. They seem to be using Roman numerals? Let's check:

12 in Roman is XII (X=10, I=1, I=1). Yes.

19 in Roman is XIX (X=10, I=1 before X=10? Actually 19 = XIX: X (10) + IX (9). Yes.

86 in Roman is LXXXVI (L=50, XXX=30, V=5, I=1). Yes, that's 86.

Thus the mapping is standard Roman numerals. So 68 in Roman? Let's compute: 68 = 60 (LX) + 8 (VIII). So LXVIII. Check: L=50, X=10 -> 60, then VIII=8. So LXVIII. So answer: \boxed{LXVIII}.
</think>
\(\boxed{LXVIII}\)<|im_end|>

sullen shale Mar 24, 2026, 9:49 PM

#

Hi, does anyone have this error and know how to fix it? - I am working with a demo notebook on the current free Kaggle GPU T4 x2 runtime for direct model loading. Getting ERROR "ModuleNotFoundError: No module named 'cutlass'" - help me to fix it - thank you

sullen shale Mar 24, 2026, 9:51 PM

#

plucky thicket I simply cannot get the environment to work for SFT in the official Kaggle notbo...

Have you fixed the issue with "ModuleNotFoundError: No module named 'cutlass'" ?

fast jungle Mar 24, 2026, 11:01 PM

#

verbal jetty Hey bro amazing work Is it the trained model or base nemotron prediction and rea...

it is the base model

plucky thicket Mar 24, 2026, 11:24 PM

#

sullen shale Have you fixed the issue with "ModuleNotFoundError: No module named 'cutlass'" ?

I have not the solution. As it stands, I cannot train or even predict with the Kaggle infrastructure. After 30+ GPU hours, very near to giving up.

uneven kestrel Mar 24, 2026, 11:41 PM

#

fast jungle it is the base model

How do you feel about the “Equations” problem type? Most of the others seem easy except for bit manipulation, but I personally haven’t been able to solve a single one of the equations problems (to be clear I’m talking about the problems with the equations involving lots of special characters)

fast jungle Mar 25, 2026, 1:15 AM

#

I have no idea on the pattern in equation symbolic

distant goblet Mar 25, 2026, 5:48 AM

#

plucky thicket I have not the solution. As it stands, I cannot train or even predict with the K...

I was able to train using the utility scripts but since yesterday - They have stopped working . I dont what has changed - the mamba-ssm package is not being installed suddenly & I am not able to find any workaround

cedar sentinel Mar 25, 2026, 6:41 AM

#

plucky thicket Can I use vllm with the Kaggle infrastructure?

Yes getting upto 3000 tokens/sec

cedar sentinel Mar 25, 2026, 6:42 AM

#

fast jungle I just gave up lol, I just make my code runnable on local macbook, and I upload ...

Is your score achievable on SFT only? Or is RLVR required?

fast jungle Mar 25, 2026, 11:30 AM

#

probably look at my previous work and infer what I would be doing

verbal jetty Mar 25, 2026, 2:38 PM

#

sullen shale Have you fixed the issue with "ModuleNotFoundError: No module named 'cutlass'" ?

run this in first cell

import subprocess, sys, os
from pathlib import Path
def resolve_python_path(target_dir):
    for pth_file in Path(target_dir).glob("*.pth"):
        with pth_file.open() as fp:
            relpath = fp.read()
            rel_pack_path = (pth_file.parent/relpath)
            if rel_pack_path.exists():
                print(f"append {rel_pack_path}")
                sys.path.append(str(rel_pack_path))



offline_dir = "/kaggle/input/nvidia-nemotron-offline-packages/offline_packages"
target_dir = "/kaggle/working/packages"

os.makedirs(target_dir, exist_ok=True)
resolve_python_path("/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/")

if os.path.exists(offline_dir):
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-q",
        "--no-index",
        "--find-links", offline_dir,
        "--target", target_dir,
        "datasets", "trl"
    ])
    print("Installed from offline packages")
else:
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-q",
        "--target", target_dir,
        "datasets", "trl"
    ])
    print("Installed from PyPI")

# Add to Python path
sys.path.append(target_dir)
resolve_python_path(target_dir)

import datasets, trl, cutlass
print("datasets:", datasets.__version__, "trl:", trl.__version__, "cutlass:", cutlass.__version__)

with https://www.kaggle.com/datasets/dennisfong/nvidia-nemotron-offline-packages added to your session

final tusk Mar 28, 2026, 10:36 AM

#

Hello everyone, this is my first time participating in such a competition. I have some industrial and theoretical experience in deep learning and ML but not in Lora and LLM finetuning. Any tips for someone like me?

dry mulch Mar 28, 2026, 10:40 AM

#

fast jungle I just gave up lol, I just make my code runnable on local macbook, and I upload ...

Claude browser use can upload to kaggle and review stuff by itself but yeah it's really clunky

sullen shale Mar 29, 2026, 9:17 PM

#

Hello, could you please assist me in identifying the most cost-effective “Marketplace” options for creating and using a GPU VM for my project? - thank you

deep tree Mar 31, 2026, 4:35 AM

#

Anyone had any kind of success doing RL with the given GPU ?

deep tree Mar 31, 2026, 5:32 AM

#

sullen shale Hello, could you please assist me in identifying the most cost-effective “Market...

vastai would be a nice place, but haven't tried it yet and practicing poverty by using the GPU provided by Kaggle itself.

cerulean gulch Mar 31, 2026, 12:45 PM

#

verbal jetty run this in first cell ``` import subprocess, sys, os from pathlib import Path d...

Wow thanks! But where does this all come from? What's happening here in this code really that can't be fixed by kaggle?

#

Infact it doesn't work for me 😛 When I run that cell I get this a few times:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7bbbdf796630>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/datasets/

and then finally:

ERROR: Could not find a version that satisfies the requirement datasets (from versions: none)
ERROR: No matching distribution found for datasets
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/tmp/ipykernel_161/395518196.py in <cell line: 0>()
     26     print("Installed from offline packages")
     27 else:
---> 28     subprocess.check_call([
     29         sys.executable, "-m", "pip", "install", "-q",
     30         "--target", target_dir,

/usr/lib/python3.12/subprocess.py in check_call(*popenargs, **kwargs)
    411         if cmd is None:
    412             cmd = popenargs[0]
--> 413         raise CalledProcessError(retcode, cmd)
    414     return 0
    415 

CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', 'install', '-q', '--target', '/kaggle/working/packages', 'datasets', 'trl']' returned non-zero exit status 1.
```
It looks like those retry warnings are coming because of no internet, but I can't activate internet with the RTX Pro 6000 Accelerator anyway.

verbal jetty Mar 31, 2026, 1:44 PM

#

cerulean gulch Infact it doesn't work for me 😛 When I run that cell I get this a few times: ``...

Do one thing go in code section fock a notebook and run that if it run without error then write your script there and in starting cell you will se same code as mine do not remove it and if you try to make a new notebook it will not run kaggle environment is not compatible natively with rtx 6000 pro they are working on this till then you need to use those work through

cerulean gulch Mar 31, 2026, 1:46 PM

#

Yes I see this notebook: Submission Demo with Blackwell Patch. Thanks, I now get why this is happening!

#

I assume things work a lot better in GPU cloud (non-Kaggle) environments?

cerulean gulch Mar 31, 2026, 2:27 PM

#

verbal jetty Do one thing go in code section fock a notebook and run that if it run without e...

Do you have a link to a specific notebook like this? I just tried one (didnt have exact code as yours but claimed to solve the compatibility issues) and it didn't work....

verbal jetty Mar 31, 2026, 2:42 PM

#

cerulean gulch Do you have a link to a specific notebook like this? I just tried one (didnt hav...

https://www.kaggle.com/code/kienngx/nvidia-nemotron-training-cot-labels
try this one

cerulean gulch Mar 31, 2026, 3:18 PM

#

Thanks @verbal jetty it looks better but I think I'll try some GPU cloud providers. This is a real pain.

cedar sentinel Mar 31, 2026, 5:27 PM

#

What problems are you guys talking about? the new environment works just fine. My notebook which I am running didn't need any workarounds?

sullen shale Mar 31, 2026, 5:28 PM

#

deep tree vastai would be a nice place, but haven't tried it yet and practicing poverty by...

True, I have tried many hours and days with utilizing the GPU provided by Kaggle, but all the time it's falling, I have decided to move to runpod.ai, in the first phase, it's good use but not free😀

cerulean gulch Mar 31, 2026, 7:41 PM

#

I tried out a coud GPU instance (Jarvislabs) but that has another set of issues. Looks like these Nemotron models on Huggingface are a mess too. I kept getting: ImportError: cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils.import_utils' (/home/myenv/lib/python3.12/site-packages/transformers/utils/import_utils.py

#

Now I seem to have got a forked notebook running on kaggle. Its full of sneaky overides and version suppresions etc. that I'm trying to figure out now.

#

I now get why Chris Lattner says that the AI tech stack is basically a mess.

#

Looks like its all Triton stuff that's messing things up here.

cedar sentinel Apr 1, 2026, 2:10 PM

#

Is bitmanuplation and symbol equation transformation even solvable through natural language??

turbid wharf Apr 2, 2026, 9:52 AM

#

looking for a team to join in nvidia nemotron model reseasoning challenge

native knoll Apr 2, 2026, 7:05 PM

#

Did anyone experiment with RL? I was planning to try GRPO but it seems like it is not going to be feasible with the kaggle notebook constraints and it will be too expensive to host on the cloud

wheat ivy Apr 2, 2026, 10:06 PM

#

So is the absolute max amount of free rtx pro 6000 hours 30 per week?

wheat ivy Apr 2, 2026, 10:06 PM

#

native knoll Did anyone experiment with RL? I was planning to try GRPO but it seems like it i...

Haven't yet.

native knoll Apr 3, 2026, 4:12 PM

#

Yeah that is the limit

Cant do much with it except a couple of supervised runs

sullen shale Apr 3, 2026, 9:14 PM

#

Can anyone explain to me what this error is about, and how to fix it? -
torch_dtype is deprecated! Use dtype instead!
Starting minimal model load probe...
/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/torch/compiler/init.py:148: FutureWarning: torch._dynamo.allow_in_graph is deprecated and will be removed in a future version. Use torch._dynamo.nonstrict_trace instead.
return torch._dynamo.allow_in_graph(fn)
MODEL_LOAD_FAILED
ModuleNotFoundError No module named 'cutlass'

fast jungle Apr 4, 2026, 12:14 PM

#

I wrote this, would need help to check for correctness https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/687961

Training Nemotron-3-Nano-30B-A3B-BF16 with rank 32 LoRA on length 8192 sequences

cinder depot Apr 5, 2026, 3:00 AM

#

https://www.kaggle.com/code/kienngx/nvidia-nemotron-training-cot-labels
I ran this notebook, but noticed that the training is happening only on a sample of 600 datapoints as compared to the original 3500 CoT + Labels, maybe because of that I got a score of 0.62 on LB. Did anyone run training on the full dataset , how many hours did it take ? It has been 12 hrs + for me and the training is yet to complete . Was there a corresponding jump in the LB score when training with the full data ?

cedar sentinel Apr 5, 2026, 6:51 AM

#

cinder depot https://www.kaggle.com/code/kienngx/nvidia-nemotron-training-cot-labels I ran th...

I analysed the CoT and I'm shocked with how 0.72 score is possible with this dataset.
If you observe carefully there are no true reasoning traces for bitwise and equation transformation

cedar sentinel Apr 5, 2026, 2:01 PM

#

Guys can someone explain what is happening instead of reacting😭

wise pasture Apr 5, 2026, 2:43 PM

#

cedar sentinel Guys can someone explain what is happening instead of reacting😭

just agreeing with you. I don't think Kaggle's notebook scores are even remotely reliable, they must have used a different eval settings. and yes I can confirm that those CoTs are bad. my own methods barely got me into 0.70 range (leaderboard).

turbid wharf Apr 5, 2026, 4:04 PM

#

Need a team mate for this competition must be proficient in llms, fine-tuning and deep generative models

magic folio Apr 6, 2026, 3:48 PM

#

native knoll Did anyone experiment with RL? I was planning to try GRPO but it seems like it i...

With slow generate I do not know how GRPO should be possible I only get like 2 tokens/s

honest ermine Apr 6, 2026, 9:21 PM

#

cedar sentinel Guys can someone explain what is happening instead of reacting😭

His results might be less about the data and more about his LoRA settings?

coral gull Apr 7, 2026, 2:14 AM

#

magic folio With slow generate I do not know how GRPO should be possible I only get like 2 t...

supposedly this guy got GRPO working: https://www.kaggle.com/code/johnnyhyland/nvidia-nemotron-sft-grpo-colab-faster/notebook - but he didnt run it before saving so I have no insight into whether he is acheiving reasonable speeds

cerulean gulch Apr 7, 2026, 7:27 AM

#

Can someone help me understand how prompt engineering plays a role in this competition?! The competition description mentions it a couple of times but as I see it, we submit a Lora adapter only. How does our prompt help in the submission?

#

We have no control over the prompt that is used at inference right?

fast jungle Apr 7, 2026, 9:43 AM

#

You prompt Claude Code

wise pasture Apr 7, 2026, 1:29 PM

#

cerulean gulch Can someone help me understand how prompt engineering plays a role in this compe...

You prompt the base model to explore its latent capabilities then adjust your training strategy.

cerulean gulch Apr 7, 2026, 1:34 PM

#

Thanks! As I expected, this prompting has nothing to do with a submission per se. Was a bit confused by the description since it appear to put fine-tuning on par with prompting as a strategy.

fast jungle Apr 10, 2026, 3:06 AM

#

placeholder post for my 0.85 progress prize publication https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/689915

silent jolt Apr 10, 2026, 10:34 PM

#

check my bio 😁

night monolith Apr 11, 2026, 2:39 PM

#

check my bio 😁

sullen shale Apr 12, 2026, 7:46 PM

#

fast jungle placeholder post for my 0.85 progress prize publication https://www.kaggle.com/c...

Congratulation @fast jungle , just want to know are you utilized kaggle GPUs quotas for your submission or you have used your own for this submission, your answer will be very helpful - thank you

fast jungle Apr 13, 2026, 10:34 AM

#

sullen shale Congratulation <@776813718834905120> , just want to know are you utilized kaggle...

I have updated my write-up, I used Tinker

sullen shale Apr 13, 2026, 8:16 PM

#

fast jungle I have updated my write-up, I used Tinker

@fast jungle - thank you for valuable update, as I am continuing on Kaggle (free) with the Kaggle submission format and my 1 run per week, with slow iteration due to quota limitations, as I don't have other options.

dark cloak Apr 14, 2026, 10:53 PM

#

I did not find the GPU version I was looking for A100 (80GB). Only got the P100 and T4 ones. How can I run my notebook?

maiden scroll Apr 15, 2026, 2:46 AM

#

anyone with RL understanding wanna join us

#

who also has credits

#

@fast jungle join us? We're 66/2058

fast jungle Apr 15, 2026, 4:03 AM

#

i don't know RL

cerulean gulch Apr 16, 2026, 8:42 PM

#

Does training on Kaggle RTX 6000 work for anyone at all?!?! Ive wasted a week trying to get it working and it simply doesn't work. So many folks out there struggling and posting the error messages but not a single response from the organiser adressing these issues.

glacial salmon Apr 16, 2026, 9:00 PM

#

maiden scroll anyone with RL understanding wanna join us

Hey can I know more about your team and what you guys are doing? And yeah I have some understanding of RL, and I am going to be implementing them.

#

Btw congratulations @fast jungle

#

I'm pretty sure a lot of the people at 86% used your work+theirs to get that, they should give you credit lol.

fast jungle Apr 16, 2026, 10:30 PM

#

by the math you should at least be able to train with micro-batch size of one - https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/687961

valid current Apr 17, 2026, 12:20 AM

#

glacial salmon I'm pretty sure a lot of the people at 86% used your work+theirs to get that, th...

This is definitely the case. I know because I'm currently going through his post to try and understand it, lol

#

I wonder if he's still aiming for 87.7 accuracy

glacial salmon Apr 17, 2026, 12:20 AM

#

Mhm well he did was it was the tinker issues.

#

I'm working on DGX Spark so hopefully I can avoid that

valid current Apr 17, 2026, 12:21 AM

#

I got some free tinker credits lying around so I'm probably gonna use that at some point

#

For now, unsloth + colab is doing it for me

glacial salmon Apr 17, 2026, 12:23 AM

#

Ah

glacial salmon Apr 17, 2026, 12:23 AM

#

valid current I got some free tinker credits lying around so I'm probably gonna use that at so...

So right now you're trying his method?

#

Are you adding anything yourself?

valid current Apr 17, 2026, 12:25 AM

#

glacial salmon Are you adding anything yourself?

Not yet. I'm using this hackathon to learn more about finetuning. Right now, I'm still working on compiling some synthetic data

glacial salmon Apr 17, 2026, 12:26 AM

#

I see.

valid current Apr 17, 2026, 12:26 AM

#

I'm saving the tinker credits for when I have a solid approach, lol

glacial salmon Apr 17, 2026, 12:27 AM

#

I see. Well some advice is this is a reasoning challenge. You want the model to solve the questions.

valid current Apr 17, 2026, 12:27 AM

#

Thanks for the advice, lol

glacial salmon Apr 17, 2026, 12:27 AM

#

If the synthetic data is bad or wrong or doesn't help the model

#

It'll die.

valid current Apr 17, 2026, 12:27 AM

#

Incredible. Thank you for this insight

glacial salmon Apr 17, 2026, 12:27 AM

#

is that sarcastic

valid current Apr 17, 2026, 12:27 AM

#

Yes, lol

#

I was assuming you were too

glacial salmon Apr 17, 2026, 12:27 AM

#

anyway my suggestion is that you should focus on generating high quality data first by solving the harder puzzles first.

glacial salmon Apr 17, 2026, 12:28 AM

#

valid current I was assuming you were too

no i didnt finish my advice

#

i like to talk in seperate messages

valid current Apr 17, 2026, 12:28 AM

#

That's what I'm working on, yes

glacial salmon Apr 17, 2026, 12:28 AM

#

Yeah what are your methods? if you dont mind me asking

valid current Apr 17, 2026, 12:30 AM

#

Using models that are smarter than Nemotron to give me reasoning traces for certain puzzles. Starting with equations problems atm

#

Keep in mind, I am a complete noob at this so my approaches are also naive. My experience is mainly in training regular deep learning models and I'm trying to translate those skills into LLM finetuning

glacial salmon Apr 17, 2026, 12:31 AM

#

Like what model?

#

Nemotron 3 Super?

#

Or like an even bigger one.

valid current Apr 17, 2026, 12:33 AM

#

Nemotron 3 Super, yes. But I'm working on getting a data generation pipeline using DeepSeek

glacial salmon Apr 17, 2026, 12:33 AM

#

Yeah I tried nemotron 3 super it wasn't very good.

#

Slightly better than nano but not much at these questions.

valid current Apr 17, 2026, 12:33 AM

#

Were you able to get super running on your dgx spark?

glacial salmon Apr 17, 2026, 12:33 AM

#

Yeah

valid current Apr 17, 2026, 12:33 AM

#

I installed it on mine and it came in 3 parts, lol

glacial salmon Apr 17, 2026, 12:33 AM

#

er what

#

how?

valid current Apr 17, 2026, 12:34 AM

#

NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00001-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00002-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00003-of-00003.gguf

glacial salmon Apr 17, 2026, 12:35 AM

#

😭 why you using gguf

#

are you on dgx spark?>

valid current Apr 17, 2026, 12:35 AM

#

Running one of the GGUFs takes up a 100 gb of the 128 gb on my DGX Spark. This model is meant for multiple GPUs, I'm pretty sure

glacial salmon Apr 17, 2026, 12:35 AM

#

no it works perfectly fine

valid current Apr 17, 2026, 12:35 AM

#

Yeah

glacial salmon Apr 17, 2026, 12:35 AM

#

you need to run nvfp4

#

but anyways trust me Super isnt the solution for this contest

valid current Apr 17, 2026, 12:36 AM

#

Yeah, that may be the case

#

Still worth collecting some data for comparision tho

glacial salmon Apr 17, 2026, 12:36 AM

#

I have a training pipeline set up and hopefully I can get to 90%, I think I might have a breakthrough on solving the puzzles, but it'll take time

glacial salmon Apr 17, 2026, 12:36 AM

#

valid current Still worth collecting some data for comparision tho

I guess

glacial salmon Apr 17, 2026, 12:36 AM

#

glacial salmon I have a training pipeline set up and hopefully I can get to 90%, I think I migh...

if it works ill be sure to share in a notebook

valid current Apr 17, 2026, 12:37 AM

#

glacial salmon if it works ill be sure to share in a notebook

I mean, if you achieve that, you should probably keep it to yourself until you've won, lmao

glacial salmon Apr 17, 2026, 12:37 AM

#

yes yes of course i meant like after i win

valid current Apr 17, 2026, 12:37 AM

#

Or is it common for people to share their winning strat in kaggle competitions?

glacial salmon Apr 17, 2026, 12:37 AM

#

no no only after they win

valid current Apr 17, 2026, 12:37 AM

#

glacial salmon no no only after they win

So Tong was either being generous or he only cared about the midpoint prize, lol

#

Either way, respect

glacial salmon Apr 17, 2026, 12:38 AM

#

yeah well it says to get a prize you need to publish a notebook

glacial salmon Apr 17, 2026, 12:38 AM

#

valid current Either way, respect

but yeah

cerulean gulch Apr 17, 2026, 10:24 PM

#

Is the Nemotron nano model that is posted on Kaggle as part of this competition any different from the one on huggingface with the same name?

valid current Apr 18, 2026, 12:43 AM

#

Idts given that the READMEs are the exact same

#

Absolutely diabolical if they were different though, lol

void notch Apr 19, 2026, 7:47 PM

#

Why the leaderboard right now just copy paste submissions of Tong's submission.

#

Like the top 500 is just Tong's submission being reproduced

valid current Apr 19, 2026, 8:56 PM

#

void notch Like the top 500 is just Tong's submission being reproduced

Lol, it's just become a baseline atp

#

imo, if no one can top that with a better technique, tong deserves the W

verbal jetty Apr 20, 2026, 10:34 AM

#

valid current imo, if no one can top that with a better technique, tong deserves the W

It would need more and expensive computation then kaggle provide

valid current Apr 20, 2026, 3:43 PM

#

verbal jetty It would need more and expensive computation then kaggle provide

Yeah probably

#

Anyway, someone just hit 0.87 on the leaderboard

#

Huge

glacial salmon Apr 20, 2026, 7:10 PM

#

void notch Like the top 500 is just Tong's submission being reproduced

Real

sullen shale Apr 20, 2026, 8:19 PM

#

What is the most important for this competition model's reasoning, or model's equations?

void notch Apr 21, 2026, 3:44 AM

#

i mean

#

the equations are the result of the reasoning

void notch Apr 21, 2026, 6:56 AM

#

my insane progress this competition: 0.66 -> 0.67 -> 0.68 -> 0.6

#

im cooked

floral fiber Apr 21, 2026, 10:48 AM

#

Hi everyone

How to incrase the speed of computation.

How to decrease the evaluation time and also. what are some steps to reduce the whole training time. I am a bit beginer, I know I can increase the batch size and resuce the epochs but, I want to know if there are any other techniques to consider, specifically to lower the evaluationn time after submission because it is taking lot of time and the GPUs are not getting available to test experiments

void notch Apr 21, 2026, 11:32 AM

#

void notch my insane progress this competition: 0.66 -> 0.67 -> 0.68 -> 0.6

watch me jump to 0.9

cerulean gulch Apr 22, 2026, 4:11 AM

#

Has anyone managed to get trl and vllm running for GRPO training?
I’ve been trying to get that (also Unsloth) to work and it hasn’t really worked anywhere. Without internet on the Kaggles RTX pro 6000 that’s super challenging anyway but even the unsloth notebook for Nemotron nano doesn’t work beyond the first import on Google colab

void notch Apr 22, 2026, 6:22 AM

#

cerulean gulch Has anyone managed to get trl and vllm running for GRPO training? I’ve been try...

that is what i am going to do next week once my quota resets

#

i dont think a lot of people have tried GRPO training because of how long it will take on a RTX 6000

#

it would easily take 25 hours so there is no room for error

verbal jetty Apr 22, 2026, 1:48 PM

#

void notch it would easily take 25 hours so there is no room for error

140 hour for 1400 sample at 4048 max token and group size 2😁

cerulean gulch Apr 22, 2026, 4:41 PM

#

verbal jetty 140 hour for 1400 sample at 4048 max token and group size 2😁

Wow that’s crazy. Did you just estimate that using some tokens/sec or did you a actually train that long?

verbal jetty Apr 22, 2026, 5:47 PM

#

cerulean gulch Wow that’s crazy. Did you just estimate that using some tokens/sec or did you a...

Trained for 1 hour then got those number

#

Kv cache us not working on rtx 6000 pro with Nemotron architecture

cerulean gulch Apr 22, 2026, 5:48 PM

#

Ah ok that’s some 20 tokens per second I guess

verbal jetty Apr 22, 2026, 5:48 PM

#

cerulean gulch Ah ok that’s some 20 tokens per second I guess

Ya maybe

cerulean gulch Apr 22, 2026, 5:49 PM

#

Perhaps we could make a GRPO team and split the training costs

verbal jetty Apr 22, 2026, 5:49 PM

#

cerulean gulch Perhaps we could make a GRPO team and split the training costs

Ya we are doing that only

cerulean gulch Apr 24, 2026, 11:48 AM

#

verbal jetty Trained for 1 hour then got those number

This is with TRL library only and not with vllm right?

verbal jetty Apr 24, 2026, 1:50 PM

#

cerulean gulch This is with TRL library only and not with vllm right?

No vllm work perfectly I tried on 14k sample with max token 7k it has done all that in 7hour

cerulean gulch Apr 24, 2026, 1:56 PM

#

verbal jetty No vllm work perfectly I tried on 14k sample with max token 7k it has done all t...

Wow on Kaggle?! Thats amazing! I am running into quantisation problems with vllm or OOM (without quantisation)

verbal jetty Apr 24, 2026, 4:36 PM

#

cerulean gulch Wow on Kaggle?! Thats amazing! I am running into quantisation problems with vllm...

It was without quantisation on rtx 6000 pro

#

Ran perfectly

#

With kv cache

void notch Apr 25, 2026, 11:54 PM

#

verbal jetty Ran perfectly

what score did you get

cerulean gulch Apr 26, 2026, 6:06 AM

#

verbal jetty It was without quantisation on rtx 6000 pro

So this was with trust_remote_code=False ?

verbal jetty Apr 26, 2026, 6:28 AM

#

void notch what score did you get

I didn't run it was too much

verbal jetty Apr 26, 2026, 6:29 AM

#

cerulean gulch So this was with trust_remote_code=False ?

For vllm it was true if you set false lora will have many missing module

sullen shale Apr 26, 2026, 6:35 AM

#

Has anyone experimented with enable_thinking=True in the chat template during SFT training? Curious if it makes a difference vs manually wrapping reasoning in <think> tags in the assistant content.

void notch Apr 26, 2026, 6:56 AM

#

sullen shale Has anyone experimented with enable_thinking=True in the chat template during SF...

i am doing that right now

#

model is 7 hours in training

cerulean gulch Apr 26, 2026, 7:06 AM

#

Here's another reasoning challenge. Spot the error (if any):

nemotron-3-nano-30B-A3B-BF16 requires transformers>=5.3.0
trl requires vllm <=0.18.0
vllm<=0.18.0 requires transformers<5.0
--> It is not possible to train nemotron-3-nano-30B-A3B-BF16 via GRPO with trl and vllm.

void notch Apr 26, 2026, 7:12 AM

#

whats you highest score so far?

cerulean gulch Apr 26, 2026, 12:15 PM

#

@verbal jetty did this all work on RTX 6000 on Kaggle? Was it in collocate mode or server mode? ( up to now I have only tried collocate since its a single GPU....)

verbal jetty Apr 26, 2026, 4:05 PM

#

cerulean gulch <@1146018775578193992> did this all work on RTX 6000 on Kaggle? Was it in colloc...

Yaa all work on rtx 6000 pro

sullen shale Apr 26, 2026, 8:02 PM

#

void notch model is 7 hours in training

Did your training complete, how is the score

void notch Apr 26, 2026, 8:30 PM

#

i had a memory leak and my BM solver was stripping the beginning 0 so the score ended up at 0.56

#

I predict that once i fix these two minor issues my score should reach 0.9 with ease.

cerulean gulch Apr 27, 2026, 3:03 PM

#

verbal jetty Yaa all work on rtx 6000 pro

I cant even install vllm and use it on the RTX. How did you do that?

#

If I import vllm I get RuntimeErrror cant find libcudart.so.13 ...

verbal jetty Apr 27, 2026, 3:21 PM

#

cerulean gulch I cant even install vllm and use it on the RTX. How did you do that?

fock this notebook and run

"""Metric for NVIDIA (129716)."""

import subprocess
import sys

# Set up environment
commands = [
    "uv pip uninstall torch torchvision torchaudio",
    "tar -cf - -C /kaggle/usr/lib/notebooks/metric/nvidia_metric_utility_script . | tar -xf - -C /tmp",
    "chmod +x /tmp/triton/backends/nvidia/bin/ptxas",
    "chmod +x /tmp/triton/backends/nvidia/bin/ptxas-blackwell",
]
if RUN_EVALUATION:
    for cmd in commands:
        print(f"Running: {cmd}")
        subprocess.run(cmd, shell=True, check=True)
sys.path.insert(0, "/tmp")

and then you will be able to run use vllm and just remove other cell according to your need

#

https://www.kaggle.com/code/huikang/adapter-validation-notebook

cerulean gulch Apr 27, 2026, 3:45 PM

#

Ahh very clever! SInce the metric uses vllm!!! Thanks, I will try this

verbal jetty Apr 27, 2026, 4:34 PM

#

cerulean gulch Ahh very clever! SInce the metric uses vllm!!! Thanks, I will try this

😋😋

cerulean gulch Apr 27, 2026, 6:10 PM

#

It works! Thanks @verbal jetty

cerulean gulch Apr 27, 2026, 8:58 PM

#

Ok I spoke too soon. Vllm works but not with trl. You used trl @verbal jetty ?

zealous wave Apr 27, 2026, 9:07 PM

#

I have a question I am participating in NVIDIA nemotron reasoning challenge, can we use dataset that is publically available generated CoT traces on the given dataset for this competition. The dataset is public generated by @fast jungle

fast jungle Apr 27, 2026, 9:11 PM

#

zealous wave I have a question I am participating in NVIDIA nemotron reasoning challenge, can...

you are free to use whatever in https://www.kaggle.com/datasets/huikang/huikang-nemotron-repository-snapshot, I do not restrict

zealous wave Apr 27, 2026, 9:12 PM

#

fast jungle you are free to use whatever in https://www.kaggle.com/datasets/huikang/huikang-...

Thank you @fast jungle

verbal jetty Apr 27, 2026, 11:48 PM

#

cerulean gulch Ok I spoke too soon. Vllm works but not with trl. You used trl <@114601877557819...

No just normal vllm it worked with both see the notebook it has use vllm just twic the code

prisma zealot Apr 29, 2026, 12:59 AM

#

Check my bio 😁

wise pasture Apr 30, 2026, 12:20 PM

#

I sent friend request, pls check

sage willow Apr 30, 2026, 12:40 PM

#

Hey everyone looking to team up for the Nemotron Challenge. We’re currently two members, including a professor with strong expertise in NLP and dozens of publications in this area.

My background is in LLM pre-training, post-training, and data curation.
Constraint: I’m currently limited to Colab, so I can’t realistically run serious experiments on a 30B model. We’re looking to collaborate with someone who has reliable GPU access. Me and the professor will take full ownership of research direction, training strategy, data curation, pipeline design, and post-training.

If you’ve got compute and want a serious, execution-focused team, DM me.

remote urchin May 5, 2026, 7:15 AM

#

Hi I am bit new to this kind of LLM competition. Can some one kindly help me to start the work?

remote urchin May 5, 2026, 10:34 AM

#

remote urchin Hi I am bit new to this kind of LLM competition. Can some one kindly help me to ...

Streamline to the above question can I use any LLM model ?

sullen shale May 5, 2026, 8:30 PM

#

@fast jungle - What reasoning format are you using for equation puzzles? That category seems hardest to crack, any comments or suggestions?

fast jungle May 5, 2026, 8:54 PM

#

What you see in my writeup is what I have, I haven’t worked on this competition since

sullen shale May 6, 2026, 7:21 PM

#

fast jungle What you see in my writeup is what I have, I haven’t worked on this competition ...

Thanks @fast jungle, your notebook has been a goldmine. Totally understand, equation puzzles feel like a rabbit hole. We noticed the model seems to pick up the other 5 categories well from your traces, so maybe the juice isn't worth the squeeze there. One thing I'm experimenting with is repeating your corpus 3x in the index to get ~734 training steps instead of 244. Curious if you ever tried multi-epoch training on the same traces, and whether it helped or hurt in your experiments?

fast jungle May 6, 2026, 7:58 PM

#

the winning submission repeated the data for some problems

ideally you should make your own synthesized problems and train on those instead, you can generate as many syntheized problems as you want

tulip elm May 7, 2026, 1:56 AM

#

fast jungle the winning submission repeated the data for some problems ideally you should m...

hey, your solution and writeup are awesome. can i use your codes you published in your github repo (https://github.com/tonghuikang/nemotron) ? and congratulations on winning the open progress prize 🙂

fast jungle May 7, 2026, 6:50 AM

#

you are free to use my code, which is why they are published

tulip elm May 8, 2026, 3:55 AM

#

fast jungle you are free to use my code, which is why they are published

thank you!

celest halo May 10, 2026, 1:07 AM

#

Hi, I just joined the competition. Could anyone kindly tell if we can use the provided NVidia GPU in the training process? Thank you so much!

cerulean gulch May 10, 2026, 5:43 AM

#

celest halo Hi, I just joined the competition. Could anyone kindly tell if we can use the pr...

Yes you can use the RTX Pro 6000 for training.

cerulean gulch May 11, 2026, 2:09 PM

#

Can anyone shed light on this whole trust_remote_code= True/False thing? When does it important / not?

meager niche May 12, 2026, 4:21 PM

#

How can be reduce training time. In my case it was not end within 12 hours 🙂

sullen shale May 13, 2026, 9:01 AM

#

meager niche How can be reduce training time. In my case it was not end within 12 hours 🙂

what is your define values for - LORA_RANK, and LORA_ALPHA if you have defined

meager niche May 13, 2026, 5:42 PM

#

sullen shale what is your define values for - LORA_RANK, and LORA_ALPHA if you have defined

LoRA r=32, alpha=64

stray wedge May 14, 2026, 9:30 AM

#

Anyone struggling in Consistency and want to learn together.
DM me.

sullen shale May 16, 2026, 6:26 PM

#

meager niche LoRA r=32, alpha=64

If you are using Kaggle GPU RTX for reasoning, then use alpha also 32 as well

steady token May 25, 2026, 5:21 PM

#

hii guys! is it too late to join this competition??

lavish kelp May 26, 2026, 1:11 AM

#

steady token hii guys! is it too late to join this competition??

Deadline for this one is Jun 9, you have approx 13 days 👀

high gyro May 27, 2026, 10:46 PM

#

cerulean gulch Can anyone shed light on this whole `trust_remote_code= True/False` thing? When ...

if true the nvidia implementation is used, if false the transformers impl is used (only if version >= 5.3.0);
and for >=5.3.0 setting it true will cause GRPO KV Cache anomalies mentioned in discussion 690161

cerulean gulch May 28, 2026, 6:27 AM

#

high gyro if true the nvidia implementation is used, if false the transformers impl is use...

So we should use False? When would one ever want to use True?!

shut cloud May 28, 2026, 6:32 AM

#

🚀 Hey @everyone!

We’re building PromptGram and aiming to reach 150 GitHub stars ⭐

If you like AI, FastAPI, microservices, or developer tools, please check it out and support the project by starring the repo 🙌

GitHub Repo: https://github.com/dewangsahuji/promptgram

Every star really helps and motivates us to improve the project further 💙

remote urchin May 28, 2026, 11:00 AM

#

Guys when I load the model I get
SafetensorError: Error while serializing: I/O error: No space left on device (os error 28)
any fixes for this ?
I was using

model_path = path

tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    offload_folder = "/kaggle/working/offload_folder"
)

#nvidia-nemotron-model-reasoning-challenge

check my bio 😁

check my bio 😁

Check my bio 😁