#google-tunix-hackathon

1 messages Β· Page 1 of 1 (latest)

tidal spruce
#

Hi

scenic orchid
#

Hello

weak cove
#

Hello everyone.

woven parrot
#

hello

bitter turtle
#

Hello

sand elk
#

hello all

zealous dome
#

hi

junior roost
#

hello everyone

tough dawn
#

Hi

primal flame
#

Hello

odd parcel
#

Hello

marble fiber
#

Hello

calm monolith
#

Hello all

tranquil mauve
#

Hello

#

Who wants to join to team up with me ?

honest scaffold
#

hii
anyone open to collab? looking for people who have some experience in fine tuning llm's!

bronze crescent
#

I need 4 new people for my hackathon team. If anyone’s interested, just DM me!

autumn terrace
#

Hellovirtual_hug

coral patrol
#

What is this hackathon about?

potent crest
#

Hey Guys!!! I am new to hackathons and this would be my first one... Does anyone has any tips or just some point to keep in mind for me?

jagged pagoda
#

Hello😍

tawdry iris
prisma barn
graceful merlin
#

hello everyone!! i'm new here!!! welcome to team up with me!!!

patent harbor
#

hello everyone

vast wigeon
#

Also, Hi!

verbal rapids
#

Hi everyone

#

I don't figure what we need to do in this hackathon ? As the problem statement says we need post train the llm using tunix such that it not only predict the answer but also provide how the model is calculated . But when I check out the starter code it's already there that it providing resonong along with the answer than what is supposed to do by us ?? Please help me as I am new in this type al hackathon..

unkempt harness
#

Hi

ionic elk
golden raft
#

hi everyone! will there be any office hours this week?

vivid veldt
#

Hello everyone. Is model distillation from non Gemma teachers allowed?

ionic elk
earnest isle
#

anyone had any issue running tunix today? yesterday i did a full run, trained a gemma model, got good results like:

Model State Accuracy
Raw baseline (no GRPO) 45–55%
After v1 β‰ˆ98%

but today even re-running the same notebook i got so many issues with tunix, lora etc that i've been unable to do training
some update of sort maybe and nothing is compatible anymore i don't know, i'm very lost :_:

tawdry iris
earnest isle
#

i literally trained it with a notebook, then run it again and now always get errors

earnest isle
#

without having changed anything

#

difference was from yesterday to today

#

so i thought maybe some update or something

tawdry iris
tawdry iris
tawdry iris
earnest isle
#

i was using this grpo_demo Gemma2 2B

#

i tried to run this one as well from someone else grpo_dual-stream_tunix

#

but i got the same problems

#

like with the shards and the flax version mismatch

#

and qwix lora and tunix version

#

i tried to downgrade but some wouldn't

#

so environment issues anyway

tawdry iris
#

are you pinning your Tunix version like this '!pip install "google-tunix[prod]==0.1.3"'?

earnest isle
#

Yes, I pinned Tunix to 0.1.3 and that version is incompatible with QWIX LoRA and that’s why I got the recursion errors.

Tunix 0.1.3 is required for the official GRPO training flow
But 0.1.3 is currently incompatible with QWIX LoRA
So the only working solution is: use 0.1.3 but disable LoRA but that... wasnt also viable

tawdry iris
earnest isle
#

i wasnt using it

#

i disabled it

#

thats how i succeded yesterday as well

#

but as i said the same notebook yesterday working, today no

#

so that's weird no?

#

anyway can you link me to the starter notebook u just tried

#

and i will start again from there

#

thank you, appreciated

earnest isle
#

i only have 2 h tpu left this week lol

tawdry iris
#

Always go back to the starter notebook to isolate if sth. weird happen

earnest isle
#

oh ok didnt think about that

#

thank you

#

anyway just as a quick question

#

i was confused because i wasnt sure of how the first training went

#

but basically baseline was like around 45% all 3 results

#

but after training i got like 55% 60% and 98% accuracy

#

with the 2b

#

which is not good because is supposed to be a general enhancement of score right?

#

mostly improved dramatically the format but not the rest

tawdry iris
#

55% accurary isn't bad; LLMs are known for bad with math

#

98% format accurary is normal

earnest isle
#

ok perfect, yeah i couldn't understand if was abnormally good or just normal

#

thank you again, i'll treasure the tips about going back to the starter notebook!

#

have a good night!

tawdry iris
#

you can see the result I got in the starter notebook as a reference

earnest isle
#

please yeah

#

where?

earnest isle
#

Found! Oh ok my run wasnt bad at all then

#

Hopefully I'll do better if i can make it work again lol

earnest isle
#

@tawdry iris thank you now is working, damn i spent so many hours trying to fix it, start from scratch was way easier lol

lucid bison
#

Hello everyone

paper pilot
#

Hello guys

#

Anyone doing capstone?

warm trench
#

hey everyone, i just kneew about this, and am really interested in joing the team. so pls let me in your team

spice tendon
#

I have no credit or debit card, so how i can access Google cloud

upper geyser
#

I tried to run the baseline but it did not work

earnest isle
#

Yes i had to restart from scratch and worked, couldn't find what was wrong with mine in the end but anyway working now

upper geyser
#

I just tried this version and it showed me the error telling me that the libraries is conflicting

earnest isle
#

Tried both but 3.1 i run out of hours before any results

#

Back on monday

#

But it was working

upper geyser
#

I tried multiple times but it did not work

earnest isle
#

Atm i can't run anything

#

But when i could training was working yes

upper geyser
#

yea I will try it again

#

just posted a post in Kaggle about this problem

#

anyway, thank you for your help

earnest isle
#

As the guy said, try the basic notebook or restart from there and it works

upper geyser
#

I will try the Gemma 2 2B version also

upper geyser
earnest isle
#

Yeah when i restart it from there it worked

upper geyser
#

did you change anything ?

earnest isle
#

Otherwise i had so many libraries issues and versions

upper geyser
#

I read you said that you pinned some versions

earnest isle
#

The one from the basic notebook are the right ones

upper geyser
#

yea, I will try it again

#

thank you so much

#

have a nice day xD

#

I ran the Gemma 2 2B version and it showed the same error again

#

@earnest isle if you run it in the future, can you please let me know if it is successful or not πŸ₯Ή

earnest isle
#

You run the notebook from the competition?

#

Or yours?

upper geyser
earnest isle
#

And it didnt work? Weird

upper geyser
#

yes

#

both of them

hallow totem
#

@upper geyser hey there you're here wow

upper geyser
hallow totem
tawdry iris
upper geyser
hallow totem
#

@tawdry iris I've replied to you on my "discussion " reply for the 0% accuracy issue I kindly request check it out

earnest isle
#

i'm a bit confused i have the tpu accelerator selected, but my notebook (training now) is only using cpu?_?

earnest isle
#

oh it just... doesn't say that is being used.. so confusing lol, but i'm not getting error and tpu is active so i suppose it is working

tawdry iris
fallen seal
#

Hi..I had couple of query if someone could help with

  1. What level of evaluation is done? I mean one of the domains mentioned is code. The hard thing here is that code reasoning and outputs is one domain where the token limit is hard to control. You can’t summarize or compress this. I see in the competition page 1K output token length is fine (and makes sense coz having longer sequence length in training just adds more runtime). So my question was, for coding can we limit to smaller token usage cases only and expect the eval to run on that. If so I would just don’t want to waste time (and tpu hours) trying to train longer sequences

  2. The single session 45 points and multi session 15 bonus points was a bit confusing. Please let me know if I got this correct: For the 45 points in single session we need to do all the trainings within that session (and can’t load any checkpoints we saved in a previous session). Basically all the training Gemma model gets from scratch is to be ran in a 9h session. The 15 bonus point is for alternate models we trained ( with more time and resources across multiple sessions) and we just have to save that to kaggle and share the name of the multi session model in 9h notebook at the end. These are mutually exclusive models and can’t be used as stage loads in 9h runtime notebook

Thanks in advance!!

median slate
simple aspen
#

how are you guys running the starter NB?

spare fog
#

hey! has anyone been able to spin up a vLLM + Tunix run on Colab? if so, could you please share your rl cluster config and your pip freeze?

tawdry iris
# simple aspen how are you guys running the starter NB?

It should only take 2-3 hrs for the starter notebooks. Btw, if you are looking at the Gemma3 1B starter notebook, don't be fooled by the progress bar (which says it takes 5+ hrs), it actually finishes in half of that time.

tawdry iris
spare fog
#

Sure! Thanks πŸ™‚

rotund wing
tawdry iris
#

I am.

tawdry iris
# fallen seal Hi..I had couple of query if someone could help with 1. What level of evaluati...
  1. Verifiable tasks (math&coding) will have much lower weights because 1) the starter notebooks already cover math and 1B or 2B models aren’t very good with math in general, especially without tools 2) Gemma is not particularly well trained with code.

  2. Correct for the single-sessio mode. You can only load one of the 2 stock Gemma models via official Tunix APIs and finish the training in one go (loading other checkpoints is not allowed and will be heavily penalized)

For the multi-session run (let's also call it 'unrestricted mode'), if you choose to participate, it will be a separate model. You can resume training from your single-session ckpt for it, or do whatever you want. No restriction.

We are working on a submission template and a FAQ. Please stay tuned.

median slate
fallen seal
# tawdry iris 1. Verifiable tasks (math&coding) will have much lower weights because 1) the st...

β€œVerifiable tasks have much lower weights” - Does this mean the evaluation of model will done with lower weight on verifiable tasks OR you meant the base Gemma model have low performance on verifiable tasks and needs more weights tuned.

If it’s the former can you please let us know what the evaluation of trained model will be on , so we can design training to follow that preferences.

rotund wing
#

I have a question, do we have to fine-tune the gemma model only for 1 task reasoning? Like only mathematical reasoning or logical reasoning? Or we have to make the model do all type of reasoning tasks like mathematical reasoning + logical reasoning + Commonsense +.....?

tawdry iris
tawdry iris
spare fog
tawdry iris
grave spruce
#

||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​||||​|| _ _ _ _ _ _ https://imgur.com/TC6h8P4 https://imgur.com/iiKXKB5 https://imgur.com/JAkE28j https://imgur.com/keASgw9

inner thorn
#

is anyone facing hf rate limit error?

hushed flower
#

Hi @tawdry iris , Im running into error:
TypeError: set_metadata takes either 1 argument or 1 or more keyword arguments, got args=('sharding_names', ('fsdp', None)), kwargs={}

when trying to load lora model using:
lora_model = qwix.apply_lora_to_model(
base_model, lora_provider, **model_input
)

my lib versions are:
flax: 0.12.0
jax: 0.8.0
google-tunix: 0.1.3

Can you please suggest any possible workaround to this?

#

anyone else facing same issue?

hushed flower
warped wharf
#

Hello, me and my team are working on a submission, I have two questions:

  • Token limit enforcement: Is the "<1K output token" limit a hard stop during generation? If the model is mid-sentence and hits the limit, will the output be truncated immediately?
  • Parser Robustness: If a response is truncated due to the limit or if the model fails to generate the closing </answer> tag, how is this handled? Will the parser attempt to recover the content or will the submission automatically receive a zero score for that specific prompt due to invalid formatting?

These clarifications will help us better design our training strategy. πŸ˜„

olive ledge
tawdry iris
tawdry iris
hushed flower
inner thorn
#

hello @tawdry iris, regarding the model generation format, is text outside <reasoning>...</reasoning><answer>...</answer> allowed?

tawdry iris
#

why do you need text beyond the closing tag?

tawdry iris
hushed flower
tawdry iris
#

Did you choose the TPU image in the right side panel?

hushed flower
#

yeah.. TPU v5e-8 session

tawdry iris
#

Can you share you notebook?

#

@all, we published a submission notebook template and a FAQ. Please take a few minutes to read through them.

hushed flower
prisma crag
#

hello everyone, i have some questions about competitions in kaggle,
Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:

CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv

will these be assesed based on the notebook you use to train your (pre-trained) models? what if i use another source like colab pro i and just save the best models the upload to kaggle to inference, does that count as cheating or anything?

tawdry iris
rotund wing
#

I read Submission Template, do it is compulsory to use LoRA?

grave socket
tawdry iris
rotund wing
main jungle
#

hi guys, i have 4 questions regarding this hackathon:

  1. It is stated that tool use is not necessary, but do i understand correctly that it is allowed?
  2. Can I use everything thats possible within a kaggle notebook with internet access enabled? (for example google search, llm apis, pretrained models as judge, pretrained models for distillation)?
  3. Has the workflow to be User Input -> Reasoning -> Answer or can I do something like User Input -> Planning -> Tool Calling -> Reasoning -> Tool Calling -> Answer?
  4. Can I use all Gemma 3 1B variations (like instruct or quantized models)?
zealous dome
#

@tawdry iris So I can make my own dataset for the competition as long as the results are fully reproducible within my submission? Also in the "model quality across multiple kaggle sessions" evaluation description it says we can use private data. Does that mean we don't have to reveal our training data used for a run across multiple sessions, we just have to provide the checkpoints and explain what we did?

tawdry iris
tawdry iris
hallow totem
#

also I have one more question, talking abt the correct answer accuracy what's the best someone could even reach roughly?

hallow totem
#

@tawdry iris
also I'm very confused abt how exactly is the code evaluated? good format + accuracy?

but then how I'm confused coz you see gemma 2 2b as shown in the started notebook clearly outperforms gemma 3 1b with just the same default parameters, reward fns and yk a little training

then wt if I just use gemma 2 2b i mean I don't exactly get what's the goal (just abt final model performance)

#

pls answer my questions whenever you have time
Thank you

zealous dome
#

@tawdry iris Although the quality of a reasoning block is subjective, will the answers to questions for evaluation be short and objective? For example will the model only be expected to output one word/number for the final answer between <answer></answer> tags or could it be a long sentence or paragraph? And will there always be exactly one correct answer?

hallow totem
hallow totem
#

what dataset are we supposed to use for training is it the maths dataset in the starter notebook? like are we just supposed to train the model on that dataset and that's it? @tawdry iris

willow stump
#

@tawdry iris for some reason I always get permission denied every time I try to load the model from my Colab Notebook. I checked, I am correctly authorized, and I accepted the license agreement. Will be grateful for help in resolving this issue.

hallow totem
# willow stump we are supposed to come up with a dataset of our own

of our own? like am I supposed to train it on an another dataset picked from kaggle? (coz where else am i even gonna get a dataset from, if I'm not wrong) but which dataset? and what domain?

also if we're supposed to come up with our own then why's everyone else using the math 8k dataset

dark forge
hallow totem
dark forge
hallow totem
#

im actually pretty prepared with the notebook and my own reward system

#

just gotta get a proper dataset now

hallow totem
vivid veldt
olive ledge
# hallow totem hey i just have one more question if you don't mind answering. as we are suppose...

yeah. formatting issues are still a problem when you do text-based reasoning, even when trained on the GSM8K dataset, so you can easily reward formatting related components, even if you don't reward having "the right answer." You could also use an LLM judge to verify the answer, but I found it too slow for GRPO with the notebooks. Some people mention in the discussion board about using the Gemini API, so that's another option for having an LLM judge as your "verifier"

hallow totem
#

i just am using 20k rows with 1 epoch num iteration as 4 num generation as 4 train micro batch size as 4

#

do you got any idea where im goingwrong if u dont mind helping me @olive ledge

rotund wing
#

Hello @tawdry iris (sorry for the ping if it disturbs you).

I had a quick clarification question. The description mentions that β€œevaluation will cover both the reasoning trace and the final answer.”

Should the model’s output reasoning be a concise explanation that justifies the answer, or should it include a detailed, step-by-step chain-of-thought showing the model’s internal reasoning process?

I want to make sure the model’s outputs are aligned with what judges find most useful and readable.

willow stump
#

@here does anyone else also having issues downloading the model directly from kagglehub after accepting the license agreement and authentificiation ?

willow stump
#

also, I will be very grateful if someone @tawdry iris or @everyone could clarify:

if the answer does not adhere to the format

<reasoning>model_thinking_trace</reasoning>
<answer>model_answer</answer>

will it get a partial credit, or just 0 , and thus in the 45 points that are awarded for this section will contribute as (0* number_of_answers_that_do_not_comply_with_format + score*number_of_answers_that_comply_with_format)/ total_number_of_questions ?

rotund wing
hallow totem
#

@tawdry iris @olive ledge @dark forge can we train for more than 2 sessions?

olive ledge
hallow totem
olive ledge
hallow totem
olive ledge
# hallow totem ok so let's say I have 3 notebooks notebook for session 1 notebook for session...

They're only going to run one of your notebooks all the way through. Pick that notebook as your single-session one. In your case, probably only the notebook for session 1 would qualify. And then add a cell at the end of that notebook that specifies your model id for what you produced at the end of session 3. From the competition website regarding multi-sesh: "Participants must explicitly provide a Kaggle model name/ID at the end of the notebook as the submission for this item."

hallow totem
#

also one more doubt
when I let's save have run whole notebook and got an output file in output section of my notebook

now after getting that let's say I make some changes in the maybe markdown and do quick save -> save an output for this version

even after this why does the output file from output section disappear

#

@olive ledge could u pls help me with this

olive ledge
hallow totem
#

entire note bhot

#

notebook*

hallow totem
olive ledge
hallow totem
#

yesyes at the end I'll save and run obviously coz I need checkpoints

#

thanks a lott

#

you rlly help a lot honestly

olive ledge
hallow totem
blazing needle
#

Hello, I am AI Engineer Professional with knowledge of ML & RL Looking forward to join any team for Tunix.

willow stump
#

what is the max allowed size for the dataset that we can create?

hallow totem
hallow totem
blazing needle
hallow totem
hallow totem
#

is anyone else facing the error

module pyarrow has no attribute PyExtensionType

#

@olive ledge

#

it doesn't happen when running the starter notebook else wise it happens and depends really on luck for me sometimes it runs sometimes gives out this issue

blazing needle
hallow totem
#

ever*

blazing needle
vernal cobalt
#

https://bit.ly/4amCrfJ
πŸš€ 𝑰𝒇 π’šπ’π’–β€™π’“π’† π’ƒπ’–π’Šπ’π’…π’Šπ’π’ˆ π’˜π’Šπ’•π’‰ 𝑳𝑳𝑴𝒔, π’•π’‰π’Šπ’” π’π’†π’˜π’”π’π’†π’•π’•π’†π’“ π’Šπ’” 𝒇𝒐𝒓 π’šπ’π’–: Agents All You Need - A bAI Labs publication Dec 2025 edition

rotund wing
lethal galleon
#

Hi, I haven't been able to use a TPU in Kaggle. I am always at least #44 in the queue. I tried going to Colab, my code runs but it runs out of RAM.

  • Is anyone else facing this issue? What have you done?
  • If you have been +#40 how long does has it taken for you to get a TPU session? Have you gotten one?
  • Any suggestion?

Thanks a lot!
P.s. this is my first Kaggle competition, any help is appreciated.

rotund wing
lethal galleon
rotund wing
dark forge
# lethal galleon Thanks for the answer! And after that I get it for a decent amount of time? If ...

you get max 9hrs per tpu session, but if you are inactive and nothing's running on the notebook for something like 10/15mins you get disconnected. Restarting your session (e.g. if you install something and need restart) don't boot you off the machine, but if you 'terminate' your session it does. If you're running out of VRAM on colab tweaking stuff like batch sizes and max sequence lengths might help (or if you mean 'normal' RAM you need to look at your data loading strategy)

rotund wing
hallow totem
#

if u r #44 in the queue minimum 1 hrs u gotta wait

#

best case 50 mins there's no way u getting in 15-20 mins

#

if u r 44 in the queue currently u r lucky coz I always be in like 70-80 or even 100+ most of the time

rotund wing
hallow totem
rotund wing
hallow totem
hallow totem
#

it just gotta be as much needed

#

u also don't want ur output to get truncated

hallow totem
#

yepp np

rotund wing
# hallow totem it gotta be a good reasoning including proper step by step chain of thought

Hello πŸ‘‹, again thanks for helping me out.
But I have one question regarding this "step by step CoT":-
What should be the output look like?:-
1st:-
Question: find x in 7x+15=4x+45
<reasoning>

  1. 7x+15=4x+45

  2. 7x-4x=45-15

  3. 3x = 30

  4. x = 30/3 = 10
    </reasoning>
    <answer>
    X = 10
    </answer>

2nd:-
<reasoning>
Ok, so the equation is 7x+15=4x+45.

First I need to bring 4x to left side and +15 to right side. So the equation becomes 7x-4x=45-15

Now, I need to subtract them. So the equation becomes 3x=30.

Now divide 30 by 3 so the value of x will be 10

I will type the solution in sequence for the user to understand
</reasoning>
<answer>

  1. 7x+15=4x+45

  2. 7x-4x=45-15

  3. 3x = 30

  4. x = 30/3 = 10
    </answer>

Which of these is a example of proper step by step CoT

rotund wing
hallow totem
#

2nd reading is what's called chain of thought basically and reasoning

rotund wing
hallow totem
#

the first one just solving maths step by step

hallow totem
rotund wing
rotund wing
hallow totem
#

so if question is like mary has 10 apples she gave 2 to her sis how many she has so model gotta be like

<reasoning>
The person initially had 10 apples
then she gives 2 to her sis that means she now has 2 apples less which means she now has 10-2 = 8 apples
</reasoning>

<answer>8</answer>

#

@rotund wing

hallow totem
rotund wing
hallow totem
#

npp

brittle rover
#

Is anyone still looking for a team to join? I have a cool problem. Please DM if interested

rotund wing
#

Is anyone expressing lag in notebook while fine-tuning the model??

I am expressing some bugs, when I run the cell that has to use GPU for a high-load task, the cell keeps on running mode with no output. Even the Cpu,ram, gpu and memory usage remains at 0%.

If you know any method to fix this problem, plz answer

dark forge
rotund wing
hallow totem
#

@dark forge @olive ledge hey! i was actually training gemma 3 on coding dataset but what's happening is
till 1000 smth seconds model is getting trained (i know it coz I got debugging outputs of response just to keep note of training)

after that the output response stops, but session keeps running till like 9000 secs but then session gets cancelled by itself

(im doing save and run all) does anyone got a hint here that what might be going wrong?

#

(sorry for the ping btw)

quick carbon
#

I fine tuned a Gemma model with sft but the lora(rank64,alpha64) makes the model answer an empty response on "hi" or "hello" prompt, is this a catastrophic forgetting? I only use 2e-5 learning rate, ~1.5 epoch, 400 steps.

hallow totem
quick carbon
hallow totem
#

i will make the notebook public by end of the day

silk patrol
#

is it necessary to disclose our private dataset used for training. And does the generation time for the dataset also count in the original 9 hrs of given time for a single session?

hallow totem
silk patrol
#

ohk thanks!

bright scaffold
#

cant access any gemma model through HF token

#

is it not allowed and we have to download it ?

dark forge
dark forge
hallow totem
#

works if i public it on 12th jan right

#

man wt is this suddenly now being in the queue for 4-5 hrs

#

earlier i was 5 hrs in the queue and then notebook failed just because i typed "checkpoints" instead of "checkpoint"

hallow totem
hallow totem
hushed flower
#

Hello @tawdry iris , is it possible to do anything about the waiting queue? Even during off-peak timings waiting is #40+ and the average waiting queue is #120+

It takes 2-5 hrs of waiting ans session discontinues if we take 15 minutes to think about code.. I'm concerned how to experiment with model given this situation, especially because we've been all working since 1 month+..

Would highly appreciate any help

hushed flower
hushed flower
hallow totem
hallow totem
hushed flower
#

Yep same, my earlier version ran for 8 hrs for 1200 batches.. but now even 500 is failing silently

hushed flower
hallow totem
#

unless they gonna evaluate it on integration and stuff (i hope not)

dark forge
# hallow totem <@315587319006756865>

in the FAQs they say

We will be using the Gemma2 2B/Gemma3 1B modelling code in Tunix to load this model up for evaluation.
so presumably if it's loadable by tunix it's fine

dark forge
hallow totem
#

also one more issue i get error when i do from load_dataset import dataset and also get some version error.

This error occurs even when i import windmaple's notebook as it is no changes made in a new notebook.
But when i run the same google's notebook i get no error such that

#

due to which all my notebooks are copy edit of windmaple's notebook and deleted all the cells and then updated code according to me which doesn't causes this error

#

there are like multiple issues with no explanation pretty confusing

dark forge
# hallow totem also one more issue i get error when i do `from load_dataset import dataset` and...

in case this helps -- this has fixed a lot of dependency headaches for me (but this assumes you are not using vllm)

%pip install python-dotenv
%pip install google-metrax
%pip install "jax-ai-stack==2025.10.28" "jax[tpu]==0.8.0"
%pip install transformers datasets huggingface_hub wandb numba omegaconf sentencepiece tqdm
%pip install --no-deps git+https://github.com/google/tunix
%pip install --no-deps git+https://github.com/google/qwix

hallow totem
dark forge
hallow totem
#

unfortunately, my model gonna perform very poorly on coding dataset..im literally not above to train above 600 NUM_BATCHES

#

no idea what's wrong

dark forge
hallow totem
#

with 600 num batches it just hits ig 33% accuracy

hallow totem
#

@dark forge hey just one last question pls minding replying to it
what license should i keep of my final notebook

#

MIT or Gemma license?

dark forge
hushed flower
hushed flower
hallow totem
willow stump
#

does anyone else here have repeating <end_of_turn> at the end of the generation? and will it somehow disqualify if the format <reasoning>reasoning </reasoning>
<answer>answer</answer>

Is otherwise correct, just a lot of repeating <end_of_turn> in the end?

@dark forge @tawdry iris @everyone @here

rotund wing
#

I am have confusion regarding the reasoning trace.

Should it contain Paragraphs or Step by step?

Paragraph example:-

Alright, so first the problem is ...... 
Let me think of......

Step by step example:-

1. This problem is ......
2. I need to find out .....

Which one is better format?

dark forge
# willow stump does anyone else here have repeating <end_of_turn> at the end of the generation?...

I don't think it will 'disqualify', but you will probably loose points for model quality. This sort of repeating can be an indication that something has gone a bit wrong in your training. But it might be worth checking that you have set up your inference correctly as well (e.g. if you haven't supplied the correct stop token to your inference code it will just make the model keep generating even when it should have stopped)

hushed flower
#

Hello @tawdry iris , since a few days I've noticed the training/session stops without any error at random batch number.

I've trained till 1200 steps once, yesterday randomly it finished till 500 steps but now it's not going till even 200, I've tested this on same configuration as well.

I think some others are also facing same.

Can you please share if you have any idea what might be happening?

tawdry iris
#

Sorry, I was out for a couple of weeks. Coming back online to address the questions here. And thx to the folks who were here to help each other πŸ˜€ . Good community spirit!

tawdry iris
tawdry iris
tawdry iris
# hushed flower Hello <@1341588806737858701> , is it possible to do anything about the waiting q...

Unfortunately there is a chip shortage everywhere, for which there is nothing we can do. I generally recommend running notebooks in the background so that you don't have to wait online. If you really need an interactive session for debugging/etc, try:

  1. load model weights in bfloat16 instead of float32 and reduce sequence length, this might fit on a Colab TPU
  2. use Gemma3 270M on free-tier Colab TPU, but only do this for debugging; 270m is not an acceptable model for eval
  3. WWYMAK has a GPU notebook (https://www.kaggle.com/code/wwymak/sft-with-qlora-and-the-jax-stack-gpus-edition#LoRA-&-QLoRA-Demo) that might help a bit. But do note that we will only use TPU for eval, so again this is for debugging
tawdry iris
hallow totem
hushed flower
#

Yeah Exactly, I just tried again, yesterday my notebook ran till 500 batches, today it silently stopped at 132 steps on similar configuration

hushed flower
#

I had a question regarding correctness reward function.. apart from using llm as judge, how do we check for correctness for domains other than math/logic?

dark forge
hushed flower
#

That's interesting .. thanks for sharing this!

hushed flower
#

If the model outputs new line between reasoning and answer like:

<reasoning>
reasoning_trace
</reasoning>

<answer>
answer
</answer>

Instead of:
<reasoning>
reasoning_trace
</reasoning>
<answer>
answer
</answer>

Would that count as invalid response?

#

Or does it have to be all in same line like:
<reasoning>reasoning_trace</reasoning>
<answer>answer</answer>

hallow totem
hushed flower
#

πŸ˜‚

hallow totem
#

haha fr man

hallow totem
hushed flower
#

Gotcha

lethal galleon
#

Hi, will the models be tested using tokenizer.apply_chat_template()? Meaning we should expect the evals to have tags like "<bos><start_of_turn>user...

This changes the final sampler output a lot. For instance, when I don't use <bos> (I trained with it), my model doesn't out anything. It is very sensitive to that.

modern maple
#

Could I please have some clarification? Should we be using gemma-3-1b-it or gemma-3-1b? The provide template notebook reads, "Use instruction-tuned Gemma2 2B or Gemma3 1B (other models are not allowed)", yet a previous discussion post reads, "The challenge of this hackathon is not around instruction tuning." Thank you!

tawdry iris
#

use instruct-tuned model

tawdry iris
rotund wing
#

is anyone is facing this error:
module 'flax.nnx' has no attribute 'ModelAndOptimizer'

i have asked this not because I want the correct code, I copied the installation and imports directly from official documentation (https://tunix.readthedocs.io/en/latest/_collections/examples/qlora_gemma.html), but it still results this error. So if any other is facing the same error that means I need to install other versions of these libraries

#

please answer as soon as possible because i havn't even started traning the model because of these errors. And only 4-5 days are left.

tawdry iris
#

This is due to a Flax NNX change from flax.nnx.Optimizer to flax.nnx.ModelAndOptimizer

#

You should upgrade Flax to 0.12+

rotund wing
#

so it means that libraries are getting updated... is jax, tunix, qwix is also getting updated?

hallow totem
# tawdry iris You should upgrade Flax to 0.12+

ive actually already submitted my unrestricted model and all the notebooks, writeups everything so now am i supposed to go to every notebook and change the code and upgrade flax or should I let it be and it would be handled by the judges?

vivid veldt
vivid veldt
#

@tawdry iris hello! What is the expected strategy for kagglehub authentication to download gemma weights? As in, I shouldn't share an api key to ensure download lol.

tawdry iris
vivid veldt
#

So it should be called KAGGLE_API_KEY?

tawdry iris
#

I use 'KAGGLE_KEY', but 'KAGGLE_API_KEY' is fine

tawdry iris
hushed flower
#

There's no limit on max_generation_steps in inference, unlike for temperature - which is 1e-4, right?

tawdry iris
#

For temp, set it None for inference. See the latest submission template.

tawdry iris
vivid veldt
vivid veldt
#

ok I see now this setting lives in

sampler.CacheConfig(cache_size)
hushed flower
#

Submissions are open till 12th Jan 11.59 PM (midnight) right?

hallow totem
#

for a clear clarification

hushed flower
#

GMT+5:30 (IST)

bright scaffold
#

Using gemma 2b is acceptable or we have to explicitly use 2-2b ?

rotund wing
hushed flower
#

Alright

rotund wing
#

@tawdry iris If I am unable to submit the working model at last, but give a good writeup, video and datasets so will I get some points?

quick carbon
topaz marsh
#

is it possible to extend the deadline by 1 day more ?

hallow totem
#

but idk, it doesn't seem like a valid argumt

fathom silo
#

@tawdry iris there seems to be lot of crunch in availability of TPU ; is it possible to extend the deadline by a day or two?. Also noticed the number of submission are still quite low. It would be great if the deadline could be extended.

muted mantle
#

Hey guys I recently found out about this competition and I really want to do it but there are only 9 hours left so I know its not possible bbut I still want to keep trying if late submissions are possible because I want to focus on learning. Does any body want to do it with me?

desert minnow
hushed flower
green oar
#

can we please extend the competition deadline by a few hours (11:59pm pst)? i had queued a TPU run overnight but didn't know that queued runs were prioritized less over interactive sessions because kaggle doesn't say anything

#

now im #198 in queue

rotund wing
rotund wing
rotund wing
desert minnow
# green oar now im #198 in queue

"TPUs are popular right now. You are #183 in the queue. You can wait, try connecting again later, or use another accelerator." same here

#

Looking forward for any solution please.

rotund wing
#

The solution is only with the competition host.

rotund wing
vivid veldt
#

submissions show up under "Code" correct?

#

@tawdry iris I do not think I submitted my notebook properly.

#

I have been working on this project 6-8 hours a day since novemeber.

#

My username on kaggle is Echo9Zulu, can you confirm my submision was submitted? I will be very upset if I lose all this time because I made silly mistake.

#

Would appreciate easing my nerves very much.

austere steeple
#

Hello everyone,

I hope you are doing great. Wanted to share our work. We GRPOd Gemma3-1B-it to be a good chemistry reasoning model. We had some cool results and the model generalizes well across other domains. Feel free to check our work and share your thoughts.

Writeup: https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/introducing-gemmax

Notebook: https://www.kaggle.com/code/alfaxadeyembe/gemmax-1b

Inspiration: https://www.futurehouse.org/research-announcements/ether0-a-scientific-reasoning-model-for-chemistry

hallow totem
fading jungle
#

I have been skimming the notebooks... some really interesting projects! Great work everyone.

I noticed that a few use external API keys during the 9-hour training run (e.g., calling a proprietary model for LLM-as-judge). I thought any LLM-as-judge would need to be an open-source model loaded in memory.

vivid veldt
prisma barn
#

Subject: Kaggle Google Tunix Gemma 3 Unicorn-1B Writeup: SOTA Formatting Fix + No-API Reward Functions

Just finalized our submission, Unicorn-1B.

Seeing the discussion on API keys/Judgesβ€”we managed to solve the reasoning alignment 100% On-Device (TPU v5e) without calling external APIs during the loop. We engineered a 6-Signal Composite Reward Function (RegEx/String matching) to enforce strict structure without network overhead.

Also, for anyone who was hitting the 100GB RAM / XLA Compilation Graph crash when training on 1B+ tokens: we found the fix was moving from dynamic iterators to Static Dataset Objects. It drops compilation time from 60m to 8m.

Hope the writeup helps anyone debugging JAX OOMs!
https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/novel-sft-rl-pipeline

fallen seal
hallow totem
ionic elk
#

I would love to see a part 2 of this competition. I was looking forward to contributing with an educational video but alas life decided priorities needed to be shifted elsewhere. Either way had fun and looking forward to seeing what the winners have to offer. πŸ™‚

vivid veldt
hallow totem
#

Honestly it was pretty difficult to manage with this competition.
Coz I'm currently a first year bachelor's engineering student

and hence managing college + the beautiful TPU queue on kaggleπŸ˜‚

vivid veldt
#

Oh yes. I am full time data engineer which = long nights, long weekends and even longer queues lol. Very intense project

hallow totem
hallow totem
hallow totem
#

im very glad that I rushed and completed 15 days before the deadline

vivid veldt
#

Lol barely. I am data monkey

hallow totem
#

coz I saw ppl seeing hell during last days

vivid veldt
#

Dude i had first stable checkpoint in pytorch week and half ago and had to port to tunix in like two days

#

Using custom loss and unsupported training strategy

hallow totem
vivid veldt
#

I believe I nuked my submission this way.

hallow totem
#

i mean I'm not winning either but still just for the sake of submitting

vivid veldt
#

Waiting to hear back from @tawdry iris

#

Of course!

hallow totem
#

oohh ah sorry to hear that

vivid veldt
#

Your solution was a well engineered pipeline.

#

Many of the other solutions were heavily generated. One was an obvious prompt injection attack. Curegrpo is cool

hallow totem
#

really? i struggled a lot but haha thank youu

vivid veldt
#

Yeah man you went full send

hallow totem
hallow totem
vivid veldt
#

Haha I am mostly obsessed with Ai and learning more.

#

I have not trained models before this competition

hallow totem
#

ooh wow

#

I've been learning abt ai since last two years

#

basically since the last year of my highschool

vivid veldt
hallow totem
vivid veldt
#

Are you familiar with Huggingface ecosystem? Kaggle is fantastic, but HF is where your models will be most visible to the community of people who may be interested to try your 1b.

#

Plus

vivid veldt
#

At some point the evals you made, plus your model, me be useful to further evaluate shadow tokens vs base model

hallow totem
#

lemme share you my

#

github

#

wait

vivid veldt
#

Interestingly shadow tokens were highest in basemodel completions where the model explains itself after answering.

vivid veldt
#

For conversion between formats I used deepwiki to help. Tunix APIs have changed a few times since November.

hallow totem
hallow totem
hallow totem
vivid veldt
#

Maybe sit down before looking up

#

You may faint

hallow totem
#

no way not im interested πŸ˜‚

vivid veldt
hallow totem
#

lemme see

vivid veldt
#

Oh it added my github to link? Unintentional lol

hallow totem
#

repo not indexed

vivid veldt
hallow totem
#

wait a sec

#

the OpenArc

#

is ur original repo

#

?

vivid veldt
#

Yes

#

Haha

hallow totem
#

no way

#

273 stars

#

no way you're popular

vivid veldt
#

No way you happen to have arc gpu?

#

Or intel cpu

hallow totem
#

intel cpu ampere gpu

#

rtx 3050

#

man damn i thought the OpenArc on ur profile is just a forked repo

#

woww

#

have you ever done CUDA programming?

#

wait lemme upvote ur writeup i liked it

#

done

vivid veldt
vivid veldt
hallow totem
#

there's no unstructured phase, r u talking abt phase 2?

vivid veldt
hallow totem
#

but following the normal reasoning tags and answer tags structure

#

i mainly had to remove it coz phase 2 was based on general reasoning also creative reasoning and for such task the output tokens very too much while following cure structure due to wish output was getting truncated and model was collapsing coz of rewards penalty

#

but the model is well trained on CURE structure so if the judges got a workaround and eval on higher output tokens limit then it'll follow CURE structure even on the general task well (I've done the inference earlier and yeah it did it perfectly except for sometimes output getting truncated due to higher tokens)

tawdry iris
vivid veldt
# tawdry iris I cannot find your notebook in the writeup

I added it to the "code" page on the main tunix competition page, which I guess was incorrect. When I went to hut submit to competition it wouldn't complete because there was no output file. The notebook I developed in I guess was missing something related to the competition which was lost when i tried to import.

#

Discord isn't allowing me to send an image. On the main page under "Code" if you search "Echo" it comes up

#

Is there some way to appeal? I have been working on project almost everyday since November

tawdry iris
#

Can you add your notebook link as a comment in your writeup?

hallow totem
#

hello @tawdry iris could u pls give us a rough idea about when we can expect the results to be announced

tawdry iris
#

It depends on the bandwidth of the judges, so it's hard to say. I think at least March.

hallow totem
#

@tawdry iris sorry but were we supposed to add our PR in our writeup for our submission to be valid?

either way, I opened a PR on December 2 2025, on an issue i found during the time I was working on this competition

I think you might remember the issue, it was that I was getting 0% format accuracy as well as answer accuracy for both pre training and post training evaluation.

I figured out the issue later and hence opened a PR for that particular issue

here's the link to the PR: https://github.com/google/tunix/pull/820

also I hope it's perfect if I add this PR in the comments of my writeup right?

here's my writeup link as well:
https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/new-writeup-1766255740138

tawdry iris
#

That is fine.

hallow totem
#

thank you

hallow totem
#

@tawdry iris is it possible for you to let me know why my PR isn't reviewed

according to me, wt I found was the batch slicing isn't done as expected when train micro batch size is less than num generations

eternal hill
#

Hello. Can you please let us know if the winners declaration for the google tunix hack been postponed. Would we get an official update regarding the postponement in case the results have been postponed?

rotund wing
eternal hill
#

@tawdry iris Thank you. 😁

astral cairn
eternal hill
hallow totem
#

any updates on when results will be out? @tawdry iris

hushed flower
#

@tawdry iris any updates when can we expect the results?