#google-tunix-hackathon
1 messages Β· Page 1 of 1 (latest)
Hello
Hello everyone.
hello
Hello
hello all
hi
hello everyone
Hi
Hello
Hello
Hello
Hello all
hii
anyone open to collab? looking for people who have some experience in fine tuning llm's!
I need 4 new people for my hackathon team. If anyoneβs interested, just DM me!
Hello
What is this hackathon about?
Hey Guys!!! I am new to hackathons and this would be my first one... Does anyone has any tips or just some point to keep in mind for me?
Helloπ
You can try to join a team so that others can help you get started.
There are public starter code notebooks at some point for you to get an idea of the technical concepts and expand it to make your write up etc
hello everyone!! i'm new here!!! welcome to team up with me!!!
hello everyone
Did I miss something? When I tried to download the dataset here: https://www.kaggle.com/competitions/google-tunix-hackathon/data it was empty.
Also, Hi!
Hi everyone
I don't figure what we need to do in this hackathon ? As the problem statement says we need post train the llm using tunix such that it not only predict the answer but also provide how the model is calculated . But when I check out the starter code it's already there that it providing resonong along with the answer than what is supposed to do by us ?? Please help me as I am new in this type al hackathon..
Hi
Hi, I tried to click on the gemma starter notebook but it 404'ed. Can someone on the admin team make an update to the correct link? https://github.com/google/tunix/blob/main/examples/grpo_demo.ipynb
hi everyone! will there be any office hours this week?
Hello everyone. Is model distillation from non Gemma teachers allowed?
I found some other example notebooks, just wondering if the demo one was different.
anyone had any issue running tunix today? yesterday i did a full run, trained a gemma model, got good results like:
Model State Accuracy
Raw baseline (no GRPO) 45β55%
After v1 β98%
but today even re-running the same notebook i got so many issues with tunix, lora etc that i've been unable to do training
some update of sort maybe and nothing is compatible anymore i don't know, i'm very lost :_:
what error did you get? Are you using the right TPU image?
i literally trained it with a notebook, then run it again and now always get errors
Use the starter notebooks here: https://www.kaggle.com/competitions/google-tunix-hackathon/data
without having changed anything
difference was from yesterday to today
so i thought maybe some update or something
That starter notebook is only for math reasoning. If you ask a non-math question, it will fail. The goal of this competition is to train a general reasoning model, not just for math.
There is no data provided. You have to come up with your own data.
Can you be specific about the error(s) you are getting? Do the starter notebooks work for you?
i was using this grpo_demo Gemma2 2B
i tried to run this one as well from someone else grpo_dual-stream_tunix
but i got the same problems
like with the shards and the flax version mismatch
and qwix lora and tunix version
i tried to downgrade but some wouldn't
so environment issues anyway
are you pinning your Tunix version like this '!pip install "google-tunix[prod]==0.1.3"'?
Yes, I pinned Tunix to 0.1.3 and that version is incompatible with QWIX LoRA and thatβs why I got the recursion errors.
Tunix 0.1.3 is required for the official GRPO training flow
But 0.1.3 is currently incompatible with QWIX LoRA
So the only working solution is: use 0.1.3 but disable LoRA but that... wasnt also viable
I just ran the Gemma3 1B starter notebook and it is fine (you have to get rid of wandb code since there is a known bug https://github.com/wandb/wandb/issues/10872).
i wasnt using it
i disabled it
thats how i succeded yesterday as well
but as i said the same notebook yesterday working, today no
so that's weird no?
anyway can you link me to the starter notebook u just tried
and i will start again from there
thank you, appreciated
i only have 2 h tpu left this week lol
Always go back to the starter notebook to isolate if sth. weird happen
oh ok didnt think about that
thank you
anyway just as a quick question
i was confused because i wasnt sure of how the first training went
but basically baseline was like around 45% all 3 results
but after training i got like 55% 60% and 98% accuracy
with the 2b
which is not good because is supposed to be a general enhancement of score right?
mostly improved dramatically the format but not the rest
55% accurary isn't bad; LLMs are known for bad with math
98% format accurary is normal
ok perfect, yeah i couldn't understand if was abnormally good or just normal
thank you again, i'll treasure the tips about going back to the starter notebook!
have a good night!
you can see the result I got in the starter notebook as a reference
Found! Oh ok my run wasnt bad at all then
Hopefully I'll do better if i can make it work again lol
@tawdry iris thank you now is working, damn i spent so many hours trying to fix it, start from scratch was way easier lol
Hello everyone
hey everyone, i just kneew about this, and am really interested in joing the team. so pls let me in your team
I have no credit or debit card, so how i can access Google cloud
heyy, can u run your notebook for now ?
I tried to run the baseline but it did not work
Yes i had to restart from scratch and worked, couldn't find what was wrong with mine in the end but anyway working now
oh wow, did you try the Gemma 3 1B version ?
I just tried this version and it showed me the error telling me that the libraries is conflicting
Tried both but 3.1 i run out of hours before any results
Back on monday
But it was working
can you run this cell:
# Policy model
lora_policy = get_lora_model(ref_model, mesh=mesh)
# nnx.display(lora_policy)
I tried multiple times but it did not work
yea I will try it again
just posted a post in Kaggle about this problem
anyway, thank you for your help
As the guy said, try the basic notebook or restart from there and it works
I will try the Gemma 2 2B version also
yup, I tried the notebook from the Data part in the competition
Yeah when i restart it from there it worked
did you change anything ?
Otherwise i had so many libraries issues and versions
I read you said that you pinned some versions
The one from the basic notebook are the right ones
yea, I will try it again
thank you so much
have a nice day xD
I ran the Gemma 2 2B version and it showed the same error again
@earnest isle if you run it in the future, can you please let me know if it is successful or not π₯Ή
I ran the one from the competition
And it didnt work? Weird
hey guys i wanted help with
https://www.kaggle.com/competitions/google-tunix-hackathon/discussion/638533
please check it out, would rlly be appreciated
@upper geyser hey there you're here wow
yea hello man
hey hello! you're a researcher? wow
You do not need to use Google Cloud. Kaggle already offers free TPUs.
oh yea :>, trying to be actually
@tawdry iris I've replied to you on my "discussion " reply for the 0% accuracy issue I kindly request check it out
i'm a bit confused i have the tpu accelerator selected, but my notebook (training now) is only using cpu?_?
oh it just... doesn't say that is being used.. so confusing lol, but i'm not getting error and tpu is active so i suppose it is working
This is a known Kaggle TPU limitation. If you are using CPU image incorrectly, it won't even run at all. There will be some error.
Hi..I had couple of query if someone could help with
-
What level of evaluation is done? I mean one of the domains mentioned is code. The hard thing here is that code reasoning and outputs is one domain where the token limit is hard to control. You canβt summarize or compress this. I see in the competition page 1K output token length is fine (and makes sense coz having longer sequence length in training just adds more runtime). So my question was, for coding can we limit to smaller token usage cases only and expect the eval to run on that. If so I would just donβt want to waste time (and tpu hours) trying to train longer sequences
-
The single session 45 points and multi session 15 bonus points was a bit confusing. Please let me know if I got this correct: For the 45 points in single session we need to do all the trainings within that session (and canβt load any checkpoints we saved in a previous session). Basically all the training Gemma model gets from scratch is to be ran in a 9h session. The 15 bonus point is for alternate models we trained ( with more time and resources across multiple sessions) and we just have to save that to kaggle and share the name of the multi session model in 9h notebook at the end. These are mutually exclusive models and canβt be used as stage loads in 9h runtime notebook
Thanks in advance!!
Looking for LLM Enthusiasts!
A new hackathon is live, and we are building a team to participate. If you have experience with LLMs or are interested in learning, join us to collaborate, share ideas, and work on exciting projects.
Hackathon:
Google Tunix Hack - Train a model to show its work
https://www.kaggle.com/competitions/google-tunix-hackathon/overview
My Kaggle Profile:
https://www.kaggle.com/muhammaddanyalmalik
but then how should we run it? takes 6 hours to just train, let alone test on a CPU!
how are you guys running the starter NB?
hey! has anyone been able to spin up a vLLM + Tunix run on Colab? if so, could you please share your rl cluster config and your pip freeze?
It should only take 2-3 hrs for the starter notebooks. Btw, if you are looking at the Gemma3 1B starter notebook, don't be fooled by the progress bar (which says it takes 5+ hrs), it actually finishes in half of that time.
Can you post this question on https://github.com/google/tunix? Our engineers can give you some guidance there.
Sure! Thanks π
You are a judge for this competition, right?
I am.
-
Verifiable tasks (math&coding) will have much lower weights because 1) the starter notebooks already cover math and 1B or 2B models arenβt very good with math in general, especially without tools 2) Gemma is not particularly well trained with code.
-
Correct for the single-sessio mode. You can only load one of the 2 stock Gemma models via official Tunix APIs and finish the training in one go (loading other checkpoints is not allowed and will be heavily penalized)
For the multi-session run (let's also call it 'unrestricted mode'), if you choose to participate, it will be a separate model. You can resume training from your single-session ckpt for it, or do whatever you want. No restriction.
We are working on a submission template and a FAQ. Please stay tuned.
Looking for LLM Enthusiasts!
A new hackathon is live, and weβre building a team to participate. If you have experience with LLMs or are interested in learning, join us to collaborate, share ideas, and work on exciting projects.
Hackathon:
Google Tunix Hack - Train a model to show its work
https://www.kaggle.com/competitions/google-tunix-hackathon/overview
My Kaggle Profile:
https://www.kaggle.com/muhammaddanyalmalik
βVerifiable tasks have much lower weightsβ - Does this mean the evaluation of model will done with lower weight on verifiable tasks OR you meant the base Gemma model have low performance on verifiable tasks and needs more weights tuned.
If itβs the former can you please let us know what the evaluation of trained model will be on , so we can design training to follow that preferences.
https://media.discordapp.net/attachments/1444971360047726605/1445085758598938824/image1.gif?ex=692f107d&is=692dbefd&hm=94f18cd6e7350e7cc612826beb5d11a9fd125485a58ee1e39a16a03b6f9e2426&=&width=237&height=315
https://media.discordapp.net/attachments/1444971360047726605/1445085766937088000/image2.gif?ex=692f107f&is=692dbeff&hm=51e8429e6818b166e21485a613e8f0c706d64c765aefc93f65a7bcefa10907c2&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085774562197535/image3.gif?ex=692f1081&is=692dbf01&hm=e520e8e4edd4eea02e82168a7059a868ea59c19d9b90c7c34402f7bb3616c76f&=&width=864&height=1152
https://media.discordapp.net/attachments/1444971360047726605/1445085781801566319/image4.gif?ex=692f1082&is=692dbf02&hm=bdc0715977fdcda4b7804916e5bfb36af1d3132f535d1b4327894a067fbfc769&=&width=725&height=907
I have a question, do we have to fine-tune the gemma model only for 1 task reasoning? Like only mathematical reasoning or logical reasoning? Or we have to make the model do all type of reasoning tasks like mathematical reasoning + logical reasoning + Commonsense +.....?
The final evaluation dataset we use will cover a range of domains, not just a single domain.
It just means the final evaluation dataset will have fewer questions from verifiable domains (math+coding).
Ok thanks π
A quick question on this: does multi-session mode then automatically render more points? I see they're an optional 15 additional points in the rubric
No. There is a model quality threshold; models below that won't get any point.
||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β||||β|| _ _ _ _ _ _ https://imgur.com/TC6h8P4 https://imgur.com/iiKXKB5 https://imgur.com/JAkE28j https://imgur.com/keASgw9
is anyone facing hf rate limit error?
Hi @tawdry iris , Im running into error:
TypeError: set_metadata takes either 1 argument or 1 or more keyword arguments, got args=('sharding_names', ('fsdp', None)), kwargs={}
when trying to load lora model using:
lora_model = qwix.apply_lora_to_model(
base_model, lora_provider, **model_input
)
my lib versions are:
flax: 0.12.0
jax: 0.8.0
google-tunix: 0.1.3
Can you please suggest any possible workaround to this?
anyone else facing same issue?
yes i am today, till yesterday it was fine: WARNING:datasets.utils.file_utils:Got disconnected from remote data host.
Hello, me and my team are working on a submission, I have two questions:
- Token limit enforcement: Is the "<1K output token" limit a hard stop during generation? If the model is mid-sentence and hits the limit, will the output be truncated immediately?
- Parser Robustness: If a response is truncated due to the limit or if the model fails to generate the closing </answer> tag, how is this handled? Will the parser attempt to recover the content or will the submission automatically receive a zero score for that specific prompt due to invalid formatting?
These clarifications will help us better design our training strategy. π
I'm not sure if this is the exact error I got initially, but I have a fully working Save & Run w/o errors. I had to change rollback to earlier library versions: https://www.kaggle.com/code/philipkd/full-save-run-of-grpo-demo-checkpointing
Hmm, this usually happens when you use Flax 0.12.1. Can you double check?
- You can go beyond 1K. It's not a requirement at all. We put it there only because longer output sequences require more compute, which I'm sure a lot of ppl don't feel having enough of 2. We won't truncate it. The entire sequence will be used for eval.
Hello @tawdry iris , I'm using 0.12 version :/
hello @tawdry iris, regarding the model generation format, is text outside <reasoning>...</reasoning><answer>...</answer> allowed?
why do you need text beyond the closing tag?
Can you run one of the starter notebooks and see if it work? Then you can compare yours w/ the starter notebook
just did "copy & edit" on the gemma3 notebook and that ran, but when I copy paste same code manually in a new notebook, that didn't seem to work.. im not sure what's the issue exactly π€
Did you choose the TPU image in the right side panel?
yeah.. TPU v5e-8 session
Can you share you notebook?
@all, we published a submission notebook template and a FAQ. Please take a few minutes to read through them.
yes sure, let me DM you
hello everyone, i have some questions about competitions in kaggle,
Submissions to this competition must be made through Notebooks. In order for the "Submit" button to be active after a commit, the following conditions must be met:
CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv
will these be assesed based on the notebook you use to train your (pre-trained) models? what if i use another source like colab pro i and just save the best models the upload to kaggle to inference, does that count as cheating or anything?
Please read the FAQ(https://www.kaggle.com/competitions/google-tunix-hackathon/discussion/651560) very carefully. For the single-session mode, your notebook needs to run on Kaggle; if we cannot reproduce your model on Kaggle, you get 0 points (out of 45). For the unrestricted mode, we don't really care how you train the model.
I read Submission Template, do it is compulsory to use LoRA?
Can i get some feedback on this? https://www.kaggle.com/competitions/google-tunix-hackathon/discussion/653942
Technically you don't have to. But what else are you going to use?
Answered
Ok thx π. I have some other ideas that I believe will work properly.
thanks a lot
hi guys, i have 4 questions regarding this hackathon:
- It is stated that tool use is not necessary, but do i understand correctly that it is allowed?
- Can I use everything thats possible within a kaggle notebook with internet access enabled? (for example google search, llm apis, pretrained models as judge, pretrained models for distillation)?
- Has the workflow to be User Input -> Reasoning -> Answer or can I do something like User Input -> Planning -> Tool Calling -> Reasoning -> Tool Calling -> Answer?
- Can I use all Gemma 3 1B variations (like instruct or quantized models)?
@tawdry iris So I can make my own dataset for the competition as long as the results are fully reproducible within my submission? Also in the "model quality across multiple kaggle sessions" evaluation description it says we can use private data. Does that mean we don't have to reveal our training data used for a run across multiple sessions, we just have to provide the checkpoints and explain what we did?
Yes, it's kind of expected that you come up with your own training data. And correct, in the multi-session (or unrestricted) mode, we don't really care how you train the model. Just provide the checkpoint and describe what you did at a high level.
- Correct. Tool use will not be part of our evaluation though.
- Correct. Just make sure we can re-produce your results.
- Techninically you can. But 'planning' or 'tool call' output might interfere with our evaluation system, so I do not recommend making things complicated unless you really have to.
- Definitely instruct model. Quantized model might work, but there will be work needed.
also I have one more question, talking abt the correct answer accuracy what's the best someone could even reach roughly?
@tawdry iris
also I'm very confused abt how exactly is the code evaluated? good format + accuracy?
but then how I'm confused coz you see gemma 2 2b as shown in the started notebook clearly outperforms gemma 3 1b with just the same default parameters, reward fns and yk a little training
then wt if I just use gemma 2 2b i mean I don't exactly get what's the goal (just abt final model performance)
pls answer my questions whenever you have time
Thank you
@tawdry iris Although the quality of a reasoning block is subjective, will the answers to questions for evaluation be short and objective? For example will the model only be expected to output one word/number for the final answer between <answer></answer> tags or could it be a long sentence or paragraph? And will there always be exactly one correct answer?
only final answer between answer tags
and yeah there will always be exactly one question
what dataset are we supposed to use for training is it the maths dataset in the starter notebook? like are we just supposed to train the model on that dataset and that's it? @tawdry iris
we are supposed to come up with a dataset of our own
@tawdry iris for some reason I always get permission denied every time I try to load the model from my Colab Notebook. I checked, I am correctly authorized, and I accepted the license agreement. Will be grateful for help in resolving this issue.
of our own? like am I supposed to train it on an another dataset picked from kaggle? (coz where else am i even gonna get a dataset from, if I'm not wrong) but which dataset? and what domain?
also if we're supposed to come up with our own then why's everyone else using the math 8k dataset
there's lots from huggingface hub, or you can generate your own, or mix and match. Thee main point with a custom dataset is that for your model to generalise to multiple reasoning domains it's very unlikely that math 8k is going to be sufficient
is there any dataset which you are aware of and don't mind recommending?
you will want to do your own research -- a good starting point I find is https://asta.allen.ai/ and searching for recent papers on llm reasoning
alright thanks a lot for the guidance
im actually pretty prepared with the notebook and my own reward system
just gotta get a proper dataset now
hey i just have one more question if you don't mind answering.
as we are supposed to use a dataset covering reasoning from multiple domains, there just won't be a single numeric or single word answer anymore provided by the model right and might be a text prolly 1-2 lines in the final answer tag
Olmo trace is also really awesome
yeah, I expect so
yeah. formatting issues are still a problem when you do text-based reasoning, even when trained on the GSM8K dataset, so you can easily reward formatting related components, even if you don't reward having "the right answer." You could also use an LLM judge to verify the answer, but I found it too slow for GRPO with the notebooks. Some people mention in the discussion board about using the Gemini API, so that's another option for having an LLM judge as your "verifier"
i prepared my own dataset but the trainer isnt running at all it's just loading and loading
i just am using 20k rows with 1 epoch num iteration as 4 num generation as 4 train micro batch size as 4
do you got any idea where im goingwrong if u dont mind helping me @olive ledge
Hello @tawdry iris (sorry for the ping if it disturbs you).
I had a quick clarification question. The description mentions that βevaluation will cover both the reasoning trace and the final answer.β
Should the modelβs output reasoning be a concise explanation that justifies the answer, or should it include a detailed, step-by-step chain-of-thought showing the modelβs internal reasoning process?
I want to make sure the modelβs outputs are aligned with what judges find most useful and readable.
@here does anyone else also having issues downloading the model directly from kagglehub after accepting the license agreement and authentificiation ?
also, I will be very grateful if someone @tawdry iris or @everyone could clarify:
if the answer does not adhere to the format
<reasoning>model_thinking_trace</reasoning>
<answer>model_answer</answer>
will it get a partial credit, or just 0 , and thus in the 45 points that are awarded for this section will contribute as (0* number_of_answers_that_do_not_comply_with_format + score*number_of_answers_that_comply_with_format)/ total_number_of_questions ?
I think that you should follow the given format.
@tawdry iris @olive ledge @dark forge can we train for more than 2 sessions?
yes, but it only counts for the multi-session bonus points, not the single-session one.
if I'm doing multi session then I'll be getting
45 (single session) + 15 (multi session) points right?
yeah, as long as you have one notebook that goes from vanilla Gemma (2 or 3) to something useful, that's your single session. then you add a line at the end with your multi-session trained model ID for the bonus points.
ok so let's say I have
3 notebooks
notebook for session 1
notebook for session 2
notebook for session 3
so then I gotta upload the model on kaggle and add model id at the end of session 3 notebook??
They're only going to run one of your notebooks all the way through. Pick that notebook as your single-session one. In your case, probably only the notebook for session 1 would qualify. And then add a cell at the end of that notebook that specifies your model id for what you produced at the end of session 3. From the competition website regarding multi-sesh: "Participants must explicitly provide a Kaggle model name/ID at the end of the notebook as the submission for this item."
ooh so I can just provide model id at the end of all session notebooks then that works perfectly right?
also one more doubt
when I let's save have run whole notebook and got an output file in output section of my notebook
now after getting that let's say I make some changes in the maybe markdown and do quick save -> save an output for this version
even after this why does the output file from output section disappear
@olive ledge could u pls help me with this
that's just the way kaggle works. quick saves don't save the output files. but those output files are stored in the old versions. my pattern is to click the "..." next to the output and click "create new dataset," and then use that in another notebook. There may be a better way, though.
should I be bothered abt those staying in output section or should I not be bothered coz judges would re run the code
entire note bhot
notebook*
yeah yeah that's wt i do but I was only worried coz it disappears from the output section
if you're 100% confident your changes don't affect your code (like markdown), then yeah, quick save is fine. but the way to be 100% sure is that your final version is a S&R (save and run)
yesyes at the end I'll save and run obviously coz I need checkpoints
thanks a lott
you rlly help a lot honestly
sure thing. thanks for sharing your CUREgrpo notebook so early in the comp.!
ay u checked it out? wow I'm glad thank you! im still actively working on it
Hello, I am AI Engineer Professional with knowledge of ML & RL Looking forward to join any team for Tunix.
what is the max allowed size for the dataset that we can create?
I don't think there's any max size just that you gotta complete training inside session time which is β€ 9 hrs
isn't it a bit too late for finding a team now? ig?
I know but either I can regret or could try to join any team is only option I have. so I preferred second one.
not rlly the only option, you still got 15 days lock urself in and start grinding alone you can do it alone as well don't need a team
is anyone else facing the error
module pyarrow has no attribute PyExtensionType
@olive ledge
it doesn't happen when running the starter notebook else wise it happens and depends really on luck for me sometimes it runs sometimes gives out this issue
one of best pieces of advice someone has given me. I will. Thank you.
npp honestly i never even felt the need of a team eve
ever*
because u r already capable of that but okay I understood your meaning to do it alone.
https://bit.ly/4amCrfJ
π π°π πππβππ πππππ
πππ ππππ π³π³π΄π, ππππ ππππππππππ ππ πππ πππ: Agents All You Need - A bAI Labs publication Dec 2025 edition
Can someone please answer this question? Idk why windmaple_87628 is not coming online.
Hi, I haven't been able to use a TPU in Kaggle. I am always at least #44 in the queue. I tried going to Colab, my code runs but it runs out of RAM.
- Is anyone else facing this issue? What have you done?
- If you have been +#40 how long does has it taken for you to get a TPU session? Have you gotten one?
- Any suggestion?
Thanks a lot!
P.s. this is my first Kaggle competition, any help is appreciated.
Yes, I also got the same thing when I started using TPU. it will take you around 15-20 minutes
Thanks for the answer! And after that I get it for a decent amount of time? If I have to restart the kernel for some reason (around the same time), will I have to wait in the queue again?
Maybe you will have to wait again.
you get max 9hrs per tpu session, but if you are inactive and nothing's running on the notebook for something like 10/15mins you get disconnected. Restarting your session (e.g. if you install something and need restart) don't boot you off the machine, but if you 'terminate' your session it does. If you're running out of VRAM on colab tweaking stuff like batch sizes and max sequence lengths might help (or if you mean 'normal' RAM you need to look at your data loading strategy)
Can someone please answer this question ASAP.
wt r u on man?
no way
if u r #44 in the queue minimum 1 hrs u gotta wait
best case 50 mins there's no way u getting in 15-20 mins
if u r 44 in the queue currently u r lucky coz I always be in like 70-80 or even 100+ most of the time
I was on #34 and it took me about 20 min to run the TPU. IDK if it took you 1 hr.
it gotta be a good reasoning including proper step by step chain of thought
Ok, thanks ππ»
NAH man I'm always 70-80 or even 100+ in the queue takes me 2 hrs or smth
just don't push the model to generate too much extra
it just gotta be as much needed
u also don't want ur output to get truncated
ok, thx
yepp np
Hello π, again thanks for helping me out.
But I have one question regarding this "step by step CoT":-
What should be the output look like?:-
1st:-
Question: find x in 7x+15=4x+45
<reasoning>
-
7x+15=4x+45
-
7x-4x=45-15
-
3x = 30
-
x = 30/3 = 10
</reasoning>
<answer>
X = 10
</answer>
2nd:-
<reasoning>
Ok, so the equation is 7x+15=4x+45.
First I need to bring 4x to left side and +15 to right side. So the equation becomes 7x-4x=45-15
Now, I need to subtract them. So the equation becomes 3x=30.
Now divide 30 by 3 so the value of x will be 10
I will type the solution in sequence for the user to understand
</reasoning>
<answer>
-
7x+15=4x+45
-
7x-4x=45-15
-
3x = 30
-
x = 30/3 = 10
</answer>
Which of these is a example of proper step by step CoT
I am asking this because it is written: "train to show its work" and I asked this same question from different AI. All are giving there own theory, some are saying 1st one is better because it is clear, some are saying 2nd one is better because it is written in a good format. And I am confuse about What should I keep. steps or CoT
the reasoning should be like 2nd one but the answer tag gotta be like:
<answer>10</answer>
2nd reading is what's called chain of thought basically and reasoning
Only the answer in answer tag?
the first one just solving maths step by step
yep
I also thought the same. But just to confirm I asked that
Ok thx again
so if question is like mary has 10 apples she gave 2 to her sis how many she has so model gotta be like
<reasoning>
The person initially had 10 apples
then she gives 2 to her sis that means she now has 2 apples less which means she now has 10-2 = 8 apples
</reasoning>
<answer>8</answer>
@rotund wing
npp
Thanks! I understood that π
npp
Is anyone still looking for a team to join? I have a cool problem. Please DM if interested
Is anyone expressing lag in notebook while fine-tuning the model??
I am expressing some bugs, when I run the cell that has to use GPU for a high-load task, the cell keeps on running mode with no output. Even the Cpu,ram, gpu and memory usage remains at 0%.
If you know any method to fix this problem, plz answer
I find that adding
import nest_asyncio
nest_asyncio.apply()
helps (for the more generic cases), but if it's very gpu specific, you need to turn off cuda async mode (set CUDA_LAUNCH_BLOCKING=1 in env vars) If it's tpu specific , I have no idea...
I will try it. Thanks π
@dark forge @olive ledge hey! i was actually training gemma 3 on coding dataset but what's happening is
till 1000 smth seconds model is getting trained (i know it coz I got debugging outputs of response just to keep note of training)
after that the output response stops, but session keeps running till like 9000 secs but then session gets cancelled by itself
(im doing save and run all) does anyone got a hint here that what might be going wrong?
(sorry for the ping btw)
I fine tuned a Gemma model with sft but the lora(rank64,alpha64) makes the model answer an empty response on "hi" or "hello" prompt, is this a catastrophic forgetting? I only use 2e-5 learning rate, ~1.5 epoch, 400 steps.
Hello, please mind checking out this question posted by me in Discussion.
It's regarding Google Tunix Hack
Thank you
https://www.kaggle.com/competitions/google-tunix-hackathon/discussiobn/665386nbb
i think it's a good idea to share the notebook codes tho, someone probably knows what's the problem
actually lowering the NUM_BATCHES worked without loosing accuracy so it's fine now
i will make the notebook public by end of the day
is it necessary to disclose our private dataset used for training. And does the generation time for the dataset also count in the original 9 hrs of given time for a single session?
as per I know, we don't have to disclose our private dataset (feel free to correct me if I'm wrong)
and about dataset generation, the 9 hours only counts actual notebook runtime.
you can just generate your dataset offline or outside kaggle and then save the file, import it as dataset in /kaggle/input and use it in ur notebook
ohk thanks!
cant access any gemma model through HF token
is it not allowed and we have to download it ?
you still need to make the dataset public though, even if it's just as a dataset in your public notebook-- in the FAQs
(45 pts) ...We will be running your notebook in a single TPU session (9hrs) and reproduce the model before sending it over to eval. If you use private data or tools that we cannot access, our reproduction training cannot finish and you get 0 pt.
make sure you got the correct read permissions on your HF token and that you have agreed to the gemma T&Cs on huggingface
ooh thanks for pointing that out
works if i public it on 12th jan right
man wt is this suddenly now being in the queue for 4-5 hrs
earlier i was 5 hrs in the queue and then notebook failed just because i typed "checkpoints" instead of "checkpoint"
Hello, could somebody please mind checking out my doubt in discussion regarding Google Tunix Hack
Thank you
https://www.kaggle.com/competitions/google-tunix-hackathon/discussion/665685
sorry but wt about unrestricted mode (multi-session) are we supposed to public our dataset for that version too? or only final model matters?
Hello @tawdry iris , is it possible to do anything about the waiting queue? Even during off-peak timings waiting is #40+ and the average waiting queue is #120+
It takes 2-5 hrs of waiting ans session discontinues if we take 15 minutes to think about code.. I'm concerned how to experiment with model given this situation, especially because we've been all working since 1 month+..
Would highly appreciate any help
Hey are you sure the <answer> tag should only include one numeric answer?
Since this is general/open-ended fine-tuning for any domain, I thought it could contain text/paras too
+1
The training stops at step 350 for example, but notebook will keep running
This only started happening recently.. anyone else facing this?
That was just about math domain.
Talking abt other domains such as general reasoning, it can definitely contain final answer text / para between <answer> </answer>
earlier i fixed this by reducing my NUM_BATCHES to 800 but when i re-ran my notebook yesterday, it stopped working even at 800 and now just working with 600
Yep same, my earlier version ran for 8 hrs for 1200 batches.. but now even 500 is failing silently
I think even for math it would work if it had text + numeric answer? π€
idk what's the issue
well yeah but i'd rather let it be only numeric answer for math, seems cleaner to me
unless they gonna evaluate it on integration and stuff (i hope not)
@olive ledge
@dark forge
in the FAQs they say
We will be using the Gemma2 2B/Gemma3 1B modelling code in Tunix to load this model up for evaluation.
so presumably if it's loadable by tunix it's fine
if you open the console/terminal does it say anything? My guess is either there's some silent error that's not fed back to the notebook (or jax got stuck in some async process...) you might also want to make sure that your dataset is deterministic -- it could be some weird row in the data that's causing issues which shuffling is hidiing
okay thank you
no no i verify that this is an actual issue and there's no issue with dataset or anything.
some days I back ran on 800 NUM_BATCHES and when i went for running the same code same dataset with no changes again on 800 batches it didn't run and it ran only when i reduced to 600 batches
also one more issue i get error when i do from load_dataset import dataset and also get some version error.
This error occurs even when i import windmaple's notebook as it is no changes made in a new notebook.
But when i run the same google's notebook i get no error such that
due to which all my notebooks are copy edit of windmaple's notebook and deleted all the cells and then updated code according to me which doesn't causes this error
there are like multiple issues with no explanation pretty confusing
in case this helps -- this has fixed a lot of dependency headaches for me (but this assumes you are not using vllm)
%pip install python-dotenv
%pip install google-metrax
%pip install "jax-ai-stack==2025.10.28" "jax[tpu]==0.8.0"
%pip install transformers datasets huggingface_hub wandb numba omegaconf sentencepiece tqdm
%pip install --no-deps git+https://github.com/google/tunix
%pip install --no-deps git+https://github.com/google/qwix
wow thx a lot but imma now pass on this coz im already done with running all my notebook and everything. I don't think it's an issue right if the notebook r copy edit of windmaple's coz the code reward fns etc are of my own and most of the things are different from initial notebook apart from core notebook stuff
I guess the main thing is to make sure that the notebook can run all the way through without random crashes due to dependency errors coming from installing tunix/jax stack (since the kaggle notebook preinstalled packages can change under you unless you have pinned the env)
okk alright makes sense thank you!
unfortunately, my model gonna perform very poorly on coding dataset..im literally not above to train above 600 NUM_BATCHES
no idea what's wrong
it doesn't affect too much (again from faqs)
One thing we could mention is that verifiable tasks (math&coding) will have much lower weights because 1) the starter notebooks already cover math and 1B or 2B models arenβt very good with math in general, especially without tools 2) Gemma is not particularly well trained with code.
yeah that's why i let it be but yk sad part is just with 800 NUM_BATCHES i was able to push the code accuracy from 16% (base model) to 66.75%
with 600 num batches it just hits ig 33% accuracy
@dark forge hey just one last question pls minding replying to it
what license should i keep of my final notebook
MIT or Gemma license?
probably MIT (and just state that your models are distributed under the gemma license)
okayy thank you
Thanks for this! It could be possibly the dataset, I'll have to see how to isolate this thing
Yep +1111
Can you please share if you figure out the issue? Now it's not even going above 400 batches, but one time it ran till 500.. now I'm in #200+ waiting queue
about queue, you can't do anything you gotta be in it.
about batches, i gave up on it coz it's clearly not an issue of the code or dataset so i don't think there's much you or me could do on this.
i gave up as my model was trained perfectly on other domains and on coding anyways the weightage of coding is less so i chose to not bother much
does anyone else here have repeating <end_of_turn> at the end of the generation? and will it somehow disqualify if the format <reasoning>reasoning </reasoning>
<answer>answer</answer>
Is otherwise correct, just a lot of repeating <end_of_turn> in the end?
@dark forge @tawdry iris @everyone @here
The model format should be:-
<reasoning>
REASONING
</reasoning>
<answer>
ANSWER
</answer>
And this is only written in https://www.kaggle.com/competitions/google-tunix-hackathon/overview/evaluation
I am have confusion regarding the reasoning trace.
Should it contain Paragraphs or Step by step?
Paragraph example:-
Alright, so first the problem is ......
Let me think of......
Step by step example:-
1. This problem is ......
2. I need to find out .....
Which one is better format?
I don't think it will 'disqualify', but you will probably loose points for model quality. This sort of repeating can be an indication that something has gone a bit wrong in your training. But it might be worth checking that you have set up your inference correctly as well (e.g. if you haven't supplied the correct stop token to your inference code it will just make the model keep generating even when it should have stopped)
Hello @tawdry iris , since a few days I've noticed the training/session stops without any error at random batch number.
I've trained till 1200 steps once, yesterday randomly it finished till 500 steps but now it's not going till even 200, I've tested this on same configuration as well.
I think some others are also facing same.
Can you please share if you have any idea what might be happening?
thank you!
Sorry, I was out for a couple of weeks. Coming back online to address the questions here. And thx to the folks who were here to help each other π . Good community spirit!
Either is fine. Both are reasonable reasoning traces as far as I can tell. There is no format requirement on the reasoning trace/answer text, as long as they are coherent.
The trainer will terminate when EITHER the data iterator runs out of data OR it reaches the max step#. I'm guessing you set a small max step# in your notebook.
Unfortunately there is a chip shortage everywhere, for which there is nothing we can do. I generally recommend running notebooks in the background so that you don't have to wait online. If you really need an interactive session for debugging/etc, try:
- load model weights in bfloat16 instead of float32 and reduce sequence length, this might fit on a Colab TPU
- use Gemma3 270M on free-tier Colab TPU, but only do this for debugging; 270m is not an acceptable model for eval
- WWYMAK has a GPU notebook (https://www.kaggle.com/code/wwymak/sft-with-qlora-and-the-jax-stack-gpus-edition#LoRA-&-QLoRA-Demo) that might help a bit. But do note that we will only use TPU for eval, so again this is for debugging
Ideally the output wrapped in the <reasoning> and </reasoning> tag will be the step-by-step CoT, similar to how humans work out the questions.
no this is definitely not the case if this was the case then it wouldn't happen that today the notebook ran 800 batches and then the next day the same notebook is not running at all for more than 600 batches.
Yeah Exactly, I just tried again, yesterday my notebook ran till 500 batches, today it silently stopped at 132 steps on similar configuration
I had a question regarding correctness reward function.. apart from using llm as judge, how do we check for correctness for domains other than math/logic?
I have two ideas that I want to test at some point: using semantic similarity (ie embeddings), or you can try models that are trained specifically for this task e.g. https://huggingface.co/TIGER-Lab/general-verifier (ofc, you can also argue that the 2nd options is also a llm as a judge...)
That's interesting .. thanks for sharing this!
If the model outputs new line between reasoning and answer like:
<reasoning>
reasoning_trace
</reasoning>
<answer>
answer
</answer>
Instead of:
<reasoning>
reasoning_trace
</reasoning>
<answer>
answer
</answer>
Would that count as invalid response?
Or does it have to be all in same line like:
<reasoning>reasoning_trace</reasoning>
<answer>answer</answer>
you definitely gotta stop overthinking
π
haha fr man
chill it's not counted as an invalid response
Gotcha
THANKS DUDEE
Hi, will the models be tested using tokenizer.apply_chat_template()? Meaning we should expect the evals to have tags like "<bos><start_of_turn>user...
This changes the final sampler output a lot. For instance, when I don't use <bos> (I trained with it), my model doesn't out anything. It is very sensitive to that.
Could I please have some clarification? Should we be using gemma-3-1b-it or gemma-3-1b? The provide template notebook reads, "Use instruction-tuned Gemma2 2B or Gemma3 1B (other models are not allowed)", yet a previous discussion post reads, "The challenge of this hackathon is not around instruction tuning." Thank you!
use instruct-tuned model
We are not using HF lib for eval.
is anyone is facing this error:
module 'flax.nnx' has no attribute 'ModelAndOptimizer'
i have asked this not because I want the correct code, I copied the installation and imports directly from official documentation (https://tunix.readthedocs.io/en/latest/_collections/examples/qlora_gemma.html), but it still results this error. So if any other is facing the same error that means I need to install other versions of these libraries
please answer as soon as possible because i havn't even started traning the model because of these errors. And only 4-5 days are left.
This is due to a Flax NNX change from flax.nnx.Optimizer to flax.nnx.ModelAndOptimizer
You should upgrade Flax to 0.12+
so it means that libraries are getting updated... is jax, tunix, qwix is also getting updated?
ive actually already submitted my unrestricted model and all the notebooks, writeups everything so now am i supposed to go to every notebook and change the code and upgrade flax or should I let it be and it would be handled by the judges?
Maybe use pip cell logs to pin versions?
@tawdry iris hello! What is the expected strategy for kagglehub authentication to download gemma weights? As in, I shouldn't share an api key to ensure download lol.
DO NOT share your key. We will use our own.
So it should be called KAGGLE_API_KEY?
I use 'KAGGLE_KEY', but 'KAGGLE_API_KEY' is fine
It's prob. a good idea to pin your lib versions in the notebook. You do not need to do a full run of it. Just do a quick save.
There's no limit on max_generation_steps in inference, unlike for temperature - which is 1e-4, right?
For temp, set it None for inference. See the latest submission template.
Correct. No restriction on max_generation_steps.
hey! have you noticed a cap on max_generation_steps?
ok I see now this setting lives in
sampler.CacheConfig(cache_size)
Submissions are open till 12th Jan 11.59 PM (midnight) right?
mention your timezone
for a clear clarification
GMT+5:30 (IST)
Using gemma 2b is acceptable or we have to explicitly use 2-2b ?
Yes, in UTC
IST is till Jan 13 5:00 AM
Alright
@tawdry iris If I am unable to submit the working model at last, but give a good writeup, video and datasets so will I get some points?
It is either Gemma 3 (instruct, 1B) or Gemma 2 (instruct, 2B)
is it possible to extend the deadline by 1 day more ?
depends on hosts
but idk, it doesn't seem like a valid argumt
@tawdry iris there seems to be lot of crunch in availability of TPU ; is it possible to extend the deadline by a day or two?. Also noticed the number of submission are still quite low. It would be great if the deadline could be extended.
Hey guys I recently found out about this competition and I really want to do it but there are only 9 hours left so I know its not possible bbut I still want to keep trying if late submissions are possible because I want to focus on learning. Does any body want to do it with me?
I agree, since the TPU availability is the real concern for us right now.
I agree
This would help! I just corrected an error, wanted to make a last end to end run
can we please extend the competition deadline by a few hours (11:59pm pst)? i had queued a TPU run overnight but didn't know that queued runs were prioritized less over interactive sessions because kaggle doesn't say anything
now im #198 in queue
Maybe, at the end moment they will not.
I agree. I am also facing the same problem. We have to wait for hours for a single TPU session each time.
That will take you 2-4 hours. Only 5 hours are left.
"TPUs are popular right now. You are #183 in the queue. You can wait, try connecting again later, or use another accelerator." same here
Looking forward for any solution please.
No solution, you have to wait for hours
The solution is only with the competition host.
upvote this might help, I guess
Done. I hope that the deadline will extend. But only 4 hrs are leftover.
submissions show up under "Code" correct?
@tawdry iris I do not think I submitted my notebook properly.
I have been working on this project 6-8 hours a day since novemeber.
My username on kaggle is Echo9Zulu, can you confirm my submision was submitted? I will be very upset if I lose all this time because I made silly mistake.
Would appreciate easing my nerves very much.
Hello everyone,
I hope you are doing great. Wanted to share our work. We GRPOd Gemma3-1B-it to be a good chemistry reasoning model. We had some cool results and the model generalizes well across other domains. Feel free to check our work and share your thoughts.
Writeup: https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/introducing-gemmax
Notebook: https://www.kaggle.com/code/alfaxadeyembe/gemmax-1b
Inspiration: https://www.futurehouse.org/research-announcements/ether0-a-scientific-reasoning-model-for-chemistry
Hey everyone,
I just published a writeup on my CURE-GRPO method for the Google Tunix Hackathon.
It explores using self-critique + GRPO to push better reasoning in LLMs, with practical insights from building and experimenting with Tunix + Gemma models.
Would love for you to check it out and drop an upvote if you find it useful
Thank you
https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/new-writeup-1766255740138
I have been skimming the notebooks... some really interesting projects! Great work everyone.
I noticed that a few use external API keys during the 9-hour training run (e.g., calling a proprietary model for LLM-as-judge). I thought any LLM-as-judge would need to be an open-source model loaded in memory.
Fantastic writeup! My first thought, is CURE grpo trained gemma interesting in chat use? Does it generalize well outside of training tasks?
Subject: Kaggle Google Tunix Gemma 3 Unicorn-1B Writeup: SOTA Formatting Fix + No-API Reward Functions
Just finalized our submission, Unicorn-1B.
Seeing the discussion on API keys/Judgesβwe managed to solve the reasoning alignment 100% On-Device (TPU v5e) without calling external APIs during the loop. We engineered a 6-Signal Composite Reward Function (RegEx/String matching) to enforce strict structure without network overhead.
Also, for anyone who was hitting the 100GB RAM / XLA Compilation Graph crash when training on 1B+ tokens: we found the fix was moving from dynamic iterators to Static Dataset Objects. It drops compilation time from 60m to 8m.
Hope the writeup helps anyone debugging JAX OOMs!
https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/novel-sft-rl-pipeline
YouTube Demo of Unicorn Gemma 3 1b: If anyone wants to check out Unicorn Gemma 3 1b running live and the SFT to RL GRPO process creating Unicorn Gemma 3 1b: https://youtu.be/z7BGR2XGksI?si=enx9OazBfw2kxCc8
https://www.kaggle.com/competitions/google-tunix-hackathon/discussion/651560 Itβs mentioned here and in few other discussions that we can use LLMs as long as its accessible to judges (probably Gemini based)
thanks a lot!
despite the packed training, it actually does well I've also trained it for general reasoning and have done the inference on it.
I would love to see a part 2 of this competition. I was looking forward to contributing with an educational video but alas life decided priorities needed to be shifted elsewhere. Either way had fun and looking forward to seeing what the winners have to offer. π
You should consider porting to safetensors so people can more easily try in llama.cpp/other projects. My model Shadows-Gemma is much more experimental, but yours seems like it could be useful in practical ways! Name it and post it
Thanks! That makes sense. Iβll look into converting it to safetensors and making it easier to try on their projects . Appreciate the encouragement
Honestly it was pretty difficult to manage with this competition.
Coz I'm currently a first year bachelor's engineering student
and hence managing college + the beautiful TPU queue on kaggleπ
Oh yes. I am full time data engineer which = long nights, long weekends and even longer queues lol. Very intense project
you mind sharing yours? writeup link?
ooh a real engineer haha interesting
yeah indeed very intense project
im very glad that I rushed and completed 15 days before the deadline
Lol barely. I am data monkey
coz I saw ppl seeing hell during last days
Dude i had first stable checkpoint in pytorch week and half ago and had to port to tunix in like two days
Using custom loss and unsupported training strategy
hey looks interesting but u haven't attached your notebook?
I believe I nuked my submission this way.
wow man I'm glad I wasn't in this situation
i mean I'm not winning either but still just for the sake of submitting
oohh ah sorry to hear that
Your solution was a well engineered pipeline.
Many of the other solutions were heavily generated. One was an obvious prompt injection attack. Curegrpo is cool
really? i struggled a lot but haha thank youu
Yeah man you went full send
heyy that rlly made me glad hearing that from a real engineer
full send??
Haha I am mostly obsessed with Ai and learning more.
I have not trained models before this competition
ooh wow
I've been learning abt ai since last two years
basically since the last year of my highschool
It is clear you made effective use of compute and made a complicated pipeline which worked lol
hey thanks a lot for the appreciation
Are you familiar with Huggingface ecosystem? Kaggle is fantastic, but HF is where your models will be most visible to the community of people who may be interested to try your 1b.
Plus
yesyes definitely
At some point the evals you made, plus your model, me be useful to further evaluate shadow tokens vs base model
Interestingly shadow tokens were highest in basemodel completions where the model explains itself after answering.
For conversion between formats I used deepwiki to help. Tunix APIs have changed a few times since November.
haven't posted much of my work here but yeah
yeah i saw ur writeup a bit it seemed interesting also the name to me tbh
deepwiki? never heard of it tbh
no way not im interested π
lemme see
Oh it added my github to link? Unintentional lol
all of ur work says
repo not indexed
intel cpu ampere gpu
rtx 3050
man damn i thought the OpenArc on ur profile is just a forked repo
woww
have you ever done CUDA programming?
wait lemme upvote ur writeup i liked it
done
No I have not. Tbh there is slim chance of me getting cuda capable hardware anyway
Your pipeline has the unstructured phase. What inspired this?
ooh no way
it's cheap
unstructured phase??
there's no unstructured phase, r u talking abt phase 2?
I believe so. I will give your writeup another read tonight est
okay well if you're talking abt phase 2 then it's not unstructured it's just not following the CURE structure
but following the normal reasoning tags and answer tags structure
i mainly had to remove it coz phase 2 was based on general reasoning also creative reasoning and for such task the output tokens very too much while following cure structure due to wish output was getting truncated and model was collapsing coz of rewards penalty
but the model is well trained on CURE structure so if the judges got a workaround and eval on higher output tokens limit then it'll follow CURE structure even on the general task well (I've done the inference earlier and yeah it did it perfectly except for sometimes output getting truncated due to higher tokens)
I cannot find your notebook in the writeup
I added it to the "code" page on the main tunix competition page, which I guess was incorrect. When I went to hut submit to competition it wouldn't complete because there was no output file. The notebook I developed in I guess was missing something related to the competition which was lost when i tried to import.
Discord isn't allowing me to send an image. On the main page under "Code" if you search "Echo" it comes up
Is there some way to appeal? I have been working on project almost everyday since November
Can you add your notebook link as a comment in your writeup?
Done!
hello @tawdry iris could u pls give us a rough idea about when we can expect the results to be announced
It depends on the bandwidth of the judges, so it's hard to say. I think at least March.
alright thank you
@tawdry iris sorry but were we supposed to add our PR in our writeup for our submission to be valid?
either way, I opened a PR on December 2 2025, on an issue i found during the time I was working on this competition
I think you might remember the issue, it was that I was getting 0% format accuracy as well as answer accuracy for both pre training and post training evaluation.
I figured out the issue later and hence opened a PR for that particular issue
here's the link to the PR: https://github.com/google/tunix/pull/820
also I hope it's perfect if I add this PR in the comments of my writeup right?
here's my writeup link as well:
https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/new-writeup-1766255740138
That is fine.
thank you
@tawdry iris is it possible for you to let me know why my PR isn't reviewed
according to me, wt I found was the batch slicing isn't done as expected when train micro batch size is less than num generations
Hello. Can you please let us know if the winners declaration for the google tunix hack been postponed. Would we get an official update regarding the postponement in case the results have been postponed?
Results will be announced in March
As mentioned here
@tawdry iris Thank you. π
Hi,
Are you still looking for a project?
Hello @tawdry iris I accidentally saved my model as a dataset. Would it affect in grading my submission for the inference section of the hackathon. My writeup is - https://www.kaggle.com/competitions/google-tunix-hackathon/writeups/new-writeup-1763935485708
any updates on when results will be out? @tawdry iris
@tawdry iris any updates when can we expect the results?