#nvidia-nemotron-model-reasoning-challenge

1 messages · Page 1 of 1 (latest)

sullen shale
#

Hi

static remnant
#

hello!

verbal jetty
#

Wow

plucky thicket
#

'sup everyone!

sullen shale
#

Is there anyone from support forum here ?

sullen shale
#

Anyone mentor here, need help - I am not able to join challenge

dire gulch
#

Hello everyone! I'm a senior right now in about to get my BSCS in the U.S. and am an aspiring AI Engineer and am looking to join a team. This will be my first Kaggle challenge and so I'm very new to all of this but I have internship experience with LLMs and am currently doing undergrad research focusing on finetuning LLMs. Please DM me if you're interested in me joining your team.

plucky thicket
plucky thicket
#

I am having a hard time getting Peft/LORA training working. I have described the issue here https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge/discussion/682877 and made a public notebook here https://www.kaggle.com/code/antonkratz/nvidia-nemotron-finetune-ipynb
Several others report very similar issues triggering the "CUDA error: no kernel image is available for execution on the device". I believe causal_conv1d and mamba_ssm are not compiled correctly for the Kaggle competition environment. I think this should really be fixed in the base environment.
Am I holding it wrong? Is anyone actually successfully doing SFT with the provided RTX Pro 6000 environment? I find it super strange that this is not addressed.

sullen shale
uneven kestrel
#

There seems to be no pattern behind these types of problems in the training data:

7f5758a8,"In Alice's Wonderland, a secret set of transformation rules is applied to equations. Below are a few examples:
?!*{# = ?!{#
?'+?> = )>
{{-)@ = ?>
'?-#> = -?(
Now, determine the result for: )>+<#",'!!

beware the ", in the last line are added by the csv format

verbal jetty
#

They must be some kind of pattern that we need to understand and specify while training model cuz if not and included in training it will make learning worse

uneven kestrel
#

Perhaps the heavy usage of special characters changed the composition of the problems somewhere in the generation pipeline such that they all came out with no underlying pattern

plucky thicket
verbal jetty
# plucky thicket I did two **identical** submissions. One scored 0.50 and the other 0.51. A chang...

Right now you are just giving model the example and waiting for it to predict and well this work well in other kind of usage but in pattern recognition or maths you need to teach model approach like there are 6 type of pattern in data each having 4 to 5 sub part you need to apply all kind of to the example when it give correct output put that in question and you need to mention everything in data how you approach you can write up python script to test that and right the approach. In aimo data already has reasoning how to solve question ,whats the steps. So from same logic you need to teach it to think not to pridect

leaden basin
#

Hiya! Does anyone know how to get access to the G4 VMs to start the training on Kaggle? How is it to be accessed?

#

Are they to be accessed through Kaggle, or are they Accessible through GCP?

plucky thicket
# verbal jetty Right now you are just giving model the example and waiting for it to predict an...

Hey @verbal jetty, not sure of we agree or disagree, I simply wanted to point out that the LLM answer is non-deterministic and it is normal to expect small fluctuations in performance. To really see the effect of small changes, it would be necessary to measure the performance many times and thus get a measure of confidence in the measurement. I do think that that the "wonderland" preamble just wastes tokens and potentially could detract the LLM. Anyway, godspeed!

split venture
#

is it really non-deterministic? in evaluation environment, temperature = 0 so it's interesting that the eval is non-deterministic

verbal jetty
plucky thicket
cedar sentinel
fast jungle
#

Base model

plucky thicket
#

The official Kaggle environment, even whith the recent patches, seems very broken. I am using Blackwell 6000, newest script pack etc... I get a little over 2 (two) tokens/sec.

cedar sentinel
#

How much performance gain can I expect after only SFT with high quality reasoning traces for 4 categories?

cedar sentinel
plucky thicket
#

Can I use vllm with the Kaggle infrastructure?

fast jungle
#

AIMO 3 people has been using vLLM for inference

#

my issue with Kaggle is that I can't easily Claude Code

plucky thicket
fast jungle
#

I just gave up lol, I just make my code runnable on local macbook, and I upload it manually

leaden basin
#

how do you get the g4 vms?

plucky thicket
#

The Kaggle envirnment is really a blocker. Today I spent 13 GPU hours trying to get this to run. I cannot even import vllm. I get inference below 4 token/sec. I find it absolutely weird that the Kaggle team does not provide a working envirnoment. Many have put work into various workarounds... these are cool but... the speed to meaningfully compete is just not possible like that.

#

I was looking at https://nemotron.huikang.dev/
It seems base cannot get Equations right out-of-the-box at all; Numeral always right. The easiest gains should be in Bit manipulation and Cipher. One thing I would like to try is SFT on the bit manipulation and cipher examples only. Then on even more, synthetic bit manipulation and cipher examples. Maybe that's obvious but it sounds like a reasonable strategy to me.

verbal jetty
# fast jungle 10th place currently, I made https://nemotron.huikang.dev/, shared in https://ww...

Hey bro amazing work
Is it the trained model or base nemotron prediction and reasoning like this is generated by your trained model or base nemotron without any cot training

<|im_start|>system
<|im_end|>
<|im_start|>user
In Alice's Wonderland, numbers are secretly converted into a different numeral system. Some examples are given below:
12 -> XII
19 -> XIX
86 -> LXXXVI
Now, write the number 68 in the Wonderland numeral system.
Please put your final answer inside `\boxed{}`. For example: `\boxed{your answer}`<|im_end|>
<|im_start|>assistant
<think>
We have to infer the numeral system mapping from given examples. They gave:

12 -> XII
19 -> XIX
86 -> LXXXVI

We need to output 68 in that system. They seem to be using Roman numerals? Let's check:

12 in Roman is XII (X=10, I=1, I=1). Yes.

19 in Roman is XIX (X=10, I=1 before X=10? Actually 19 = XIX: X (10) + IX (9). Yes.

86 in Roman is LXXXVI (L=50, XXX=30, V=5, I=1). Yes, that's 86.

Thus the mapping is standard Roman numerals. So 68 in Roman? Let's compute: 68 = 60 (LX) + 8 (VIII). So LXVIII. Check: L=50, X=10 -> 60, then VIII=8. So LXVIII. So answer: \boxed{LXVIII}.
</think>
\(\boxed{LXVIII}\)<|im_end|>
sullen shale
#

Hi, does anyone have this error and know how to fix it? - I am working with a demo notebook on the current free Kaggle GPU T4 x2 runtime for direct model loading. Getting ERROR "ModuleNotFoundError: No module named 'cutlass'" - help me to fix it - thank you

sullen shale
plucky thicket
uneven kestrel
# fast jungle it is the base model

How do you feel about the “Equations” problem type? Most of the others seem easy except for bit manipulation, but I personally haven’t been able to solve a single one of the equations problems (to be clear I’m talking about the problems with the equations involving lots of special characters)

fast jungle
#

I have no idea on the pattern in equation symbolic

distant goblet
cedar sentinel
cedar sentinel
fast jungle
#

probably look at my previous work and infer what I would be doing

verbal jetty
# sullen shale Have you fixed the issue with "ModuleNotFoundError: No module named 'cutlass'" ?

run this in first cell

import subprocess, sys, os
from pathlib import Path
def resolve_python_path(target_dir):
    for pth_file in Path(target_dir).glob("*.pth"):
        with pth_file.open() as fp:
            relpath = fp.read()
            rel_pack_path = (pth_file.parent/relpath)
            if rel_pack_path.exists():
                print(f"append {rel_pack_path}")
                sys.path.append(str(rel_pack_path))



offline_dir = "/kaggle/input/nvidia-nemotron-offline-packages/offline_packages"
target_dir = "/kaggle/working/packages"

os.makedirs(target_dir, exist_ok=True)
resolve_python_path("/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/")

if os.path.exists(offline_dir):
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-q",
        "--no-index",
        "--find-links", offline_dir,
        "--target", target_dir,
        "datasets", "trl"
    ])
    print("Installed from offline packages")
else:
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-q",
        "--target", target_dir,
        "datasets", "trl"
    ])
    print("Installed from PyPI")

# Add to Python path
sys.path.append(target_dir)
resolve_python_path(target_dir)

import datasets, trl, cutlass
print("datasets:", datasets.__version__, "trl:", trl.__version__, "cutlass:", cutlass.__version__)

with https://www.kaggle.com/datasets/dennisfong/nvidia-nemotron-offline-packages added to your session

final tusk
#

Hello everyone, this is my first time participating in such a competition. I have some industrial and theoretical experience in deep learning and ML but not in Lora and LLM finetuning. Any tips for someone like me?

dry mulch
sullen shale
#

Hello, could you please assist me in identifying the most cost-effective “Marketplace” options for creating and using a GPU VM for my project? - thank you

deep tree
#

Anyone had any kind of success doing RL with the given GPU ?

deep tree
cerulean gulch
#

Infact it doesn't work for me 😛 When I run that cell I get this a few times:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7bbbdf796630>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/datasets/

and then finally:

ERROR: Could not find a version that satisfies the requirement datasets (from versions: none)
ERROR: No matching distribution found for datasets
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
/tmp/ipykernel_161/395518196.py in <cell line: 0>()
     26     print("Installed from offline packages")
     27 else:
---> 28     subprocess.check_call([
     29         sys.executable, "-m", "pip", "install", "-q",
     30         "--target", target_dir,

/usr/lib/python3.12/subprocess.py in check_call(*popenargs, **kwargs)
    411         if cmd is None:
    412             cmd = popenargs[0]
--> 413         raise CalledProcessError(retcode, cmd)
    414     return 0
    415 

CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', 'install', '-q', '--target', '/kaggle/working/packages', 'datasets', 'trl']' returned non-zero exit status 1.
```
It looks like those retry warnings are coming because of no internet, but I can't activate internet with the RTX Pro 6000 Accelerator anyway.
verbal jetty
cerulean gulch
#

Yes I see this notebook: Submission Demo with Blackwell Patch. Thanks, I now get why this is happening!

#

I assume things work a lot better in GPU cloud (non-Kaggle) environments?

cerulean gulch
cerulean gulch
#

Thanks @verbal jetty it looks better but I think I'll try some GPU cloud providers. This is a real pain.

cedar sentinel
#

What problems are you guys talking about? the new environment works just fine. My notebook which I am running didn't need any workarounds?

sullen shale
cerulean gulch
#

I tried out a coud GPU instance (Jarvislabs) but that has another set of issues. Looks like these Nemotron models on Huggingface are a mess too. I kept getting: ImportError: cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils.import_utils' (/home/myenv/lib/python3.12/site-packages/transformers/utils/import_utils.py

#

Now I seem to have got a forked notebook running on kaggle. Its full of sneaky overides and version suppresions etc. that I'm trying to figure out now.

#

I now get why Chris Lattner says that the AI tech stack is basically a mess.

#

Looks like its all Triton stuff that's messing things up here.

cedar sentinel
#

Is bitmanuplation and symbol equation transformation even solvable through natural language??

turbid wharf
#

looking for a team to join in nvidia nemotron model reseasoning challenge

native knoll
#

Did anyone experiment with RL? I was planning to try GRPO but it seems like it is not going to be feasible with the kaggle notebook constraints and it will be too expensive to host on the cloud

wheat ivy
#

So is the absolute max amount of free rtx pro 6000 hours 30 per week?

native knoll
#

Yeah that is the limit

Cant do much with it except a couple of supervised runs

sullen shale
#

Can anyone explain to me what this error is about, and how to fix it? -
torch_dtype is deprecated! Use dtype instead!
Starting minimal model load probe...
/kaggle/usr/lib/notebooks/ryanholbrook/nvidia-utility-script/torch/compiler/init.py:148: FutureWarning: torch._dynamo.allow_in_graph is deprecated and will be removed in a future version. Use torch._dynamo.nonstrict_trace instead.
return torch._dynamo.allow_in_graph(fn)
MODEL_LOAD_FAILED
ModuleNotFoundError No module named 'cutlass'

fast jungle
cinder depot
#

https://www.kaggle.com/code/kienngx/nvidia-nemotron-training-cot-labels
I ran this notebook, but noticed that the training is happening only on a sample of 600 datapoints as compared to the original 3500 CoT + Labels, maybe because of that I got a score of 0.62 on LB. Did anyone run training on the full dataset , how many hours did it take ? It has been 12 hrs + for me and the training is yet to complete . Was there a corresponding jump in the LB score when training with the full data ?

cedar sentinel
cedar sentinel
#

Guys can someone explain what is happening instead of reacting😭

wise pasture
turbid wharf
#

Need a team mate for this competition must be proficient in llms, fine-tuning and deep generative models

magic folio
honest ermine
coral gull
cerulean gulch
#

Can someone help me understand how prompt engineering plays a role in this competition?! The competition description mentions it a couple of times but as I see it, we submit a Lora adapter only. How does our prompt help in the submission?

#

We have no control over the prompt that is used at inference right?

fast jungle
#

You prompt Claude Code

wise pasture
cerulean gulch
#

Thanks! As I expected, this prompting has nothing to do with a submission per se. Was a bit confused by the description since it appear to put fine-tuning on par with prompting as a strategy.

fast jungle
silent jolt
#

check my bio 😁

night monolith
#

check my bio 😁

sullen shale
fast jungle
sullen shale
dark cloak
#

I did not find the GPU version I was looking for A100 (80GB). Only got the P100 and T4 ones. How can I run my notebook?

maiden scroll
#

anyone with RL understanding wanna join us

#

who also has credits

#

@fast jungle join us? We're 66/2058

fast jungle
#

i don't know RL

cerulean gulch
#

Does training on Kaggle RTX 6000 work for anyone at all?!?! Ive wasted a week trying to get it working and it simply doesn't work. So many folks out there struggling and posting the error messages but not a single response from the organiser adressing these issues.

glacial salmon
#

Btw congratulations @fast jungle

#

I'm pretty sure a lot of the people at 86% used your work+theirs to get that, they should give you credit lol.

fast jungle
valid current
#

I wonder if he's still aiming for 87.7 accuracy

glacial salmon
#

Mhm well he did was it was the tinker issues.

#

I'm working on DGX Spark so hopefully I can avoid that

valid current
#

I got some free tinker credits lying around so I'm probably gonna use that at some point

#

For now, unsloth + colab is doing it for me

glacial salmon
#

Ah

glacial salmon
#

Are you adding anything yourself?

valid current
glacial salmon
#

I see.

valid current
#

I'm saving the tinker credits for when I have a solid approach, lol

glacial salmon
#

I see. Well some advice is this is a reasoning challenge. You want the model to solve the questions.

valid current
#

Thanks for the advice, lol

glacial salmon
#

If the synthetic data is bad or wrong or doesn't help the model

#

It'll die.

valid current
#

Incredible. Thank you for this insight

glacial salmon
#

is that sarcastic

valid current
#

Yes, lol

#

I was assuming you were too

glacial salmon
#

anyway my suggestion is that you should focus on generating high quality data first by solving the harder puzzles first.

glacial salmon
#

i like to talk in seperate messages

valid current
#

That's what I'm working on, yes

glacial salmon
#

Yeah what are your methods? if you dont mind me asking

valid current
#

Using models that are smarter than Nemotron to give me reasoning traces for certain puzzles. Starting with equations problems atm

#

Keep in mind, I am a complete noob at this so my approaches are also naive. My experience is mainly in training regular deep learning models and I'm trying to translate those skills into LLM finetuning

glacial salmon
#

Like what model?

#

Nemotron 3 Super?

#

Or like an even bigger one.

valid current
#

Nemotron 3 Super, yes. But I'm working on getting a data generation pipeline using DeepSeek

glacial salmon
#

Yeah I tried nemotron 3 super it wasn't very good.

#

Slightly better than nano but not much at these questions.

valid current
#

Were you able to get super running on your dgx spark?

glacial salmon
#

Yeah

valid current
#

I installed it on mine and it came in 3 parts, lol

glacial salmon
#

er what

#

how?

valid current
#

NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00001-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00002-of-00003.gguf NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q4_K_XL-00003-of-00003.gguf

glacial salmon
#

😭 why you using gguf

#

are you on dgx spark?>

valid current
#

Running one of the GGUFs takes up a 100 gb of the 128 gb on my DGX Spark. This model is meant for multiple GPUs, I'm pretty sure

glacial salmon
#

no it works perfectly fine

valid current
#

Yeah

glacial salmon
#

you need to run nvfp4

#

but anyways trust me Super isnt the solution for this contest

valid current
#

Yeah, that may be the case

#

Still worth collecting some data for comparision tho

glacial salmon
#

I have a training pipeline set up and hopefully I can get to 90%, I think I might have a breakthrough on solving the puzzles, but it'll take time

glacial salmon
glacial salmon
valid current
glacial salmon
#

yes yes of course i meant like after i win

valid current
#

Or is it common for people to share their winning strat in kaggle competitions?

glacial salmon
#

no no only after they win

valid current
#

Either way, respect

glacial salmon
#

yeah well it says to get a prize you need to publish a notebook

glacial salmon
cerulean gulch
#

Is the Nemotron nano model that is posted on Kaggle as part of this competition any different from the one on huggingface with the same name?

valid current
#

Idts given that the READMEs are the exact same

#

Absolutely diabolical if they were different though, lol

void notch
#

Why the leaderboard right now just copy paste submissions of Tong's submission.

#

Like the top 500 is just Tong's submission being reproduced

valid current
#

imo, if no one can top that with a better technique, tong deserves the W

verbal jetty
valid current
#

Anyway, someone just hit 0.87 on the leaderboard

#

Huge

sullen shale
#

What is the most important for this competition model's reasoning, or model's equations?

void notch
#

i mean

#

the equations are the result of the reasoning

void notch
#

my insane progress this competition: 0.66 -> 0.67 -> 0.68 -> 0.6

#

im cooked

floral fiber
#

Hi everyone

How to incrase the speed of computation.

How to decrease the evaluation time and also. what are some steps to reduce the whole training time. I am a bit beginer, I know I can increase the batch size and resuce the epochs but, I want to know if there are any other techniques to consider, specifically to lower the evaluationn time after submission because it is taking lot of time and the GPUs are not getting available to test experiments

void notch
cerulean gulch
#

Has anyone managed to get trl and vllm running for GRPO training?
I’ve been trying to get that (also Unsloth) to work and it hasn’t really worked anywhere. Without internet on the Kaggles RTX pro 6000 that’s super challenging anyway but even the unsloth notebook for Nemotron nano doesn’t work beyond the first import on Google colab

void notch
#

i dont think a lot of people have tried GRPO training because of how long it will take on a RTX 6000

#

it would easily take 25 hours so there is no room for error

verbal jetty
cerulean gulch
verbal jetty
#

Kv cache us not working on rtx 6000 pro with Nemotron architecture

cerulean gulch
#

Ah ok that’s some 20 tokens per second I guess

verbal jetty
cerulean gulch
#

Perhaps we could make a GRPO team and split the training costs

verbal jetty
cerulean gulch
verbal jetty
cerulean gulch
verbal jetty
#

Ran perfectly

#

With kv cache

void notch
cerulean gulch
verbal jetty
verbal jetty
sullen shale
#

Has anyone experimented with enable_thinking=True in the chat template during SFT training? Curious if it makes a difference vs manually wrapping reasoning in <think> tags in the assistant content.

void notch
#

model is 7 hours in training

cerulean gulch
#

Here's another reasoning challenge. Spot the error (if any):

  • nemotron-3-nano-30B-A3B-BF16 requires transformers>=5.3.0
  • trl requires vllm <=0.18.0
  • vllm<=0.18.0 requires transformers<5.0
    --> It is not possible to train nemotron-3-nano-30B-A3B-BF16 via GRPO with trl and vllm.
void notch
#

whats you highest score so far?

cerulean gulch
#

@verbal jetty did this all work on RTX 6000 on Kaggle? Was it in collocate mode or server mode? ( up to now I have only tried collocate since its a single GPU....)

sullen shale
void notch
#

i had a memory leak and my BM solver was stripping the beginning 0 so the score ended up at 0.56

#

I predict that once i fix these two minor issues my score should reach 0.9 with ease.

cerulean gulch
#

If I import vllm I get RuntimeErrror cant find libcudart.so.13 ...

verbal jetty
# cerulean gulch I cant even install vllm and use it on the RTX. How did you do that?

fock this notebook and run

"""Metric for NVIDIA (129716)."""

import subprocess
import sys

# Set up environment
commands = [
    "uv pip uninstall torch torchvision torchaudio",
    "tar -cf - -C /kaggle/usr/lib/notebooks/metric/nvidia_metric_utility_script . | tar -xf - -C /tmp",
    "chmod +x /tmp/triton/backends/nvidia/bin/ptxas",
    "chmod +x /tmp/triton/backends/nvidia/bin/ptxas-blackwell",
]
if RUN_EVALUATION:
    for cmd in commands:
        print(f"Running: {cmd}")
        subprocess.run(cmd, shell=True, check=True)
sys.path.insert(0, "/tmp")

and then you will be able to run use vllm and just remove other cell according to your need

cerulean gulch
#

Ahh very clever! SInce the metric uses vllm!!! Thanks, I will try this

cerulean gulch
#

It works! Thanks @verbal jetty

cerulean gulch
#

Ok I spoke too soon. Vllm works but not with trl. You used trl @verbal jetty ?

zealous wave
#

I have a question I am participating in NVIDIA nemotron reasoning challenge, can we use dataset that is publically available generated CoT traces on the given dataset for this competition. The dataset is public generated by @fast jungle

verbal jetty
prisma zealot
#

Check my bio 😁

wise pasture
#

I sent friend request, pls check

sage willow
#

Hey everyone looking to team up for the Nemotron Challenge. We’re currently two members, including a professor with strong expertise in NLP and dozens of publications in this area.

My background is in LLM pre-training, post-training, and data curation.
Constraint: I’m currently limited to Colab, so I can’t realistically run serious experiments on a 30B model. We’re looking to collaborate with someone who has reliable GPU access. Me and the professor will take full ownership of research direction, training strategy, data curation, pipeline design, and post-training.

If you’ve got compute and want a serious, execution-focused team, DM me.

remote urchin
#

Hi I am bit new to this kind of LLM competition. Can some one kindly help me to start the work?

remote urchin
sullen shale
#

@fast jungle - What reasoning format are you using for equation puzzles? That category seems hardest to crack, any comments or suggestions?

fast jungle
#

What you see in my writeup is what I have, I haven’t worked on this competition since

sullen shale
# fast jungle What you see in my writeup is what I have, I haven’t worked on this competition ...

Thanks @fast jungle, your notebook has been a goldmine. Totally understand, equation puzzles feel like a rabbit hole. We noticed the model seems to pick up the other 5 categories well from your traces, so maybe the juice isn't worth the squeeze there. One thing I'm experimenting with is repeating your corpus 3x in the index to get ~734 training steps instead of 244. Curious if you ever tried multi-epoch training on the same traces, and whether it helped or hurt in your experiments?

fast jungle
#

the winning submission repeated the data for some problems

ideally you should make your own synthesized problems and train on those instead, you can generate as many syntheized problems as you want

tulip elm
fast jungle
#

you are free to use my code, which is why they are published

celest halo
#

Hi, I just joined the competition. Could anyone kindly tell if we can use the provided NVidia GPU in the training process? Thank you so much!

cerulean gulch
cerulean gulch
#

Can anyone shed light on this whole trust_remote_code= True/False thing? When does it important / not?

meager niche
#

How can be reduce training time. In my case it was not end within 12 hours 🙂

sullen shale
stray wedge
#

Anyone struggling in Consistency and want to learn together.
DM me.

sullen shale
steady token
#

hii guys! is it too late to join this competition??

lavish kelp
high gyro
cerulean gulch
shut cloud
#

🚀 Hey @everyone!

We’re building PromptGram and aiming to reach 150 GitHub stars

If you like AI, FastAPI, microservices, or developer tools, please check it out and support the project by starring the repo 🙌

GitHub Repo: https://github.com/dewangsahuji/promptgram

Every star really helps and motivates us to improve the project further 💙

remote urchin
#

Guys when I load the model I get
SafetensorError: Error while serializing: I/O error: No space left on device (os error 28)
any fixes for this ?
I was using

model_path = path

tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    offload_folder = "/kaggle/working/offload_folder"
)