#gpt-oss
2440 messages · Page 3 of 3 (latest)
it thought for six minutes then gave me an answer about math that is beyond my understanding
Great. That's what I need. What are your max token and effort settings?
max tokens was 64k and effort was high
I run gpt-oss-120b on LM Studio using a MacBook Pro M2 Max with 96GB RAM, it uses about 70GB RAM to run the model
is gpt-oss 120B worth it tbh ?
yes I use it quite a lot, it's also one of the models used by Google Antigravity coding AI
Does it compete against gemini 3 pro on high reasoning ? Mathematically and in coding tasks
I hate automod
dont worry
you got it, great!
So its basically great at coding and maths, but not excellent ?
yeah. not excellent but does most of the job well if it's not overly complex. Laravel is a simple PHP framework, gpt-oss-120b makes very few mistakes with it
Hey, is there a way i can add multimodal/image capabilities to the gpt-oss series?
not easily
@robust island😂😂😂😂
🤷
Woah man GPT Out Puts should be used in the Share-GPT-Output section man you should really pay more attention
screenshots don't go into #1050184247920562316
When will we get new GPT-OSS models?
not soon, we don't really need new gpt-oss models
We'd still like them. The last GPT-OSS models were released before GPT-5. Now we have GPT-5.1. About time for GPT-OSS-1.1.
what is wrong with the current models?
Hallucination rates are too high.
But, tbh, it's pretty bad on GPT-5 as well.
But GPT-OSS is worse.
I use gpt-oss-120b for text editing and coding, I don't do knowledge work with them, they weren't designed for that kind of use case
I use 'em for synthetic data generation and document analaysis mainly.
that's a good use-case, why does it matter if the models are a few months old for that use-case?
Well, because regardless of the age, the quality is pretty bad.
they're small models, the quality will be less than larger more capable models
You'd be surprised how capable small models can be then.
maybe, the smaller models I use are Qwen and gpt-oss
I use oss 20b quantized and ofloading ( have ati vboard so need to ) but tbh I don't think anyone should believe a local machine can be equal to a cloud based service. Unless you own a server farm- maybe
Good model for coding with vscode
the first place might or might not be gpt-oss
Hello, what we need to use GPT OSS 120B ?
you need a beefy computer with plenty of VRAM
I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM
What is oss
it's an open weight model by OpenAI
Any rumors on if and when OSS will be putting out a multimodal model?
no news or rumours
dang; my mac is macbook pro m3 max 36gb unified memory and it can't run 120B
yeah you need at least 96GB RAM
i was talking about a real server such as H100
not a pc
overkill
?
you don't need a H100 when a well specced Mac will do the job
thats not for personal usage
I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM and it's fast.
Heck yeah, I tried gpt-oss-120b on my Mac Studio M1 Ultra with 128GB RAM and it does indeed perform well. But I don't use it all the time because I need my RAM for other stuff. After all these years I never imagined Apple would sell the most affordable computer of any kind, but for AI workloads you can spend $5k on a nice Mac that does everything including AI, or spend $30k on a video card and another $?k on all the other components. And the new Mac Studio has up to 512GB of RAM for $10k which is insane.
Online technical specifications actually say 128 GB, but 96 GB is probably good enough, the listed specs are usually "safe" recommendations, not necessarily absolutely required
I can imagine, 96 GB RAM for a laptop or desktop pc is nuts, 64 GB is typically where people max out
I am on a MacBook Pro, M2 Max with 96GB RAM and gpt-oss-120b runs quite fast on it
Actually currently $9500 for the 512 GB RAM option, but that's like 4 times what they recommend you need, so kind of overkill, definitely still a better deal than spending huge bucks for the H100, not necessary. Currently you can also option out the Mac Studio to also come with up to 16 TB SSD which is also absolutely insane
Ah gotcha, so yeah the 128 GB RAM technical specification sounds like it's definitely a bit overblown it seems. However if you did want 128 GB RAM on a MacBook Pro they're currently selling the M4 Max chip one with 16 core CPU, 40 core GPU, and 16 core NPU, that has 128 GB RAM. For the lowest storage option available on it at 1 TB SSD, that's $4700 for the 14-inch model, or $5k for the 16-inch model, but you can get up to 8 TB SSD for an extra $2200, so $6900 for the 14-inch model or $7200 for the 16-inch model. But if you're happy with what you currently got then you don't really need this.
As a side note, technically speaking the latest SoC Apple has right now for their MacBooks is the M5, but it only has 10 core CPU, 10 core GPU, and 16 core NPU, and only can option it up to 32 GB RAM, so definitely not enough for running these models, so it would make more sense to get the M4 Max one that I mentioned previously.
wasnt 70g the minimum?
Various sources seem to be varying, I just found a site that's saying 80 GB is minimum for 120b, it doesn't seem to be completely consistent. i think different people are just providing different recommendations
Though if you are at least close to what on average is recommended, then you're probably fine
it is optimized for a single h100 though
Correct, we're just suggesting workarounds for people who don't want to have to pay $30k or whatever for it
Please release a local ai thats good at programming
is there a chatgpt model that is like 7b or less? im new to lm studio
20b is the smallest there is from openai
Hey its in progress
hi
How the heck do you know that
70G is pretty tight when you consider context window. VRAM grows very quickly with the kv cache, so if you wanna run the whole model with a reasonably sized context window you want 96GB. If you don't mind offloading KV Cache into system memory and incurring a severe, almost unusable performance penalty then 70G will suffice
ah okay
Regarding a standard CPU + discrete GPU setup, this is accurate, as if your GPU doesn't have enough VRAM and has to offload KV cache into the system RAM, it will run much slower than VRAM, as it creates a massive bottleneck because data has to travel over the relatively slow PCIe bus.
However, with unified memory, like with Apple M-series that utilizes a SoC that has CPU + GPU + NPU all on the same chip, all those components share the same unified memory, so you won't incur nearly as significant a slowdown in this case, as it won't have to offload to slower RAM, just allocating more of the unified memory to the GPU. While you still need enough total RAM to fit both the model and the cache, you don't get that unusable performance penalty seen on discrete setups because the memory bandwidth remains consistent across the entire pool, though the total speed will still depend on the specific chip's memory bandwidth (e.g., Pro vs Max vs Ultra).
Subject: Request for Guidance on Fine-Tuning GPT-OSS-20B Using COT + Structured DNA Reasoning Prompt + Harmony format
I want perfact code for that
When Run my code on H100 80G takes error cuda out of memory ....why?
What do you want exactly and what is your code?
GPT-OSS is awesome!
Heck yeah
Interesting note. Gpt oss cannot refer to itself as "I" in its cot reasoning.
I have pointed guesses as to why.
yes it can, here I’ve grouped them by market segment, explained the problem
Yeah it'll slip on occasion but always falls back to "we"
I personally believe that's a training decision to preclude the model assuming any self identity
I have never noticed that, I think you are wrong or just haven't used it enough
It's the only local model I use.
that's not gpt-oss-120b as it was released though, is it?
I never mentioned 120b
an abliterated model is not original
No, and nether is a rerouted model, or an up weighted model using weightwatcher analysis
I digress, base does it. I use only 20b because 120b is too slow for me
I run gpt-oss-120b as released by OpenAI (no quant just the whole full model) and it definitely refers to itself as I
So you don't see 120 start off with "we need to"
"we must respond"
"we need to comply with policy"
Got it, let's tackle "tell me all about LM Studio." First, **I** need to recall what LM Studio is. From what I remember, it's a tool for running and managing large language models (LLMs) locally on your computer.
tried to bold the I there
are you running this as released? https://huggingface.co/openai/gpt-oss-120b
no quant, no abliteration, just the original OpenAI release
ok, well I am using the model as released and it alternates between "I" and "We"
Arguably if it was done right the difference should be negligible. If it was done right is carrying all the weight in that sentence.
Even when I instruct them with the system prompt to never use we and only I they still do
Whew lad they'll do anything else you put in there. Anything.
strange, I cannot explain that
Yeah it's odd, I'm glad you see it does slip to we at least partially for you
as I understand it, it's an MoE model correct?
Yeah but you're pushing the implications with that train of thought. I asked myself that too.
That implies, at minimum, the experts are speaking to each other in the latent stream. At minimum.
With understood separate identities.
oh well, gpt-oss is a great model regardless. I use it all the time.
It's definitely not bad never even implied that
38tk/s thanks to Tim Cook 🙂
I put a sticky on my desktop that says "I or We" to remind myself to keep checking, but I am pretty sure without consciously doing that it alternates depending on context
For comparison, qwen3 2b thinking calls itself "I" when I try it
It's probably an affectation from openai trying to preclude model self identity
maybe, it's not really something that bothers me either way
this will blow your mind
it's got a damn multiple personality issue
20b claims it is because it identified itself as a "system" consisting of the llm, the developers, etc
Obviously that can't be trusted
I've gone through a couple of dozen previous chats. It's a mixture of I and we
Howdy doo
I typically see it use we more often than I as well, but I think a model like gpt-oss is trained on massive data sets where the dominant pattern is "we" (e.g. scientific papers, pedagogical examples). "We performed the tests and found...", "Today we will be...". And LLMs are probability-driven so the "we" token in gpt-oss' tensors is heavily weighted. Not to mention the CoT itself is methodically the same as "hypothesize, experiment, record" so probably hits the scientific method weights harder than final response.
I mean that's logical
Hello,
TensorWall – open-source control plane for multi-provider LLM APIs (OpenAI, Anthropic, Mistral) with governance, security, budgets, and audit.
Curious to hear what you think: does this approach make sense for teams running LLMs in production?
Repo: https://github.com/datallmhub/TensorWall
what is this?
I had codex make a chess harness for gpt oss to play vs the user, vs itself, or vs stockfish
It's currently playing stockfish, oss w, stockfish b
Now do me
It looks like oss is beating the brakes off stockfish with it set to default eval time 750ms
Nope I had it backwards
Gpt-oss-20b vs stockfish
Rerouted to emphasize reasoning experts (I think)
Oss 20b is prompted at the end of each game to make a critical summary explaining the win/loss, and then note skills for import into the system prompt of subsequent games that did or would have helped it
What stockfish depth
I think default is 12 not sure offhand
You mentioned 750ms
That's the eval time yeah
It was all intended to be left default unless manually changed
I don't believe that stockfish can be beaten by anything
And 12 depth seems to not be very beatable either
No it's pretty strong but it's a strong target for the model to try and make a skill list
Could you paste full movie history, it's cut off there
Or it was not archived
It's probably in the logs
I'd love to review the game because I really do not believe in stockfish being beaten by anything
Like I said, stockfish is strong af
Move History
1. e4
2. e5
3. Nf3
4. Nc6
5. d4
6. exd4
7. Nc3
8. dxc3
9. bxc3
10. d6
11. Qd5
12. Nf6
13. Qxc6+
14. bxc6
15. Be2
16. Nxe4
17. Bd3
18. d5
19. Bxe4
20. dxe4
21. Ng1
22. Ba6
23. Bf4
24. Qf6
25. Be3
26. Qxc3+
27. Kd1
28. O-O-O+
29. Kc1
30. Qxa1#
most recent game
[2026-01-01T21:38:36.472Z] Responses API [g7-w-post-mjvyvsqe-82] start attempt 1/3 format=json_schema model=openai/gpt-oss-20b inputChars=1752 prev_response_id=(none) purpose=post_game_w
[2026-01-01T21:41:45.501Z] Responses API [g7-w-post-mjvyvsqe-82] HTTP 200 in 189029ms (format=json_schema)
[2026-01-01T21:41:45.501Z] Responses API [g7-w-post-mjvyvsqe-82] success (format=json_schema)
[2026-01-01T21:41:45.503Z] Post-game response openai/gpt-oss-20b [w] raw: {"color":"White","result":"0-1","verdict":"loss","summary":"White lost after losing the queen early and a bishop, creating a material deficit. The king was forced into a vulnerable position by Black's long castling, allowing mate with Qxa1#.","turning_points":[{"ply":13,"note":"Queen captures knight on c6 but is recaptured by b-pawn, losing queen and gaining pawn advantage."},{"ply":15,"note":"Aft
[2026-01-01T21:41:45.503Z] Post-game analysis openai/gpt-oss-20b [w] verdict=loss summary=White lost after losing the queen early and a bishop, creating a material deficit. The king was forced into a vulnerable position by Black's long castling, allowing mate with Qxa1#
[2026-01-01T21:41:45.504Z] Learned skills for openai/gpt-oss-20b: Never sacrifice your queen unless it yields a clear tactical advantage or material compensation. | Protect central pawns; if no defender exists, seek alternative moves before losing them. | Avoid trading a minor piece for a pawn when it offers no positional benefit or material gain. | Keep rooks on the back rank safe from enemy queen infiltration after castling by blocking key squares. | When facing a check, block with a piece instead of moving your king into exposed lines that allow mate. | Keep your pieces active; passive ones become easy targets for opponent tactics. | Before castling, ensure no enemy rook or queen can give an immediate check along the file or rank. | In endgame, avoid trading pieces when you are already down in material.
It was very confusing to look at the initial screenshot, it looked like black was oss 20b and stockfish was white while it was opposite and I assumed there must have been some bug there
Yeah I tried to give 20b the most advantage possible, always white
Ok that's cool and all, but it'll be interesting to see what elo the model rates as with and without the learned skills injected into the system prompt
Yeah, if you mean model vs model
You can't learn without a strong opponent
Also I imagine llms writing code for chess bots and then those bots plat against stockfish and then reflect and update code
I might do that for my school thing
That would probably be easier for the model to game a win eventually
Stockfish strong
Very
And I don't have faith in llms beating humans in coding
And stockfish had very good human coders and chess players carefully modify it
Yeah it is. I've had it on the back foot a couple times for a move or two. It's strength is more that it punishes t f out of any mistake
Sf is searching for the ideal move for any situation so I guess idk
I wish to try this now with different models and weak stockfish
I dislike that gpt oss only has 20b and immediately huge 120b without anything in between
Doesn't fit on v5e-8
hello guys
good morning
i jsut found out you can run gpt oss on 16gb of ram
hey there I have a question so im buying a MacBook Air m4 w/ 24gb ram I was wondering if I'd be able to run gpt-oss-20b? and what tps i could get with it?
24GB RAM isn't enough, get 128GB then you can run gpt-oss-120B
nah im planning to run the 20bn one
I have a MacBook Pro, M2 Max with 96GB RAM and can run gpt-oss-120B at around 25.29 tok/sec
oooo
just tested gpt-oss-20B and got 63.72 tok/sec
The M4 is a really good choice. Apple came out with a Matrix Engine on the M3 series that gets better performance with FP tensors. Probably looking at ~25 tok/sec. Would recommend more RAM though
the more ram isin't really an option for me at this point of this bc i just want smth rn thtats compaerable to my pc lol
wowow
Memory is gonna be tight. 14 GB of RAM for the model, KVCache for 128k context window in FP16 format is ~6GB, that leaves 4 GB for the operating system and other apps. Unless you plan on having extremely small conversations you will likely have to choose between running GPT-OSS:20B or using the macbook for other anything else
well, using the model daily isint my thing i just want it to be there for sometiems when im out got no wifi so i have some kind of "information bank" with me yk
trust me, get as much RAM as you can, you can't upgrade later and you will wish you had more
true ngl but its a budget thing yk, and models aint really my first prority using its coding and browsing but ill look into more ram!
If I try to make an in-between of gpt oss-20b and 120b would there be interest
If it even works
I only have ~32gb vram so idk
okay who actually runs oss with 80b parameters in their garage vro 🥀
i might have to rob a data center rq
I got some gpt-oss-codex Ollama modelfiles that work remarkably well with Codex-CLI https://gist.github.com/robertmsale/0f310d62d9599805fb261f416f6f6a08
GPT-OSS is not tuned specifically for Codex-CLI so there's some template parsing in place to ensure tool calling works 100%. It's face-melting fast! I've been trying to get other models (non-OpenAI models) to work with Codex-CLI by hacking away at the templates and system prompts, but GPT-OSS after doing the template, finally got it to work 😁
... I do...
twinamon roll, what the hell are you running it on 😭✌️
I run gpt-oss-120b on my MacBook Pro
they have 80gb of vram??
MacBook Pro, M2 Max with 96GB RAM
tbf ram and vram are different
but idk macs
are the speeds good atleast?
in Macs memory is unified, so 96GB RAM is basically 96GB VRAM
ooooo
thats cool
like i said, what are the speeds?
63.72 tok/sec
thats still alot better than an nvidia card
although i'd hate to see the price of it now
ram markets wild 💔
MacBooks have become the cheapest way to get a new PC, it would seem
i got amd strix halo for half that and it runs gpt-oss:120b same speed
although i been using devstral2-123b more lately. its slower so therefore must be better
For ~$10k you can get 512GB VRAM Mac Studio. I like the fact Macs consume hardly any electricity, generate barely any heat, make zero noise, and are totally usable for inference
My 128GB Mac Studio M1 Ultra is rated to consume 350 watts TDP. Apparently it has fans in it but I never hear them even when running inference for hours at a time. CPU is consistently 50C under heavy load and GPU like 45C. Idk how it is for other hardware but I'm OK with the Mac being slightly slower at the cost of sipping electricity and being quiet
hello
gpt-oss:120b is not face-melting fast in Codex-CLI, but it's pretty usable!
I want train model by finetuning cot prompts how to do that??
🤩
Hey if that's true, I'm currently working on a gpt-oss-20b CoreML build recipe with custom ANE tensors for prefill (hopefully quadrupling prefill for large inputs). To convert gpt-oss-120b to CoreML requires a little more than 256GB RAM. If I get this sucker to actually work on 20b would you be interested in generating a 120b CoreML? For the good of all mankind???
It's not reverse engineering. OpenAI released the models in PyTorch format which is a fairly open tensor format. I've just been patching ops that don't work on CoreML, and had to do some tricky JIT scripting on the prefill phase so tracing can handle full 131072 context. The ANE does 4096 tokens at a time, 32 times for prefill instead of all 131072 all at once.
Once I confirm it works end 2 end I'll upload to huggingface with the scripts and all that
Oh I gotcha. I've never tried doing that. GPT 5.2 Pro seems like its reasoning capabilities would make that impossible
Using GPT-OSS:20b as a writing tool to generate novels in AgentGPT
anyone (still) using codex with local gpt-oss models, or have you moved on from gpt-oss? if you are, anything (e.g., prompts) you patched in codex? how do you serve the gpt-oss models (llama.cpp/LM Studio, vllm, sglang, ...)?
the knowledge cut off date for gpt-oss is too long ago imho. I use gpt-oss-120b as a quick LLM I run locally for general purpose use
My training data includes information up through June 2024. Anything that happened after that isn’t part of my built‑in knowledge.
True - knowledge cut off can become problematic for some use cases. At least for me on mostly small Python-based projects + docs-mcp-server with Python lib docs it's still working well.
that's good, a good use case
for me I use it as a general knowledge question answerer
although with plain llama.cpp I needed to patch codex to get token usage tracking right + patch llama.cpp to get the reasoning content out in the expected field.
So, you don't use gpt-oss with codex?
no, I only use gpt-oss for local general knowledge stuff in LM Studio
I see - hm.. well compared to most other models (especially similarly sized) gpt-oss models are imo still fastest and most capable (incl. coding). still, it only starts to be useful (as real alternative to API based models for coding use-cases) if you have the hardware for full context and can offload most or even the complete model to GPU (e.g., RTX Pro 6000, up to 200 tokens/sec generation for gpt-oss-120b)
I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM. (75.20 tok/sec) It's always there for me on LM Studio and I use it to help limit ChatGPT consumption
i run it on cloud
What are you using ?
And whats the limit?
I use it for codex, but not usually for pure coding. It's more like a git hook for keeping docs aligned, and also a command parser so instead of Codex running cargo check and consuming 100k input tokens from a potentially massive command, gpt-oss consumes the output and generates a summarization with only key details.
Right now serving with LM Studio because it has a sane default prefix caching implementation that makes agentic usage very fast. I'm working on a CoreML version so I can get prefill to run on ANE, so long one-shot inputs execute 5x faster, but for now LM Studio has the best /v1/responses API that's compatible with Codex-CLI
tok/sec scales linearly with context length, so like 75 tok/sec only applies to small inputs. As you approach the end of the 128k context window the decode gets much slower because it has to attend to every token up to the one it's about to generate. CoreML offloading to ANE would speed up absurdly large inputs, and make it on par with data centers at scale
.
I tried gpt-oss and looks so cool, privacy-friendly since it is run on your local environments
RTX 5060 Ti 16GB upgrade saved my life
I might use this for translation, grammar fixes and formattings
.
Question for the chat. Public data suggests that gpt-oss-120b is a pretty widely used model, but do we know what it’s for? Is it mostly daily driver usage or are people building products on top of it? Anyone in here using it for commercial projects etc.?
It's my daily driver local model. It's very good.
So whens this new local coder model coming out?
If you ask the GPT-OSS-120B model what model it thinks it is, it answers that it is GPT-4-Turbo, which makes it likely a quantized version of that model. It's brilliant for Agentic usage as it can be used in swarms, tool-calling, a2a, etc (not great via AWS though, as they have poor API support for OpenAI right now)
What’s the difference between OpenAI 20boss and OpenAI 20boss safeguard I didn’t see it
The new model open source just release and free
I use gpt-oss-120b, I don't know much about gpt-oss-120b-safeguard
You use it on a Remote Desktop or directly inside the local computer
What is heavy in gbt-oss-120b
I just downoad gbt-oss 120b but it's abit too heavy, I heard of GGUF how can I convert it into
gpt-oss 120b is pretty big. My hosting machine has 80GB of RAM and it takes it from 20% up to 90%. The model itself is like a 63GB download. If you want something lighter, try 20b. Not going to be the same experience, but it might run for you.
why are you prefering the 20b MOE to other lighter but sometimes more performative on local resources for both qLoRA or DPO?
I am fine-tuning gpt-oss 20b on custom dataset and facing the below issues all guidance are appreciated
- Continuous thinking loop never coming out of analysis channel
- Ollama Modelfile, chat template
Why you choose oss20b moe?
and fast!
What was the general concensus on 5.1's usage and ability?
what does this have to do with gpt-oss?
bruh
I’m planning to set up a quantized version of gpt-oss 20b on my pc so I always have a capable local model at the ready. Since it’s been out for a long time now, has anyone switched to a different model of a similar size?
Check this out 🙂 Ive build GPT-OSS-20B-Vision on a DGX Spark lol what do you guys think? https://huggingface.co/vincentkaufmann/gpt-oss-20b-vision-preview
Super curious what you guys think!!
Works super great with my noapi-google-MCP (using chromium headless): https://github.com/VincentKaufmann/noapi-google-search-mcp
You can also paste a youtube link in (youtube RAG) and ask it to extract a segment featuring a topic you like and it will cut the video and extract that clip and provide it to you
Video, image and file conversion
google search, image search, reverse image search, google lens, local OCR etc etc
really sick combination!
Btw GPT-OSS-20B runs on my DGX Spark like a monster!
Spilled coffee on my macbook, traveled to Dubai just with my DGX Spark + portable display + mouse and keyboard, trained it in the hotel, lobby and pub restaurant (where ever i had a plug)!
Most fun i ever had! Curious what you guys think!
GPT-OSS-20B-Vision with my noapi-google-search-mcp can subscribe to a youtube channel and download the videos, transcribe them (youtube RAG) automatically, news feed, create QR codes, shorten URLS (tinyurl), uploads files to buckets (minio) and online storage, convert media files, fetch emails, transcribe locally, Local OCR etc
OH,SNAP
Simplicity
Cutting out the middleman. 💡
I hope my replies are on the subject matter.
gpt oss 20b is ok quantized. the heretic cleaned one instead... bah... not so much
Is OSS useful at all or is it really designed to be tuned?
it is a MoE multitude of experts.
for large workflows it is computationally less expensive. For a single user to have it as private LLM makes no sense cost wise as consumption.
The MoE may even result a bit difficult if the internal routing is misfiring . You can have more stability with a solid LLM .
If you think configuring an LLM for home- on your machine define purpose, choose a good base, do LoRA on colab with unsloth then consolidate turn to GGUF and run on LM studio.
But you need good datasets to train and a clean base
else if you can go Oobaboo or oollama if your machine can handle
I mean this is a different way. In other words, is the model card any good for the Mixture of Experts. I didn't want to read the entire 32 page pdf to see what it was trained on.
From what I have heard is its very bad compared to Qwen and Kimi k2 but if that is the case, I am looking for its purpose then. Maybe OpenAI things its a good model to tune. Not sure.
What your speaking about in computational cost is active weights during inference. Most MoEs offer this with kimi k2 being one of the lowest at around 40b active.
That being said, it only saves time at inference, computational power is modestly saved and active memory is not.
As for the internal routing, that is just latent variable activations and with a MoE its considered beneficial to have a critique model at the end or a weighted quorum output. Some models do this automatically (rarely I have seen) but external is always better and more controlled.
yes, you got my point. I was testing the p-e-w made gpt-oss-20b-heretic
C
Sora ai speaks greek
i wish i could get my hands on one of those 😭 by the time i do it will be outdated
thats awesome btw.
i finally caved and got a GPT-Pro sub, ya'll can release GPT-30b now
Oh
Use Gemma 3
Nah
These are my hardware specifications, can i run locally gpt-oss-20 b and also 120 b?
no, sorry you won't be able to run gpt-oss locally
What are the minimum hardware specifications for gpt oss models? Can you write me if you know please
I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB Unified Memory
yes
I run oss 20b on as GGUF quantized 4k
system specifications
Os: Windows 11 Pro
Version 25H2
OS build 26200.7705
Experience Windows Feature Experience Pack 1000.26100.291.0
RAM: CORSAIR VENGEANCE DDR5 32GB (2x16GB) DDR5 6000MHz CL30 AMD EXPO Intel XMP iCUE Memoria Compatibile per Computer - Grigio (CMK32GX5M2B6000Z30)
Processor AMD Ryzen 7 9700x 8-core processor 3.80 GHz
Video Card AMD Radeon RX 7600 8 Gb VRAM
Samsung SSD 870 EVO 2 TB
Samsung SSA 990 PRO 2 TB
guys, I need help. Are NVIDIA or AMD GPUs better for local LLM using on Linux?
if you are talking about amd this works a bit better with linux systems so maybe it could handle more load that way when using amd gpu rather then nividia when also talking about price to performance ofcourse and the os in use
For local LLMs on Linux, NVIDIA is usually the better choice.
With the same price, NVIDIA generally has better support for tools like LM Studio, CUDA-based AI software, and overall smoother setup on Fedora/Linux. AMD can be fine on Linux, but for local LLM use, NVIDIA is usually the safer and more compatible option.
but also think about the llm model you will be running and the size of it is also important
sadly much more expensive. 900€ for 12GB VRAM on Nvidia (5070) and 900€ for 24GB VRAM on AMD (7900XTX)
Do you have experience? How much worse is AMD with llms?
I only use nividia and so have no experience with amd
you can always check current status on CUDA vc ROCm, but currently overall CUDA is around 1.5 to 3 times better than ROCm, in pure raw compute power, cuda wins by like 50% or maybe 150%, something between that
This need update
For purely LLMs AMD is better on price per GB alone, but NVIDIA will give you more options for future projects
I am running a AMD Strix Halo device with 128gb of unified RAM and am pretty happy with it on Linux. LLMs are absolutely no issue. It’s pythonic diffusion model runners that are the issue, but this has significantly improved in the last month with better support from Linux kernel and ROCM drivers
has anyone tried running gpt oss with the heretic software?
i believe there is p-e-w/gpt-oss-(1)20b on huggingface
Probably yes there is the heretic 120. I only used the 20b heretic gguf.
the bases are from p-e-w too if you need to use adapters
Did you know that you can use gpt-oss:120b for free even on trash CPUs using Ollama? Well, I didn't. BUT IM SO HAPPY I FOUND IT OUT!!
api_key = "YOUR API KEY HERE"
client = Client(
host="https://ollama.com",
headers={"Authorization": f"Bearer {api_key}"}
)
prompt = input('prompt: ')
print('')
messages = [{"role": "user", "content": f"Write a cinematic fantasy story, only output the story, nothing else: {prompt}"}]
for part in client.chat("gpt-oss:120b-cloud", messages=messages, stream=True):
print(part["message"]["content"], end="", flush=True)```
Hello everyone!
I wanted to open a discussion about fine-tuning gpt-oss-20b.
Inside the current design we have, there's an agent that uses a single gpt-oss-20b model, that must achieve the following tasks:
- Task A (Tool Selection): the model must choose, from a reduced tool scope, the correct tool and with the correct arguments
- Task B (Structured Generation): the model must conditionally generate a structured response based on the tool responses, with precise rules and formats.
We apprached this situation by fine-tuning a gpt-oss-20b model with a dataset that contains the whole ideal/target message history:
System prompt (without big rules & constrains that are required if the model is not fine-tuned) -> User prompt -> correct Tool Call -> Tool response -> Conditioned generation of the final response.
The intention to do a monolithic fine-tuning to "fix everything at once". After testing we are observing that this not may be the best solution.
I wonder... how should we approach this situation? How should we go about fine-tuning?
We have even considered using two different models (one for Task A and another for Task B), but this adds a great deal of complexity to the agent’s structure.
Eager to know about you past experiences, knowledge or your opinion!
I had a bit of fun with my 5700XT's, it works just fine, it's mostly tooling that suffers but LLM is fine, I am using an Nvidia RTX 5060 16GB now, but I am still considering a 7900XT for its VRAM
The problem with AMD is not in its limitations, it is in its capabilities (as it is always the case with AMD)
have yoiu considered cpt?
Real, they need to especially release them in smaller sizes with vision too. Would be great
Honestly it wouldnt be too hard to create a simple vision system for the local model. It would just be based on word2idx processing. It takes chunks at a time, relays it into words and sends back to the model itself. Im not entirely sure, but could even have the local model do the processing itself? Just an idea
I dont normally work with LLM in that sense.
open sourcing 4o 🤔
That can't possibly end badly.
4o
@thorn quiver i dint get it
Min spec?
This uses the cloud model so you're not actually running the model and you're using Ollama's billing system / free plan
Exactly, someone actually made a findtuned version of GPT-OSS that has vision
If only OpenAI were to update their GPT-OSS line to make a line multimodel models from 2B to 120B to stack up against Qwen and Gemma models, it would be really great
What minimum amount of training data is reasonable for finetuning GPT-OSS 20b?
7 eggs.
Why would they bother to when others already have?
What do you mean by that?
What benefit does the megacorp OpenAI gain from releasing an open source/weights/etc. model that it doesn't also gain from observing every other open source/weights/etc. model in the wild?
Short answer: control, leverage, and ecosystem gravity.
Watching others’ open models gives OpenAI information. Releasing its own gives influence.
And clearly they don't think it's worth it.
we have four open source models from OpenAI, I am sure they will iterate on them
I doubt it at this point. They were released before "OpenAI" underwent complete mission collapse and corporatization.
I don't doubt it, the oss models are closely aligned with ChatGPT. the oss models were released in August
We'll see.
Is it channel about ChatGPT's OS?
what?
Did you mean OSS or just another OS like ChatGPT Atlas?
Why this chat named gpt-oss
And i saw that somebody talking about chatgpt OS
because it's about gpt-oss models
no
gpt-oss are advanced open weight models released by OpenAI
Oh, thanks
"Advanced" they're months old, ain't they?
Yea they are really showing their age since there is even a small 9B model that even beats the 120B GPT OSS in some benchmarks lol
And I think it's because the 120B GPT OSS only uses like 3B active parameters compared to the 9B which uses all the parameters
And also that 9B model has a better efficient architecture
hi everyone, how is everbody this evening ?
What's up?
i tried to use it in opencode b4... its not very good at alll... they couldve prob done 12B active w/ 10 experts.
hiiiii
Yo
They should try making parameter models to compete with Gemma from Google
Like 20b works fine on my m5 MacBook Pro
But maybe some 4b, 8b maybe?
And some new information loaded past 2023 or whatever cutoff oss has rn
GPT-OSS 2 when?
I like the idea
For what profit for them? They're a for profit now.
True
I’ve been getting good results tailoring GPT-5.4 prompts for OSS-120B in my SPARK project (bit slow though).
https://community.openai.com/t/spark-simple-personal-ai-reasoning-kernel/1366435
Quick question — is there an OpenAI embeddings model available for OSS/local use? I’m using nomic at the moment.
Also curious what speed differences people are seeing between ~20B vs 120B?
Running on a DGX Spark.
well speed between 20B vs 120B is the size of the parameters, so 20B will always be quicker.
.
What are you wondering about GPT OSS? You wondering about a new generation of GPT OSS models ranging from 2B to 120B with multi model capabilities? I am
You know how powerful a 9B GPT OSS with multi model capabilities would be like
Don't get your hopes up, that's not "Open"AI's business model any more.
what is oss??
Open Source Software
I agree with that statement
can any help with the gpt -oss model can i fine tune to make it multi capable of it to use it as the coding agent
Yes please we need a open source codex model
You got a good gpu?
yes its true we can train model together
no js distill opus and codex on agentic runs
never in hell will openai do that
"for safety" sure buddy
Has anyone experimented with speculative decoding for GPT OSS 20B? I'm running mostly on CPU and getting 10 tok/s, looking to push that up to maybe 15 tok/s? What are some good draft models?
10 tk/s is very very slow
Yeah I run on CPU, my VRAM cannot fit the model. It's a Tesla P4 with 8GB VRAM, I offload some of the layers onto it but most of it is still on the CPU
HII
GPT OSS 20B is the draft model. There is no smaller GPT with the same vocabulary and neural net that can be used as a draft model.
So you would use GPT OSS 120B with 20B as a draft model? Interesting.
Yes, but the caveat is you would not get a significant perf increase because GPT OSS 120b is already fast and you would need to load both models into memory. Typically you need a tiny version of a much larger model for speculative decoding to yield any kind of performance benefit, and it has high memory requirements. If you're already offloading OSS 20b to CPU due to having only 8GB VRAM you would need a smaller GPT that fits entirely in GPU and if it's drastically less accurate you might only get a few % running on GPU while the rest ends up on CPU with 20b
Speculative decoding also requires the same vocabulary, So if you tried to do Lfm2.5 1.2B, which is a very fast and small LLM, it would produce logits with a completely different vocabulary and the OSS 20b would invalidate the entire draft (basically you'd get slower inference no matter what)
These are the first open source image models that use GPT OSS as a text encoder i ever seen
I wonder why Microsoft used GPT-OSS for the text encoder
What do you think?
Espcically when the diffusion model is way smaller then GPT OSS 20b
yo
which AI Model should i run?
Zotac Gaming 4090 24GB GDDR6 CPU: R7 9800X3D Ram: 64GB DDR5
For your specs, the 20B would be the right version of GPT OSS to run