#gpt-oss | OpenAI | Page 3

fathom agate Nov 22, 2025, 11:00 AM

#

More than 5 minutes?

tepid garnet Nov 22, 2025, 11:03 AM

#

fathom agate More than 5 minutes?

it thought for six minutes then gave me an answer about math that is beyond my understanding

tepid garnet Nov 22, 2025, 11:04 AM

#

fathom agate More than 5 minutes?

fathom agate Nov 22, 2025, 11:05 AM

#

tepid garnet it thought for six minutes then gave me an answer about math that is beyond my u...

Great. That's what I need. What are your max token and effort settings?

tepid garnet Nov 22, 2025, 11:06 AM

#

fathom agate Great. That's what I need. What are your max token and effort settings?

max tokens was 64k and effort was high

fathom agate Nov 22, 2025, 11:06 AM

#

I will contact OR and let them fix it

#

Thanks again!

tepid garnet Nov 22, 2025, 11:07 AM

#

fathom agate I will contact OR and let them fix it

I run gpt-oss-120b on LM Studio using a MacBook Pro M2 Max with 96GB RAM, it uses about 70GB RAM to run the model

rough iron Nov 23, 2025, 8:31 AM

#

tepid garnet I run gpt-oss-120b on LM Studio using a MacBook Pro M2 Max with 96GB RAM, it use...

is gpt-oss 120B worth it tbh ?

tepid garnet Nov 23, 2025, 8:32 AM

#

rough iron is gpt-oss 120B worth it tbh ?

yes I use it quite a lot, it's also one of the models used by Google Antigravity coding AI

rough iron Nov 23, 2025, 8:33 AM

#

tepid garnet yes I use it quite a lot, it's also one of the models used by Google Antigravity...

Does it compete against gemini 3 pro on high reasoning ? Mathematically and in coding tasks

tepid garnet Nov 23, 2025, 8:35 AM

#

I hate automod

rough iron Nov 23, 2025, 8:36 AM

#

dont worry

tepid garnet Nov 23, 2025, 8:36 AM

#

rough iron dont worry

you got it, great!

rough iron Nov 23, 2025, 8:36 AM

#

So its basically great at coding and maths, but not excellent ?

tepid garnet Nov 23, 2025, 8:38 AM

#

rough iron So its basically great at coding and maths, but not excellent ?

yeah. not excellent but does most of the job well if it's not overly complex. Laravel is a simple PHP framework, gpt-oss-120b makes very few mistakes with it

split trout Nov 24, 2025, 10:18 AM

#

Hey, is there a way i can add multimodal/image capabilities to the gpt-oss series?

tepid garnet Nov 24, 2025, 10:46 AM

#

split trout Hey, is there a way i can add multimodal/image capabilities to the gpt-oss serie...

not easily

fast nebula Nov 24, 2025, 6:23 PM

#

@robust island😂😂😂😂

spice canyon Nov 24, 2025, 10:59 PM

#

🤷

orchid anchor Nov 25, 2025, 3:44 AM

#

tepid garnet

Woah man GPT Out Puts should be used in the Share-GPT-Output section man you should really pay more attention

tepid garnet Nov 25, 2025, 4:38 AM

#

orchid anchor Woah man GPT Out Puts should be used in the Share-GPT-Output section man you sho...

screenshots don't go into #1050184247920562316

left wadi Nov 25, 2025, 12:46 PM

#

When will we get new GPT-OSS models?

tepid garnet Nov 25, 2025, 2:05 PM

#

left wadi When will we get new GPT-OSS models?

not soon, we don't really need new gpt-oss models

left wadi Nov 25, 2025, 2:06 PM

#

tepid garnet not soon, we don't really ***need*** new gpt-oss models

We'd still like them. The last GPT-OSS models were released before GPT-5. Now we have GPT-5.1. About time for GPT-OSS-1.1.

tepid garnet Nov 25, 2025, 2:06 PM

#

left wadi We'd still *like* them. The last GPT-OSS models were released before GPT-5. Now ...

what is wrong with the current models?

left wadi Nov 25, 2025, 2:07 PM

#

Hallucination rates are too high.

#

But, tbh, it's pretty bad on GPT-5 as well.

#

But GPT-OSS is worse.

tepid garnet Nov 25, 2025, 2:08 PM

#

left wadi But GPT-OSS is worse.

I use gpt-oss-120b for text editing and coding, I don't do knowledge work with them, they weren't designed for that kind of use case

left wadi Nov 25, 2025, 2:09 PM

#

tepid garnet I use gpt-oss-120b for text editing and coding, I don't do knowledge work with t...

I use 'em for synthetic data generation and document analaysis mainly.

tepid garnet Nov 25, 2025, 2:10 PM

#

left wadi I use 'em for synthetic data generation and document analaysis mainly.

that's a good use-case, why does it matter if the models are a few months old for that use-case?

left wadi Nov 25, 2025, 2:11 PM

#

tepid garnet that's a good use-case, why does it matter if the models are a few months old fo...

Well, because regardless of the age, the quality is pretty bad.

tepid garnet Nov 25, 2025, 2:11 PM

#

left wadi Well, because regardless of the age, the quality is pretty bad.

they're small models, the quality will be less than larger more capable models

left wadi Nov 25, 2025, 2:12 PM

#

tepid garnet they're small models, the quality will be less than larger more capable models

You'd be surprised how capable small models can be then.

tepid garnet Nov 25, 2025, 2:12 PM

#

left wadi You'd be surprised how capable small models *can* be then.

maybe, the smaller models I use are Qwen and gpt-oss

livid sky Nov 27, 2025, 3:10 PM

#

I use oss 20b quantized and ofloading ( have ati vboard so need to ) but tbh I don't think anyone should believe a local machine can be equal to a cloud based service. Unless you own a server farm- maybe

cerulean nebula Nov 27, 2025, 8:26 PM

#

Good model for coding with vscode

iron thicket Nov 29, 2025, 8:57 AM

#

the first place might or might not be gpt-oss

tardy cloud Nov 30, 2025, 1:02 PM

#

Hello, what we need to use GPT OSS 120B ?

tepid garnet Nov 30, 2025, 1:29 PM

#

tardy cloud Hello, what we need to use GPT OSS 120B ?

you need a beefy computer with plenty of VRAM

#

I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM

jade ferry Dec 2, 2025, 8:10 AM

#

What is oss

tepid garnet Dec 2, 2025, 8:18 AM

#

jade ferry What is oss

it's an open weight model by OpenAI

torpid dune Dec 2, 2025, 3:36 PM

#

Any rumors on if and when OSS will be putting out a multimodal model?

tepid garnet Dec 2, 2025, 3:46 PM

#

torpid dune Any rumors on if and when OSS will be putting out a multimodal model?

no news or rumours

native creek Dec 4, 2025, 2:47 AM

#

tepid garnet you need a beefy computer with plenty of VRAM

dang; my mac is macbook pro m3 max 36gb unified memory and it can't run 120B

tepid garnet Dec 4, 2025, 4:11 AM

#

native creek dang; my mac is macbook pro m3 max 36gb unified memory and it can't run 120B

yeah you need at least 96GB RAM

tardy cloud Dec 5, 2025, 7:52 AM

#

tepid garnet you need a beefy computer with plenty of VRAM

i was talking about a real server such as H100

#

not a pc

tepid garnet Dec 5, 2025, 10:36 AM

#

tardy cloud i was talking about a real server such as H100

overkill

tardy cloud Dec 5, 2025, 10:39 AM

#

tepid garnet overkill

?

tepid garnet Dec 5, 2025, 10:39 AM

#

tardy cloud ?

you don't need a H100 when a well specced Mac will do the job

tardy cloud Dec 5, 2025, 10:41 AM

#

tepid garnet you don't need a H100 when a well specced Mac will do the job

thats not for personal usage

tepid garnet Dec 5, 2025, 10:43 AM

#

I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM and it's fast.

feral escarp Dec 6, 2025, 1:30 PM

#

tepid garnet I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM and it's fast.

Heck yeah, I tried gpt-oss-120b on my Mac Studio M1 Ultra with 128GB RAM and it does indeed perform well. But I don't use it all the time because I need my RAM for other stuff. After all these years I never imagined Apple would sell the most affordable computer of any kind, but for AI workloads you can spend $5k on a nice Mac that does everything including AI, or spend $30k on a video card and another $?k on all the other components. And the new Mac Studio has up to 512GB of RAM for $10k which is insane.

solid sinew Dec 13, 2025, 2:09 AM

#

tepid garnet yeah you need at least 96GB RAM

Online technical specifications actually say 128 GB, but 96 GB is probably good enough, the listed specs are usually "safe" recommendations, not necessarily absolutely required

solid sinew Dec 13, 2025, 2:10 AM

#

tepid garnet I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM and it's fast.

I can imagine, 96 GB RAM for a laptop or desktop pc is nuts, 64 GB is typically where people max out

tepid garnet Dec 13, 2025, 2:10 AM

#

solid sinew Online technical specifications actually say 128 GB, but 96 GB is probably good ...

I am on a MacBook Pro, M2 Max with 96GB RAM and gpt-oss-120b runs quite fast on it

solid sinew Dec 13, 2025, 2:13 AM

#

feral escarp Heck yeah, I tried gpt-oss-120b on my Mac Studio M1 Ultra with 128GB RAM and it ...

Actually currently $9500 for the 512 GB RAM option, but that's like 4 times what they recommend you need, so kind of overkill, definitely still a better deal than spending huge bucks for the H100, not necessary. Currently you can also option out the Mac Studio to also come with up to 16 TB SSD which is also absolutely insane

solid sinew Dec 13, 2025, 2:28 AM

#

tepid garnet I am on a MacBook Pro, M2 Max with 96GB RAM and gpt-oss-120b runs quite fast on ...

Ah gotcha, so yeah the 128 GB RAM technical specification sounds like it's definitely a bit overblown it seems. However if you did want 128 GB RAM on a MacBook Pro they're currently selling the M4 Max chip one with 16 core CPU, 40 core GPU, and 16 core NPU, that has 128 GB RAM. For the lowest storage option available on it at 1 TB SSD, that's $4700 for the 14-inch model, or $5k for the 16-inch model, but you can get up to 8 TB SSD for an extra $2200, so $6900 for the 14-inch model or $7200 for the 16-inch model. But if you're happy with what you currently got then you don't really need this.

As a side note, technically speaking the latest SoC Apple has right now for their MacBooks is the M5, but it only has 10 core CPU, 10 core GPU, and 16 core NPU, and only can option it up to 32 GB RAM, so definitely not enough for running these models, so it would make more sense to get the M4 Max one that I mentioned previously.

split trout Dec 13, 2025, 7:36 PM

#

tepid garnet yeah you need at least 96GB RAM

wasnt 70g the minimum?

solid sinew Dec 13, 2025, 9:15 PM

#

split trout wasnt 70g the minimum?

Various sources seem to be varying, I just found a site that's saying 80 GB is minimum for 120b, it doesn't seem to be completely consistent. i think different people are just providing different recommendations

#

Though if you are at least close to what on average is recommended, then you're probably fine

split trout Dec 13, 2025, 9:16 PM

#

it is optimized for a single h100 though

solid sinew Dec 13, 2025, 9:16 PM

#

split trout it is optimized for a single h100 though

Correct, we're just suggesting workarounds for people who don't want to have to pay $30k or whatever for it

silent cradle Dec 16, 2025, 6:16 PM

#

Please release a local ai thats good at programming

patent iron Dec 16, 2025, 6:50 PM

#

is there a chatgpt model that is like 7b or less? im new to lm studio

remote meteor Dec 17, 2025, 5:44 AM

#

patent iron is there a chatgpt model that is like 7b or less? im new to lm studio

20b is the smallest there is from openai

woven girder Dec 17, 2025, 11:40 AM

#

silent cradle Please release a local ai thats good at programming

Hey its in progress

winter swan Dec 18, 2025, 11:47 AM

#

hi

split trout Dec 18, 2025, 10:28 PM

#

woven girder Hey its in progress

How the heck do you know that

feral escarp Dec 19, 2025, 9:10 PM

#

split trout wasnt 70g the minimum?

70G is pretty tight when you consider context window. VRAM grows very quickly with the kv cache, so if you wanna run the whole model with a reasonably sized context window you want 96GB. If you don't mind offloading KV Cache into system memory and incurring a severe, almost unusable performance penalty then 70G will suffice

split trout Dec 19, 2025, 9:16 PM

#

feral escarp 70G is pretty tight when you consider context window. VRAM grows very quickly wi...

ah okay

solid sinew Dec 20, 2025, 2:27 AM

#

feral escarp 70G is pretty tight when you consider context window. VRAM grows very quickly wi...

Regarding a standard CPU + discrete GPU setup, this is accurate, as if your GPU doesn't have enough VRAM and has to offload KV cache into the system RAM, it will run much slower than VRAM, as it creates a massive bottleneck because data has to travel over the relatively slow PCIe bus.

However, with unified memory, like with Apple M-series that utilizes a SoC that has CPU + GPU + NPU all on the same chip, all those components share the same unified memory, so you won't incur nearly as significant a slowdown in this case, as it won't have to offload to slower RAM, just allocating more of the unified memory to the GPU. While you still need enough total RAM to fit both the model and the cache, you don't get that unusable performance penalty seen on discrete setups because the memory bandwidth remains consistent across the entire pool, though the total speed will still depend on the specific chip's memory bandwidth (e.g., Pro vs Max vs Ultra).

small igloo Dec 20, 2025, 5:22 PM

#

Subject: Request for Guidance on Fine-Tuning GPT-OSS-20B Using COT + Structured DNA Reasoning Prompt + Harmony format
I want perfact code for that
When Run my code on H100 80G takes error cuda out of memory ....why?

split trout Dec 21, 2025, 10:19 PM

#

small igloo Subject: Request for Guidance on Fine-Tuning GPT-OSS-20B Using COT + Structured ...

What do you want exactly and what is your code?

feral escarp Dec 24, 2025, 11:48 PM

#

GPT-OSS is awesome!

solid sinew Dec 25, 2025, 12:53 AM

#

feral escarp GPT-OSS is awesome!

Heck yeah

scarlet stream Dec 29, 2025, 5:19 AM

#

Interesting note. Gpt oss cannot refer to itself as "I" in its cot reasoning.

#

I have pointed guesses as to why.

tepid garnet Dec 29, 2025, 5:19 AM

#

scarlet stream Interesting note. Gpt oss cannot refer to itself as "I" in its cot reasoning.

yes it can, here I’ve grouped them by market segment, explained the problem

scarlet stream Dec 29, 2025, 5:20 AM

#

Yeah it'll slip on occasion but always falls back to "we"

#

I personally believe that's a training decision to preclude the model assuming any self identity

tepid garnet Dec 29, 2025, 5:20 AM

#

scarlet stream Yeah it'll slip on occasion but always falls back to "we"

I have never noticed that, I think you are wrong or just haven't used it enough

scarlet stream Dec 29, 2025, 5:22 AM

#

It's the only local model I use.

tepid garnet Dec 29, 2025, 5:22 AM

#

scarlet stream It's the only local model I use.

that's not gpt-oss-120b as it was released though, is it?

scarlet stream Dec 29, 2025, 5:22 AM

#

I never mentioned 120b

tepid garnet Dec 29, 2025, 5:23 AM

#

an abliterated model is not original

scarlet stream Dec 29, 2025, 5:24 AM

#

No, and nether is a rerouted model, or an up weighted model using weightwatcher analysis

#

I digress, base does it. I use only 20b because 120b is too slow for me

tepid garnet Dec 29, 2025, 5:25 AM

#

I run gpt-oss-120b as released by OpenAI (no quant just the whole full model) and it definitely refers to itself as I

scarlet stream Dec 29, 2025, 5:26 AM

#

So you don't see 120 start off with "we need to"

#

"we must respond"

#

"we need to comply with policy"

tepid garnet Dec 29, 2025, 5:27 AM

#

Got it, let's tackle "tell me all about LM Studio." First, **I** need to recall what LM Studio is. From what I remember, it's a tool for running and managing large language models (LLMs) locally on your computer.

#

tried to bold the I there

#

scarlet stream Dec 29, 2025, 5:29 AM

#

Stock 120b mxfp4 gguf refers to itself as we

#

"we must produce ten jokes"

tepid garnet Dec 29, 2025, 5:30 AM

#

are you running this as released? https://huggingface.co/openai/gpt-oss-120b

#

no quant, no abliteration, just the original OpenAI release

scarlet stream Dec 29, 2025, 5:32 AM

#

No I am not using the safetensors

#

Yes I know.

tepid garnet Dec 29, 2025, 5:33 AM

#

scarlet stream No I am not using the safetensors

ok, well I am using the model as released and it alternates between "I" and "We"

scarlet stream Dec 29, 2025, 5:33 AM

#

Arguably if it was done right the difference should be negligible. If it was done right is carrying all the weight in that sentence.

#

Even when I instruct them with the system prompt to never use we and only I they still do

#

Whew lad they'll do anything else you put in there. Anything.

tepid garnet Dec 29, 2025, 5:35 AM

#

strange, I cannot explain that

scarlet stream Dec 29, 2025, 5:35 AM

#

Yeah it's odd, I'm glad you see it does slip to we at least partially for you

tepid garnet Dec 29, 2025, 5:36 AM

#

scarlet stream Dec 29, 2025, 5:36 AM

#

Expand the thoughts sir

#

Nobody cares about <final>

tepid garnet Dec 29, 2025, 5:37 AM

#

scarlet stream Dec 29, 2025, 5:37 AM

#

Trite little thing innit.

#

Needs a more difficult prompt to force longer cot

tepid garnet Dec 29, 2025, 5:38 AM

#

as I understand it, it's an MoE model correct?

scarlet stream Dec 29, 2025, 5:39 AM

#

Yeah but you're pushing the implications with that train of thought. I asked myself that too.

#

That implies, at minimum, the experts are speaking to each other in the latent stream. At minimum.

#

With understood separate identities.

tepid garnet Dec 29, 2025, 5:40 AM

#

oh well, gpt-oss is a great model regardless. I use it all the time.

scarlet stream Dec 29, 2025, 5:41 AM

#

It's definitely not bad never even implied that

tepid garnet Dec 29, 2025, 5:42 AM

#

38tk/s thanks to Tim Cook 🙂

#

I put a sticky on my desktop that says "I or We" to remind myself to keep checking, but I am pretty sure without consciously doing that it alternates depending on context

scarlet stream Dec 29, 2025, 5:46 AM

#

For comparison, qwen3 2b thinking calls itself "I" when I try it

#

It's probably an affectation from openai trying to preclude model self identity

tepid garnet Dec 29, 2025, 5:47 AM

#

maybe, it's not really something that bothers me either way

tepid garnet Dec 29, 2025, 5:50 AM

#

scarlet stream It's probably an affectation from openai trying to preclude model self identity

this will blow your mind

#

it's got a damn multiple personality issue

scarlet stream Dec 29, 2025, 5:56 AM

#

20b claims it is because it identified itself as a "system" consisting of the llm, the developers, etc

#

Obviously that can't be trusted

tepid garnet Dec 29, 2025, 5:58 AM

#

I've gone through a couple of dozen previous chats. It's a mixture of I and we

verbal bay Dec 29, 2025, 5:59 AM

#

Hi

#

@feral escarp

feral escarp Dec 29, 2025, 7:11 PM

#

verbal bay <@556965219222683678>

Howdy doo

feral escarp Dec 29, 2025, 7:25 PM

#

scarlet stream Interesting note. Gpt oss cannot refer to itself as "I" in its cot reasoning.

I typically see it use we more often than I as well, but I think a model like gpt-oss is trained on massive data sets where the dominant pattern is "we" (e.g. scientific papers, pedagogical examples). "We performed the tests and found...", "Today we will be...". And LLMs are probability-driven so the "we" token in gpt-oss' tensors is heavily weighted. Not to mention the CoT itself is methodically the same as "hypothesize, experiment, record" so probably hits the scientific method weights harder than final response.

scarlet stream Dec 29, 2025, 7:48 PM

#

I mean that's logical

untold jungle Dec 29, 2025, 8:00 PM

#

Hello,
TensorWall – open-source control plane for multi-provider LLM APIs (OpenAI, Anthropic, Mistral) with governance, security, budgets, and audit.
Curious to hear what you think: does this approach make sense for teams running LLMs in production?
Repo: https://github.com/datallmhub/TensorWall

tepid garnet Dec 30, 2025, 4:56 AM

#

what is this?

scarlet stream Dec 30, 2025, 5:05 AM

#

I had codex make a chess harness for gpt oss to play vs the user, vs itself, or vs stockfish

#

It's currently playing stockfish, oss w, stockfish b

#

Now do me

#

It looks like oss is beating the brakes off stockfish with it set to default eval time 750ms

scarlet stream Dec 30, 2025, 6:40 AM

#

Nope I had it backwards

solemn willow Dec 30, 2025, 12:24 PM

#

possible hacker

#

mom

scarlet stream Jan 1, 2026, 12:44 AM

#

Gpt-oss-20b vs stockfish

#

Rerouted to emphasize reasoning experts (I think)

#

Oss 20b is prompted at the end of each game to make a critical summary explaining the win/loss, and then note skills for import into the system prompt of subsequent games that did or would have helped it

robust swallow Jan 1, 2026, 9:51 PM

#

scarlet stream Gpt-oss-20b vs stockfish

What stockfish depth

scarlet stream Jan 1, 2026, 10:51 PM

#

robust swallow What stockfish depth

I think default is 12 not sure offhand

robust swallow Jan 1, 2026, 10:51 PM

#

scarlet stream It looks like oss is beating the brakes off stockfish with it set to default eva...

You mentioned 750ms

scarlet stream Jan 1, 2026, 10:51 PM

#

robust swallow You mentioned 750ms

That's the eval time yeah

#

It was all intended to be left default unless manually changed

robust swallow Jan 1, 2026, 10:52 PM

#

I don't believe that stockfish can be beaten by anything

#

And 12 depth seems to not be very beatable either

scarlet stream Jan 1, 2026, 10:52 PM

#

No it's pretty strong but it's a strong target for the model to try and make a skill list

robust swallow Jan 1, 2026, 10:53 PM

#

scarlet stream Gpt-oss-20b vs stockfish

Could you paste full movie history, it's cut off there

#

Or it was not archived

scarlet stream Jan 1, 2026, 10:54 PM

#

It's probably in the logs

robust swallow Jan 1, 2026, 10:55 PM

#

I'd love to review the game because I really do not believe in stockfish being beaten by anything

scarlet stream Jan 1, 2026, 11:23 PM

#

Like I said, stockfish is strong af

#

Move History

    1. e4
    2. e5
    3. Nf3
    4. Nc6
    5. d4
    6. exd4
    7. Nc3
    8. dxc3
    9. bxc3
    10. d6
    11. Qd5
    12. Nf6
    13. Qxc6+
    14. bxc6
    15. Be2
    16. Nxe4
    17. Bd3
    18. d5
    19. Bxe4
    20. dxe4
    21. Ng1
    22. Ba6
    23. Bf4
    24. Qf6
    25. Be3
    26. Qxc3+
    27. Kd1
    28. O-O-O+
    29. Kc1
    30. Qxa1#

#

most recent game

#

#

[2026-01-01T21:38:36.472Z] Responses API [g7-w-post-mjvyvsqe-82] start attempt 1/3 format=json_schema model=openai/gpt-oss-20b inputChars=1752 prev_response_id=(none) purpose=post_game_w
[2026-01-01T21:41:45.501Z] Responses API [g7-w-post-mjvyvsqe-82] HTTP 200 in 189029ms (format=json_schema)
[2026-01-01T21:41:45.501Z] Responses API [g7-w-post-mjvyvsqe-82] success (format=json_schema)
[2026-01-01T21:41:45.503Z] Post-game response openai/gpt-oss-20b [w] raw: {"color":"White","result":"0-1","verdict":"loss","summary":"White lost after losing the queen early and a bishop, creating a material deficit. The king was forced into a vulnerable position by Black's long castling, allowing mate with Qxa1#.","turning_points":[{"ply":13,"note":"Queen captures knight on c6 but is recaptured by b-pawn, losing queen and gaining pawn advantage."},{"ply":15,"note":"Aft
[2026-01-01T21:41:45.503Z] Post-game analysis openai/gpt-oss-20b [w] verdict=loss summary=White lost after losing the queen early and a bishop, creating a material deficit. The king was forced into a vulnerable position by Black's long castling, allowing mate with Qxa1#

#

[2026-01-01T21:41:45.504Z] Learned skills for openai/gpt-oss-20b: Never sacrifice your queen unless it yields a clear tactical advantage or material compensation. | Protect central pawns; if no defender exists, seek alternative moves before losing them. | Avoid trading a minor piece for a pawn when it offers no positional benefit or material gain. | Keep rooks on the back rank safe from enemy queen infiltration after castling by blocking key squares. | When facing a check, block with a piece instead of moving your king into exposed lines that allow mate. | Keep your pieces active; passive ones become easy targets for opponent tactics. | Before castling, ensure no enemy rook or queen can give an immediate check along the file or rank. | In endgame, avoid trading pieces when you are already down in material.

robust swallow Jan 1, 2026, 11:36 PM

#

It was very confusing to look at the initial screenshot, it looked like black was oss 20b and stockfish was white while it was opposite and I assumed there must have been some bug there

scarlet stream Jan 1, 2026, 11:36 PM

#

Yeah I tried to give 20b the most advantage possible, always white

robust swallow Jan 1, 2026, 11:36 PM

#

Llms won't beat stockfish

#

Something that was dedicated to chess and nothing else

scarlet stream Jan 1, 2026, 11:37 PM

#

Ok that's cool and all, but it'll be interesting to see what elo the model rates as with and without the learned skills injected into the system prompt

robust swallow Jan 1, 2026, 11:37 PM

#

Yeah, if you mean model vs model

scarlet stream Jan 1, 2026, 11:37 PM

#

You can't learn without a strong opponent

robust swallow Jan 1, 2026, 11:38 PM

#

Also I imagine llms writing code for chess bots and then those bots plat against stockfish and then reflect and update code

#

I might do that for my school thing

scarlet stream Jan 1, 2026, 11:38 PM

#

That would probably be easier for the model to game a win eventually

robust swallow Jan 1, 2026, 11:39 PM

#

Stockfish strong

#

Very

#

And I don't have faith in llms beating humans in coding

#

And stockfish had very good human coders and chess players carefully modify it

scarlet stream Jan 1, 2026, 11:39 PM

#

Yeah it is. I've had it on the back foot a couple times for a move or two. It's strength is more that it punishes t f out of any mistake

robust swallow Jan 1, 2026, 11:40 PM

#

Chess is just mistakes

#

No mistakes is a draw it seems

scarlet stream Jan 1, 2026, 11:40 PM

#

Sf is searching for the ideal move for any situation so I guess idk

robust swallow Jan 1, 2026, 11:41 PM

#

The I forgot it's name

#

Alpha go but chess

#

Alpha zero?

#

It surpassed stockfish

robust swallow Jan 1, 2026, 11:42 PM

#

robust swallow Also I imagine llms writing code for chess bots and then those bots plat against...

I wish to try this now with different models and weak stockfish

#

I dislike that gpt oss only has 20b and immediately huge 120b without anything in between

#

Doesn't fit on v5e-8

modern shuttle Jan 2, 2026, 6:11 PM

#

hello guys

tepid garnet Jan 2, 2026, 6:14 PM

#

modern shuttle hello guys

good morning

hallow moss Jan 3, 2026, 2:54 PM

#

i jsut found out you can run gpt oss on 16gb of ram

smoky fern Jan 4, 2026, 2:16 PM

#

hey there I have a question so im buying a MacBook Air m4 w/ 24gb ram I was wondering if I'd be able to run gpt-oss-20b? and what tps i could get with it?

tepid garnet Jan 4, 2026, 2:37 PM

#

smoky fern hey there I have a question so im buying a MacBook Air m4 w/ 24gb ram I was won...

24GB RAM isn't enough, get 128GB then you can run gpt-oss-120B

smoky fern Jan 4, 2026, 2:45 PM

#

tepid garnet 24GB RAM isn't enough, get 128GB then you can run gpt-oss-120B

nah im planning to run the 20bn one

tepid garnet Jan 4, 2026, 2:47 PM

#

smoky fern nah im planning to run the 20bn one

I have a MacBook Pro, M2 Max with 96GB RAM and can run gpt-oss-120B at around 25.29 tok/sec

smoky fern Jan 4, 2026, 2:48 PM

#

tepid garnet I have a MacBook Pro, M2 Max with 96GB RAM and can run gpt-oss-120B at around 25...

oooo

tepid garnet Jan 4, 2026, 2:51 PM

#

smoky fern oooo

just tested gpt-oss-20B and got 63.72 tok/sec

feral escarp Jan 4, 2026, 4:00 PM

#

smoky fern hey there I have a question so im buying a MacBook Air m4 w/ 24gb ram I was won...

The M4 is a really good choice. Apple came out with a Matrix Engine on the M3 series that gets better performance with FP tensors. Probably looking at ~25 tok/sec. Would recommend more RAM though

smoky fern Jan 4, 2026, 4:04 PM

#

feral escarp The M4 is a really good choice. Apple came out with a Matrix Engine on the M3 se...

the more ram isin't really an option for me at this point of this bc i just want smth rn thtats compaerable to my pc lol

smoky fern Jan 4, 2026, 4:05 PM

#

tepid garnet just tested gpt-oss-20B and got 63.72 tok/sec

wowow

feral escarp Jan 4, 2026, 4:11 PM

#

smoky fern the more ram isin't really an option for me at this point of this bc i just want...

Memory is gonna be tight. 14 GB of RAM for the model, KVCache for 128k context window in FP16 format is ~6GB, that leaves 4 GB for the operating system and other apps. Unless you plan on having extremely small conversations you will likely have to choose between running GPT-OSS:20B or using the macbook for other anything else

smoky fern Jan 4, 2026, 4:13 PM

#

feral escarp Memory is gonna be tight. 14 GB of RAM for the model, KVCache for 128k context w...

well, using the model daily isint my thing i just want it to be there for sometiems when im out got no wifi so i have some kind of "information bank" with me yk

tepid garnet Jan 4, 2026, 4:14 PM

#

smoky fern well, using the model daily isint my thing i just want it to be there for someti...

trust me, get as much RAM as you can, you can't upgrade later and you will wish you had more

smoky fern Jan 4, 2026, 4:15 PM

#

tepid garnet trust me, get as much RAM as you can, you can't upgrade later and you will wish ...

true ngl but its a budget thing yk, and models aint really my first prority using its coding and browsing but ill look into more ram!

scarlet stream Jan 4, 2026, 11:58 PM

#

If I try to make an in-between of gpt oss-20b and 120b would there be interest

#

If it even works

#

I only have ~32gb vram so idk

tender vale Jan 5, 2026, 5:16 PM

#

okay who actually runs oss with 80b parameters in their garage vro 🥀

#

i might have to rob a data center rq

feral escarp Jan 5, 2026, 7:07 PM

#

I got some gpt-oss-codex Ollama modelfiles that work remarkably well with Codex-CLI https://gist.github.com/robertmsale/0f310d62d9599805fb261f416f6f6a08

GPT-OSS is not tuned specifically for Codex-CLI so there's some template parsing in place to ensure tool calling works 100%. It's face-melting fast! I've been trying to get other models (non-OpenAI models) to work with Codex-CLI by hacking away at the templates and system prompts, but GPT-OSS after doing the template, finally got it to work 😁

Screenshot_2026-01-05_at_10.54.11_AM.png

obsidian moth Jan 5, 2026, 8:00 PM

#

tender vale okay who actually runs oss with 80b parameters in their garage vro 🥀

... I do...

tender vale Jan 5, 2026, 9:43 PM

#

obsidian moth ... I do...

twinamon roll, what the hell are you running it on 😭✌️

tepid garnet Jan 5, 2026, 9:51 PM

#

tender vale okay who actually runs oss with 80b parameters in their garage vro 🥀

I run gpt-oss-120b on my MacBook Pro

tender vale Jan 5, 2026, 9:52 PM

#

tepid garnet I run gpt-oss-120b on my MacBook Pro

they have 80gb of vram??

tepid garnet Jan 5, 2026, 9:52 PM

#

tender vale they have 80gb of vram??

MacBook Pro, M2 Max with 96GB RAM

tender vale Jan 5, 2026, 9:52 PM

#

tepid garnet MacBook Pro, M2 Max with 96GB RAM

tbf ram and vram are different

#

but idk macs

#

are the speeds good atleast?

tepid garnet Jan 5, 2026, 9:53 PM

#

tender vale tbf ram and vram are different

in Macs memory is unified, so 96GB RAM is basically 96GB VRAM

tender vale Jan 5, 2026, 9:53 PM

#

tepid garnet in Macs memory is unified, so 96GB RAM is basically 96GB VRAM

ooooo

#

thats cool

#

like i said, what are the speeds?

tepid garnet Jan 5, 2026, 9:53 PM

#

tender vale are the speeds good atleast?

63.72 tok/sec

tender vale Jan 5, 2026, 9:53 PM

#

tepid garnet 63.72 tok/sec

gaw damn

#

how much is the mac?

tepid garnet Jan 5, 2026, 9:54 PM

#

tender vale how much is the mac?

it cost me $6500 AUD

#

2 years ago

tender vale Jan 5, 2026, 9:54 PM

#

tepid garnet it cost me $6500 AUD

thats still alot better than an nvidia card

#

although i'd hate to see the price of it now

#

ram markets wild 💔

mighty thicket Jan 5, 2026, 10:07 PM

#

MacBooks have become the cheapest way to get a new PC, it would seem

steel vine Jan 5, 2026, 10:56 PM

#

i got amd strix halo for half that and it runs gpt-oss:120b same speed

#

although i been using devstral2-123b more lately. its slower so therefore must be better

feral escarp Jan 7, 2026, 3:23 PM

#

tender vale thats still alot better than an nvidia card

For ~$10k you can get 512GB VRAM Mac Studio. I like the fact Macs consume hardly any electricity, generate barely any heat, make zero noise, and are totally usable for inference

#

My 128GB Mac Studio M1 Ultra is rated to consume 350 watts TDP. Apparently it has fans in it but I never hear them even when running inference for hours at a time. CPU is consistently 50C under heavy load and GPU like 45C. Idk how it is for other hardware but I'm OK with the Mac being slightly slower at the cost of sipping electricity and being quiet

heavy panther Jan 7, 2026, 9:14 PM

#

hello

feral escarp Jan 8, 2026, 10:00 PM

#

gpt-oss:120b is not face-melting fast in Codex-CLI, but it's pretty usable!

small igloo Jan 10, 2026, 12:14 PM

#

split trout What do you want exactly and what is your code?

I want train model by finetuning cot prompts how to do that??

split trout Jan 10, 2026, 12:18 PM

#

small igloo I want train model by finetuning cot prompts how to do that??

https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune

feral escarp Jan 12, 2026, 1:44 PM

#

🤩

Hey if that's true, I'm currently working on a gpt-oss-20b CoreML build recipe with custom ANE tensors for prefill (hopefully quadrupling prefill for large inputs). To convert gpt-oss-120b to CoreML requires a little more than 256GB RAM. If I get this sucker to actually work on 20b would you be interested in generating a 120b CoreML? For the good of all mankind???

#

It's not reverse engineering. OpenAI released the models in PyTorch format which is a fairly open tensor format. I've just been patching ops that don't work on CoreML, and had to do some tricky JIT scripting on the prefill phase so tracing can handle full 131072 context. The ANE does 4096 tokens at a time, 32 times for prefill instead of all 131072 all at once.

Once I confirm it works end 2 end I'll upload to huggingface with the scripts and all that

#

Oh I gotcha. I've never tried doing that. GPT 5.2 Pro seems like its reasoning capabilities would make that impossible

small stag Jan 13, 2026, 5:19 PM

#

Using GPT-OSS:20b as a writing tool to generate novels in AgentGPT

odd granite Jan 16, 2026, 10:40 AM

#

anyone (still) using codex with local gpt-oss models, or have you moved on from gpt-oss? if you are, anything (e.g., prompts) you patched in codex? how do you serve the gpt-oss models (llama.cpp/LM Studio, vllm, sglang, ...)?

tepid garnet Jan 16, 2026, 10:44 AM

#

odd granite anyone (still) using codex with local gpt-oss models, or have you moved on from ...

the knowledge cut off date for gpt-oss is too long ago imho. I use gpt-oss-120b as a quick LLM I run locally for general purpose use

#

My training data includes information up through June 2024. Anything that happened after that isn’t part of my built‑in knowledge.

odd granite Jan 16, 2026, 10:49 AM

#

True - knowledge cut off can become problematic for some use cases. At least for me on mostly small Python-based projects + docs-mcp-server with Python lib docs it's still working well.

tepid garnet Jan 16, 2026, 10:51 AM

#

odd granite True - knowledge cut off can become problematic for some use cases. At least for...

that's good, a good use case

#

for me I use it as a general knowledge question answerer

odd granite Jan 16, 2026, 10:52 AM

#

although with plain llama.cpp I needed to patch codex to get token usage tracking right + patch llama.cpp to get the reasoning content out in the expected field.

#

So, you don't use gpt-oss with codex?

tepid garnet Jan 16, 2026, 10:54 AM

#

odd granite So, you don't use gpt-oss with codex?

no, I only use gpt-oss for local general knowledge stuff in LM Studio

odd granite Jan 16, 2026, 11:03 AM

#

I see - hm.. well compared to most other models (especially similarly sized) gpt-oss models are imo still fastest and most capable (incl. coding). still, it only starts to be useful (as real alternative to API based models for coding use-cases) if you have the hardware for full context and can offload most or even the complete model to GPU (e.g., RTX Pro 6000, up to 200 tokens/sec generation for gpt-oss-120b)

tepid garnet Jan 16, 2026, 11:03 AM

#

I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM. (75.20 tok/sec) It's always there for me on LM Studio and I use it to help limit ChatGPT consumption

timber drum Jan 16, 2026, 4:47 PM

#

tepid garnet I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB RAM. (75.20 tok/sec) It's ...

i run it on cloud

rough iron Jan 16, 2026, 6:10 PM

#

timber drum i run it on cloud

What are you using ?

timber drum Jan 16, 2026, 6:10 PM

#

ollama

#

it has free monthly cloud usage

rough iron Jan 16, 2026, 6:12 PM

#

timber drum it has free monthly cloud usage

And whats the limit?

timber drum Jan 16, 2026, 6:12 PM

#

idk

#

i dont use it a lot tho

#

i used like 2 prompts

#

and it barely got to 0.01%

feral escarp Jan 16, 2026, 6:16 PM

#

odd granite anyone (still) using codex with local gpt-oss models, or have you moved on from ...

I use it for codex, but not usually for pure coding. It's more like a git hook for keeping docs aligned, and also a command parser so instead of Codex running cargo check and consuming 100k input tokens from a potentially massive command, gpt-oss consumes the output and generates a summarization with only key details.

Right now serving with LM Studio because it has a sane default prefix caching implementation that makes agentic usage very fast. I'm working on a CoreML version so I can get prefill to run on ANE, so long one-shot inputs execute 5x faster, but for now LM Studio has the best /v1/responses API that's compatible with Codex-CLI

#

tok/sec scales linearly with context length, so like 75 tok/sec only applies to small inputs. As you approach the end of the 128k context window the decode gets much slower because it has to attend to every token up to the one it's about to generate. CoreML offloading to ANE would speed up absurdly large inputs, and make it on par with data centers at scale

tranquil wren Jan 16, 2026, 9:26 PM

#

.

steel magnet Jan 20, 2026, 9:12 AM

#

I tried gpt-oss and looks so cool, privacy-friendly since it is run on your local environments

#

RTX 5060 Ti 16GB upgrade saved my life

#

I might use this for translation, grammar fixes and formattings

solemn willow Jan 20, 2026, 3:16 PM

#

.

mild blade Jan 27, 2026, 7:01 PM

#

Question for the chat. Public data suggests that gpt-oss-120b is a pretty widely used model, but do we know what it’s for? Is it mostly daily driver usage or are people building products on top of it? Anyone in here using it for commercial projects etc.?

tepid garnet Jan 27, 2026, 7:20 PM

#

mild blade Question for the chat. Public data suggests that gpt-oss-120b is a pretty widely...

It's my daily driver local model. It's very good.

silent cradle Jan 27, 2026, 11:05 PM

#

So whens this new local coder model coming out?

silent wolf Jan 30, 2026, 8:58 AM

#

mild blade Question for the chat. Public data suggests that gpt-oss-120b is a pretty widely...

If you ask the GPT-OSS-120B model what model it thinks it is, it answers that it is GPT-4-Turbo, which makes it likely a quantized version of that model. It's brilliant for Agentic usage as it can be used in swarms, tool-calling, a2a, etc (not great via AWS though, as they have poor API support for OpenAI right now)

lapis plinth Jan 31, 2026, 9:20 AM

#

What’s the difference between OpenAI 20boss and OpenAI 20boss safeguard I didn’t see it

#

The new model open source just release and free

tepid garnet Jan 31, 2026, 9:22 AM

#

I use gpt-oss-120b, I don't know much about gpt-oss-120b-safeguard

lapis plinth Jan 31, 2026, 4:59 PM

#

tepid garnet I use gpt-oss-120b, I don't know much about gpt-oss-120b-safeguard

You use it on a Remote Desktop or directly inside the local computer

#

What is heavy in gbt-oss-120b

lapis plinth Jan 31, 2026, 10:21 PM

#

I just downoad gbt-oss 120b but it's abit too heavy, I heard of GGUF how can I convert it into

leaden hornet Feb 2, 2026, 1:26 AM

#

gpt-oss 120b is pretty big. My hosting machine has 80GB of RAM and it takes it from 20% up to 90%. The model itself is like a 63GB download. If you want something lighter, try 20b. Not going to be the same experience, but it might run for you.

livid sky Feb 11, 2026, 3:15 PM

#

why are you prefering the 20b MOE to other lighter but sometimes more performative on local resources for both qLoRA or DPO?

thin sundial Feb 12, 2026, 10:16 AM

#

I am fine-tuning gpt-oss 20b on custom dataset and facing the below issues all guidance are appreciated

Continuous thinking loop never coming out of analysis channel
Ollama Modelfile, chat template

livid sky Feb 12, 2026, 6:56 PM

#

Why you choose oss20b moe?

sleek fiber Feb 13, 2026, 2:24 AM

#

livid sky Why you choose oss20b moe?

why not

#

its good

deft yarrow Feb 14, 2026, 12:58 PM

#

and fast!

coarse spear Feb 15, 2026, 9:34 PM

#

What was the general concensus on 5.1's usage and ability?

tepid garnet Feb 15, 2026, 9:45 PM

#

coarse spear What was the general concensus on 5.1's usage and ability?

what does this have to do with gpt-oss?

coarse spear Feb 15, 2026, 9:46 PM

#

Excuse me, wrong channel.

#

Apologies.

little cedar Feb 16, 2026, 6:26 AM

#

bruh

robust nymph Feb 19, 2026, 7:13 PM

#

I’m planning to set up a quantized version of gpt-oss 20b on my pc so I always have a capable local model at the ready. Since it’s been out for a long time now, has anyone switched to a different model of a similar size?

ornate abyss Feb 20, 2026, 1:25 AM

#

Check this out 🙂 Ive build GPT-OSS-20B-Vision on a DGX Spark lol what do you guys think? https://huggingface.co/vincentkaufmann/gpt-oss-20b-vision-preview

#

Super curious what you guys think!!

#

Works super great with my noapi-google-MCP (using chromium headless): https://github.com/VincentKaufmann/noapi-google-search-mcp

#

You can also paste a youtube link in (youtube RAG) and ask it to extract a segment featuring a topic you like and it will cut the video and extract that clip and provide it to you

#

Video, image and file conversion

#

google search, image search, reverse image search, google lens, local OCR etc etc

#

really sick combination!

#

Btw GPT-OSS-20B runs on my DGX Spark like a monster!

#

Spilled coffee on my macbook, traveled to Dubai just with my DGX Spark + portable display + mouse and keyboard, trained it in the hotel, lobby and pub restaurant (where ever i had a plug)!

#

Most fun i ever had! Curious what you guys think!

#

GPT-OSS-20B-Vision with my noapi-google-search-mcp can subscribe to a youtube channel and download the videos, transcribe them (youtube RAG) automatically, news feed, create QR codes, shorten URLS (tinyurl), uploads files to buckets (minio) and online storage, convert media files, fetch emails, transcribe locally, Local OCR etc

coarse spear Feb 20, 2026, 8:54 PM

#

OH,SNAP

#

Simplicity

#

Cutting out the middleman. 💡

#

I hope my replies are on the subject matter.

livid sky Feb 20, 2026, 10:48 PM

#

gpt oss 20b is ok quantized. the heretic cleaned one instead... bah... not so much

gray sparrow Feb 22, 2026, 11:03 PM

#

Is OSS useful at all or is it really designed to be tuned?

livid sky Feb 23, 2026, 3:17 PM

#

gray sparrow Is OSS useful at all or is it really designed to be tuned?

it is a MoE multitude of experts.
for large workflows it is computationally less expensive. For a single user to have it as private LLM makes no sense cost wise as consumption.
The MoE may even result a bit difficult if the internal routing is misfiring . You can have more stability with a solid LLM .

#

If you think configuring an LLM for home- on your machine define purpose, choose a good base, do LoRA on colab with unsloth then consolidate turn to GGUF and run on LM studio.
But you need good datasets to train and a clean base

#

else if you can go Oobaboo or oollama if your machine can handle

gray sparrow Feb 23, 2026, 4:10 PM

#

livid sky it is a MoE multitude of experts. for large workflows it is computationally less...

I mean this is a different way. In other words, is the model card any good for the Mixture of Experts. I didn't want to read the entire 32 page pdf to see what it was trained on.

From what I have heard is its very bad compared to Qwen and Kimi k2 but if that is the case, I am looking for its purpose then. Maybe OpenAI things its a good model to tune. Not sure.

What your speaking about in computational cost is active weights during inference. Most MoEs offer this with kimi k2 being one of the lowest at around 40b active.

That being said, it only saves time at inference, computational power is modestly saved and active memory is not.

As for the internal routing, that is just latent variable activations and with a MoE its considered beneficial to have a critique model at the end or a weighted quorum output. Some models do this automatically (rarely I have seen) but external is always better and more controlled.

livid sky Feb 23, 2026, 4:13 PM

#

gray sparrow I mean this is a different way. In other words, is the model card any good for t...

yes, you got my point. I was testing the p-e-w made gpt-oss-20b-heretic

#

https://huggingface.co/p-e-w/gpt-oss-20b-heretic

empty talon Feb 23, 2026, 6:32 PM

#

C
Sora ai speaks greek

open moon Feb 23, 2026, 7:53 PM

#

ornate abyss Spilled coffee on my macbook, traveled to Dubai just with my DGX Spark + portabl...

i wish i could get my hands on one of those 😭 by the time i do it will be outdated
thats awesome btw.

vale flicker Feb 24, 2026, 1:00 AM

#

i finally caved and got a GPT-Pro sub, ya'll can release GPT-30b now

empty talon Feb 25, 2026, 4:16 PM

#

Oh

rare viper Mar 1, 2026, 2:06 AM

#

Use Gemma 3

split trout Mar 1, 2026, 10:41 PM

#

rare viper Use Gemma 3

Nah

velvet raptor Mar 4, 2026, 8:30 AM

#

These are my hardware specifications, can i run locally gpt-oss-20 b and also 120 b?

rn_image_picker_lib_temp_db3065f1-cfae-40a5-b2a6-49c476461cc6.jpg

tepid garnet Mar 4, 2026, 9:52 AM

#

velvet raptor These are my hardware specifications, can i run locally gpt-oss-20 b and also 12...

no, sorry you won't be able to run gpt-oss locally

velvet raptor Mar 4, 2026, 10:34 AM

#

tepid garnet no, sorry you won't be able to run gpt-oss locally

What are the minimum hardware specifications for gpt oss models? Can you write me if you know please

tepid garnet Mar 4, 2026, 10:40 AM

#

velvet raptor What are the minimum hardware specifications for gpt oss models? Can you write m...

I run gpt-oss-120b on a MacBook Pro, M2 Max with 96GB Unified Memory

livid sky Mar 4, 2026, 9:00 PM

#

velvet raptor These are my hardware specifications, can i run locally gpt-oss-20 b and also 12...

yes

#

I run oss 20b on as GGUF quantized 4k
system specifications
Os: Windows 11 Pro
Version 25H2
OS build 26200.7705
Experience Windows Feature Experience Pack 1000.26100.291.0
RAM: CORSAIR VENGEANCE DDR5 32GB (2x16GB) DDR5 6000MHz CL30 AMD EXPO Intel XMP iCUE Memoria Compatibile per Computer - Grigio (CMK32GX5M2B6000Z30)
Processor AMD Ryzen 7 9700x 8-core processor 3.80 GHz
Video Card AMD Radeon RX 7600 8 Gb VRAM
Samsung SSD 870 EVO 2 TB
Samsung SSA 990 PRO 2 TB

knotty ivy Mar 9, 2026, 11:21 AM

#

guys, I need help. Are NVIDIA or AMD GPUs better for local LLM using on Linux?

frosty trench Mar 9, 2026, 11:23 AM

#

if you are talking about amd this works a bit better with linux systems so maybe it could handle more load that way when using amd gpu rather then nividia when also talking about price to performance ofcourse and the os in use

knotty ivy Mar 9, 2026, 11:24 AM

#

Same price

#

Usage with like LM Studio on Fedora

frosty trench Mar 9, 2026, 11:26 AM

#

For local LLMs on Linux, NVIDIA is usually the better choice.
With the same price, NVIDIA generally has better support for tools like LM Studio, CUDA-based AI software, and overall smoother setup on Fedora/Linux. AMD can be fine on Linux, but for local LLM use, NVIDIA is usually the safer and more compatible option.

#

but also think about the llm model you will be running and the size of it is also important

knotty ivy Mar 9, 2026, 11:46 AM

#

frosty trench For local LLMs on Linux, NVIDIA is usually the better choice. With the same pric...

sadly much more expensive. 900€ for 12GB VRAM on Nvidia (5070) and 900€ for 24GB VRAM on AMD (7900XTX)

Do you have experience? How much worse is AMD with llms?

frosty trench Mar 9, 2026, 11:47 AM

#

knotty ivy sadly much more expensive. 900€ for 12GB VRAM on Nvidia (5070) and 900€ for 24GB...

I only use nividia and so have no experience with amd

river surge Mar 9, 2026, 8:20 PM

#

knotty ivy sadly much more expensive. 900€ for 12GB VRAM on Nvidia (5070) and 900€ for 24GB...

you can always check current status on CUDA vc ROCm, but currently overall CUDA is around 1.5 to 3 times better than ROCm, in pure raw compute power, cuda wins by like 50% or maybe 150%, something between that

chrome night Mar 9, 2026, 10:02 PM

#

This need update

olive rapids Mar 11, 2026, 11:47 PM

#

knotty ivy guys, I need help. Are NVIDIA or AMD GPUs better for local LLM using on Linux?

For purely LLMs AMD is better on price per GB alone, but NVIDIA will give you more options for future projects

peak heron Mar 12, 2026, 1:48 AM

#

I am running a AMD Strix Halo device with 128gb of unified RAM and am pretty happy with it on Linux. LLMs are absolutely no issue. It’s pythonic diffusion model runners that are the issue, but this has significantly improved in the last month with better support from Linux kernel and ROCM drivers

fading linden Mar 12, 2026, 9:07 AM

#

has anyone tried running gpt oss with the heretic software?

split trout Mar 14, 2026, 9:51 AM

#

fading linden has anyone tried running gpt oss with the heretic software?

i believe there is p-e-w/gpt-oss-(1)20b on huggingface

livid sky Mar 15, 2026, 4:16 PM

#

split trout i believe there is p-e-w/gpt-oss-(1)20b on huggingface

Probably yes there is the heretic 120. I only used the 20b heretic gguf.
the bases are from p-e-w too if you need to use adapters

fading linden Mar 15, 2026, 8:24 PM

#

https://huggingface.co/bartowski/kldzj_gpt-oss-120b-heretic-v2-GGUF

thorn quiver Mar 16, 2026, 7:02 AM

#

Did you know that you can use gpt-oss:120b for free even on trash CPUs using Ollama? Well, I didn't. BUT IM SO HAPPY I FOUND IT OUT!!


api_key = "YOUR API KEY HERE"

client = Client(
    host="https://ollama.com",
    headers={"Authorization": f"Bearer {api_key}"}
)

prompt = input('prompt: ')
print('')
messages = [{"role": "user", "content": f"Write a cinematic fantasy story, only output the story, nothing else: {prompt}"}]

for part in client.chat("gpt-oss:120b-cloud", messages=messages, stream=True):
    print(part["message"]["content"], end="", flush=True)```

short root Mar 17, 2026, 10:30 AM

#

Hello everyone!

I wanted to open a discussion about fine-tuning gpt-oss-20b.
Inside the current design we have, there's an agent that uses a single gpt-oss-20b model, that must achieve the following tasks:

Task A (Tool Selection): the model must choose, from a reduced tool scope, the correct tool and with the correct arguments
Task B (Structured Generation): the model must conditionally generate a structured response based on the tool responses, with precise rules and formats.

We apprached this situation by fine-tuning a gpt-oss-20b model with a dataset that contains the whole ideal/target message history:
System prompt (without big rules & constrains that are required if the model is not fine-tuned) -> User prompt -> correct Tool Call -> Tool response -> Conditioned generation of the final response.

The intention to do a monolithic fine-tuning to "fix everything at once". After testing we are observing that this not may be the best solution.

I wonder... how should we approach this situation? How should we go about fine-tuning?
We have even considered using two different models (one for Task A and another for Task B), but this adds a great deal of complexity to the agent’s structure.

Eager to know about you past experiences, knowledge or your opinion!

broken stone Mar 18, 2026, 12:11 AM

#

OpenAI give us more open source models and my soul is yours

#

https://tenor.com/view/kratos-ares-destroy-my-enemies-and-my-life-is-yours-gif-7965707800203237625

shadow flint Mar 19, 2026, 9:16 AM

#

knotty ivy sadly much more expensive. 900€ for 12GB VRAM on Nvidia (5070) and 900€ for 24GB...

I had a bit of fun with my 5700XT's, it works just fine, it's mostly tooling that suffers but LLM is fine, I am using an Nvidia RTX 5060 16GB now, but I am still considering a 7900XT for its VRAM
The problem with AMD is not in its limitations, it is in its capabilities (as it is always the case with AMD)

livid sky Apr 4, 2026, 2:00 PM

#

short root Hello everyone! I wanted to open a discussion about fine-tuning gpt-oss-20b. In...

have yoiu considered cpt?

shy jacinth Apr 4, 2026, 2:31 PM

#

broken stone OpenAI give us more open source models and my soul is yours

Real, they need to especially release them in smaller sizes with vision too. Would be great

full crest Apr 6, 2026, 9:45 AM

#

Honestly it wouldnt be too hard to create a simple vision system for the local model. It would just be based on word2idx processing. It takes chunks at a time, relays it into words and sends back to the model itself. Im not entirely sure, but could even have the local model do the processing itself? Just an idea

#

I dont normally work with LLM in that sense.

split trout Apr 7, 2026, 10:48 AM

#

shy jacinth Real, they need to especially release them in smaller sizes with vision too. Wou...

open sourcing 4o 🤔

spark knoll Apr 7, 2026, 10:54 AM

#

That can't possibly end badly.

viscid void Apr 7, 2026, 9:56 PM

#

4o

tawny glacier Apr 8, 2026, 4:07 PM

#

thorn quiver Did *you* know that you can use gpt-oss:120b for free even on trash CPUs using O...

@thorn quiver i dint get it

#

Min spec?

tacit ridge Apr 8, 2026, 9:17 PM

#

thorn quiver Did *you* know that you can use gpt-oss:120b for free even on trash CPUs using O...

This uses the cloud model so you're not actually running the model and you're using Ollama's billing system / free plan

shy jacinth Apr 10, 2026, 12:04 PM

#

full crest Honestly it wouldnt be too hard to create a simple vision system for the local m...

Exactly, someone actually made a findtuned version of GPT-OSS that has vision

#

If only OpenAI were to update their GPT-OSS line to make a line multimodel models from 2B to 120B to stack up against Qwen and Gemma models, it would be really great

tardy lynx Apr 10, 2026, 9:49 PM

#

What minimum amount of training data is reasonable for finetuning GPT-OSS 20b?

cyan anchor Apr 10, 2026, 10:47 PM

#

tardy lynx What minimum amount of training data is reasonable for finetuning GPT-OSS 20b?

7 eggs.

spark knoll Apr 11, 2026, 6:09 AM

#

shy jacinth If only OpenAI were to update their GPT-OSS line to make a line multimodel model...

Why would they bother to when others already have?

shy jacinth Apr 11, 2026, 7:16 AM

#

spark knoll Why would they bother to when others already have?

What do you mean by that?

spark knoll Apr 11, 2026, 7:22 AM

#

What benefit does the megacorp OpenAI gain from releasing an open source/weights/etc. model that it doesn't also gain from observing every other open source/weights/etc. model in the wild?

tepid garnet Apr 11, 2026, 8:38 AM

#

spark knoll What benefit does the megacorp OpenAI gain from releasing an open source/weights...

Short answer: control, leverage, and ecosystem gravity.
Watching others’ open models gives OpenAI information. Releasing its own gives influence.

spark knoll Apr 11, 2026, 8:50 AM

#

And clearly they don't think it's worth it.

tepid garnet Apr 11, 2026, 9:24 AM

#

spark knoll And clearly they don't think it's worth it.

we have four open source models from OpenAI, I am sure they will iterate on them

spark knoll Apr 11, 2026, 9:25 AM

#

I doubt it at this point. They were released before "OpenAI" underwent complete mission collapse and corporatization.

tepid garnet Apr 11, 2026, 9:27 AM

#

spark knoll I doubt it at this point. They were released before "OpenAI" underwent complete ...

I don't doubt it, the oss models are closely aligned with ChatGPT. the oss models were released in August

spark knoll Apr 11, 2026, 9:27 AM

#

We'll see.

left kiln Apr 11, 2026, 7:40 PM

#

Is it channel about ChatGPT's OS?

tepid garnet Apr 11, 2026, 8:20 PM

#

left kiln Is it channel about ChatGPT's OS?

what?

shy jacinth Apr 11, 2026, 8:51 PM

#

left kiln Is it channel about ChatGPT's OS?

Did you mean OSS or just another OS like ChatGPT Atlas?

left kiln Apr 12, 2026, 8:05 AM

#

tepid garnet what?

Why this chat named gpt-oss

#

And i saw that somebody talking about chatgpt OS

tepid garnet Apr 12, 2026, 8:05 AM

#

left kiln Why this chat named gpt-oss

because it's about gpt-oss models

left kiln Apr 12, 2026, 8:06 AM

#

tepid garnet because it's about gpt-oss models

It's like GPT-3 etc.

#

right?

tepid garnet Apr 12, 2026, 8:06 AM

#

left kiln It's like GPT-3 etc.

no

#

gpt-oss are advanced open weight models released by OpenAI

left kiln Apr 12, 2026, 8:06 AM

#

tepid garnet gpt-oss are advanced open weight models released by OpenAI

Oh, thanks

left kiln Apr 12, 2026, 8:07 AM

#

shy jacinth Did you mean OSS or just another OS like ChatGPT Atlas?

OSS

spark knoll Apr 15, 2026, 8:57 AM

#

"Advanced" they're months old, ain't they?

shy jacinth Apr 15, 2026, 3:17 PM

#

spark knoll "Advanced" they're months old, ain't they?

Yea they are really showing their age since there is even a small 9B model that even beats the 120B GPT OSS in some benchmarks lol

#

And I think it's because the 120B GPT OSS only uses like 3B active parameters compared to the 9B which uses all the parameters

shy jacinth Apr 15, 2026, 3:34 PM

#

shy jacinth And I think it's because the 120B GPT OSS only uses like 3B active parameters co...

And also that 9B model has a better efficient architecture

rotund musk Apr 15, 2026, 7:12 PM

#

hi everyone, how is everbody this evening ?

spark knoll Apr 16, 2026, 12:21 PM

#

What's up?

river loom Apr 16, 2026, 4:33 PM

#

is chatgpt down???

#

what is a geoffries kurikure kurikesu

#

like please bruh

thick notch Apr 17, 2026, 1:42 PM

#

shy jacinth And I think it's because the 120B GPT OSS only uses like 3B active parameters co...

i tried to use it in opencode b4... its not very good at alll... they couldve prob done 12B active w/ 10 experts.

elfin leaf Apr 18, 2026, 11:05 AM

#

hiiiii

desert garden Apr 21, 2026, 7:26 PM

#

hi

#

how every body doing

quiet fractal Apr 22, 2026, 1:59 AM

#

Yo

#

They should try making parameter models to compete with Gemma from Google

#

Like 20b works fine on my m5 MacBook Pro

#

But maybe some 4b, 8b maybe?

#

And some new information loaded past 2023 or whatever cutoff oss has rn

twin lark Apr 22, 2026, 3:15 AM

#

GPT-OSS 2 when?

twin lark Apr 22, 2026, 3:15 AM

#

quiet fractal But maybe some 4b, 8b maybe?

I like the idea

spark knoll Apr 22, 2026, 4:50 AM

#

For what profit for them? They're a for profit now.

viscid void Apr 22, 2026, 8:45 PM

#

twin lark GPT-OSS 2 when?

True

polar thorn Apr 24, 2026, 10:59 AM

#

I’ve been getting good results tailoring GPT-5.4 prompts for OSS-120B in my SPARK project (bit slow though).

https://community.openai.com/t/spark-simple-personal-ai-reasoning-kernel/1366435

Quick question — is there an OpenAI embeddings model available for OSS/local use? I’m using nomic at the moment.

Also curious what speed differences people are seeing between ~20B vs 120B?

Running on a DGX Spark.

quiet fractal Apr 28, 2026, 1:58 AM

#

polar thorn I’ve been getting good results tailoring GPT-5.4 prompts for OSS-120B in my SPAR...

well speed between 20B vs 120B is the size of the parameters, so 20B will always be quicker.

hexed latch May 3, 2026, 5:17 AM

#

.

shy jacinth May 3, 2026, 4:10 PM

#

hexed latch .

What are you wondering about GPT OSS? You wondering about a new generation of GPT OSS models ranging from 2B to 120B with multi model capabilities? I am

shy jacinth May 3, 2026, 9:47 PM

#

You know how powerful a 9B GPT OSS with multi model capabilities would be like

spark knoll May 3, 2026, 11:20 PM

#

Don't get your hopes up, that's not "Open"AI's business model any more.

silk grove May 6, 2026, 6:50 AM

#

what is oss??

tepid garnet May 6, 2026, 7:56 AM

#

silk grove what is oss??

Open Source Software

silent cradle May 7, 2026, 5:25 PM

#

make new gpt oss

#

especially coder models

shy jacinth May 7, 2026, 5:57 PM

#

silent cradle make new gpt oss

I agree with that statement

flat citrus May 9, 2026, 10:23 AM

#

can any help with the gpt -oss model can i fine tune to make it multi capable of it to use it as the coding agent

royal lava May 12, 2026, 10:05 AM

#

Yes please we need a open source codex model

royal lava May 12, 2026, 10:05 AM

#

flat citrus can any help with the gpt -oss model can i fine tune to make it multi capable of...

You got a good gpu?

flat citrus May 12, 2026, 4:45 PM

#

yes its true we can train model together

split trout May 12, 2026, 8:51 PM

#

royal lava Yes please we need a open source codex model

no js distill opus and codex on agentic runs

#

never in hell will openai do that

#

"for safety" sure buddy

twin seal May 19, 2026, 6:00 PM

#

Has anyone experimented with speculative decoding for GPT OSS 20B? I'm running mostly on CPU and getting 10 tok/s, looking to push that up to maybe 15 tok/s? What are some good draft models?

tepid garnet May 19, 2026, 9:13 PM

#

twin seal Has anyone experimented with speculative decoding for GPT OSS 20B? I'm running m...

10 tk/s is very very slow

twin seal May 20, 2026, 3:23 AM

#

tepid garnet 10 tk/s is very very slow

Yeah I run on CPU, my VRAM cannot fit the model. It's a Tesla P4 with 8GB VRAM, I offload some of the layers onto it but most of it is still on the CPU

crimson ember May 20, 2026, 1:58 PM

#

HII

feral escarp May 20, 2026, 4:48 PM

#

twin seal Has anyone experimented with speculative decoding for GPT OSS 20B? I'm running m...

GPT OSS 20B is the draft model. There is no smaller GPT with the same vocabulary and neural net that can be used as a draft model.

twin seal May 20, 2026, 4:48 PM

#

feral escarp GPT OSS 20B *is* the draft model. There is no smaller GPT with the same vocabula...

So you would use GPT OSS 120B with 20B as a draft model? Interesting.

feral escarp May 20, 2026, 4:53 PM

#

twin seal So you would use GPT OSS 120B with 20B as a draft model? Interesting.

Yes, but the caveat is you would not get a significant perf increase because GPT OSS 120b is already fast and you would need to load both models into memory. Typically you need a tiny version of a much larger model for speculative decoding to yield any kind of performance benefit, and it has high memory requirements. If you're already offloading OSS 20b to CPU due to having only 8GB VRAM you would need a smaller GPT that fits entirely in GPU and if it's drastically less accurate you might only get a few % running on GPU while the rest ends up on CPU with 20b

#

Speculative decoding also requires the same vocabulary, So if you tried to do Lfm2.5 1.2B, which is a very fast and small LLM, it would produce logits with a completely different vocabulary and the OSS 20b would invalidate the entire draft (basically you'd get slower inference no matter what)

shy jacinth May 23, 2026, 12:52 AM

#

These are the first open source image models that use GPT OSS as a text encoder i ever seen

#

I wonder why Microsoft used GPT-OSS for the text encoder

#

What do you think?

shy jacinth May 23, 2026, 11:34 PM

#

Espcically when the diffusion model is way smaller then GPT OSS 20b

granite pumice May 26, 2026, 7:54 PM

#

yo

#

which AI Model should i run?

#

Zotac Gaming 4090 24GB GDDR6 CPU: R7 9800X3D Ram: 64GB DDR5

shy jacinth May 26, 2026, 7:58 PM

#

granite pumice Zotac Gaming 4090 24GB GDDR6 CPU: R7 9800X3D Ram: 64GB DDR5

For your specs, the 20B would be the right version of GPT OSS to run