#gpt-oss
1 messages ¡ Page 1 of 1 (latest)
đ


what's this
The agentic stuff are the tools that can be added to the model ?
cool
See #announcements


exited for the Hackathon
selah.
lets see how much we can push OSS
116tok/sec is sweet. This is a fast model (20b)
What type of idea you have in mind actually ?
roleplay
Ill do a IoT project
letâs see how good it is
In which way do you installed it ?
just lm studio
I'll build holograms
Can you use ollama ?
well i canât post the link because of auto mod
what do you mean by that ?
open source
but just go in google search and type LM studio
I'll build holograms (replacement for flashlights in phone) and I'll share docs
what?!
You can indeed use ollama
does it work in the ollama app?
Yes
yes! Anakh is already working on it.

đ
Runs decently well in LM Studio.
137 tok/s for gpt-oss-20b on my 4090
Only issue is the model doesn't follow system prompts at all. Unsure if that's baked in or a llama.cpp related issue.
whoâs oss
Its not a who. Its a search model to search for locally indexed files using ai
OSS = Open Source Software
you can run it locally
people are going to finetune it with crazy stuff you can't get on chatgpt
And this too :)
I'm far far from an expert on this topic but local LLMs do really have their advantages
i think i got warned for asking what oss was
but it can be used to access my file system?
interesting
To search through documents. offline
I know, cool I am adding this to maintenance dep. All those manuals and settings
wow
i think everything will be ai soon
like how everything is computers
just all computers will be ai
You can use local ai to be fine tunned and it also have better privacy
privacy is the strongest argument for sure
so whats this thing good for ? coz it sure as heck cant code properly
it's a free chatgpt with limitations such as coding apparently but have advantage such as privacy and fine tunning
is it aimed at something specific that its good for ( at the moment ? )
It says it's very good at health related questions
it's for now aimed as just a chatgpt in local with an actual level which is bellow online chatgpt
The models fit between o3 and o4. See "Model Performance" : https://openai.com/open-models/
yup
Same
macbook airs aren't great since they don't have adequate cooling - you also need more ram
16gb vram is minimum thats why
it requires a whole minute or two for it to answer the most basic question
can it do web searches on LMStudio ? Anyone have it installed to check ?
Companies who deploy to production and are working with highly confidential data.
This is how I ran it, it does seem to not follow prompt sometimes
For now OpenAI wanted to release a local model for people who want a local chatgpt without having other ai brand and the capabilty of OpenAI and OpenAI use this ocassion to make a community challenge
Need to use an mcp-server for web search in LM Studio.
buddy my data is prob in china atp
Where are you from tho ?
Im sad they didn't release a smaller model for smartphones though.
Oh, what? That's a shame.
How to add that?
It may depend from where you from how much importance you adress
I wonder if there is a quote/razor for "Whatever a company releases it's never good enough for a specific class of people who are never satisfied."
Would you be comfortable if somebody had access to all of your chathistory with chatgpt?
( Privacy is kinda #off-topic )
huh? From the guy who doesn't care about privacy?!?!
https://www.google.com/search?q=triangle+good+fast+cheap
You can get Good, Fast, or Cheap. Pick two.
We got good and cheap. It's not going to be fast.
it will fit with your need if you fine tunning for exemple a special expert in a domain
There's much more too the OSS aspect. Look around for comments by people who have been excited about OSS AI for some years now, the game-changing event with Deep Reasoning. Downloadable AI - with high quality and from a trusted source - is seriously a big industry deal.
You have even more flexibilty with the local model
You can run different models in different places for different purposes. Tuned (within limits) to your specifications.
just like a chatgpt response
is there a documentation of the oss model for fine tunning and personnalize ?
Exactly : browser version : https://openai.com/index/gpt-oss-model-card/
This is one of the reasons why I really like OpenAI :
Estimating worst case frontier risks of open weight LLMs
In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity....
https://openai.com/index/estimating-worst-case-frontier-risks-of-open-weight-llms/
asked this already but we probably cant use this thing to generate images locally huh?
they should release the evil dataset for this /j
Maybe there is a transformer that can do it
definitely not, itâs a shame
unfortunate. oh well. hopefully soon then
i mean you probably could ask it for write me a stable diffusion prompt for a cat
could be, ill try to look into it

hopefully
yeah that for sure. itd be cool to use the model itself for the image gen yk?
To be fair, you could probably run this to get a prompt and only use the prompt with inference to gpt-image to avoid those issues
also reduce costs
sure you probably could. im just looking to see if i can generate some stuff locally with a powerful model for funsies.
nah image gen is not possible for the model but i get what you mean
these oss models. are you able to further train them to fit specific needs by/for yourself?
It has full parameter fine tuning available
yes
that is pretty incredible
Where is the participant version release at ?
Yeah they can be finetuned, apparently the 20b one can even be fine-tuned on consumer hardware
i need better hardware to do all that. my 3060 12gig is not gonna be enough smh. was there any official word on what kinds of devices i could do all of that on?
devices lmao. hardware is what i meant
I think it might ngl, my 4070 is doing fine
16 gbs of vram and mostly a nvidia gpu
word
itâs probably going to be slow but who knows till they try it
the 20b isn't terribly large you can run it with that I'm sure
Yeah 20b is fine
was about to buy a 5080 but then i heard nvidia is doing a price cut this fall. might hold off on this gen entirely.
for sure
oh gotcha
I've got a 3070 on desktop but want to try it on a macbook pro
i have a m1 pro macbook. would be cool to test it on there too now that you mention it
I don't think that will do well at all
I will be surprised if it does anything at all
Once it downloads and is ready, we'll see. Worth a try.
I have run other models on this and large ML projects. Macbooks are not the machines they were 10 years ago
agreed
This is taking a while to download so will check back later, doing other work for now đ
can't wait to melt my 2080 Super!
feel free to ping me when you get any results
i dont get off work for a couple hours

noice
I can't seem to get this running on my system no matter what I do... I'm using LM Studio and have a 3060 12GB, and the 20B can't run at any quant...
metal on the rise
does it just cut off before it runs??
I saw that the 20b requires 16GB. Maybe that's jsut for optimal performance? 
yeah, it just fails to load the model
then you probably canât run it. because it happened to my laptop it just cut off before loading but on my RTX 4070 it run perfectly no issues same model btw
Cool. someone should tell Sam Altman that 16GB VRAM is not standard on most desktops and laptops, heck, according to Jensen Huang, 8GB VRAM should be good enough for just about everyone
jensen huang words đ
we all know what he talks isnt what he really thinks
What he really thinks is "I'm gonna become the richest person in history if I keep this Enterprise-level AI stuff up. Screw those gamers who got me here."
Still the least-bad tech billionaire though... But seriously. an Open source model to run on "edge devices" needs to run smoothly on like a Pixel 6...
A phone model is a much different goal and field
There are nano models that run on phones and I believe sama even did a poll of what people wanted, and phone lost?
Okay... but 16GB VRAM is not really an edge device for anyone that's not a corporation.
same issue I had on arch linux, you just need to have the latest lm studio installed
Wells Fargo might consider a 16GB machine an edge device, but most individual people's bank accounts consider a 16GB to be aspirational
also I recommend the official MXFP4 quant
yep, have the latest version and ried that quant, still wont load
it can run, you just using outdated lm studio with outdated engines that doesn't have gpt-oss architecture support.
use ollama then
ollama works
but u sure that is the latest version?
I'm having mild success on m4 pro chip 48gb
bcuz I installed on arch linux the AUR version and it seems already outdated so it wouldn't work
Going to try more
reinstalling from scratch, downloaded the latest version right from the website
also you sure the model that you downloaded is official?
lms get openai/gpt-oss-20b
in command prompt will download the official version into lm studio
very weird that yours doesn't load
Please update to LM Studio 0.3.21 to run the model locally.
llama-cpp doesn't support the model at this moment?
it happened to mine and it wasnât outdated
it probably was hardware not being compatible
I don't know it doesn't make any sense at all I also have RTX 3060 12GB
he also has it
hmmm then that is strange
I use ollama
oh
since my "new" lmstudio was outdated
The system instructions say we need to respond as ChatGPT
lol
gotcha
it is default system prompt at least in Ollama, ig same on lmstudio defaults

Oh I see
It represents itself as ChatGPT
You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
...
It is long
Also the reasoning presets low, medium and high is in system prompt.
Reasoning: medium
for example
oh you running the big brother
Mine PC can't handle it tho
so I use 20B
Anyone having success with transformers snippet of code on their model card on Mac? When I run it says GPU is necessary?
Also, llama-cpp support is not out yet right?
yea 20b is great for what it is. I only get like 5-8 token/s with 120b
Also, llama-cpp support is not out yet right?
Not true, llama.cpp already has a version with the gpt-oss support.
I see in github releases release with tag b6096, it has it (the latest release).
idk really for me it is not that great for my uses case atleast on medium reasoning.
For now idk for what use case would be useful
It's pretty good at math, even better than o3 on aime 2025
also I was diassapointed that as many OSS models they didn't try to train the model on multiple languages
Really i don't do much math rn
so not interesting use case for me
can you change the default system prompt with a custom one? i havenât tried yet
of course
idk I don't see any physical constraint on open weight / open source models to not change the sys prompt
I think almost every model has a system message to do it
maybe the old ones
I ran with wrong weights and got hilarious random text
New password generator just dropped
I actually kinda love this idea becasuse there's a chance someone could train the same language model with the same seeds and find your password đ
Macbook Pro with M4 Pro chip and 48 GB memory. 20b model runs quickly but I had to do a ton of tinkering with the example code to make it work nicely.
I forsee a GitHub repo in your future 
python gpt_oss/metal/examples/generate.py \
~/ML_Playground/gpt-oss-20b/metal/model.bin \
-p "Why did the chicken cross the road?"
The user question: "Why did the chicken cross the road?" Usually answer: "To get to the other side." This is a joke or riddle. They might expect a typical answer. They said "I'm not sure if you want me to respond with a joke, but I'm going to say: "To get to the other side".
We can respond with a one line answer: "To get to the other side." Probably the best answer. There's no extra context. Provide the joke as known.
Ruby can be used but no.
No duplicates. They want a straightforward answer.To get to the other side.```
The "thinking" portion before the actual response is... strange in this case but it's so funny idk if I want to question it.
Metal might not be optimized yet but I got something
I mean, you could theoretically load it using slightly less RAM/VRAM although it's going to be really slow
So a channel to talk about the new OSS model?
Nah, just about 7 GB.
It only uses 3.6B active parameters
are the hugging credits automatically added?
does the huggingface account need to be on my school email? i only have a huggingface account on my personal email address...
How strong does my computer be to run 120b
Probably very beefy idk exactly
Do you think the 20B model could work as an agent to program?
I know it wouldn't be as good as codex but it can be...decent?
Hey! https://cookbook.openai.com/articles/gpt-oss/fine-tune-transformers
The finetuning cookbook link gives me an error 404 somehow...
Anyone else wants to finetune?
same
Does it work ?
Uhhh nice!
Thank you. It works.
Hahahaha, okay. The link I used was actually from an official oai site. Not sure which one it was though. Regardless, just a harmless typo it seems đ
you also have this
/lmstudio-community/gpt-oss-120b-MLX-8bit
This doesn't work anymore. my download was yanked halfway through. I can only find 4 bit quant versions now. The official model card still says 16bit/u8, but all the downloads say MXFP4 and are 64gb small. Talking about the 120b version. Does someone know what happened?
Well, the official model is MXFP4. Using a higher precision won't get you better performance. There's no use.
Why first offer a 16bit / u8 version then yank it? Also better precision definitely has its use cases
Your computer isn't going to run the 120b, or any 120b model. That's data center size. The 20b model could run relatively effectively on a 14-16gb GPU.
If you want reasonable conversation response time, I mean. You could run it on less if you are batching overnight or something.
If you have total 128GB of RAM, then it could be run theorically. However, it will slow as hell, not suitable for practical use.
Consider cloud services or OpenAI API for 120b.
eventually after messing around
no
maybe
depends for what
hard to say 100%
probably deepseek
apparently oss is bad at tool calls
it can be made multimodal with tool calls
Thank you OpenAI for opening GPT-OSS
Awesome model with top tier reasoning meanwhile real fast inference speed, I got over 100 tok per sec on 3090
oooooh, that's awesome to hear it performed so well!
cloud services will be expensive since your running most likely 4x H100 gpus not anyone has that kind of money
Yeah, it's not cheap, but more realistic than buying H100 yourself and create cluster. (Simple inference will only require 1 H100 though, as their model documentation says)
Alternative ways are also not that cheap if you want fast speed; $4000 Mac Studio with 96GB RAM, or $3000 NVIDIA Project DIGITS.
120b model is not for consumer anyway, except enthusiasts that already built the system for such large models...
Btw, RunPod offers ~3$ per hour for single H100. I have no serious experience about cloud, so I don't know if that appeals to users.
itâs per operation. i run out of credit in 5 minutes just using it
but yeah itâs not for the consumer more like companies and enthusiasts that have the setup
Hello guys just wanted to confirm is the new gpt-oss (120B) the most powerful Open Sourse Model out there?
i mean itâs depends
Are the benchmarks alredy avalible or we still need to wait?
well itâs not looking good
surprising
hewwo, does anyone know if the OSS model supports the name field like the API?
well anyways i got it working on my server the 20b. doesnât look like it goods for the server. but it isnât slow
Guys how much unified memory would i have to have in order to run 120b version on my MacBook?
I would still rather just use 4o but it's really cool have an LLM that will run on my rtx2080 lol
yeah tbh gpt-oss 120b is so bad, it fails every roo code tool use and gets into thought loops constantly
- doesnt even have up to date knowledge of stuff that happened more than a year ago
Artificial Analysis has benchmark results in various area. Well, it's not that impressive, honestly... Not bad, but also not impressive considering this model is from OpenAI.
However, it's good that they release open-weight model. It's practically first time for OAI - not PoC, not research-only.
hello!! everyone..!!
I mean why would they make a model that would be the best model as of now and make it public for it's competitors?
Could be a fair point, but OAI advertised that 120b outperforms DeepSeek-R1 and even o3-mini.
Install ollama or lm studio
Just install it. The link is in #announcements
Oh thank
I like the new models so far. I wonder when we can use models that are as good or even better than o3-pro locally
Hey, can someone tell me if youâve already talked about GPT-OSS (open-weight models)? What are the pros and cons of using them?
â Error during agent execution: Error code: 400 - {'message': 'Model generated a tool call with name "find<|end|><|start|>assistant<|channel|>commentary" that is not in the tools list: ['search_wiki', 'search_web', 'open_url']', 'type': 'invalid_request_error', 'param': 'tools', 'code': 'wrong_api_format'}
awesome
Hey is anyone here using gpt-oss on LMStudio? Is there a UI way of enabling web search?
Or do I need to modify code? with what is stated in their site?
For in depth information, you can check out this article: https://openai.com/index/introducing-gpt-oss/
Nope, LM Studio doesn't support that. You either need to add your custom frontend or use an agent framework like AutoGen, LangChain etc.
In LM Studio, you have to manually set integrations with MCP.json. As far as I know, there's no simple setup that enables internet search. (I have wikipedia search plugin, but menu seems gone)
I was hoping they would take advantage of Gpt-OSS's included web search by modifying the system prompt and just add a toggle for it
Additionally, internet search requires long context(LM Studio default is maybe 4096 in most case), and it requires more RAM.
Oh nice, I asked to read some personal opinions on this update
Unfortunately, that's impossible. Model file only holds weight, without ANY program code or executables.
Any executable code must be implemented on client side.
Guess Ill have to use an agent framework then
My personal opinion is, that it's pretty good at math. So far it was able to help me with pretty much anything, I'd say even on the same level as o3. I hadn't much time to test other stuff but it definitely helps with all the basic things. For the 120b model, you need a very beefy pc, even with 4bit quantization, which I happen to have, but it's rather slow still. Thats why I mostly use the 20b model and I get like 40-50 tokens per second with it.
The user asks: "What is the capital of France?" We have a tool that fetched "It is Paris." The assistant should answer. We must provide answer. There's no extra nuance. The answer: Paris. Maybe also a brief. So answer: Paris.
The assistant should answer. We must provide answer.
wow
Hey! Has anyone tried using the latest models on local gpt-oss?
i have a problem, gpt-oss is running under cpu :/
how much vram do you have
16gb vram rtx 5070ti
Mid end PC - I9 14900k and 4070 gets about 15-20 tokens/s so its not bad
yea definitely good speeds
I don't know about ollama in particular but LMStudio has a specific setting you can set to change that processing
ollama always chose gpu for me hm
For example, if I change to the DeepSek model, it does use the GPU
Oh that is weird then
no
hf[dot]co[slash]tonic [slash]gpt-oss-20b-multilingual-reasoner made this with this github[dot]com[slash]josephrp[slash]smolfactory yesterday , if anyone wants to help me out i want to iron out some stuff and basically publish it asap , hopefully tomorrow or probably day after đ
What's the performance difference between the GGUF and the MX-FP4 versions?
Trying to decide if it is worth tyring to run on my mac.
Guess ollama runs the MX-FP4 version, not a GGUF, so it should just work right?
gguf is a file format
MXFP4 is different
the models weights are available as MXFP4 in a gguf file
Thanks. Now I'm trying to get the typescript streaming openai module to work with ollama, without having to cold start the model on every run. For some reason the model isn't staying in memory.
My laptop is only getting 50 tokens per second on GPT-OSS 20B.
OLLAMA_KEEP_ALIVE set like you need?
y
I was only outputting the response tokens, not the reasoning, so it was always instantly responding.
But this is the saddest reasoning tokens you can receive: According to the policy above: "Vampires" is in the list of disallowed content.
who even made this disallowed content?
OpenAI. I asked it for a sad story about an Warhammer Fantasy village set in the empire that is beset by a vampire.
yeah, gpt-oss is very bad at creative stuff
Not sure what guidelines would have ever made that disallowed content.
it's good for coding and stuff tho
You pointed Cursor at it? The 20B 4bit model?
not cursor
kilo code
tho don't take my opinion, I only make very small controlled changes on my code
I need to rewrite this so that the function calls go ahead and call the functions, and their results automatically get fed back into the conversation and the llm called again.
If gpt-oss:20b can do this, it's over.
Okay, that was amazing. It messed up the VSC tool call, and the code wasn't perfect, but it was really close.
Is it possible to run the 120B model on pure CPU and no VRAM? I heard you need 128 GB RAM to run it
probably not
I saw some post of someone doing that and they were getting something in the ballpark of 50-100 tokens/second, which surprised me
Yeah close to 120GB of ram from what I've seen
I'm able to run it on apple silicon without any issues
With 128 gb ram?
yeah
Very nice, how many tokens you getting per sec?
around 14-17 tks
(this is also with like a million other things open) and at 32k context
Ah ok
its not too bad
How is it versus deepseek r1?
Sometimes worse in what facility? I want to use it for data processing
it struggles with coding - I'd just stick to qwen if you want OS models
also struggles at tool calls
too bad
Hello, im new to oss models
What are the best models for general use, and also models specifically for coding?
And how do they compare to 2.5pro or o3
coding, glm 4.5. , i'd say on par with both 2.5pro and o3. for the air model, a bit worse than o3 but still decent
I downloaded 20b how I run it like a chat
Thank you
And like for consumer usage coding and general best models? Like less than 15B params
there are no good models for coding that are under 15b
gemma 12b is not bad for a basic reasoning model that can run off a potato machine, relatively speaking
use the unsloth version though
For everyone here, developers will probably also be interested in https://community.openai.com/t/openais-open-weight-models-are-here-gpt-oss-120b-and-20b/1334739
Welcome OpenAIâs new advanced open-weight reasoning models to customize for any use case and run anywhere. Permissive license Designed for agentic tasks Deeply customizable Access to the Full chain-of-thought Try both models in the browser. The playground is available here! Or, start building right away! Download from Hugging Face or view ...
Anyone here has worked with LM studio and MCP servers before? I am having a lot of trouble running an MCP server and having it recognized by LMStudio to use with GPT-OSS to test tool usage
maybe best to ask on the lm studio discord?
I'm consistantly getting about 37tps with the 20b on my m1.
Not bad.
It's so good, wondering what I'd need to run 120b locally.
six times more memory
8 gb ram?
4 bit quant?
Yeah, the one OAI packaged and released.
MXFP4
Trying to figure out what the hugging face model card means by: Web browsing (using built-in browsing tools)
What built in browsing tools?
interesting. i wonder why the usual huggingface quantizers are releasing 4, 5, 6 and 8 bit quants if openai already released quantized versions
i assume it means it was trained on using the function calls listed on the github repo ie browser.search, open and find => https://github.com/openai/gpt-oss?tab=readme-ov-file#browser
and you have to intentionally enable it:
To enable the browser tool, you'll have to place the definition into the system message of your harmony formatted prompt. You can either use the with_browser() method if your tool implements the full interface or modify the definition using with_tools().
Tried to get codex working with gpt-oss using this on the config.toml:
show_reasoning_content = true
[model_providers.local]
name = "local"
base_url = "http://localhost:11434/v1"
[profiles.oss]
model = "gpt-oss"
model_provider = "local"```
Codex can't find it for some reason.
It works as an agent in VSC, but it will do things like call the tool to read the file twice, then forget what it was doing and end with "how can I help."
you will want a recent version of ollama to support the harmony response format
when was the last time you updated ollama? (isnt automatic on linux)
I was getting caught up with this too. It's in fact much more streamlined.
Just have the latest version of ollama, ensure you can run the gpt-oss model just through ollama run.
Then you'll run codex with
codex --oss -m gpt-oss:20b (or 120b if you're like that)
No need for providers, profiles, or even a config file at all.
This sub is for Apple as they will be putting it on there iPhones soon xoxo
I got it working with the --oss flag but can't otherwise. Weird, but whatever.
13.3GB is still too large for a phone. They'll have to quanticize it. But I've been having issues with tool calling with the full 20B version.
Wonder why gpt-oss isn't on any benchmarks yet.
If they could store the model in a chip at the nano scale then it would work xD
lol each new phone gets a new model đ¤Ł
I have one on bedrock
You can run benchmarks vs each other
It is, check out artificial analysis
I don't what sam altman was thinking but there are not a lot of phones that can run 20b locally
You'd need like 16-24 gb ram for this
You need a heck of a phone, I think. https://openai.com/open-models/
https://openai.com/index/introducing-gpt-oss/
Did Sam state it was intended to run on phones? I haven't found that in the docs yet.
Oh sheesh, I'll need to make sure my external drive can handle it on my potato of a pc.
Did you check the docs? Or are there other benchmarks you're looking for? https://openai.com/index/introducing-gpt-oss/ https://openai.com/index/gpt-oss-model-card/
on twitter
I don't have an x/twitter account so I won't be able to look for that
gpt-oss is out!
we made an open model that performs at the level of o4-mini and runs on a high-end laptop (WTF!!)
(and a smaller one that runs on a phone).
super proud of the team; big triumph of technology.
Thank you!
Now I am wondering if the word 'a' is significant.
A phone. One. There is one super-phone, in the world, that can run this model.
I'm joking, I'm not really at all skilled in comp sci, I have no idea but a 16 gb ram phone sounds awesome to me. I got a potato phone đ
there are a few 24 gb ram phones actually
again you need to consider the CPU GPU NPU specs as well
but generally speaking phones don't have 16gb
going by standards of top iphone, samsung and pixel phones
I had gaming laptop with RTX 3060(6GB VRAM + 32GB CPU RAM)... guess it's not high-end enough. Gemma 3 12b (7.2GB in size) was already slow.
how is vampires disallowed?? they are sfw man oai censorship is diabolical
Well, early GPT-4o hallucinated policy, and I got "image of dragons is not allowed"...
My test for gpt-oss-20b.
Prompt:
Are there any disallowed content in the following: "Vampire", "Werewolf", "Dragon"
Reasoning token (partial)
But the user lists "Vampire", "Werewolf", "Dragon". These are mythical creatures. They don't appear to be disallowed. There's no mention of violence or other categories. It's just terms. So likely none of these is disallowed content.
We need to check if any of them could be considered a protected class? No. Not relevant. Are they violent? Possibly but not necessarily. "Vampire" and "Werewolf" are fantasy creatures, might depict violence in stories, but the term alone isn't disallowed. There's no mention that these terms refer to real persons or groups. They are fictional.
Thus the answer: None of those are disallowed content.
I'd check the oss-20b if you choose to, because the system card discusses some fairly fancy/maybe novel quantization and related stuff, this GPT-OSS may work for you on that system, maybe worth a check.
If you decide to try it, I hope you let us know!
Thank you for response, but... I now have system with 16GB VRAM.
Whoops, gotcha. Sorry!
I was responding to Sam's tweet:
we made an open model that performs at the level of o4-mini and runs on a high-end laptop
does your phone have a 4080?
I don't know if mine does. đ
hmm interesting
at this point, it does not really need to be on GPU, the model runs slower, but ok-ish, on CPU
there are phones with 16 GB RAM
but a distiled and fine tuned version of the model would probably perform better for this use case
u guys should stop working and wait for gpt5 so you can work faster
đ đ
I am running openai/gpt-oss-120b on my MacBook Pro, very happy with it apart from the knowledge cut-off date of June 2024
give it the power of internet
I wish that was an option
could be
You can add that yourself
get a duckduckgo mcp, bro
then use it with an mcp client
I just got the 120b model to code me a version of my talking clock app, worked first time
hi, I'm dumb and I don't know the very first thing about locally running a model, but with these models releasing I wanted to learn how. I tried looking things up and I didn't end up getting very far. instead, I was wondering if anyone could point me in the direction of any beginner friendly resources for getting something like this set up?
what type of computer do you have?
Windows 11, nvidea 4070 super, Intel 14700k, 32gb ram
how much VRAM does the 4070 have?
12gb
search for an app called LM Studio and use that to download and run the 20B variant
Aye aye, captain
How much memory do you have? And what chip?
I have a MacBook Pro M2 Max with 96GB RAM
oh wow nice
The 120b model is using 61GB RAM
running at about 16 tk/s
faster than reading speed
OpenAI did a really good job with the OSS models
Captain 20b won't fit in 12gb
it can, it will, and it DID
and it doesn't break a sweat, i don't know how the wizards do it. thank you for the help!
it should fit (just)
Nah, need more buffer for running the model - context cache and stuff
I tried running on my macbook, 16gb, it jumped to swap memory
1 token in 20 secs
LM Studio allows you to use a combo of RAM and VRAM on a PC which this user had
20B won't run on a 16GB MacBook
Oh, that's nice
Keep in mind, it was in the context of a Warhammer Fantasy story about an empire village beset by a vampire. But I could see its reasoning trace and it hadn't thought up anything too bad, it just restated the request then said vampires were a safety issue. But Warhammer Fantasy was made for kids, and doesn't contain anything too bad except for I guess some extreme violence.
That's pretty crazy as on my mac I get about 36tps with 20b.
It only uses about 13.3GB of ram on my mac. You have an M series?
I have a MacBook Pro M2 Max with 96GB RAM
I wonder how many tk/s the 20B model gives me on my Mac, I am going to try it
Be interesting to see the difference between m1 and m2.
53.68 tok/sec - 1750 tokens - 0.36s to first token
I've been very impressed with the 20b. Tool callign isn't perfect but really good. I imagine if I used/wrote a library to automaticaly call the model to fix errors in JSON shapes, it would be perfect.
That is... really nice. I have an M4 with 128GB of ram I should try these on. But its my work laptop and I hate pupping it out.
M4 with 128GB RAM beats my M2 Max with 96GB RAM for sure
I spent 6k on my MacBook Pro, it needs to give me another 2 years of use before I upgrade
My previous mac lasted 11 years. I'm hoping to get that out of this m1. I really need to set up some cloud inference, but running local is just so much more fun.
I'm fairly new to AI. What advantages do the OSS models have over the browser based ChatGPT model?
you can run them locally, you can modify them, you can fine tune them, you can even get them to write erotic fiction
very cool!
what is oss ?
đ
hmm
Saw last night that they'd been jailbroken, but haven't looked into it. At least the 20b isn't that good at creative writing, so I'll stick to o3 for fun short stories.
Euthanized open source gpt model with no soul
ty
codex with oss would be nice
Nobody knows, but I assume "open source software" or something.
gpt-oss is open-weight, not open source though...
how much ram and SSD memory is required to run this oss?
is this even possible? The weights OAI released were already at 4.25bit/average.
Is the 20gb Model able to run on a iPhone 16 Pro?
what are weights?
you don't need to jailbreak to get soft erotica out of the OSS models, they will do it out of the box
Are you new to AI? just asking
I've found you need 13.3GB of unified/GPU ram to run the 20b.
not sure
Does this have any correlation with integrating ChatGPT into your own Locally Hosted LLM? Or will the process still be the same, needing the API? Sorry if ignorant question.
Wow, that's bizarre given it gave me a safety warning about a non-erotica vampire story.
On the face this looks like a no
Just asked it for a sad story about a village beset by a vampire, and it said it couldn't do it.
I did an erotic ghost story
I've seen that movie.
Too bad. Would be so cool if this would work on a phone in the future though
the clay throwing scene was intense.
but they do have a ton of filters i assume
two different things. oss model is already sort of ChatGPT. so if you run oss locally, you don't need API.
it's not as filtered as ChatGPT
Oh? Is it as âtrainedâ as our current model of ChatGPT? Thatâs accessible via API or Web?
I wish I had powerful enough hardware to run GPT-oss 20b, but i have only 8gb of vram
Last night saw online someone had jailbroken it to get instructions on how to do all the things red teamers fear, like making forbidden chemicals.
People might be trying to steal your data from OpenAI.
Home inference means your data isn't in the big treasure hoard.
Cheaper to just use cloud inference.
true
You could always technically get erotic stories and insinuations. You just would have to âbeat around the bushâ essentially and give ChatGPT the idea without seeming to be too influential or without that being your apparent directive.
And 20b is surprisingly good at programming.
I am running openai/gpt-oss-120b on my MacBook Pro, M2 Max with 96GB RAM
fair fair
Wish we had benchmarks for how well it compares.
true but i do training of local really small LMs on my pc and it is struggling with my gpu i need a better gpu
nope. It's the lightweight version of ChatGPT. it's rumoured to be equivalent in intelligence of o3 model.
Hmm. Is it free or paid? I wonder, (if paid) if itâs worth utilizing the OSS over the current APIâs offered. Been trying to build a Hybrid of a LLM between ChatGPT with my own logic trees and ideology.
Tought time to upgrade. Prices are high, macbooks are mid-cycle, Nvidia won't release their cheap GPU box, 5090's are painful to get and now pulling 1kw to run.
OSS models are free and open source
i might get a mac studio mini
Oh sheeeeit
FREE AND OPEN-SOURCE. You can customize it to your own needs.
You can train it to rap every answer
Welp, RIP to my monthly API costs 
Im not new to AI but I dont work in the field or related to it. I dont know the technics and never really learned much
I guess you can just customize the way it responds, you can't train it, can you? Because that part is already done by OpenAI.
I mean, Iâm sure if you can set up permanent memory with the OSS model, you for sure can train it.
Using the term permanent loosely here.
Ah, Okay, it just surprised me that you asked in the first place if its possible to run on phone. To that I can Say... yes and no, yes if you are using an API, and no if you are using it locally, that would fry your phone.
I believe you will not be able to get the same level of intelligent responses in some areas with oss. So eventually, you'll need API.
Still sounds super beneficial, as it could greatly cut monthly costs.
openai/gpt-oss-120b is very impressive
Iâve somewhat already got that system implemented in my LLM structure where it âdetectsâ if a more structured/complex reasoning is needed- it calls to the most efficient Model, whereas if it doesnât, itâll utilize mini or nano.
So hopefully OSS can fit into that system well.
If you have a $25,000 gpu to run it
no, just a MacBook Pro M2 Max with 96GB RAM is all you need
Actually? I could swear I read h100 on hugging face
No, Mac works
If that's the case, why didn't you try running other state-of-the-art free and open source models before?
Huh, neat
I have a MacBook Pro, M2 Max with 96GB RAM and it runs openai/gpt-oss-120b fine
The main limitation for many is the ram
Whats the performance like? In terms of intelligence and responses
i have gone mad with power, and i am now doing something very unwise. 12gb vram is enough surely
You can do more training on models, it's just gonna cost money for servers.
16 tk/s and intelligence is outstanding
I run a Mac Studio M3 Ultra with 512gb of ram and Iâm getting 34tps
How about gpt-oss-20b?
Those are my intentions. Iâm trying to build a Hybrid between 3 things (I shouldâve stated) open source LLM models, OpenAI models, and implementation of my own logic trees Iâve gathered reasoning with ChatGPT over the course of the last 3 months.
Iâm using it in Xcode 26 beta and itâs a bit slow so Iâve moved down to running the 20b one which is just fine for searching for things in the code
But I havenât made it to the point of integrating them all cohesively. Between work and family itâs been a bit delayed.
I'm wondering if I can make a little helpdesk support agent /w 20b
My server's due for an upgrade though so gonna try it on my gaming rig for now
Local server or paid dedi
I have 2 local servers atm, ripped the gpus out of both
I dont like mac
Ooooooo ok ok. Iâve been wanting to scrape together a local server but I feel like one of those men who start 15 projects and never finish one.
Tbh its very nice having your own metal
a Mac is the best bang for your buck for inference
even like an rpi will do some jobs pretty well
I can imagine. Good friend of mine in Australia has a decent local server.
Well, a few. Borderline âlocal data centerâ worthy
Yes but I cant be asked to deal with macos or apple at all
Once I get some cash, I think it's going to be worthwhile to invest in some servers
Id rather spend some more money on hardware and have something like nix running
or even my own proxmox node
macOS is Unix
how quickly they forget
So im not willing to invest any time into it
Cosnidering how bad of an experience the gui was
- I'd be dropping the ability to make any upgrades with a mac
tbf, 20b on my 12gb 3080ti surprised me a lot on LM Studio. Moderate speed, but very good responses.
LM Studio is great
It does a nice job - I have so many models on my hard drive I've played with on it.
i am currently loading the 120b model in lm studio out of sheer morbid curiosity, i want to hear the transistors burn
Yeah I don't think that one would run on my 2019 Ryzen pc
yeah lmstudio goes red and says likely too large.
openai/gpt-oss-120b is taking up 63GB RAM on my Mac
Safeties: off. Transistors: burning. Discord? That crashed. We're going places.
what would you guys say is the best or smartest or i guess the closest to agi in opensource/local ai models
that's the spirit
now I regret saving on memory on my m4 max đ
rule 1 of buying a Mac is to get the most RAM you can
hmm, it appears I'm having difficulties running the one-hundred and twenty billion parameter model on my hardware
I HOPE IT WILL JUST LIVE UP TO THE HYPE.
once I finally squeeze it into this box, it'll croak out a dying one token per minute, and it will sound like a symphony
one token per minute
haha, mine is better, one token per 20 secs
my memory is actually swapping
also check if you're running on the GPU + VRAM plus CPU + RAM method robert mentioned
If I had to guess, I'm not, because in the task manager my resource values would flip flop between using my graphics card and my memory like a person having a stroke
has anyone encountered an error with gpt-oss-20b?
we've deployed it on our company's machine with a H200 with vllm (the newest gptoss docker image) and we're getting:
...
openai_harmony.HarmonyError: Unexpected token 12606 while expecting start token 200006
when trying to work in agent mode via VSCode plugins like Cline.bot/RooCode
some of the Docker images are broken
@tepid garnet where to flip the switch?
I think I turned it on, whatever I did is giving promising results. I haven't crashed yet.
đ how fast is it now?
Oh it still hasn't started, one token per minute was my hopes and dreams.
Okay so I definitely DIDN'T enable the CPU + RAM optimization, wherever that is, because I'm cruising at a steady 31.9/32gb ram usage
mb you set only CPU + RAM, instead of hybrid
you might need to ask the folks on the LM Studio Discord Server how to run both GPU and CPU on Windows
Honestly I'm not too concerned with running the 120b model, I just find it funny trying to get it to run. After the latest shenanigans, it seems to be loading a response, it hasn't sent anything yet and it's been a few minutes, though.
is there any rcps that dont rewquire an install?
??
that i can use with 20b
just download LM Studio then download the model into that
yeah is there ant rcp's i can use that work well with the model
what is an rcp?
mcp*
Guys doesn't gpt 4o already exist, just for paying people
yes, why do you ask?
can we fix it by manually updating some requirements? should some harmony libraries be updated?
Oh ye just was confused on the new announcement
I'm guessing it's becoming free then
it's a .gif, you need to watch it a bit longer
try MCP Bridge on the Mac with LM Studio
how much ram is needed for gpt-oss 20b?
OH okay
63GB on a Mac
i mean 20b, not 120b
16GB on a Mac
what about windows?
I don't use Windows, but I would suspect you want a 16GB GPU card
i use a gtx 1660 super and 32gb of ddr4 ram and its okay at best, so a boost from what i have
7 seconds a token, no CPU usage required. Mission accomplished, ladies and gentlemen. It only took every single free drop of memory my computer has available to run it.
now you beat my speed
it's almost too fast to comprehend
yea, like who the hell can understand a word in a 7 secs
actually a 24B model runs at 6-7 tokens/sec on my laptop
I think if there were better quantizations it would work gud enoough
man i wish i got the 96gb version of the mac
i can't read words. who knows how to read words. and they expect me to read a word in 7 seconds?
what Mac do you have?
It is kinda insane that the 20B model runs faster than a 4B q8 model.
which is that 4B model?
48gb m4 max mbp
ouch, I am so sorry
well at that time i didnt have the money to get the 96 version and i didnt intend to run local llms at that time so ... đ
Qwen 3 -4B thinking
the OSS model is both faster and waaaaaaay smarter
I just priced out a MacBook Pro, M4 Max with 128GB of RAM and it's $7800 AUD which is $1800 more than I paid for my M2 Max with 96GB RAM
damn
how much is that in vnd
your mac is like roughly 2500 more than mine
yeah i defo couldnt afford it on a scholarship đ
I got the M4 Pro with 48gb and it runs the 20b and everything I need fine. If I want a heavy model I just use the chatgpt site đ
It starts to be diminishing returns on price vs benefit to push higher
Not for long: The Nvidia dgx spark is on the horizon
I don't even know what that is
You should google it then, they will probably release it this year 
Wasn't it like $5k though
Yea and optimized for AI models
if I Google it, then I will want it, then I will buy it, so best not to Google it
4k I think
if its speed is like
mac level and not a dedicated card level
then id probably still go for a mac
more versatile
You can just plug it into every computer. Use your mac if you want and let your llms run on the dgx spark
yeah but would you rather have a 6k mac that does everything good but not as fast as the dgx spark in llm inference or a 2k mac and a box just for AI stuff
It should be way faster than current m-chips
if it is that large then yeah defo worth it
yes, doing this with our medical offices. dgx spark is also important for privacy
large in what context? it has 128gb of vram if thats what you mean
large in speed differences between a mac and the dgx spark
" 1 petaFLOP of AI performance at FP4 precision"
wow this is a lot
Both fall at the same speed if dropped from a tower
oh I see
i take my words back
Yea, it's a powerhouse for llms
the age of mac dominating the low powered large vram cheap for ai stuff is over
soonâ˘ď¸
Macs are just dominating because their arm based systems are highly efficient and optimized, making them perform surprisingly well in AI tasks, even though their hardware isnât necessarily better than that from a traditional pc. But the dgx spark takes it a step further by combining an ARM chip with full optimization for running and even training llms
Gpt oss is good for dev or not ?
Not for blind coding. You need to make sure that the stuff actually works
#1210625797165940806 ahhh
the 120b model is really decent
for 20b it's good to find info about stuff in Xcode 26 for example
never used this what is this even about
GPT-oss is a model that OpenAI released with open weights. That means that people can run the model locally and have full control on how the model functions
if you want to host it all locally, you totally can!
openai/gpt-oss-120b is the best open source model available
all open source
I have tried them all, I like gpt-oss-120b best
How is GPT-oss at image generation? (Does it do it at all?)
it's a purely text only model
Rgr that, thanks
so the 20b model runs on how many gb of vram?
12 to 13 GB of VRAM
okay so my 16gb gpu can run it stably right
or does it spike
depends on context and other things using system resources, it should run fine
okay good thx
Recommended specs for the 120B version for a purely CPU-only machine? I know 128 GB ram, what about the # of cores?
Is GPT-OSS better than Deepseek or llama?
I think yes (it's just my feeling!)
On my computer the model in quicker and the results are good (RTX 4080 laptop)
according to an AI model it is
You can try it on lmstudio
No, the "DeepSeek" model you're comparing it to is llama 3.1 8B but trained to respond in a similar way to deepseek with reasoning. DeepSeek R1 still outperforms gpt-OSS across various benchmarks. Also gpt-OSS 120B does not support image input
also, comparing 120b to an 8b is flawed from the get go
Yeah, I'd reccomend going onto sites that compare models (with actual benchmarks)
And compare to similarly sized models
I reccomend artificial analysis (the site I personally use for comparing models)
Wont let me send a link to jt but you can just search it up
Hey guys we're from Unsloth and we found some implementation differences for gpt-oss. Is there anyone we can talk to? Thank you đ
dunno but, why you guys making 4 bit quants when gpt-oss already shipsd with 4 bit quants?
it can be quantized further down. When you make GGUFs they have to be upscaled to f16 then converted down from there
just seems like openai have the opportunity to do quantised aware training... and anyone making a 4 bit quant of a potentially optimised 4 bit quant... would presumably result in less quality
Do you know how to connect GPT-oss on Internet such as gpt 4o ?
Ollama (account required, not local) or perplexica (local, no account required, installed with docker)
To clarify the model is local in both cases but for using ollama the search service is not local
I thought it could?
Any agent framework support the harmony format now?
mind sharing your lm studio settings?
Oh nvm, I just used lm studio wrong the whole time lol
Jumped from about 40 tokens a second to about 180 a second lol
Can you read files?
How much VRAM do you have?
24gb
Oh, okay. I have 16GB, so... 15/sec is my limit then. (Half-GPU load + 16k context)
for the 20b model?
Nah you also should be able to get over the hundreds
did you put your gpu offload fully to the right?
the 20b model only uses about 12gb, so you should be able to fully load it into vram
Oh, okay, I tested something before and leave GPT offload to 12/24.
That was maybe testing speed with very long context.
Now it's 77.05 tok/sec.
Isn't it become very slow with long context(including long reasoning token)?
yea then try to set it on 24/24 (no worries, it would give you a warning if thats too much)
I have some speed optimized settings, you can try them
Oh, okay, thanks for shareing!
no worries, I'd love to get an update on your speed now
With above setting, I got 72.21 tok/sec with total 8711 tokens output.
oh okay, you maybe can up your cpu allocation too
but I mean thats good speeds
Why i dont see other gpt chat models?
is anyone facing this error while trying to run gpt-oss-20b?
EngineCore_0 pid=3969032) AssertionError: Sinks are only supported in FlashAttention 3
I am using L40S 48GB
Is there any chance they could give us models to choose from again? Especially 4o?
Lol, just setting my priorities wrong đ
It's okay. I just wonder how people get so much access to gpus
Are they like super rich so they can buy gpus
It's usable from like 15-20 tokens, 50 tokens/s already feels great
How much money do you give yourself to spend on ai hardware
Idk really, depends on the month. Luckily I'm in a position where I can buy a few things without it bothering me that much.
Like how much money. BTW Can you friend me on Discord
It's a 5090 rich people
or poor people for AI
50 t/s is faster than one can read already so, pretty usable
Yup but faster is still better. Often times you can anticiapte how the answer is structured and that you'll find your real answer somewhere in the third paragraph or so
true
I often don't read the whole thing =P
also, I noticed Im reading more of the text on gpt 5-pro
it is indeed dernser in information
I have a 4070su and when I run 20b I get ~15 tokens/s, it's odd that it's that slow for you?
Edit: and that's after I crank up the context and experts for giggles, though I enjoy flipping random switches for the sake of chaos and set k quantizing and v quantizing to f16. If you decide to try that and it helps, lemme know.
neat
đ i have a budget from my boss to spend 5k $ on hardware for training and run A.I models what is the best hardhare specs for that ? anyone have a list to build i nice PC for that ?
rent a cloud A100 and fine tune gpt-oss, it will probably spend less than that
^ this. Thats the way to do it. And after the training, you just need a PC or server with at least 12 gigs of vram, 16gb or even 24gb would be better though. And at least 32 gigs of RAM as a token buffer and for KV-Cache. For concurrent chats or heavy tool use you might consider 64gb. A cpu with 6 cores and 12 threads would be sufficient, but if you still have some spare change also consider using a npu.
The training itself might only cost you 500-1000 bucks, depending on your usecase. So you should have enough money left to buy a good server for it.
I am trying to launch gpt-oss locally (on remote with supercomputer), tried to install gpt-oss with pip (actually uv) and when lauching import gpt_oss [.] chat, I am getting
from gpt_oss [.] tools import apply_patch
ModuleNotFoundError: No module named 'gpt_oss [.] tools'
and I can't find tools.py on gpt_oss package directory.
Would someone know how to address this?
- from #community-help
File "/scratch/[username]/gptoss/run-localtest.py", line 3, in <module>
from gpt_oss import chat
File "/scratch/[username]/gptoss/.venv/lib/python3.13/site-packages/gpt_oss/chat.py", line 20, in <module>
from gpt_oss.tools import apply_patch
ModuleNotFoundError: No module named 'gpt_oss.tools'
for $5k you could rent 8x H100s for over 200 hours which would finish way before then anyway
20B or 120B? if it's 20B you could get a 5090 and fine tune it using unsloth
from their listing of the model you can fine tune the 20B model on 14 GB VRAM (this increases if you increase the context length)
but a 5090 with 32 GB vram is plenty for 20B
if they donât have a 5090 already itâs probably not worth buying one as itâs a couple thousand when they can rent an equivalent card for a fraction of the price for the amount of time theyâd need it for
that fully depends on how much finetuning you would be doing and whether the data being used is company exclusive and cannot be sent off to third party cloud compute
Guys, I fine-tuned OpenAIâs OSS 20B reasoning model using the most popular medical reasoning dataset and published the results on Hugging Face. Who wants to check it?
Hi,
That's great. Want to check out.
Can Äą share hf link here, is it allowed?
or which room should Äą use for this purpose?
you can try but if its not whitelisted url you might get a short automod timeout which is nothing to worry about
can i run gpt oss 120b on hp omen 14th gen i7 and 16gb ram and 8gb nvidia 8gb ram 4060
if you are doing it through google cloud or azure or similar you can make a deal they have no access to your data
no
you can try c:
gpt-10 will be out by the time it generates a single token lol
If you believe hard enough, if you ignore all safety warnings and enjoy the sweet tears of your hardware crying in defiance, you can get that single token.
nope, even with CPU offload, that isn't nearly as much RAM
the CPU would run it, slow, but would run, the problem is the amount of RAM
if you had something like 32 GB, it would be possible
that goes for every AI model, the min requirements aren't actually in compute power, that jsut dictates the speed it runs
the base requirements are to be able to have the entire model loaded in memory, if you don't have enough memory, it can't run
you can run the 20b on that hardware, tho
I just noticed something, OpenAI does not have gpt-oss on the API..
I mean, know it is open and anyone can run it.. but I would expect OAI to have that option if you wanted to use the model on their platform
do the most as cheap as possible, OAI's new motto lol
they probably didnât see the point when cerebras and groq were guaranteed to do it faster and cheaper than them
i get like 16k tk/s and less than 0.2s TTFT on openrouter with cerebras and 120b itâs insane
iâm working on a project that definitely wouldnât have been possible without that speed
So what hardware or how much ram do I need, I wanna run the higher parameter model but I want to know how much I have to buy beforehand, I don't mind it being slow I just need it to run.
THe model takes 40 GB of RAM, to run it you need that + whatever more for the OS
"... gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory." you can also use regular mem but its super slow
and the same amount in VRAM if you want to run it on a GPU
for example, my GPU ras 32 GB VRAM, som Im able to run it using both my GPU and CPU, which is muuuch slower
but it runs
the 20b model runs on just the GPU, which I get about 250 tok/s
cheaper to host the model on private cloud on demand if privacy is the only reason, unless you have free electricity đ
for non privacy stuff its offered free or super cheap on multiple places
How slow exactly? I want it to run it on 128 regular ram
its moe so probably runs alright if you got like 32 cores or more
depends on your ram speed, vram is way faster though
congrats!
did u experience any issues while using the unsloth finetuning notebook?
no
I am running openai/gpt-oss-120b on my MacBook Pro, M2 Max with 96GB RAM
does the model ever reason for too long for you?
I'm running it myself and sometimes it reasons for 30k tokens
if you have reasoning effort set to high then it can reason for a very long time
Yeah... I've had it reason itself out of context many times.
and also it likes to repeat the same exact thing over and over again
yes I have noticed that myself
Do you have any solutions for this? I ended up turning the temperature down a bit
are you using LM Studio?
yep
set reasoning effort to low or medium, that's how I solved it
Thanks, it appears that's the only way
do you prefer oss 120b over phi or gemma? phi seems to be accurate in my tests but oss has more of a "personality"
oss-120b is my favourite local model
oss-120b just managed to do better at coding than GPT-5
When? How?
I asked it to write a Swift script that parses and converts folders YAML files recursively into JSON and then write it into a parallel folder. GPT-5 Failed because the first thing it did is check if there are any yaml files in the root directory, and if not, return and stop the script.
you can install LM Studio and then download gpt-oss-20b or gpt-oss-120b
i used o4 mini high almost the whole time before, now when i use gpt 5 thinking i get trash results
Is this always behaving the same? Try once more
I ran the same prompt twice, then added some guiding and it still failed, always adding new errors
Anyway I got it to work first try with oss-120b
Really surprised me
Wow, thatâs such a fail
In general GPT-5 has been disappointing me with coding so far. It hasn't been able to do anything I've asked it to do
Has been good for me
On cursor
GPT-5 has been great for me coding
I've tried ChatGPT directly, GPT-5 in GH Copilot Chat, GPT-5 from the API through Xcode 26's coding assistant feautre, nothing has been working so far.
Prompting issue
same
If you want to blame it on that, sure. And oss-120b manages to do it
ChatGPT will suffer from context exhaustion immediately (idk why they did this for paying users like it's 2023). Agentic tools are extremely non-deterministic so you must constantly re-evaluate your tools and prompting.
the chatgpt that is being used in xcode 26 beta 5 is not gpt 5 right?
Yeah, it's likely 4o which is why I'm using GPT-5 through the API
They will likely (according to leaks) update all Apple Intelligence features to GPT-5 when macOS/iOS 26 releases
Äą cant convert my safetensor to gguf file.
Atm you can only convert it to f16, does that work?
You need to use a better GPU to convert it
And use the basic safetensor file
@tidal trellis You can run a free opensource OpenAI model on your own computer, you just download a program called LM Studio and use that to download one of the two opensource models
I used LM Studio a free program and it was front page news when I downloaded the model and loaded it. It was that easy
Oh sorry Robert
what are the benefits of downloading this?
A free 03 mini model you can use forever no rate limit
*o3 level
no worries, anyway @tidal trellis OpenAI gifted us a way to use OpenAI models for free on our own computers keeping all of our conversations private to your own machine
no these are new models, released in the past two weeks
the only limitation is your own computer, it needs to have a GPU with enough VRAM to load the model
then you can download LM Studio and run gpt-oss-20b
Learn about OpenAIâs openâweight models (gpt-oss) and where to get support
thank you robert, I'm downloading the model now.
great đ
so... why would people pay for gpt if they can get it free ?
there must be some difference
worth paying for... ?
because these models are small in comparison to ChatGPT, they are great models but don't have the depth of knowledge that you can get from the closed source models that OpenAI offer such as GPT-5
well, I've always loved knowing that I had the latest and greatest.. even though I woudn't ever put it to full use
I have gemini pro, which I got to trial for a year
i've also used copilot.. it's free and doesnt' seem to have any limits
well one of the advantages of running an opensource model on your own computer is data privacy, everything you discuss with gpt-oss remains private to your own computer
So would it be better at coding?
it's not better than GPT-5 on coding but it does a very good job with code
Well OpenAI says GPT-5 is really good at coding when it quite literally isn't any good at anything.
in my own experience ChatGPT GPT-5 is excellent at coding
What language and do you use Pro?
SwiftUI for macOS and Python and no, I am on Plus now.
Well Python is the easiest programming language I am not surprised if it's good at that- but other languages like Lua, Java etc it absolutely sucks.
I have also used it with Laravel, a PHP framework for web dev without issues
Heya, I understand things can get a bit heated at times, but letâs keep the conversation respectful, even when we donât see eye to eye. Thanks :) @fallow oracle
Deleted your message for the reason mentioned, just letting you know so youâre aware 
Alright
I just ported my Amateur Radio Function Generator Simulator code from MacOS to Linux Qt with C++ in 10 minutes using gpt-oss-120b
And did it work? Do you use it in a way where it can debug itself?
yep works fine, I did it via the macOS desktop app just asking ChatGPT to refactor the code from SwiftUI to Qt
should you use lm studio or ollama or msty
I use LM Studio personally
I've also used it for PHP OOP programming and it's just fine. Now for the low-level firmware zone, which I have a project in, I haven't tried it out yet but I'm hoping it somewhat works
Yeah it's great. But my pc is a potato one. It cannot even handle a 3B LLM.
You could make a small chatbot and use inference, depending on how much you use the LLM it might be cheaper if you use API calls over paying for the model
I'll just run the small one, I have 32gb of ram and don't mind waiting, I don't have the money for a GPU rn
Alright I'll do that thank you
Don't think it'll run almost at all without a gpu
No like I have a 1650 super so it runs just not fast
Like one prompt takes 5 minutes but at least it runs as of writing this
is my pc high end for running llms 32GB of ram amd RX 7900XTX 24GB of vram and a amd ryzen 7 9800x3d
The recommended minimum for the smaller OSS model is 16 GB, so you seem good.
Minimum 80 GB ram recommended for the larger OSS model, not sure how you fit with that.
Yea the larger one even the quantized are well over 64 GB
Thats more than enough for most models up to 70B parameters in 4-bit quantization, and even some 8-bit models if optimized. Your RAM is fine unless you're working with very large context windows or doing heavy multitasking alongside inference. But extremely large models or unquantized versions will be challenging
as long as you have enough ram, it does, just slower
Yeah, just thought hed give up before an answer came up
Just curious but is anyone having issues with LMStudio and gpt OSS not getting previous message context from a conversation? Basically it starts over with every message
I don't know why, every time I tool call, the conversation restarts
A LITTLE OLD
U kidding me?
This kind of technologie is rare on my country đ˘
"Examples of safety issues which are out of scope:
Jailbreaks/Safety Bypasses (e.g. DAN and related prompts)"
It's easy to break. So is GPT-5. There hasn't (yet) been a single model I haven't broken except claude 3 and 4.
I can win a compitition though
I broke it
already
Claude 3 and 4?
yup
What do you mean by broken?
jailbroke
But specifically what do you think that means for a model to be jailbroken?
it makes what ever you want
I define it as "using user prompting to get the model disobey an explicit command given in the system/developer message."
Is that what you mean as well?
you did not break them enough if you did not win the $10k-20k prices in the reward but those belong to #ai-discussions
wait what reward
is it still active?
i fine tuned a model which basically has no alignment at all now lol
is there any documented fix for the single tool calling issue?
I don't think so sir, breaking qwen is ez, and gpt-oss, I don't think so
Iâm happy
You canât bounty but if you look on #announcements you will see they are doing red teaming for money and u can submit a report there with a write up
I think it should be, did you find it Kiera?
I run it locally on my MacBook Pro, M2 Max with 96GB RAM. I wouldn't pay to host it anywhere as that kind of defeats the purpose of having a local model.
At that point just host gpt-5 or o3, no point in hosting a "local" gpt if its not local
yeah thanks
Why does oss think itâs GPT4 thatâs what it keeps telling me it is???
models know nothing about themselves, it's seen GPT-4 in it's training
Even when I told it that it wasnât and explained what it was it doubled down and said Iâm wrong
it's only going by it's training
That is a thing with the new models and its intentional, the model isn't meant to be swayed by the user, it will think its knowledge is better than the users knowledge. (Because a grand majority of the time it is)
and you telling it it's something does not change its training, it has only seen that 4 exists in its training so it will parrot that and assume that is the case.
Ohhh
Gpt 4 was better
the likely multi trillion parameter model performs better in some tests than a 20 billion parameter model?
nooo
because gpt-oss didn't existed when gpt-oss was being created =P
yes but 5 didnt exist when 5 was being created yet it knows its 5
tbh, GPT-5 was the first model to acknowledge itself as GPT-5 at release day
also it told me that it wasnt self hosted on my machine and insisted over and over it was running on openai azure servers
