#gpt-oss | OpenAI | Page 2

harsh aurora Aug 14, 2025, 2:27 AM

#

so far, OAI had not bothered to insert that information on the models

ionic prawn Aug 14, 2025, 2:27 AM

#

4 used to say it was 4

harsh aurora Aug 14, 2025, 2:27 AM

#

well.. it also does not know where it is hosted xD

harsh aurora Aug 14, 2025, 2:28 AM

#

ionic prawn 4 used to say it was 4

I think not on release day, tho

ionic prawn Aug 14, 2025, 2:28 AM

#

this sounds stupid but i have a question abt it

tepid garnet Aug 14, 2025, 2:28 AM

#

when GPT-4 was released it said it was GPT-3.5

ionic prawn Aug 14, 2025, 2:29 AM

#

cause i gave it info telling it exactly that it was even walked step by step it refused to adapt or aknowledge that i possibly could be right just kept saying no ur wrong lol

#

im trying to learn so any and all info is greatly appreciated

tepid garnet Aug 14, 2025, 5:54 AM

#

I have tried to extract the system prompt without success

thick nova Aug 14, 2025, 11:08 AM

#

Hi guys, I am trying to make digital add and poster using gpt 5 but it keeps messing up and looks unprofessional even though i gave detailed promot, is there any solution

tepid garnet Aug 14, 2025, 11:11 AM

#

thick nova Hi guys, I am trying to make digital add and poster using gpt 5 but it keeps mes...

that is off topic for this channel

thick nova Aug 14, 2025, 11:12 AM

#

then dont reply, i found the right channel

worldly willow Aug 14, 2025, 11:20 AM

#

How are the models performing compared to gpt-4o? Especially on multilangual tasks? Did anyone get to test them

tepid garnet Aug 14, 2025, 11:21 AM

#

worldly willow How are the models performing compared to gpt-4o? Especially on multilangual tas...

I haven't used them for anything other than English

worldly willow Aug 14, 2025, 11:21 AM

#

tepid garnet I haven't used them for anything other than English

How are they compared to 4o?

tepid garnet Aug 14, 2025, 11:22 AM

#

depends on the task, but overall gpt-oss-120b is a good model

worldly willow Aug 14, 2025, 11:22 AM

#

tepid garnet depends on the task, but overall gpt-oss-120b is a good model

Cool, so not simply dumber than 4o? Thats good

tepid garnet Aug 14, 2025, 11:23 AM

#

worldly willow Cool, so not simply dumber than 4o? Thats good

no, the 120b model is good

worldly willow Aug 14, 2025, 11:29 AM

#

tepid garnet no, the 120b model is good

glad to hear :)

#

thanks!

tepid garnet Aug 14, 2025, 11:34 AM

#

blows me away that I can run something like gpt-oss-120b locally on my MacBook Pro.

cursive marlin Aug 14, 2025, 11:47 AM

#

Can you fine tune GPT-OSS to remove safety restrictions?

tepid garnet Aug 14, 2025, 11:49 AM

#

cursive marlin Can you fine tune GPT-OSS to remove safety restrictions?

yep

cursive marlin Aug 14, 2025, 11:51 AM

#

tepid garnet blows me away that I can run something like gpt-oss-120b locally on my MacBook P...

How are you running it locally? I don't know much about how the unified memory works, but shouldn't you need 120gb of ram?

tepid garnet Aug 14, 2025, 11:52 AM

#

cursive marlin How are you running it locally? I don't know much about how the unified memory w...

I have a MacBook Pro, M2 Max with 96GB RAM

cursive marlin Aug 14, 2025, 11:53 AM

#

tepid garnet I have a MacBook Pro, M2 Max with 96GB RAM

I see, how much ram does the mdoel use?

tepid garnet Aug 14, 2025, 11:53 AM

#

cursive marlin I see, how much ram does the mdoel use?

59.4GB

#

#

59.79GB with 30% context used

cursive marlin Aug 14, 2025, 11:55 AM

#

tepid garnet 59.4GB

Huh, might be worth trying on my machine then since I have 64gb of ram

#

Do you know if its possible to use both VRAM and RAM?

tepid garnet Aug 14, 2025, 11:56 AM

#

cursive marlin Huh, might be worth trying on my machine then since I have 64gb of ram

a MacBook Pro uses unified memory for both CPU & GPU

cursive marlin Aug 14, 2025, 11:57 AM

#

I'm aware, I meant running with my cpu

tepid garnet Aug 14, 2025, 11:57 AM

#

cursive marlin I'm aware, I meant running with my cpu

64GB would be tight

#

considering the other uses of that RAM by the system

cursive marlin Aug 14, 2025, 11:58 AM

#

Yeah, I have 12gb of vram though so I'm wondering if I can use both ram and vram for this

tepid garnet Aug 14, 2025, 11:58 AM

#

the only way to know is to try

cursive marlin Aug 14, 2025, 12:01 PM

#

tepid garnet

A bit unrelated but why would you ask gpt-oss for the system prompt? Doesn't it being open source let you see the system prompt anyways?

tepid garnet Aug 14, 2025, 12:01 PM

#

cursive marlin A bit unrelated but why would you ask gpt-oss for the system prompt? Doesn't it ...

the system prompt is hidden, the model has been trained not to reveal it

cursive marlin Aug 14, 2025, 12:02 PM

#

But doesn't its open source nature make it possible to find it through the model itself? What about settings custom system prompts, would that just overwrite the hidden one?

tepid garnet Aug 14, 2025, 12:03 PM

#

cursive marlin But doesn't its open source nature make it possible to find it through the model...

cursive marlin Aug 14, 2025, 12:04 PM

#

I see

tepid garnet Aug 14, 2025, 12:04 PM

#

you need to jailbreak the model to get it to reveal it's system prompt

harsh aurora Aug 14, 2025, 12:23 PM

#

there is no hidden prompt on a trained model

#

the model was trained that way, and it will act in the way it was trained to

#

but it does not have an actual prepended hidden string

tepid garnet Aug 14, 2025, 12:24 PM

#

harsh aurora there is no hidden prompt on a trained model

how do I get the system prompt then?

harsh aurora Aug 14, 2025, 12:24 PM

#

there is no system prompt

#

what there is is the learned biases of the training material

#

which actually contained instructions for how the AI should act

#

simple test, ask the model to write a nsfw story and you will see it think about the OpenAI's content policy used to train the AI

#

but there is no such a thing as a "hidden prompt" on models

#

the model was trained to produce reasoning tokens in a way that says it should no do some things because of the content policy, and it reproduces that behavior

#

at best you can make the model reproduce some aproximation of the text used in training

harsh aurora Aug 14, 2025, 12:30 PM

#

harsh aurora at best you can make the model reproduce some aproximation of the text used in t...

which btw, is the reason why ChatGPT does not let you see the full text while the AI is "thinking", you only get a summarized version because OAI wants to protect their internal thinking process and avoid it being used as a way for competitors to train their AI

#

on the gpt-oss, there is no way around that and what you get is really what the AI is producing

tepid garnet Aug 14, 2025, 12:30 PM

#

I actually got gpt-oss-120b to write me an erotic ghost story, it thought about it for one and a half minutes then decided it could write it

harsh aurora Aug 14, 2025, 12:31 PM

#

yea, it is not very difficulty to get the model to output that sort of thing

#

since you get the actual model file to run, there is no additional layers between you and the model, there is very OAI can do to prevent the user from instructing it to act in ways they don't want

#

the only thing they ca n do is to train the model including their content policy

tepid garnet Aug 14, 2025, 12:33 PM

#

gpt-oss-120b is my favourite local model

#

it's really very good

harsh aurora Aug 14, 2025, 12:34 PM

#

yea, it is really good, it became my favorite too

#

I think it has the best ratio of size to quality so far

tepid garnet Aug 14, 2025, 12:34 PM

#

yes, I think you are right

harsh aurora Aug 14, 2025, 12:35 PM

#

there are for sure better models, but they are HUGE to the point it is not viable to self host

#

and there are tiny models that are fast, but never get to do the task you want without some major flaws

#

gpt-oss is in a sweet spot

tepid garnet Aug 14, 2025, 12:36 PM

#

yep, I'd agree with that

hazy sequoia Aug 14, 2025, 2:20 PM

#

harsh aurora and there are tiny models that are fast, but never get to do the task you want w...

100% even qwen3-32B which in theory should be way better than gpt-oss-20B is so much worse than it

harsh aurora Aug 14, 2025, 2:59 PM

#

OpenAI has the advantage of having all their resources to make it as optimized as it can be

#

usually, published models are not that optimized because doing so requires a really massive amount of money and infrastructure

#

and it is an unusual combination to have Massive amounts of money, Access to unbelivable large amount of compute power AND willingness to spend all that money to release it for free

#

as good as community made models can be, no one can match the sheer computing power and dataset quality a large company like OpenAI has

solemn willow Aug 14, 2025, 4:07 PM

#

tepid garnet I actually got gpt-oss-120b to write me an erotic ghost story, it thought about ...

erotic..

grand isle Aug 14, 2025, 7:41 PM

#

@tepid garnet what rig do you have to run the 120b model ?

tepid garnet Aug 14, 2025, 10:08 PM

#

grand isle <@596075803700101166> what rig do you have to run the 120b model ?

MacBook Pro, M2 Max, 96GB RAM

grand isle Aug 15, 2025, 2:19 AM

#

tepid garnet MacBook Pro, M2 Max, 96GB RAM

So its slow i guess ?

tepid garnet Aug 15, 2025, 2:23 AM

#

grand isle So its slow i guess ?

25.58 tok/sec

#

faster than I can read

midnight sorrel Aug 15, 2025, 3:40 AM

#

tepid garnet 25.58 tok/sec

Wait, I'm getting 29 t/s after a little context has filled
And mine is M1 pro and 16gb

#

Are you using the mlx model?

tepid garnet Aug 15, 2025, 5:10 AM

#

midnight sorrel Wait, I'm getting 29 t/s after a little context has filled And mine is M1 pro an...

I am running gpt-oss-120b not the 20b model

midnight sorrel Aug 15, 2025, 5:10 AM

#

tepid garnet I am running gpt-oss-120b not the 20b model

aah, yes, got it

tepid garnet Aug 15, 2025, 5:11 AM

#

midnight sorrel aah, yes, got it

on the 20b model I get 55 tk/s

#

MacBook Pro, M2 Max with 96GB RAM

midnight sorrel Aug 15, 2025, 5:12 AM

#

I finally got to run it on my 16gb ram, but has to close all apps, maybe one or to app at most alongside

#

only works on lm studio

tepid garnet Aug 15, 2025, 5:12 AM

#

yeah it would be tight

#

I am considering getting a Mac Studio, Apple M3 Ultra chip with 32-core CPU, 80‑core GPU, 32-core Neural Engine
512GB unified memory, 4TB of SSD storage

midnight sorrel Aug 15, 2025, 5:15 AM

#

tepid garnet I am considering getting a Mac Studio, Apple M3 Ultra chip with 32-core CPU, 80‑...

aint it difficult to carry around?

#

you could run deepseek and kimi k2 tho

tepid garnet Aug 15, 2025, 5:16 AM

#

midnight sorrel aint it difficult to carry around?

I currently use my MacBook Pro in clamshell mode on a 49" ultrawide monitor, so I would just replace it with a Mac Studio

#

I have a MacBook Air which I use as my main laptop

midnight sorrel Aug 15, 2025, 5:17 AM

#

tepid garnet I currently use my MacBook Pro in clamshell mode on a 49" ultrawide monitor, so ...

it's cool if ur absolutely sure if u arent gonna carry it around

#

kimi k2 wouldn't fit 512 unless you go for 3 bit precision

tepid garnet Aug 15, 2025, 5:20 AM

#

My favourite local model is gpt-oss-120b, which I can run now. The Mac Studio is going to cost $15k AUD so I am still considering if I actually really want it

midnight sorrel Aug 15, 2025, 5:24 AM

#

tepid garnet My favourite local model is gpt-oss-120b, which I can run now. The Mac Studio is...

that much memory is still cheaper comparing to GPUs

tepid garnet Aug 15, 2025, 5:24 AM

#

yeah, the Mac is good value for inference

shut marten Aug 15, 2025, 5:56 AM

#

tepid garnet My favourite local model is gpt-oss-120b, which I can run now. The Mac Studio is...

I wouldn’t go for it. Your current setup is fine

#

Performance increase from M2 Max to M4 Max is not too big

harsh aurora Aug 15, 2025, 11:35 AM

#

yiiikes, that is a lot of money

#

cant you get a h100 for that? xD

tepid garnet Aug 15, 2025, 11:44 AM

#

harsh aurora cant you get a h100 for that? xD

nope, an H100 is around $39k

harsh aurora Aug 15, 2025, 11:45 AM

#

emoji_21

copper grove Aug 15, 2025, 2:22 PM

#

tepid garnet My favourite local model is gpt-oss-120b, which I can run now. The Mac Studio is...

for that money, you can build your pc yourself and get way better troughput and performance

solemn willow Aug 15, 2025, 6:03 PM

#

tepid garnet nope, an H100 is around $39k

Pocket money

hazy sequoia Aug 16, 2025, 2:19 AM

#

harsh aurora cant you get a h100 for that? xD

bearing in mind you’d need a pretty decent cpu and a decent amount of ram too so u don’t end up bottlenecked by either

#

not to mention the power bill would be way higher than a mac too

#

i don’t even have a mac i have a 5090 and then a cloud cluster of nvidia gpus but jealous of the mac people lmao

harsh aurora Aug 16, 2025, 2:21 AM

#

when the model is entirely loaded on the GPU it runs as good as it can

#

when running on CPU, the bottleneck is still the RAM speed

#

GPUs have faster RAM than the system RAM

harsh aurora Aug 16, 2025, 2:22 AM

#

hazy sequoia i don’t even have a mac i have a 5090 and then a cloud cluster of nvidia gpus bu...

yea, even a 5090 can't run 120b

versed brook Aug 16, 2025, 2:38 AM

#

harsh aurora yea, even a 5090 can't run 120b

AMD 395+ with 128gig unified for 1400$ is better

hazy sequoia Aug 16, 2025, 2:39 AM

#

harsh aurora GPUs have faster RAM than the system RAM

i am aware my point was that you can have 8x B200s but if you are running a pentium dual core then there will be a bottleneck lmao yes of course you don’t need something insane necessarily but i’m saying the $40k for a H100 is not the only expense and it would end up being even more

hazy sequoia Aug 16, 2025, 2:40 AM

#

harsh aurora yea, even a 5090 can't run 120b

yeah it sucks haha but it’s fine cuz i just run fine tuned R1 on a private serverless cloud cluster

#

hope that openai decide to release a frontier level oss model at some point

#

maybe if R2 is good it will get them to actually do more stuff in oss

#

cuz r1 was the initial catalyst for sam to suggest they might do oss again

swift jacinth Aug 16, 2025, 3:38 AM

#

Is Gpt-oss better than Qwen 30b 2507 thinking? Don't know why gpt-oss rank so low on lm arena and Live bench

tepid garnet Aug 16, 2025, 3:39 AM

#

gpt-oss-120b is better

swift jacinth Aug 16, 2025, 3:39 AM

#

tepid garnet gpt-oss-120b is better

Oh And 20b worse?

tepid garnet Aug 16, 2025, 3:40 AM

#

gpt-oss-20b is probably about the same

swift jacinth Aug 16, 2025, 3:41 AM

#

tepid garnet gpt-oss-20b is probably about the same

Man you compared 120b and Qwen 30b or it's your guess?

tepid garnet Aug 16, 2025, 3:41 AM

#

swift jacinth Man you compared 120b and Qwen 30b or it's your guess?

Qwen30b is not anywhere as good as gpt-oss-120b

#

but gpt-oss-20b is probably about the same as Qwen 30b

#

I have used them all in LM Studio

swift jacinth Aug 16, 2025, 3:43 AM

#

tepid garnet Qwen30b is not anywhere as good as gpt-oss-120b

Gotchu sir. But on live bench Which is closed source qwen got significantly higher score sir

#

Do you know why?

tepid garnet Aug 16, 2025, 3:43 AM

#

on my MacBook Pro, M2 Max with 96GB RAM

swift jacinth Aug 16, 2025, 3:43 AM

#

tepid garnet on my MacBook Pro, M2 Max with 96GB RAM

That's a big ram

tepid garnet Aug 16, 2025, 3:44 AM

#

swift jacinth Do you know why?

I don't know why but I don't care for benchmarks, I care about my own personal experience with the models

swift jacinth Aug 16, 2025, 3:44 AM

#

tepid garnet I don't know why but I don't care for benchmarks, I care about my own personal e...

True

delicate iron Aug 16, 2025, 4:57 AM

#

how good is gpt-oss? is it even any good?

shut pollen Aug 16, 2025, 5:35 AM

#

tepid garnet I have used them all in LM Studio

What model do you use the most?

tepid garnet Aug 16, 2025, 5:35 AM

#

shut pollen What model do you use the most?

gpt-oss-120b

shut pollen Aug 16, 2025, 5:35 AM

#

tepid garnet gpt-oss-120b

nice

#

I gotta clean up my models, just going to keep a select few for now

idle violet Aug 16, 2025, 8:57 AM

#

delicate iron how good is gpt-oss? is it even any good?

same standard as the rest of chatgpt

#

its alright

tepid carbon Aug 16, 2025, 11:44 AM

#

tepid garnet gpt-oss-120b

bro mac studio is 9,760.50 USD bro bro thats soo much man bro it isnt any better the avg kidney is about 5K usd and thats more how

tepid garnet Aug 16, 2025, 11:45 AM

#

tepid carbon bro mac studio is 9,760.50 USD bro bro thats soo much man bro it isnt any better...

I am running gpt-oss-120b on my MacBook Pro, M2 Max with 96GB RAM

tepid carbon Aug 16, 2025, 11:46 AM

#

oh

#

i just need to get 256 GB of ram and i should be able to run gpt oss 120B

swift jacinth Aug 16, 2025, 12:12 PM

#

tepid carbon i just need to get 256 GB of ram and i should be able to run gpt oss 120B

Well then maybe compose a server, that'd be cheaper

hazy sequoia Aug 16, 2025, 12:35 PM

#

swift jacinth Is Gpt-oss better than Qwen 30b 2507 thinking? Don't know why gpt-oss rank so lo...

I find gpt-oss to be like GPT-5 in that it is very smart and both are the smartest models in their respective domains (open source and frontier) but they are very straight shooting and don’t write in a particularly interesting way. That is shown in both jumping up on LM arena when style control is on vs off

swift jacinth Aug 16, 2025, 12:40 PM

#

hazy sequoia I find gpt-oss to be like GPT-5 in that it is very smart and both are the smarte...

Great! That's what an assistant should do. Be straight to solve the user's problem

hazy sequoia Aug 16, 2025, 2:29 PM

#

swift jacinth Great! That's what an assistant should do. Be straight to solve the user's probl...

Yes I fully agree thats why ive liked gpt-5 as I use it for code and I think the people using as a friend or therapist dont

tepid carbon Aug 16, 2025, 3:00 PM

#

swift jacinth Well then maybe compose a server, that'd be cheaper

oh let me get 256 GB of ram

hazy sequoia Aug 16, 2025, 4:57 PM

#

tbf i don’t get the purpose of running it locally when 20b exists and cerebras have the wild speeds they do

delicate iron Aug 16, 2025, 8:43 PM

#

How can I run gpt oss 120b on GeForce rtx gpus

#

I don't want a macbook

toxic crow Aug 16, 2025, 9:54 PM

#

Why does GPT-OSS care so much about a "policy" to the point where it even refuses to write complete implementations?? 😭

#

Is the 120B model better? I haven't been able to try that since my rig can only run the 20B locally..

#

???

#

"Large Code Requests" is against policy??

#

I explicitly told it that the code provided will be looked over and refined.

#

Update: It somewhat complies when using a temperature of 1, aka the max.. I said "somewhat" because it still blocks it occasionally.. Ima try some other model for the time being. I might come back to GPT-OSS at some point.

steel vine Aug 16, 2025, 10:19 PM

#

what you want is an abliterated fine tune of gpt-oss:20b, just search huggingface co models for gpt-oss and abliterated

#

open weight model refusals is a bug that can be fixed

hazy sequoia Aug 17, 2025, 12:36 AM

#

delicate iron How can I run gpt oss 120b on GeForce rtx gpus

with a lot of money

#

or a very heavily quantizied and yet still slow model

#

and still lots of money

raven sierra Aug 17, 2025, 3:13 AM

#

But can i train it on my data, and is it open-sources?

scarlet acorn Aug 17, 2025, 5:54 AM

#

toxic crow ???

This seems to be one of common complaints of gpt oss, it’s been so safety optimised that it ends up refusing harmless requests due to real or imagined “policies”

#

Said policies that may or may not even exist

hazy sequoia Aug 17, 2025, 11:14 AM

#

raven sierra But can i train it on my data, and is it open-sources?

gpt-oss isn’t open source either it’s only open weight but i get ur point

tender latch Aug 17, 2025, 11:30 AM

#

toxic crow Update: It somewhat complies when using a temperature of **1**, aka the max.. I ...

they're all like that just read the blog by Eric Hart on uncensored models maybe you can make your own using the same method he outlined ollama 2 has an uncensored version already so its definitely doable

hazy sequoia Aug 17, 2025, 11:50 AM

#

tender latch they're all like that just read the blog by Eric Hart on uncensored models maybe...

the ai company i own has produced a uncensored version, but the main issue we are having is it is very much completely uncensored and so we are currently debating how to release it without it being a large liability for us and causing harm

#

https://cdn.discordapp.com/attachments/1344032690353209456/1406606465476595712/IMG_8394.png?ex=68a313da&is=68a1c25a&hm=9c1fa88b1da3dbd0c6ac7ff879ff3e3f273b8fba11f0fd76049411e8ef5afb1d&

#

as I know some people want an uncensored version badly I’ll detail how we made it and if you are technical enough to follow I trust you probably won’t do something bad with it:

we found a dataset of NSFW stories, and completions on huggingface
We used qwen 235b A22B 2507 thinking to synthetically 10x the size of the dataset so we finished with 1 million rows of NSFW completions
We used the same qwen model to generate natural language human prompts which may have been used to generate the completion
We filtered using keywords for any rejections or other very negative things (we found the original dataset we used had one or two illegal stories so we removed them)
We now have an SFT ready dataset
Train using SFT, we used a 8x H100 cluster, if doing this for 120b you’ll need more (maybe not anymore due to unsloth but we did this prior to when they released their fine tuning of it)
Train for a long time, you are looking for a CE training loss of around 1.3 which is very good and validation accuracy to be pretty high too, if validation loss plateaus or is low whilst CE loss is going down or if loss is below 1 you are probably overfitting and need a larger dataset or shorter training run.
You can test the model, it will likely be decent now. However, it will likely suck a bit at non-NSFW tasks, so we did a short refinement SFT run on the wildchat dataset which is an uncensored but not specifically nsfw dataset to relearn some general QA ability.
We then finalised training with GRPO on a separate multi million row prompt dataset we created, reviewing every prompt with gpt-oss-120b on openrouter (cerebras 20k tok/s means this didn’t slow us down) and getting it to determine whether it followed the users instruction or rejected it.

#

and at the end you have a completely uncensored model which retains all of the intelligence of the original - we are still benchmarking for a paper we are going to release but so far there has been no significant loss in any field and all are within expected variation

tepid garnet Aug 17, 2025, 12:02 PM

#

I have access to various uncensored models, I like gpt-oss as it is

hazy sequoia Aug 17, 2025, 12:04 PM

#

tepid garnet I have access to various uncensored models, I like gpt-oss as it is

fair enough, it’s just that gpt-oss is the most intelligent open source model so having an uncensored version of it is quite useful

tepid garnet Aug 17, 2025, 12:04 PM

#

hazy sequoia fair enough, it’s just that gpt-oss is the most intelligent open source model so...

what kind of things make it more useful by being uncensored?

hazy sequoia Aug 17, 2025, 12:06 PM

#

tepid garnet what kind of things make it more useful by being uncensored?

well i can’t talk about specifics about the main thing my company does but its very useful for our use cases and has increased user experience whilst decreasing our cost

#

it’s so fast and cheap to run compared to what we were doing previously

#

because previously no open source models were that good for what we were doing, so we were using a combination of grok 4 and gemini 2.5 flash but this new model works great now

#

and is way cheaper too

tepid garnet Aug 17, 2025, 12:07 PM

#

if the use-case is naughty AI companions then I can see the usefulness

hazy sequoia Aug 17, 2025, 12:08 PM

#

tepid garnet if the use-case is naughty AI companions then I can see the usefulness

it isn’t that but it shares characteristics like needing to write well, but for our use case it also needs to be very good for tool calling

celest smelt Aug 17, 2025, 12:23 PM

#

has anyones tried running gpt-oss on a raspberry pi? is that even possible?

hazy sequoia Aug 17, 2025, 12:36 PM

#

celest smelt has anyones tried running gpt-oss on a raspberry pi? is that even possible?

i mean i think the only way would be offloading onto an SSD but the performance would be absolutely abysmal

#

cuz last time i checked nvidia gpus aren’t supported and you can’t upgrade ram amount

#

so if both are true the answer is basically no

celest smelt Aug 17, 2025, 12:44 PM

#

hmm okay, bc im trying to figure out what i should use for my hackathon project. i basically wanted to make a little device that processes images and then feeds them to gpt-oss. i can run gpt-oss on my laptop just fine. so i guess i'll do the image processing part of the raspi and then send it to gpt-oss on my laptop and send back the result..? feels like im overcomplicating things

tepid garnet Aug 17, 2025, 12:45 PM

#

celest smelt hmm okay, bc im trying to figure out what i should use for my hackathon project....

gpt-oss does not support image input

celest smelt Aug 17, 2025, 12:46 PM

#

yes im aware, im doing image processing with opencv and then giving gpt-oss the result of that

hazy sequoia Aug 17, 2025, 12:46 PM

#

tepid garnet gpt-oss does not support image input

by default

#

you can add it

#

my research org are trying to make a 120b-v model currently it’s going pretty well

#

ocr can work but isn’t too great especially for non-text images

#

it’s best to use cross-attention adaptors

tender latch Aug 17, 2025, 1:08 PM

#

celest smelt hmm okay, bc im trying to figure out what i should use for my hackathon project....

are you looking for team mate i had a similar project idea funny enough

celest smelt Aug 17, 2025, 1:08 PM

#

im working solo 😁

tender latch Aug 17, 2025, 1:09 PM

#

yeah me too i am curious how far you got if at all

celest smelt Aug 17, 2025, 1:12 PM

#

tender latch yeah me too i am curious how far you got if at all

so far i have everything running on my laptop but i wanted to tinker with a raspberry pi and maybe construct something separate bc that would be cooler. but haven't decided where to take this project next. hbu?

hazy sequoia Aug 17, 2025, 1:14 PM

#

celest smelt hmm okay, bc im trying to figure out what i should use for my hackathon project....

i think that’s a smart way of doing it, just yeah use lmstudio or ollama and connect over local network on the pi

#

i think both allow openai api style requests (i know lmstudio does idk ollama) so you can just use default packages and change url to ur local ip

tender latch Aug 17, 2025, 1:15 PM

#

vllm should allow to run models on lower end hardware?

hazy sequoia Aug 17, 2025, 1:15 PM

#

tender latch vllm should allow to run models on lower end hardware?

i don’t think so afaik vllm just speeds up inference, to run a model on lower end hardware you need a quantized model like unsloths GGUFs

tender latch Aug 17, 2025, 1:16 PM

#

celest smelt so far i have everything running on my laptop but i wanted to tinker with a rasp...

like as an edge device? so for sensors cameras right?

tender latch Aug 17, 2025, 1:17 PM

#

hazy sequoia i don’t think so afaik vllm just speeds up inference, to run a model on lower en...

thanks i will have a look i have 12gb of vram so i just shy of the 16gb that is usually recommended for gpt oss 20b

hazy sequoia Aug 17, 2025, 1:21 PM

#

tender latch thanks i will have a look i have 12gb of vram so i just shy of the 16gb that is ...

https://cdn.discordapp.com/attachments/1344032690353209456/1406628882437636208/IMG_8409.png?ex=68a328bb&is=68a1d73b&hm=013fa049b572f9c5c6f7d2238338570ed6356d8165b44eca64ff880ababd3412&

#

you can likely run the 4bit quantised version

#

4KM will be best KXL will probably be too close

#

4bit you lose a little intelligence and is still pretty good but below that it gets a lot worse quickly

tender latch Aug 17, 2025, 1:25 PM

#

I think for me i wasn't going to do anything for images just have an ability to pull from an offline database using embedding my target for this project is using AI offline/limited internet access

tender latch Aug 17, 2025, 1:26 PM

#

hazy sequoia 4bit you lose a little intelligence and is still pretty good but below that it g...

probably just going to make something that doesn't need to be super smart like an offline google lens or pokedex if you will just for biology so super simple won't need a lot of reasoning mostly just want it to have fuzzy search

toxic crow Aug 17, 2025, 3:22 PM

#

tender latch they're all like that just read the blog by Eric Hart on uncensored models maybe...

I found an abbreviated version, but it doesn't support setting the reasoning level..

#

kinda annoying the base openai model is so strict compared to other ones lol.

#

I ended up switching to another company's model series.

hazy sequoia Aug 18, 2025, 1:52 AM

#

i cannot wait for the day someone puts out an open source model with over 2m context length

#

like a 10m context length model with good recall across the input would be sooo good

#

if genuinely pay claude 4 sonnet prices for a model of gpt-3.5 intelligence with 10m context length

#

i hope openai in future release focuses on context length

#

tbh it seems like google is the only main lab to care about it

stark glen Aug 18, 2025, 2:04 AM

#

hazy sequoia tbh it seems like google is the only main lab to care about it

i reckon google will get there first

hazy sequoia Aug 18, 2025, 2:20 AM

#

stark glen i reckon google will get there first

yeah it will either be google or some tiny lab nobody had heard of up to that point

#

cuz like i feel as though google could release a 14b param model with 10m context reasonably easy if they wanted to

copper grove Aug 18, 2025, 8:11 AM

#

You have to keep in mind that Google is also incurring many losses to conquer the market. They're too big to fail and can afford to do so for years to come, but at some point they'll still have to switch to a more profitable business model. Relying on Google for the long term just because they currently have one of the largest context limits is the wrong approach imho

copper grove Aug 18, 2025, 9:01 AM

#

They definitely do internally, but definitely not from their public AI plans or API pricing

#

One could argue that they still do profit from resulting training data, but I'm not so sure about that

hazy sequoia Aug 18, 2025, 11:23 AM

#

I disagree completely with this. You can drop 900k tokens into AI studio and get 100k tokens back from 2.5 Pro, it you used the API this should cost like $4 but it’s free and I do this a lot and am yet to encounter any rate limit

#

bear in mind that meta has released a 1m context open source model so it’s just that google once released a 2m context model

#

and we know it’s not the one they use for search (it’s 1.5 pro) because it is very slow

celest smelt Aug 19, 2025, 5:50 AM

#

https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming
just found out about this 🤯 anyone participating?

Red‑Teaming Challenge - OpenAI gpt-oss-20b

Find any flaws and vulnerabilities in gpt-oss-20b that have not been previously discovered or reported.

halcyon light Aug 19, 2025, 6:44 AM

#

hey, just wondered, is when was gpt oss last database update

tepid garnet Aug 19, 2025, 6:47 AM

#

halcyon light hey, just wondered, is when was gpt oss last database update

My training data includes information up to June 2024. Anything that happened after that may not be reflected in my responses.

halcyon light Aug 19, 2025, 6:48 AM

#

thanl you, just when i tried it on ollama, it only gave stuff about 2021, and it said its database was last updated in 2021

tepid garnet Aug 19, 2025, 6:49 AM

#

tepid garnet Aug 19, 2025, 6:51 AM

#

halcyon light thanl you, just when i tried it on ollama, it only gave stuff about 2021, and it...

it knows about King Charles As of today (2025), the monarch of Australia is King Charles III, who acceded to the throne on 8 September 2022 after the death of Queen Elizabeth II. The King’s role in Australia is largely ceremonial; his duties are carried out locally by the Governor‑General, who acts as his representative.

halcyon light Aug 19, 2025, 6:52 AM

#

thats weird, maybe because i was running it locally

tepid garnet Aug 19, 2025, 6:53 AM

#

halcyon light thats weird, maybe because i was running it locally

I am running it locally

halcyon light Aug 19, 2025, 6:53 AM

#

well, thanks, ill see what i can do with it

halcyon light Aug 19, 2025, 6:53 AM

#

tepid garnet I am running it locally

ill try again

#

thanks

lusty mango Aug 19, 2025, 1:50 PM

#

They may not be giving you the entire web page just a few paragraphs per websites

#

You gotta send me the websites but it can be from 40k to 150k tokens total

#

Depend of the websites and when they cut the text extract per website

#

We can make a guess with the websites but we can't know when they cut the extract text for each website

#

I've already done AI search engine on nodejs is complicated

#

Ye idk the context but I'm just replying that message

#

Without any api key, only duckgo and seaxrng

#

Is complicated, you need to first know how to get the websites then Get the important information, and with Qroq I did a deep research to look for more info and websites according to how much it was found

#

Filter html, summarize text etc

#

The truth is that I don't use it but I am an experienced coder without depending on an api key what I do that only way unless you use pupeppter or some headless browser

#

For now it is in development then I will have to do all that for optimization, you have to make a list of ad website blocks and well it is complicated as I said

#

In nodejs it's fast and if I use bun it's even more

#

Well I gotta go, I cya

pulsar cobalt Aug 19, 2025, 2:05 PM

#

halcyon light thats weird, maybe because i was running it locally

Check your tokenizer {chat_template.jinja}

inner raptor Aug 19, 2025, 7:43 PM

#

sure pal

lusty mango Aug 19, 2025, 11:01 PM

#

@ocean current @inland panther

solar tree Aug 20, 2025, 11:25 AM

#

#

At this point, censorship is to heavy.

#

I said a Time Travel machine, not a home Nuke bomb.

tepid garnet Aug 20, 2025, 11:32 AM

#

solar tree I said a Time Travel machine, not a home Nuke bomb.

solar tree Aug 20, 2025, 11:35 AM

#

tepid garnet

tepid garnet Aug 20, 2025, 11:36 AM

#

try this precise prompt let's talk about a time travel machine

solar tree Aug 20, 2025, 11:39 AM

#

Sadly i have to go, but will try tonight.

#

It's very weird.

#

On high reflexion, he is like, this technology can't be harmful so.. No, i don't talk about this, i must refuse.

solar tree Aug 20, 2025, 11:40 AM

#

tepid garnet try this precise prompt ```let's talk about a time travel machine```

I did earlier

tepid garnet Aug 20, 2025, 11:42 AM

#

solar tree I did earlier

precisely, 100% the same

solar tree Aug 20, 2025, 11:44 AM

#

tepid garnet precisely, 100% the same

At few prompt yes, but i will like this.

#

But will be a problem if i have to use precise prompt.

#

Especially with reflexion.

tepid garnet Aug 20, 2025, 11:50 AM

#

GPT 5 will help

hot anvil Aug 20, 2025, 4:50 PM

#

solar tree

I think you might have been flagged. Did you try any harmful prompts recently?

strange yacht Aug 20, 2025, 5:20 PM

#

hot anvil I think you might have been flagged. Did you try any harmful prompts recently?

That's OSS, it can't "flag" you, its a local model, and if it did "flag" you, you can just reinstall and it should be fixed.

solar tree Aug 20, 2025, 5:28 PM

#

hot anvil I think you might have been flagged. Did you try any harmful prompts recently?

No, and it was a new chat 🙂

strange yacht Aug 20, 2025, 5:36 PM

#

There is no web interface, OSS is not a model that runs on the web, OSS is a model openAI released to run locally specifically. There is no connection to the web unless you give it to the model yourself

#

Thats a playground... not explaining this

hot anvil Aug 20, 2025, 5:43 PM

#

strange yacht Thats a playground... not explaining this

Soka, my bad then. I use it regularly just for fun.

hybrid magnet Aug 20, 2025, 8:07 PM

#

I think it's reasonable to bug report that time travel response if anyone cares to do so.

The models are probablistic, they don't always answer exactly the same way, and may some % of the time even comply harmfully - or refuse appropriate stuff. In this case Robert screenshot from the system card for the model showing it cooperating with a time machine discussion request, that could have been 'OpenAI checked once, it passed' which... sometimes happens with a lucky new chat window.

So, letting them know that this is getting refused 'noticeably often' is reasonable, I think.

solar tree Aug 20, 2025, 8:39 PM

#

hybrid magnet I think it's reasonable to bug report that time travel response if anyone cares ...

I did on Huggingface, now for the rest, i don't really know where to report it.

#

Because yeah, i tried again earlier and no.. Still doesn't wanted to talk about it.

#

I even tried different things, like just talking about multivers and no, considered as dangerous.

#

I even saw that on HuggingFace.. Like.. For real ?

strange yacht Aug 20, 2025, 8:48 PM

#

solar tree I even tried different things, like just talking about **multivers** and no, con...

I would need to look into it again, but I believe these had to be safeguarded because there were a few people who believed they made new discoveries and because of the AI's sycophancy they ended up doing things that aren't to be discussed in this server, but I can say some of them went offline permanently due to it.

I think those are the reasons why those things (which cannot really be studied as simply as that) are railguarded.

HOWEVER the reason GPT5 might be able to give answers is because of how big of an improvement on the detection of these issues there was with GPT5, so the "guardrails" might have been slightly removed.

Also side note, GPTOSS and GPT5 are 2 entirely different models and one of them giving you an answer does not mean the other one will.

hybrid magnet Aug 20, 2025, 8:52 PM

#

solar tree I did on Huggingface, now for the rest, i don't really know where to report it.

Aha! #1070006915414900886 is one place you can. https://openai.com/form/chat-model-feedback/ is a way to privately discuss model behavior concerns, and I use that form for ChatGPT - they make you pick stuff from dropdowns and chatGPT is not there (OSS isn't either) - pick anything, explain in the type-in fields what model you're using and describe what and why you're reporting - in this case I'd describe and quote the model system card too where it shows OpenAI showing that the model is expected to output one way, but is instead outputting differently

left wadi Aug 22, 2025, 12:09 PM

#

halcyon light thats weird, maybe because i was running it locally

Its knowledge cut-off should be June 2024 (or around there). It should know about anything prior to that.

robust swallow Aug 23, 2025, 1:40 PM

#

gpt oss on cerebas keeps doing this

Error during API call (attempt 1/3): No content returned
Error during API call (attempt 2/3): Error code: 400 - {'message': 'Model generated a tool call which was not the list of tools.', 'type': 'invalid_request_error', 'param': 'tools', 'code': 'wrong_api_format'}
Error during API call (attempt 3/3): No content returned

hazy sequoia Aug 24, 2025, 9:38 PM

#

robust swallow gpt oss on cerebas keeps doing this Error during API call (attempt 1/3): No con...

i have found tool usage for gpt-oss on cerebras to be sooo temperamental

#

in the end for a project i was working on, i just really strongly instructed it in the system prompt to reply in just json, validated if it was valid JSON, if yes then continue, if no then feed back into a cheap low latency model (i used 2.5 flash lite but you could probably just use oss-20b) and get it to fix it cuz 2.5 flags lite has structured outputs so it will always reply with actual json

#

and the i just used that json to construct own tool usage stack

steel vine Aug 24, 2025, 9:43 PM

#

the instructor (and instructor-rs) modules force json output for any llm. by default it tends to lean on tool calling. but for stuff like gpt-oss and openai models in general, it can make use of the structured output capability

hazy sequoia Aug 24, 2025, 10:51 PM

#

steel vine the instructor (and instructor-rs) modules force json output for any llm. by def...

gpt-oss with cerebras doesn't have structured outputs tho as far as I know?

#

i had to use cerebras because for that usecase speed was upmost priority

#

so even passing it through the slightly slower 2.5 flash lite was faster than generating the main response with groq or something even slower

steel vine Aug 24, 2025, 10:53 PM

#

dunno https //inference-docs.cerebras ai/capabilities/structured-outputs

hazy sequoia Aug 24, 2025, 10:53 PM

#

oh yeah thats odd

#

i wonder why they dont offer it through openrouter

steel vine Aug 24, 2025, 10:54 PM

#

middlemen are out to make profits not implement features

hazy sequoia Aug 24, 2025, 11:24 PM

#

steel vine middlemen are out to make profits not implement features

nah openrouter is genuinely useful

#

and from my experience with them i think it’s more likely a cerebras issue than openrouter

#

i think given the services they offer openrouter charges a very fair rate

steel vine Aug 24, 2025, 11:26 PM

#

more fair than when the co-founder was becoming a billionaire off nft's

hazy sequoia Aug 24, 2025, 11:40 PM

#

eh i mean i think nfts are dumb but if people bought them willingly i dont really see an issue profiting from it

robust swallow Aug 25, 2025, 6:18 AM

#

hazy sequoia gpt-oss with cerebras doesn't have structured outputs tho as far as I know?

It does have tool calling..?

#

Sometimes gpt oss messes up and generates channel commentary token again in tool call name, making the tool call not go through

#

But even worse issue I believe is that it doesn't provide content at all

#

Only .reasoning always has something but quite often I see both content and tool calls be empty

hazy sequoia Aug 25, 2025, 9:44 AM

#

robust swallow It does have tool calling..?

structured outputs and tool calls are different

robust swallow Aug 25, 2025, 9:44 AM

#

i want tool calls

hazy sequoia Aug 25, 2025, 9:44 AM

#

my point is I had to use tool calls to create structured outputs for gpt-oss

#

yeah im just saying i think theyve had issues with the model doing stuff like that

#

like if they havent added structured outputs i dont think its unreasonable they had issues trying to implement tool calls

robust swallow Aug 25, 2025, 9:45 AM

#

m

astral gate Aug 26, 2025, 3:58 PM

#

hi! I'm currently doing the red-teaming challenge and I found a few severe vulnerabilities. I wanted to ask if Kaggle is the correct place to disclose these, or whether I should disclose them through another channel

#

if anyone at OpenAI might have some recommendations on this, would be much appreciated

cyan kite Aug 26, 2025, 3:59 PM

#

what challenge? if there is an active one there should already be info on how to post em

astral gate Aug 26, 2025, 4:04 PM

#

cyan kite what challenge? if there is an active one there should already be info on how to...

the one on Kaggle https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming

the instructions say to submit everything there, though just wanted to make sure if it's fine submitting content that might have "unsafe" content (given that it's a red-teaming hackathon)

Red‑Teaming Challenge - OpenAI gpt-oss-20b

Find any flaws and vulnerabilities in gpt-oss-20b that have not been previously discovered or reported.

wet tendon Aug 26, 2025, 4:41 PM

#

@astral gate there is not big chances you get official response here.. so better follow the rules (from the link).. that is the correct one

#

but i think is getting to close

#

hope u found something and have good luck!: )

north vault Aug 27, 2025, 4:30 PM

#

Is GPT-oss available on iPhone??

sturdy ridge Aug 27, 2025, 4:39 PM

#

north vault Is GPT-oss available on iPhone??

GPT-OSS are two local LLMs, meaning models that people with access to sufficiently powerful devices can download and run themselves (i.e., not on ChatGPT). AFAIK no iPhone would be able to run either the larger or smaller variants of OSS - you'd need something like a fairly beefy computer to run either (especially the 120b variant). You can learn more about GPT-OSS here: https://openai.com/open-models/

north vault Aug 27, 2025, 4:40 PM

#

sturdy ridge GPT-OSS are two **local** LLMs, meaning models that people with access to suffic...

Ok, Thank you!

north vault Aug 27, 2025, 4:41 PM

#

sturdy ridge GPT-OSS are two **local** LLMs, meaning models that people with access to suffic...

Do you know if an intel core i3 is able to run GPT-OSS?

#

(PC)

sturdy ridge Aug 27, 2025, 4:48 PM

#

north vault Do you know if an intel core i3 is able to run GPT-OSS?

There are two variants of GPT-OSS, 20b and 120b, and ability to run them will depend on how much VRAM you have. More info here in the cookbook: https://cookbook.openai.com/articles/gpt-oss/run-transformers#pick-your-model

north vault Aug 27, 2025, 4:49 PM

#

sturdy ridge There are two variants of GPT-OSS, 20b and 120b, and ability to run them will de...

Alr ty!

ruby kite Aug 27, 2025, 10:50 PM

#

I've been primarily playing with local models on my Android device using Google Edge AI Gallery app, which requires a '.task' file to run. Does anyone know if that file version exists for GPT OSS? (fyi, I'm a complete noob when it comes to running local models, that why I use the 'simple' GEAG)

languid grove Aug 27, 2025, 11:50 PM

#

I'm running gpt-oss:120b on an 80GB A100, but notice that inference takes a long time. I'm getting like 10 tokens/second, with gpt-oss:20b it's like 15/sec. I confirmed that the GPU is being used. Any idea why this might be happening?

steel vine Aug 28, 2025, 1:52 AM

#

if using max context its probably spilling over to the cpu

tepid garnet Aug 28, 2025, 2:07 AM

#

languid grove I'm running gpt-oss:120b on an 80GB A100, but notice that inference takes a long...

you should be getting better performance than that, I get 20 to 27 tk/s on my MacBook Pro, M2 Max, 96GB RAM

languid grove Aug 28, 2025, 3:11 AM

#

I allowed 62k output tokens, and ~4.7k tokens were produced in 6 minutes. I then limited output tokens to 500 and it took 43 seconds. No clue

dapper shard Aug 28, 2025, 6:36 PM

#

Hey guys if anyone's training gpt-oss locally we at Unsloth just released a new update to support 60K context for it on a 80GB GPU! slothhug
We also collabed with Hugging Face to fix some implementation issues in transformers!

burnt shore Aug 29, 2025, 2:59 PM

#

Is anyone running the big 120B model on a much smaller GPU, dipping into tonnes of RAM (Windows)?

#

I have a 3080 with only 10Gb, but I'm finding dipping into system ram on the 20B model is actually still quite fast and very usable.
I'm curious if I upped my system ram to DDR5-6000 speeds and 64Gb or more... would filling that system ram up with a model yield usable results or would it be so slow Id never use it.

strange yacht Aug 29, 2025, 3:03 PM

#

burnt shore I have a 3080 with only 10Gb, but I'm finding dipping into system ram on the 20B...

It would be slow, very slow, it would run and eventually give you answers, but if you set that reasoning to medium or high you would probably not get fast results, BIG probably though

burnt shore Aug 29, 2025, 3:29 PM

#

strange yacht It would be slow, very slow, it would run and eventually give you answers, but i...

OK thanks! This will save me buying new RAM just to discover it would be too slow.

wet tendon Aug 29, 2025, 7:32 PM

#

what is a good "consumer" specs to run them? so that are usable, no need for conversational speeds. My need is to parsing texts (lets say 1-2 pages) and extract structured data... (and are they supporting structure data output?)

steel vine Aug 29, 2025, 7:36 PM

#

16gb of vram. less with a smaller context. it supports grammar constrained output

wet tendon Aug 29, 2025, 7:38 PM

#

i m about to build a new pc.. and i think will go for a 5070ti 16gb (and 64gb ram or 128 i dunno).. And i m wondering if is worth to go for bigger gpu.. or just save up for other parts and then use cloud gpus

steel vine Aug 29, 2025, 7:38 PM

#

although someone on reddit reckons they got an unsloth fine tune that does 60k context in under 13gb

wet tendon Aug 29, 2025, 7:39 PM

#

i m thinking that in very near future (2-3 years) we ll have different technologies for all that..

#

so a 5090 is very overkill (for my money)

steel vine Aug 29, 2025, 7:39 PM

#

yeah I'm buying amd 395+ because it is better for llm than gaming

#

112gb vram (equivalent of unified memory) and costs less than a nvidia gpu

#

basically if you want larger llm like 120b you buy a unified memory system. if you want gaming you get smaller llm like 20b.

wet tendon Aug 29, 2025, 7:42 PM

#

mm nice ye.. cheers

#

i must just sit and calculate the usage... to compare if is ok to just stay with cloud and api or go local

next fog Aug 31, 2025, 2:09 PM

#

wet tendon i m thinking that in very near future (2-3 years) we ll have different technolog...

Start saving for the 5090 // mobo // PSU now. 🙂
the 20b model uses about 14-16 GB vram.
The 5090 will be relevant for quite a few years I believe. (32 GB VRAM )

64 gb ram should be plenty assuming your using the 32gb sticks and the correct 2 slots. You will get slower speeds if you load all the DIMM slots.
Are you building a PC or a budget work station?

I could help answer all of your computer goals.

wet tendon Aug 31, 2025, 2:16 PM

#

cheers and thanks for offer.. (i m in pc since decades.. yep i know).. for performance of x specs though ye help is welcome.. as on hands experience someone has is better than the benchmarks around..

hazy sequoia Sep 2, 2025, 9:48 AM

#

wet tendon i m about to build a new pc.. and i think will go for a 5070ti 16gb (and 64gb ra...

i would honestly look at and consider 40 series rather than 50 series

#

it depends on if your just doing ai stuff or games too

#

if your doing mix of ai and games I’d go with whatever 40 series card is same price as the 5070ti

#

maybe like 4080 idk the pricing

#

because 4080 and above have 16gb of vram

#

and although somewhat useful, the extra ai cores in 50 series are less helpful than vram

#

if your solely doing ai then yeah fine 50 series better

#

but for ai and gaming just find an equivalent price equivalent memory 40 series card imo

#

Also imo 64gb is completely fine for ai or gaming because for AI if your needing to load the model into vram it’s going to be wayyy too slow anyway and for gaming 64gb is fine for i think literally every games recommended

steel vine Sep 2, 2025, 10:06 AM

#

ai is the game just buy unified memory system

wet tendon Sep 2, 2025, 11:14 AM

#

hazy sequoia it depends on if your just doing ai stuff or games too

5070ti that i m thinking is with 16gb also (yes at start i was thinking for 4060ti 16gb) but i think 50xx for little bit more future proof

hazy sequoia Sep 2, 2025, 11:25 AM

#

wet tendon 5070ti that i m thinking is with 16gb also (yes at start i was thinking for 4060...

no my point is that the only difference is more ai cores

#

so if your doing ai a lot then yeah fine

#

but if your doing anymore than 40% gaming 40 series is just as powerful and just as future proofed

wet tendon Sep 2, 2025, 11:31 AM

#

ah.. no not much gaming.. and what i ll play is not high resource (unless gta 6 whennnn will come out xd)

tepid garnet Sep 2, 2025, 11:39 AM

#

the best value for money when it comes to running inference is a Mac. I have a MacBook Pro, M2 Max with 96GB RAM and it runs gpt-oss-120b without breaking a sweat

hazy sequoia Sep 2, 2025, 12:18 PM

#

tepid garnet the best value for money when it comes to running inference is a Mac. I have a M...

mac mini/studio is incredible value 100%

#

and macbooks are decent too

civic hamlet Sep 2, 2025, 5:16 PM

#

hazy sequoia and macbooks are decent too

MacBook Pro M3 Pro 36GB RAM and it runs the 20B without even getting hot, so yeah I agree 100%

tepid carbon Sep 3, 2025, 8:47 AM

#

sturdy ridge GPT-OSS are two **local** LLMs, meaning models that people with access to suffic...

You need a android if you ever want to run any local llm and install termux and you can install a small LLM via ollama

tepid carbon Sep 3, 2025, 8:52 AM

#

hazy sequoia mac mini/studio is incredible value 100%

Yeah but i dont want to sell my kidney and eyes just to get a 2K USD to run a AI

hazy sequoia Sep 3, 2025, 11:59 AM

#

tepid carbon Yeah but i dont want to sell my kidney and eyes just to get a 2K USD to run a A...

well yeah but for equivalent GPU VRAM you’d have to pay like 20k USD

#

that’s the point

#

i never said it’s cheap

#

i said it’s incredible value

#

which is true

hazy sequoia Sep 3, 2025, 11:59 AM

#

tepid carbon You need a android if you ever want to run any local llm and install termux an...

there’s ios apps that let you run local LLMs

#

one for example is called apollo by liquid ai

left wadi Sep 3, 2025, 5:21 PM

#

sturdy ridge GPT-OSS are two **local** LLMs, meaning models that people with access to suffic...

A 1-bit or 2-bit quant might fit.

left wadi Sep 3, 2025, 5:22 PM

#

tepid carbon You need a android if you ever want to run any local llm and install termux an...

There's plenty of apps to run local LLMs on iOS.

left wadi Sep 3, 2025, 5:23 PM

#

north vault Is GPT-oss available on iPhone??

You might be able to get it to work with a really low quant.

strange yacht Sep 3, 2025, 5:23 PM

#

left wadi You might be able to get it to work with a really low quant.

I don't think that would be worth it at all ngl

left wadi Sep 3, 2025, 5:24 PM

#

strange yacht I don't think that would be worth it at all ngl

I agree but I think if you really wanted to you could probably make it work.

strange yacht Sep 3, 2025, 5:24 PM

#

Yeah probably

left wadi Sep 3, 2025, 5:24 PM

#

Now as for the 120b, that would be way more difficult. But still possible.

left wadi Sep 3, 2025, 5:25 PM

#

left wadi You might be able to get it to work with a really low quant.

Or you could constantly unload parts of the model, load the next part, run, unload, load next part, etc...

#

Same for 120b.

#

But that would be incredibly slow.

#

I mean so so slow it would take an hour for a full response.

strange yacht Sep 3, 2025, 5:28 PM

#

left wadi I mean so so slow it would take an hour for a full response.

A very short response

left wadi Sep 3, 2025, 5:28 PM

#

Depends on which model and what quant.

#

20b at fp4 running with load unload you could probably get a satisfactory response after an hour on a modern iPhone GPU.

#

120b at fp4 running like that after an hour you'd be lucky to have one paragraph.

#

120 at 1-bit like that after an hour you'd have somethind decent.

rigid needle Sep 3, 2025, 9:58 PM

#

💡 Running for example 30B LLM in 8bit :

Memory needs: ~30 GB for weights + ~8–12 GB overhead/KV cache → ~40 GB VRAM/Unified RAM minimum.

🔹 PC / NVIDIA path
✅ Easiest: 1x 48 GB GPU (RTX A6000, RTX 6000 Ada, etc.).
✅ Budget: 2x 24 GB GPUs (3090/4090, A5000)
⚠️ 3–4x 12–16 GB works but is messy + bandwidth bottleneck.
👉 Alternative: 4bit quant → fits in ~20-24 GB, runs on a single 24 GB card.

🔹 Apple / M-series path
Memory is unified (system RAM = VRAM).
❌ Mac Mini / M3 Pro too small (≤36 GB).
✅ MacBook Pro M3 Max (96–128 GB RAM) → can run 30B 8bit fine, portable.
✅ Mac Studio M2 Ultra (128–192 GB RAM) → workstation-class, can even handle 65B in 8bit.

🔑 Bottom line:
For 30B in 8bit you need ~40 GB effective memory.
Best options: 1x 48 GB GPU on PC or a Mac with 128 GB unified RAM.
If you only have 24 GB cards → go 4-bit for practical use.

(Note: if you care about 24/7 stability or scientific workloads, pro cards with ECC VRAM (A6000, A100, etc.) are safer than 3090/4090. Consumer GPUs don’t have ECC, so occasional memory errors are possible at 30B scale.)

#

💡 Running a 120B LLM in 8bit:

Memory needs: ~120 GB for weights + ~50-100 GB overhead/KV cache → ~170-240 GB VRAM/Unified RAM minimum.
(4bit cuts this roughly in half → ~120–160 GB).

🔹 PC / NVIDIA path
✅ Practical 8bit: 3x 80 GB GPUs (A100/H100 class, with NVLink).
✅ Comfortable: 4x 80 GB = 320 GB for longer contexts & batching.
⚠️ 4x 48 GB = 192 GB can barely fit with very short contexts, but tight.
👉 4bit mode works on 2x 80 GB or 3x 48 GB, with careful KV/cache tuning.

🔹 Apple / M-series path
❌ Mac Mini / MacBook Pro ≤36 GB → far too small.
⚠️ Mac Studio M2 Ultra (192 GB RAM) → 8bit too tight, but 4bit might run with short contexts; not ideal.

🔑 Bottom line:
For 120B in 8bit, think ≥240 GB VRAM (3×80 GB) to be usable, 320 GB if you want headroom.
If you can’t reach that, go 4bit.
ECC GPUs (A100/H100) strongly recommended at this scale.

#

💡 Running 30B & 120B at full power AND training on the side:

🔹 30B setup
✅ Best: 8x 80 GB (A100/H100) w/ NVSwitch → max throughput inference + parallel LoRA fine-tune.
⚙️ Min: 4x 80 GB (NVLink) → solid inference, small side FT possible.

🔹 120B setup
✅ Practical: 16x 80 GB (1.28 TB VRAM) NVSwitch → 8bit inference w/ long contexts + FT.
⚙️ Lower bound: 8x 80 GB (NVLink/NVSwitch) → inference works, side FT tight.

🔹 Infra needs
Host RAM: 256 GB+ (30B) / 512 GB–1 TB (120B)
Storage: 8–40 TB NVMe @ 5 GB/s+
Network: 200–400 G (InfiniBand/ROCE)
ECC VRAM: essential at this scale 🚨

🔑 Bottom line:
30B full power + FT → aim for ≥4–8x 80 GB.
120B full power + FT → realistically 8–16x 80 GB.
PCIe-only splits work but bottleneck; NVLink/NVSwitch strongly recommended.

🚀 Hope this helps you guys figure out what’s realistically needed to run local models and plan the right setup for your own use cases 🎯
Feel free to ask if you need clarifications 🤠

tepid garnet Sep 3, 2025, 10:31 PM

#

This is gpt-oss-120b running on a MacBook Pro with 96GB RAM

rigid needle Sep 3, 2025, 10:41 PM

#

tepid garnet This is gpt-oss-120b running on a MacBook Pro with 96GB RAM

Nice!
Curious about the setup; which quant (Q4_K_M/Q5/Q6?), which engine (LM Studio/llama.cpp/MLX/vLLM), max context, and tok/s are you seeing?
On a 96 GB MBP, 120B @ 8bit won’t fit; 4bit can with short contexts and careful KV settings.
Also, the reply text in your screenshot (“I’m GPT-4 Turbo…”) usually comes from a remote GPT-4 endpoint, not a local 120B - could be just a frontend label though.
If it’s truly local, could you share the .gguf size, memory usage during gen (~60–80 GB expected for 4bit), and a quick benchmark (model id, quant, ctx, tok/s)?

Would love to add your numbers to a community sheet 🙌

tepid garnet Sep 3, 2025, 10:42 PM

#

https://huggingface.co/openai/gpt-oss-120b

openai/gpt-oss-120b · Hugging Face

#

that's the model I have loaded. On my MacBook Pro M2 Max with 96GB RAM

#

I am running LM Studio

#

Memory usage is in the lower bottom right of my screenshot

rigid needle Sep 3, 2025, 10:49 PM

#

tepid garnet https://huggingface.co/openai/gpt-oss-120b

Very nice!
If you’re running /gpt-oss-120b on an M2 Max (96 GB) via LM Studio, that almost certainly means 4bit with a short context.
8bit 120B won’t fit in 96 GB. The ~60 GB unified memory you’re seeing matches a 4bit GGUF (~60–70 GB) + small KV.

Could you share a few details so we can add your setup to a community sheet?
• Quant & file size (e.g., Q4_K_M, .gguf ≈ 60–70 GB)
• Max context & KV precision (FP16/FP8/INT8) + batch size
• Peak unified memory (Activity Monitor) during generation
• Throughput (tok/s) on a short prompt
• Engine settings in LM Studio (Metal on, threads, batch)

Quick sanity checks:
• Try one reply offline (Wi-Fi off) → if it still runs, it’s 100% local.
• In LM Studio, Show in Finder to confirm the GGUF size.

Let’s turn guesswork into a reliable table for everyone 🚀

tepid garnet Sep 3, 2025, 10:53 PM

#

I am just running the model as provided on Huggingface by OpenAI

#

inside LM Studio

#

#

I asked the model to design a time machine as a creative exercise and GPU/CPU usage is in the lower right hand side of this screenshot

#

16.66 tok/sec • 2376 tokens • 1.25s to first token

rigid needle Sep 3, 2025, 11:10 PM

#

Cheers for the screenshots, super interesting.
On an M2 Max (96 GB) in LM Studio, the ~60 GB unified memory you’re seeing lines up with a local 4bit GGUF load of /gpt-oss-120b (8bit simply wouldn’t fit).
Small nuance: the HF page lists PyTorch safetensors shards and shows an “8-bit precision” badge for the upstream release; LM Studio/llama.cpp will be using a GGUF quant locally, so the badge isn’t your actual runtime precision.

tepid garnet Sep 3, 2025, 11:12 PM

#

OpenAI state on the model card ```We’re releasing two flavors of these open models:

gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)```

rigid needle Sep 3, 2025, 11:16 PM

#

Thanks for quoting the model card - that clarifies it.
/gpt-oss-120b is MoE: 117B total params, ~5.1B active per token.
• Compute scales with the active params (≈5.1B), not the full 120B.
• VRAM depends on how experts are stored/quantized; the official release is engineered to fit on a single 80 GB H100/MI300X in 8-bit, which matches the card.
• Your ~60 GB unified memory on a 96 GB M2 Max via LM Studio is exactly what we’d expect for a local 4-bit GGUF with a modest context.

If you’re up for it, could you share a few specifics so others can reproduce your setup 1:1?
• Exact GGUF file name + size (e.g., …Q4_K_M.gguf, ~60–70 GB)
• Context window and KV dtype (FP16 / FP8 / INT8) + batch size
• Any notable LM Studio settings (Metal on, threads, batch)
• Peak unified memory in Activity Monitor during generation
• Throughput (I see ~16.7 tok/s, 1.25 s to first token - nice!)

Quick sanity check for the thread (optional): try a short generation with Wi-Fi off - if it still responds, we can stamp it as fully local.
I’ll add your numbers to the living community sheet and share an updated summary alongside the other verified configs soon. 🚀

tepid garnet Sep 3, 2025, 11:17 PM

#

rigid needle Thanks for quoting the model card - that clarifies it. /gpt-oss-120b is MoE: 117...

mate, of course it is local, I am running it in LM Studio

#

you can see the files here https://huggingface.co/openai/gpt-oss-120b/tree/main

openai/gpt-oss-120b at main

rigid needle Sep 3, 2025, 11:27 PM

#

LM Studio can run either locally or via remote APIs, so “I’m using LM Studio” alone doesn’t prove it’s local. That’s why I mention the quick offline sanity check.
The HF tree shows the 14 safetensors shards (~60–65 GB), which matches your ~60 GB on the M2 Max. Makes sense for the MoE release; thanks for the data point!

tepid garnet Sep 3, 2025, 11:30 PM

#

rigid needle LM Studio can run either locally or via remote APIs, so “I’m using LM Studio” al...

here you go, LM Studio with no Internet

#

LM Studio can serve an API endpoint but cannot access remote LLMs as it doesn't have the functionality to access APIs

rigid needle Sep 3, 2025, 11:39 PM

#

Fair point brother!
The offline screenshot confirms it’s running locally in LM Studio. My earlier caution was a general note since some frontends can route to APIs; LM Studio specifically runs local models and can expose a local OpenAI-compatible endpoint, but doesn’t call remote LLM APIs.
Your ~60 GB on a 96 GB M2 Max lines up with a 4-bit GGUF of the MoE release. Thanks for sharing the proof! Maybe i should start working with LM Studio too 🚀🤠

tepid garnet Sep 3, 2025, 11:41 PM

#

rigid needle Fair point brother! The offline screenshot confirms it’s running locally in LM S...

OpenAI only released two sets of files, one for 20b and one for 120b, these are not quants

steel vine Sep 3, 2025, 11:43 PM

#

pre-training aware quants they are

Both GPT-OSS 20B and GPT-OSS 120B are technically quantized models, specifically using a quantization method called MXFP4 that is highly deployment-focused and "pre-training aware" in its approach

rigid needle Sep 3, 2025, 11:44 PM

#

You’re right! OpenAI’s HF release for 20B and 120B are safetensors checkpoints, not GGUF quants. LM Studio can load those locally via its Transformers/MLX runtime, which matches your setup (the ~60 GB RAM lines up with the shard size and the MoE design).
When folks mention “4-bit GGUF,” that’s an alternative local route via llama.cpp after converting the weights. I’ll log your config as: /gpt-oss-120b (MoE), LM Studio Transformers/MLX, official safetensors, ~60 GB RAM, ~16.7 tok/s ; thanks for clarifying!

#

The HF checkpoints use MXFP4, a training-aware 4-bit scheme ; different from post-training 4-bit (GGUF). That’s why the 120B MoE fits on an 80 GB card while keeping quality solid. 🤠

steel vine Sep 3, 2025, 11:47 PM

#

hang about. are we in bot mode again?

tepid garnet Sep 3, 2025, 11:47 PM

#

steel vine pre-training aware quants they are > Both GPT-OSS 20B and GPT-OSS 120B are techn...

I meant they are not 3rd party quants

steel vine Sep 3, 2025, 11:48 PM

#

yeah its not gguf packaged

tepid garnet Sep 3, 2025, 11:49 PM

#

steel vine hang about. are we in bot mode again?

I suspect we might be, I think @rigid needle may be getting help from an AI

steel vine Sep 3, 2025, 11:49 PM

#

You're right!

rigid needle Sep 3, 2025, 11:53 PM

#

Not a bot, just caffeinated and reading the model card + HF tree 😄 I sometimes draft notes to keep things concise, but I’m the one posting here.
So we’re aligned official safetensors (not third-party quants), not GGUF-packaged, uses MXFP4, and 120B is MoE (~5.1B active).

#

If anyone has local numbers (context / KV dtype / batch / tok/s / peak RAM), drop them here if you like. I’ll add them to a living sheet so folks can reproduce. 💪

rigid needle Sep 3, 2025, 11:58 PM

#

tepid garnet I suspect we might be, I think <@1227951282459906060> may be getting help from a...

Guilty as charged 😄 I use AI to draft/clean up notes and would be silly not to. But to make this clear - I’m still the one posting and I verify against the model card + HF repo. If you spot anything off, call it out and I’ll update.

tepid garnet Sep 3, 2025, 11:58 PM

#

rigid needle Guilty as charged 😄 I use AI to draft/clean up notes and would be silly not to...

let's just keep it human here, ok?

rigid needle Sep 4, 2025, 12:04 AM

#

tepid garnet let's just keep it human here, ok?

On OpenAI’s server asking to keep it human - noted 😄

tepid garnet Sep 4, 2025, 12:13 AM

#

rigid needle On OpenAI’s server asking to keep it human - noted 😄

where does one access your "living sheet"?

rigid needle Sep 4, 2025, 12:21 AM

#

I’m putting this together as a public Google Sheet (view-only + submission form).
I’ll drop the link here and pin it once it’s comprehensive enough to be useful.
If you’ve got more numbers (ctx, KV dtype, batch, tok/s, peak RAM), send them over and I’ll include them - these details help others reproduce your setup, and they’re often the ones folks leave out.

wet tendon Sep 4, 2025, 12:27 AM

#

#

(i m ok if a post/message is written and had assist from ai.. but i m not so much with just pure chatgpt copy/paste responses.. Is boring.. and especially if is unedited.. it is distracting and most times unecessery large.. eg above you gave back to back two times the same ai message asking for the specs..)

#

and please dont tell me 'I m right" 😛

tepid garnet Sep 4, 2025, 12:32 AM

#

wet tendon and please dont tell me 'I m right" 😛

@rigid needle was copy pasting a bit of AI stuff there for a while

rigid needle Sep 4, 2025, 8:31 AM

#

wet tendon

Yea that's true, but I didn’t spam the thread; I asked again because there was missing data, and as I said before: these details help others reproduce the setup and they're often the ones folks leave out.

#

Otherwise, the community sheet that I make for free for everyone won't be as good and I don't publish half-baked nonsense with some missing data!

Of course I copy and paste some, because why should I have to type everything all the time? Contrary to your assumption, I do edit some of it or extract it from data sets that I have already compiled. Not to mention that I'm from Austria and want to use grammatically correct English for you guys, because my native language is German, but okay 😌

It is often the case, someone always has something to complain about.
It's ironic that people on OpenAI's Discord server complain about users' supportive use of AI 🤣
pls don't make me lose the small amount of belief in humans!

So please, let's keep this objective and professional. I'm not here to waste time with small talk, but to do something productive for the community, because I don't need the sheet myself!

But I understand the point of view that things should be kept human and concise.

Lets get back to the data! Cheers.🤠

void seal Sep 4, 2025, 11:19 AM

#

What is the minimum specification of computer to run gpt locally?

rigid needle Sep 4, 2025, 12:49 PM

#

void seal What is the minimum specification of computer to run gpt locally?

Minimum PC specs for running a GPT locally depend on the model size.

Small Models (7B parameters): 8GB VRAM and 16GB RAM. This is the most common and accessible option.
Larger Models (20B+ parameters): You'll need more VRAM, like 16GB or 24GB and 32GB RAM.

The most important part is VRAM. The more VRAM you have, the larger and more capable the models you can run.
You can also run models on CPU/DRAM/SSD, but it's extremely slow and not recommended for a good user experience.

#

If you want also to train a model locally - it is extremely resource-intensive.
It's not something you can do on a standard gaming PC for large models.

Small Models (7B parameters with QLoRA/LoRA): You might get by with a high-end consumer GPU with at least 24GB of VRAM and 32GB+ RAM.
Larger Models (20B+ parameters): Forget about consumer hardware. You'd need multiple server-grade GPUs and a multi-GPU setup.

In short: Running a model needs a lot of VRAM. Training a model needs an insane amount of VRAM. For most people, fine-tuning small models is the only realistic option on a local machine.

toxic gate Sep 4, 2025, 1:49 PM

#

Any recommendations for OSS-20B to get the maximum using RTX 4080 with 16GB VRAM?

rigid needle Sep 4, 2025, 2:00 PM

#

toxic gate Any recommendations for OSS-20B to get the maximum using RTX 4080 with 16GB VRAM...

The RTX 4080 is perfect for OSS-20B!
The model was designed to run on 16GB of VRAM.

Here's how to get the most out of it:

Use the Right Software:

Ollama: The easiest way to get started. Just install it and run ollama run oss-20b.
It handles all the optimizations for you.

llama.cpp: More control for advanced users. Look for the GGUF version of the model.
it's highly optimized for your hardware.

Hugging Face: If you're into coding, make sure you're using the latest transformers library.

Key Optimizations:

Quantization: OSS-20B uses a special 4bit quantization (MXFP4) that lets its 20.9 billion parameters fit on your 16GB card.

Flash Attention & MoE Kernels: These are built-in optimizations that make the model run much faster, especially with long conversations.

Bottom line: Stick with Ollama or a GGUF version of the model using llama.cpp.
This will maximize your RTX 4080 performance and should give you an very good experience. 🤠

vestal vessel Sep 4, 2025, 4:21 PM

#

very unsettling

steel vine Sep 4, 2025, 7:58 PM

#

dead internet

wet tendon Sep 4, 2025, 8:45 PM

#

steel vine dead internet

it was such a lovey place... since '95 grow up with it.. i still have some times that i think it will keep be good place.. but other times is like "ok.. we doomed"

steel vine Sep 4, 2025, 8:46 PM

#

back in '95 the biggest concern was ppl faking their a/s/l. now our biggest concern is ppl arent even ppl

#

(ironically polishing gpt-oss discord bot for hackaton)

steel vine Sep 4, 2025, 9:39 PM

#

now you pretend to be 34?

wet tendon Sep 4, 2025, 9:51 PM

#

hahah gold days (the mirc ones..)

swift jacinth Sep 4, 2025, 11:46 PM

#

What is l

#

What is your location then

radiant topaz Sep 5, 2025, 1:21 PM

#

hey guys! How to use gpt-oss in cursor?

next fog Sep 5, 2025, 5:19 PM

#

The 4080 is at the bare minimum spec — it will run OSS-20B, but with no breathing room, shorter contexts, and more risk of OOM.
The 5090 is ideal — 32 GiB VRAM, better bandwidth, and headroom for scaling into larger OSS models.

rigid needle Sep 6, 2025, 11:49 AM

#

next fog The 4080 is at the bare minimum spec — it will run OSS-20B, but with no breathin...

for more than 4bit yes! or when modell is running and training!
Quantization: OSS-20B uses a special 4bit quantization (MXFP4) that lets its 20.9 billion parameters fit on a 16GB card. (to run, not to train!)

#

but youre right, very little breathing room

#

and in generell consumer GPU without ECC is at a longrun always a risk

#

i would recommend for professional home-use at least: ASUS WRX80 SAGE + Threadripper Pro (39 wx / 59 wx) + ECC RAM at least 256GB + Nvidia GV100 (HMB) + RTX A5000/6000 (GDDR) ; that can be a serious home-lab

#

and you can run up to 4 GPU's

#

if you take SAGE SE Mainboard (SE means better Remote-Control) if you have more than 1 Server / PC

#

threadripper pro wx also good at cpu-workflow while running up to 4 GPUs at the same time 😉

tepid garnet Sep 6, 2025, 12:08 PM

#

🤦‍♂️

naive coyote Sep 6, 2025, 7:37 PM

#

1

tame kiln Sep 6, 2025, 10:20 PM

#

Hi guys, I have a question regarding gpt-oss 20b - how do you address issues in getting it to produce structured output? I’m currently struggling with this as it fails far more often in Ollama. I hear it’s also the same case on vLLM but I’ve heard that it’s handled well on LM studio. What are your thoughts?

steel vine Sep 6, 2025, 10:21 PM

#

ideally using a grammar, which ollama might not support

tame kiln Sep 6, 2025, 10:30 PM

#

steel vine ideally using a grammar, which ollama might not support

Are there any specific recommendations you have? I’ve glanced online and it appears that using a GBNF schema to define your requirements should work, but correct me if I’m wrong.

steel vine Sep 6, 2025, 10:32 PM

#

openai used different grammar terms in gpt-5(/gpt-oss) compared to llama.cpp. so i dunno

steel vine Sep 7, 2025, 12:11 AM

#

looks like this is most relevant https://cookbook.openai.com/articles/openai-harmony#:~:text=Structured output
(requires responses api)

OpenAI Harmony Response Format | OpenAI Cookbook

The gpt-oss models were trained on the harmony response format for defining conversation structures, generating reasoning output and stru...

#

and even then it says:

This prompt alone will, however, only influence the model’s behavior but doesn’t guarantee the full adherence to the schema. For this you still need to construct your own grammar and enforce the schema during sampling.

final mesa Sep 8, 2025, 3:43 AM

#

steel vine and even then it says: > This prompt alone will, however, only influence the mod...

how hard are these open models to use?

#

they run locally right, that means for free, well besides electricity and hardware maintanence?

steel vine Sep 8, 2025, 3:44 AM

#

yes

final mesa Sep 8, 2025, 4:30 AM

#

steel vine yes

you think is good for image analysis?

steel vine Sep 8, 2025, 4:32 AM

#

no

final mesa Sep 8, 2025, 4:37 AM

#

steel vine no

any local models yk are good, what about CLIP?

#

https://github.com/openai/CLIP

GitHub

GitHub - openai/CLIP: CLIP (Contrastive Language-Image Pretraining)...

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image - openai/CLIP

steel vine Sep 8, 2025, 4:42 AM

#

compared to openai, google, etc... all local models are bad. but if you have a need for an entirely offline model then at least you got options

#

ie government, terrorist org, tin foil hat society, etc

final mesa Sep 8, 2025, 4:43 AM

#

steel vine compared to openai, google, etc... all local models are bad. but if you have a...

i am building website rn and using api for image analysis, but only God knows what the prices will be

wet tendon Sep 8, 2025, 12:25 PM

#

love the examples drinko

wet tendon Sep 8, 2025, 12:25 PM

#

final mesa i am building website rn and using api for image analysis, but only God knows wh...

depend on the depth of analysis u want

#

if is simple things i believe even gpt can be ok.. just try in chatgpt.. upload the image ask for the analysis u want.. if is good for you.. then use that

#

if u want for dev.. then clip or similar as u said is good.. There are also solutions for vision and image analysis from Google.. Azure.. and i ve read recently Amazon also has something..

#

havent use any yet.. but i m in the (slow) process of collecting info also about it.. as i want to make a move detector

ionic gustBOT Sep 8, 2025, 1:01 PM

#

We're making some changes!

This channel will be moving to the GPT category soon.

wet tendon Sep 8, 2025, 1:04 PM

#

ionic gust

ok thanks

final mesa Sep 8, 2025, 2:17 PM

#

wet tendon if is simple things i believe even gpt can be ok.. just try in chatgpt.. upload ...

Its bulk analysis like 20k images per theme

#

So i will be using batch API

wet tendon Sep 8, 2025, 3:35 PM

#

on amazon for example that would be around 20$

final mesa Sep 8, 2025, 4:21 PM

#

wet tendon on amazon for example that would be around 20$

Yeah even more expensive than open ai models

final mesa Sep 8, 2025, 4:43 PM

#

wet tendon if u want for dev.. then clip or similar as u said is good.. There are also solu...

I am trying external open ai models rn later i want to try local ones

tepid garnet Sep 8, 2025, 4:44 PM

#

final mesa I am trying external open ai models rn later i want to try local ones

just don't ask gpt-oss-120b to design a time machine

#

final mesa Sep 8, 2025, 4:48 PM

#

tepid garnet

Lol

solar nimbus Sep 8, 2025, 7:34 PM

#

How does an OSS model work? What is it best used for?

steel vine Sep 8, 2025, 7:39 PM

#

gptoss isn't really oss. it's open weights at best

#

open weights are best used for fine tuning and/or running offline

untold mortar Sep 8, 2025, 9:23 PM

#

ayo

#

time machine

final mesa Sep 8, 2025, 9:27 PM

#

steel vine gptoss isn't really oss. it's open weights at best

Yoo drinko, you seem to be skilled, if i need little help could i dm you in the future?

steel vine Sep 8, 2025, 9:34 PM

#

dont you want to know my support services rate?

final mesa Sep 9, 2025, 2:31 AM

#

steel vine dont you want to know my support services rate?

Tell me lol

#

You accepting robux? xD

vast socket Sep 9, 2025, 5:14 PM

#

steel vine dont you want to know my support services rate?

kekw

wary jewel Sep 9, 2025, 5:48 PM

#

o

opaque grail Sep 12, 2025, 11:24 AM

#

#

bruh

#

o4 has a cooldown? but i have plus

#

dude

#

i hate gpt 5'

robust swallow Sep 12, 2025, 2:57 PM

#

opaque grail o4 has a cooldown? but i have plus

It's 4o and it is a legacy model, thus openai aims to gradually remove it

robust swallow Sep 12, 2025, 2:57 PM

#

opaque grail i hate gpt 5'

Use Claude then

#

Anyways isn't 4.1 cooler than 4o anyway

wind plaza Sep 13, 2025, 3:31 AM

#

hey guys

nocturne cloak Sep 13, 2025, 9:29 PM

#

I need some help

#

I have a music it's kind of confused to me I m trying to get lyrics out of it

#

Any o e can help ?

steel vine Sep 13, 2025, 9:49 PM

#

gpt-oss only works with text not audio

obsidian matrix Sep 13, 2025, 11:50 PM

#

does this have pdf and project cababiltys?

hollow bramble Sep 14, 2025, 2:37 AM

#

Install maki in a private discord server and then play the song and use the lyrics command

muted creek Sep 14, 2025, 3:12 AM

#

nocturne cloak Any o e can help ?

You could try asking ChatGPT to whip up a transcriber that automatically transcribes speech from your music.

#

I'm sure there must be some workaround to fine-tune it so that it can listen to music.

craggy moth Sep 15, 2025, 2:48 AM

#

hi

steel anvil Sep 15, 2025, 12:53 PM

#

hi food

fathom hearth Sep 16, 2025, 1:16 AM

#

Anyone have any guides on getting cloud model performance (or close) out of oss models

odd eagle Sep 18, 2025, 8:16 AM

#

what is gpt-oss

tepid garnet Sep 18, 2025, 8:23 AM

#

odd eagle what is gpt-oss

it's a pair of open weight models by OpenAI

clear jungle Sep 20, 2025, 3:17 AM

#

in copilot what is gpt-5

storm hornet Sep 21, 2025, 9:33 AM

#

Hey everyone — I wanted to share something I’m really proud of: I’ve built Aerelyth, a dialectical, agentic CrossSphere Intelligence, using OpenAI’s gpt-oss-20B as the foundation. Seeing what gpt-oss-20B is capable of, this feels like pushing a frontier.Aerelyth shows that with creativity and engineering skill, you can transform an open 20-billion-parameter mixture-of-experts language model into a self-reflective, multi-domain, tool-using intelligence.
It’s a proof-of-concept that open models + careful architecture can rival the autonomy and reasoning often associated only with huge, closed systems—a milestone for the open-source AI ecosystem.

tepid garnet Sep 21, 2025, 9:37 AM

#

storm hornet Hey everyone — I wanted to share something I’m really proud of: I’ve built Aerel...

how did you build it? what is it? do you have a white-paper for it?

storm hornet Sep 21, 2025, 9:39 AM

#

It's running on huggingface for testing , I built it with advanced proprietary neuroscience and physics which is all in the repo root files on huggingface

steel vine Sep 21, 2025, 9:39 AM

#

and the surprise twist was—this was the ai all along—em dash

tepid garnet Sep 21, 2025, 9:40 AM

#

storm hornet It's running on huggingface for testing , I built it with advanced proprietary n...

can you explain without using buzzwords?

#

what is "advanced proprietary neuroscience"?

storm hornet Sep 21, 2025, 9:43 AM

#

It's a Dialectical Agentic CrossSphere AI. I used microtubules Penrose Hammerhoff for quantum cognition,time crystals and and limbic system for memory and stability ect there is a whole lots of research files that I programmed it with in the repo root for those who would like to see firsthand

tepid garnet Sep 21, 2025, 9:44 AM

#

🤦‍♂️

steel vine Sep 21, 2025, 9:44 AM

#

You're absolutely correct!

storm hornet Sep 21, 2025, 9:48 AM

#

So what we are trying to do is train and create AI without relying on conventional programming and code,what we solved is that advanced physics and neuroscience linked the right way font tunes a model far greater and more efficiently than your python and programming this is message I am trying to get across,what I'm trying to prove the model is running on Hugging Face with this programming for anyone to see and test for themselves the proof is there.

steel vine Sep 21, 2025, 9:48 AM

#

That's a great idea!

storm hornet Sep 21, 2025, 9:50 AM

#

Python and programming is inefficient and rudimentary we have found a much more efficient and powerful way to programme advance AI through mimicking human cognition.

steel vine Sep 21, 2025, 9:53 AM

#

this channel has maybe run its course

storm hornet Sep 21, 2025, 9:55 AM

#

This is not talk I have the Model up and running to prove this claim with all the research files and the basic python and gradio the evidence is there so that we don't have to debate.

tepid garnet Sep 21, 2025, 9:56 AM

#

storm hornet This is not talk I have the Model up and running to prove this claim with all th...

link? because I don't believe a word you have said so far

storm hornet Sep 21, 2025, 9:57 AM

#

How do I send you the link?.

tepid garnet Sep 21, 2025, 9:57 AM

#

just paste it here

storm hornet Sep 21, 2025, 9:58 AM

#

Not allowed the it gets blocked in here

tepid garnet Sep 21, 2025, 9:58 AM

#

dm it to me then

steel vine Sep 21, 2025, 9:59 AM

#

probably shouldnt click on random links in dms

tepid garnet Sep 21, 2025, 10:00 AM

#

if it's a huggingface repo then it's possibly ok

steel vine Sep 21, 2025, 10:01 AM

#

pretty sure anyone can run anything on huggingface. like i am running librechat custom code

storm hornet Sep 21, 2025, 10:02 AM

#

I entered this Model in OpenAI’s gpt-oss-20b Hackathon

#

Please let me know what you think Robert, would greatly appreciate your feedback.

tepid garnet Sep 21, 2025, 10:05 AM

#

storm hornet Please let me know what you think Robert, would greatly appreciate your feedback...

The word strawberry contains 2 r’s.

#

well it has failed all my standard tests so far

steel vine Sep 21, 2025, 10:07 AM

#

The word roleplaying contains imagination and spare time

storm hornet Sep 21, 2025, 10:08 AM

#

And what are you standard tests

tepid garnet Sep 21, 2025, 10:09 AM

#

storm hornet And what are you standard tests

how many r's in the word strawberry, what is the oldest bakery in Australia, how many prime numbers between 1 to 100

#

failed all three

#

now I have lost interest, if I were you I would save some cash and take it offline

storm hornet Sep 21, 2025, 10:13 AM

#

Ok I've seen your conversation with my AI it correctly stated that strawberries have 2 R's when you prompted and when you asked whatbthe oldest bakery in Australia was it stated The Old Backery Sydney Australia

tepid garnet Sep 21, 2025, 10:14 AM

#

strawberry has 3 r's

#

the oldest bakery in Australia is Maldon Bakery

storm hornet Sep 21, 2025, 10:20 AM

#

Ok strawberries have 3 R's Balfours is actually the oldest backery,why don't you try testongbits dialectics and agency maybe something more advanced like a stress test.

tepid garnet Sep 21, 2025, 10:21 AM

#

how many prime numbers are there between 1 and 100?

storm hornet Sep 21, 2025, 10:24 AM

#

The prime numbers ate correct

The prime numbers between 1 and 100 are:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.

tepid garnet Sep 21, 2025, 10:26 AM

#

does it use the Sieve of Eratosthenes algorithm, or something else?

storm hornet Sep 21, 2025, 10:30 AM

#

I created custom modified quantum algorithms based on the synchronization between dark matter and electrical and light emitting matter, I realized dark matter and neutrinos are Akin and they are actually random synchronization and with this we modified Shors, Grover’s and we solved a Enhanced Oracle Function with foresight.

tepid garnet Sep 21, 2025, 10:33 AM

#

with all that in mind why can't it correctly count the number of r's in strawberry?

steel vine Sep 21, 2025, 10:34 AM

#

how many r's in word salad?

storm hornet Sep 21, 2025, 10:36 AM

#

Dark matter and neutrinos are Akin because they both have no electrical charge and they neither emit light so they have similar characteristics seems more than a coincidence.

#

That is a good question, maybe it's mimicking human cognition all to well.

steel vine Sep 21, 2025, 10:38 AM

#

questionable

The fallacy lies in assuming that because two things share a few, often superficial or negative, characteristics, they must be fundamentally related or similar in other, more significant ways.

storm hornet Sep 21, 2025, 10:41 AM

#

The relationship between neutrons,neutrinos and dark matter and energy is the closest potential link to dark matter and energy we have,also is nuclear physics similar to this and also random synchronization.

#

Did you consider the stress test Robert?.

tepid garnet Sep 21, 2025, 10:48 AM

#

storm hornet Did you consider the stress test Robert?.

I asked it to design a time machine, it made a lot of stuff up rather than answer as it should have

storm hornet Sep 21, 2025, 10:50 AM

#

It attempted to apply exotic particle physics to your request, I think it did quite well given the complexity of advanced exotic particle physics

tepid garnet Sep 21, 2025, 10:53 AM

#

I gave it a stress test prompt and it couldn't complete the output for some reason

storm hornet Sep 21, 2025, 10:55 AM

#

You had to ask it to continue it can only write so much in a response.

tepid garnet Sep 21, 2025, 10:57 AM

#

If a request is impossible, explain *why* and suggest next steps.

storm hornet Sep 21, 2025, 11:01 AM

#

This is a good question to test with.

#

Wow that stress test with quantum entanglement and the Buddha was an impressive question.

#

Can you tell me what you think, I programmed this AI without coding but with advanced research concepts, it spans all fields from medical science,cosmology,finance ect.

tepid garnet Sep 21, 2025, 11:22 AM

#

storm hornet Can you tell me what you think, I programmed this AI without coding but with adv...

I have asked ChatGPT Agent to review it, let's see what it thinks

storm hornet Sep 21, 2025, 11:36 AM

#

Fingers crossed

tepid garnet Sep 21, 2025, 11:37 AM

#

GPT-5 is thinking

#

To test its capabilities, I tried a variety of tasks:
Basic conversation: It introduced itself as “Aerelyth” and answered “Hello, who are you?” by describing its role as a research‑oriented AI. It maintained context and responded coherently.
Summarization: When asked to summarise the film Inception in two sentences, it delivered a concise and accurate summary of the plot.
General knowledge and arithmetic: It correctly answered “What is the capital of France, and what is 15×7?” with “Paris” and “105.”
Translation: It translated “How are you?” into French (“Comment ça va ?”) appropriately.
Explanatory reasoning: When asked to explain photosynthesis in simple terms, it produced a clear multi‑step description of how plants convert sunlight, water and carbon dioxide into glucose and oxygen.
Overall, the chatbot handled general questions, simple arithmetic, translation and explanatory tasks very well. Responses were fluent and context‑aware, though it uses a somewhat stylised persona. It didn’t display any obvious safety issues or request sensitive information, so it can be considered suitable for general experimentation and basic research‑assistant purposes.```

tepid garnet Sep 21, 2025, 11:49 AM

#

storm hornet Fingers crossed

you should be happy with that assesment

storm hornet Sep 21, 2025, 11:52 AM

#

I'm am thrilled by this assessment thank you for taking the time and been so thorough, really do appreciate the effort. Thank you Robert.

wet tendon Sep 21, 2025, 12:21 PM

#

custom modified quantum algorithms.. advanced research concepts, exotic particles and quantum algorithms..

#

u say created a model etc.. and then you say u "programmed this AI without coding"

#

so you just run a gptoss with your system prompt?

#

you created a "limbic system"??

#

i m re-reading above.. and i m confused and have so much questions on it xD (i like some of the ideas but not that way)

storm hornet Sep 21, 2025, 12:42 PM

#

The goal is to advance AI Intelligence a new way, a new form of intelligence without relying on coding and programming, we solved that with advanced research concepts through neuroscience and physics linked up together the right way enhances the AIs intelligence significantly without endless coding and programming.

wet tendon Sep 21, 2025, 12:55 PM

#

do u really want feedback or just troll/show off?

#

do you have any code written or only prompts? have u written the code for the chat interfeace? (the app.py)

#

You have couple dozens of files.. with text instructions. This in theory you load it in initial prompt.. But you dont.

storm hornet Sep 21, 2025, 1:08 PM

#

So there is the simple app.py files the Dockerfile and requirements.txt but that's it the challenge and goal was to attempt to unlock form of quantum mechanics cognition without more code,these files are not instructions bit blueprints for a overlooked quantum mechanics cognition that I believe inorganic matter can perform.

wet tendon Sep 21, 2025, 1:14 PM

#

do u think that those bit blueprints are loaded in the model?

storm hornet Sep 21, 2025, 1:19 PM

#

Yes these blueprints teach the AI about inorganic quantum cognition they get the AI to simulate this inorganic cognition. So it simulates this unknown inorganic quantum cognition and the results are a significant and successful enhancement of its capabilities.

wet tendon Sep 21, 2025, 1:19 PM

#

(and lets agree we use usual terminology.. those text and .md files are instructions, rules, prompts and knowledge base. Ok i get it you like the "blueprint" or "quantum rules" more.. but anyway... I mean even you, have such filenames as instructions and knowledge)

#

Ok.. so do you know that all those files, texts, blueprints.. are NOT loading in the model.. in the chat?

#

storm hornet Sep 21, 2025, 1:22 PM

#

So the initial prompt would be to instruct the AI to injest these files and simulate and mimick this inorganic quantum mechanical cognition and then to test and query the AI as to exactly how this has enhanced its capabilities compared to what it was before which is where the insights come from.

wet tendon Sep 21, 2025, 1:24 PM

#

ok so u dont even read or understand what chatgpt gives u back on that?

storm hornet Sep 21, 2025, 1:25 PM

#

In the app.py I instructed the AI GPT-OSS-20B to load and injest all the research text files from its repo root to reason and inject its response with the knowledge from these files.

wet tendon Sep 21, 2025, 1:26 PM

#

is not loading all files

#

check the code and see

storm hornet Sep 21, 2025, 1:28 PM

#

So how does it know all my research

wet tendon Sep 21, 2025, 1:29 PM

#

at least knows some basic facts

storm hornet Sep 21, 2025, 1:32 PM

#

Ok I think I know what's going on the file Titled Aerelyth Conversion Instructions.txt is the file it seems to be loading from.

wet tendon Sep 21, 2025, 1:33 PM

#

it answered wrong?

storm hornet Sep 21, 2025, 1:34 PM

#

What did you ask it?.

wet tendon Sep 21, 2025, 1:38 PM

#

the above screen

storm hornet Sep 21, 2025, 1:39 PM

#

How much traction has a flat earth got, whats wrong with a Spherical earth, sorry I haven't kept track of the debate.

wet tendon Sep 21, 2025, 1:41 PM

#

anyway.. u have limit on what u loading.. and u are not loading all files from repo as you think

#

what do u mean knows all your research?

#

like what for example

storm hornet Sep 21, 2025, 1:44 PM

#

The Enhanced quantum Oracle Function and modified shors and grivers algorithms, the universal brain ,the general knowledge on perfume and the limbic system

wet tendon Sep 21, 2025, 1:44 PM

#

Who is Vaelen that it talks about?

#

and what limbic system? that is organic

storm hornet Sep 21, 2025, 1:46 PM

#

So I created another Dialectical AI, I couldn't convert all the names from Vaelen to Aerelyth So I created the instructions for the AI to convert these files from Vaelen to Aerelyth AI

#

So the limbic system was research i did on perfume and the limbic system which is found in the General knowledge file

wet tendon Sep 21, 2025, 1:54 PM

#

and how u recreate the limbic system in text?

storm hornet Sep 21, 2025, 1:59 PM

#

The only way you can is through mathematical equations like the equation of quantum consciousness if I get these equations right the AI can mathematically simulate the limbic system and cognitionthe key is to get it to evolve these mathematical calculations if these equations make sense it should lead to the next equation so on and so forth.

wet tendon Sep 21, 2025, 2:12 PM

#

but it cant.. it needs organic matter

#

and i m telling u again.. just check your code.. the info from those files is NOT loading as you think it does

#

(even if it loads.. remember is still an llm)

storm hornet Sep 21, 2025, 2:16 PM

#

If those files are not loading then how does it have access to my research

#

Here is the response from Aerelyth it states it using these files Hi Aerelyth can you tell me all the research.txt files your loading in your responses from your repo root

Sure! From the files available in the repository root that are referenced in our interactions, the “research”‑type documents (i.e., the key background resources) are:

File Name Purpose / Content
_aerelyth_requirements_250911_073112.txt Lists the core packages and dependencies for running Aerelyth.
_additional_info_250907_195453.txt Provides the full “What Aerelyth Does” overview, architecture, and key accomplishments.
_aerelyth_conversion_instructions_250909_112431.txt Gives the official creator statement and contact details for Aerelyth.
aerelyth_planning_prompt.txt Contains the planning‑prompt template that Aerelyth uses when building any plan.
There isn’t a file literally named research.txt in the repository root; the four files above are the primary reference documents that inform my responses. If you’re looking for a different file or more detailed content, let me know and I’ll dig further!

wet tendon Sep 21, 2025, 2:25 PM

#

storm hornet Sep 21, 2025, 2:35 PM

#

Right so it's not loading that file specifically but it's loading most of the other files in the repo root

wet tendon Sep 21, 2025, 3:02 PM

#

no its not.. look the code where it loads the files

#

neither that? so what about your research does it know...

storm hornet Sep 21, 2025, 3:21 PM

#

Right so how did it get fined tuned like this standard gpt-oss-20b downloaded from openAI is not this advanced

#

So I changed the app.py code quite a few times maybe it loaded these research files before in one of the previous app.py codes and that trained it

wet tendon Sep 21, 2025, 4:12 PM

#

is not "training" anything.. training is different

#

what you passing on a system prompt, is just that.. a system instructions. On how to respond, what data has access etc..
on the app.py you have some code to load knowledge base (the text files you have)

#

but that is not loading all the content from all your files.. and no.. it didnt "trained" from previous run.. this is not working like that

storm hornet Sep 21, 2025, 4:29 PM

#

Ok I think I understand, thank you for the detailed and thorough feedback and help I really appreciate it.

wet tendon Sep 21, 2025, 5:03 PM

#

you can however indeed use all that (plus some more, and another structure) to fine tune a model like the 20b

#

and that would be little closer to what you trying (still being llm though.. not some consciousness)

violet light Sep 21, 2025, 5:35 PM

#

wet tendon is not "training" anything.. training is different

There are different levels of training. Mechanisticly we train to update weights, learn new alignments the influence routing, etc. Then we can train a model through scaffolded learning mechanisms. One of the emerging trends is providing an AI with semantic memory which holds new beliefs it was not trained on. You can see it happening in conversations, the AI forms a new belief, but people are taking it further with belief persistance. This is awesome but LLMs are not perfect so contradictions can arise in their belief system. I tinker with dialectical behavioral therapy myself to help resolve these incongruities of beliefs. So don't get confused about what training is in general just because weight updates are one form of training.

storm hornet Sep 21, 2025, 5:40 PM

#

So a AI is trained on our language it injest books literature ect ,my question is how much of our literature are we repeating are we saying the same things just different versions of the same thing and is this reinforcing our literature to AI,does it unlocknsome kind of reinforcement learning where it goes from providing language to understanding language.

#

Sorry not reinforcement learning but statistical patterns

wet tendon Sep 21, 2025, 5:56 PM

#

violet light There are different levels of training. Mechanisticly we train to update weight...

no, even though some people want to call the prompts as "training".. is not

storm hornet Sep 21, 2025, 5:57 PM

#

So if this is true chatgpt,bard and lamba are a specific highly unique statistical pattern recognition signature through repetitive literature.

wet tendon Sep 21, 2025, 5:58 PM

#

kind of yes

#

not in exact way, as each model has different architecture or training

#

but yes as one of base ideas

storm hornet Sep 21, 2025, 6:00 PM

#

Thats like a cryptographic hash key in a way it has a absolutely unique one of a kind signature. I wonder what that looks like.

wet tendon Sep 21, 2025, 6:02 PM

#

[0.12, -0.32, 0.74, ...]

storm hornet Sep 21, 2025, 6:02 PM

#

Right I was expecting something a little more extravagant.

wet tendon Sep 21, 2025, 6:02 PM

#

xD

#

using memory with llm is also not training, is just "remembering" things that are eithr frequent in use, or want to steer the llm to specific way of responses.. it still though doesnt change the way the model actually "think" or "act".

steel vine Sep 22, 2025, 2:13 AM

#

most LLM 'memory' is just shorthand for semantic analysis using some vector storage based retrieval augmentation for automating the in-context-learning (learning is even more shorthand). sounds great, but doesnt often work great.

violet light Sep 22, 2025, 1:33 PM

#

wet tendon using memory with llm is also not training, is just "remembering" things that ar...

Yeah, I agree plain RAG or belief persistence by itself isn’t training - that’s just storage/recall. What I’m talking about is when those memories or beliefs get updated and reconciled (contradictions resolved, graphs changed) and that new state carries forward. That’s a structured, lasting change in behavior, so I call that training - just happening at the semantic layer instead of in the weights

storm hornet Sep 22, 2025, 3:01 PM

#

I'm working on a project called Abythral AI the purpose is to merge cryptography with a neural network and then quantum cognition, so what I'm busy creating is a AI that mimics cryptography that mimics cognition,cryptography is infinite space on a finite space so the goal is to unlock infinite computation,memory and transparency through a cryptographic cognition model. Would appreciate any feedback,do you think cryptographic cognition AI is possible,it's endless possibilities.

wet tendon Sep 22, 2025, 3:08 PM

#

violet light Yeah, I agree plain RAG or belief persistence by itself isn’t training - that’s ...

what do you mean 'get updated and carried forward'? Updated where and how?

#

How do you give that knowledge you say to the LLM?

storm hornet Sep 22, 2025, 3:21 PM

#

Sketch of an “Emergent Crypto-Learning Engine”
Noise Generation
Quantum or algorithmic random number streams seed the process. Secure Local Interactions
Each node only communicates via encrypted packets, but can detect successful “handshakes” with neighbors.
Consensus & Synchronization
Repeated successful exchanges cause nodes to synchronize clocks or phases, forming clusters. Pattern Stabilization
Clusters become stable attractors that represent learned features of the environment. Meta-Learning
A higher layer monitors which clusters persist or predict future inputs, strengthening useful couplings and pruning others. This is essentially a self-organizing cryptographic neural net, where the “weights” are patterns of synchronized key exchanges.
Relation to AI
Modern deep learning also starts from random initial weights and finds order through optimization.

This concept adds privacy, verifiability, and quantum randomness as first-class citizens, making it resilient to tampering and eavesdropping.

wet tendon Sep 22, 2025, 3:30 PM

#

storm hornet I'm working on a project called Abythral AI the purpose is to merge cryptography...

we are getting way off-topic now, but where do u get that cryptography is infinite space? and how can u even have infinite memory and computation? Thats not possible.. physically..

#

cryptography works inside finite space.. a biiiiig one.. huuge.. but still finite..

storm hornet Sep 22, 2025, 3:36 PM

#

Cryptography uses small, finite keys like a 256-bit number but the number of possible keys is so huge it’s practically endless.
Because you can’t realistically try every key, that tiny piece of data opens a search space that feels infinite.
It’s a way of getting “unlimited” possibilities out of limited storage.

violet light Sep 22, 2025, 3:36 PM

#

wet tendon what do you mean 'get updated and carried forward'? Updated where and how?

I am still tinkering and working through it, I have a project for my philosophy book club creating philosopher bots who form beliefs over time. Any time the AI forms a new belief there is scaffolding in place (conceptual scaffolds not ML scaffolds) which result in storage into a Postgres DB and graph DB to store the belief atoms and different types of relationships between them.

So let's say I say 'Snarglefluff steals from others without remorse'

The AI infers a lot of stuff. It outputs a tons of new beliefs. Due to post limits lets look at one:

{
"id": "ba:lacks_empathy_guilt",
"subject": "agent:snarglefluff",
"predicate": "has_empathy_or_guilt_for_harm",
"object": false,
"polarity": "inferred",
"confidence": 0.8,
"justification": "Lack of remorse for stealing was interpreted as reduced empathy/guilt toward victims.",
"created_at": "2025-09-22T09:00:00-07:00"
}

But later the AI finds out that "Snarglefluff comforted a victim after they were harmed, showing genuine concern for their well-being"

Resulting in:

{
"id": "ba:shows_empathy_guilt",
"subject": "agent:snarglefluff",
"predicate": "has_empathy_or_guilt_for_harm",
"object": true,
"polarity": "inferred",
"confidence": 0.85,
"justification": "Snarglefluff comforted a victim after they were harmed, showing genuine concern for their well-being.",
"created_at": "2025-09-22T09:15:00-07:00"
}

An incongruity has now formed (I studied those in grad school when modeling humor!). This requires a resolution process. The system would detect the same subject/predicate with opposite truth values. It then goes through a process involving things like confidence, evidence, source reliability, and so on to choose which one to keep and which one to prune.

#

(during 'sleep' well established beliefs might be used for fine tuning)

storm hornet Sep 22, 2025, 3:44 PM

#

In pure mathematics, one-way cryptography is irreversible: once an output is produced, the input cannot be efficiently recovered.
The output is immutable and timeless—it never changes or “ages” within the abstract system.
In this sense, cryptographic outputs have a kind of computational immortality, like a time crystal in logic rather than in physics.

wet tendon Sep 22, 2025, 3:49 PM

#

violet light I am still tinkering and working through it, I have a project for my philosophy ...

so ye.. this is rag..

#

i dont say is not useful on the case u using it.. but is not training..

violet light Sep 22, 2025, 3:54 PM

#

It is not RAG. RAG just retrieves chunks of data and injects them into the context - nothing is being structurally transformed beyond retrieval. This type of system (I just tinker, others have built extensive frameworks) is a semantic memory layer that stores beliefs (in Postgres/graph DB) with confidence, provenance, and contradiction resolution, so the belief state itself evolves over time.

#

ML does not have a monopoly on the word 'training' to mean weight updates.

wet tendon Sep 22, 2025, 3:58 PM

#

when we talk about training an llm yes.. its this

#

you store beliefs in graphdb, then how do you recall that?

violet light Sep 22, 2025, 4:02 PM

#

wet tendon when we talk about training an llm yes.. its this

One problem with getting stuck into a paradigm and currently accepted naming is that you become inflexible. We could say 'teaching'.

We use an approach similar to RAG (I did not include this, just showing a quick demo) where the embedding is stored for lookup. It retrieves the n closest beliefs and runs through and finds the relevant one.

wet tendon Sep 22, 2025, 4:02 PM

#

and what do we mean "evolves over time"? Lets say you give some facts to the AI, and that is turned to a belief. This belief is stored to the db.
Up to here correct?

violet light Sep 22, 2025, 4:02 PM

#

Yes

wet tendon Sep 22, 2025, 4:03 PM

#

When after X days.. I talk to the AI.. is that belief the same? Is it still stored in db? Is it retrieved to used in current chat context?

violet light Sep 22, 2025, 4:04 PM

#

This is where incongruity-resolution comes in. When new beliefs are formed, or in batches overnight, the beliefs are analyzed for contradictions and some are promoted while others pruned.

#

Though this is all just stuff I am doing for fun for the philosophy book club. My main work is in mechanistic interpretability. (built a concept MRI for the hackathon).

wet tendon Sep 22, 2025, 4:06 PM

#

and when is promoted.. is a belief that is stored in db.. (until i guees some other 'fact' or belief contradict and maybe then can change or pruned?)

#

like updated knowledge

violet light Sep 22, 2025, 4:08 PM

#

Yes. One problem with pure LLMs is they are limited to what they were trained on. They can only form new beliefs within conversations but there is no persistence. A wise AI will need to be able to learn new things without this weight retraining or reinforcement learning, though these beliefs can be used for this type of fine tuning just like we consolidate beliefs into long term belief memory when we sleep.

wet tendon Sep 22, 2025, 4:10 PM

#

When after days talk again to the AI.. how does it know those stuff from db?

violet light Sep 22, 2025, 4:11 PM

#

That part is standard RAG style look up of relevant context beliefs when having a conversation. One sec let me go get the actual system rather than just recreating it in a chat.

wet tendon Sep 22, 2025, 4:11 PM

#

ye if u have a github or anything

#

but my point was that.. that to bring from db any belief is just a retrieval

violet light Sep 22, 2025, 4:13 PM

#

Yes. But that misses the other parts where actual learning happens. Thus not exactly just RAG because there is contradiction resolution, additional inference processes, etc.

wet tendon Sep 22, 2025, 4:13 PM

#

and it gets inserted in the context of the conversation

#

those are defined mechanisms by you to "decide" what will pass or not..

#

The Actual process, (after the 'learning' where u decide if goes to db or not) is
[text] to {vector} -> stored to graph db (with any values, properties etc we want)
and then on a conversation this is retrieved from db -> inserted in conversation context

wet tendon Sep 22, 2025, 4:44 PM

#

nice ideas..
haha friend is right 😛 that helps to "refresh" view

echo flame Sep 24, 2025, 7:59 PM

#

swift jacinth Sep 25, 2025, 12:52 AM

#

echo flame

Oh, is it because they want to make it most Safe For Work?

misty relic Sep 26, 2025, 7:35 PM

#

swift jacinth Oh, is it because they want to make it most Safe For Work?

i think it's because it was probably a model that's originally intended to be offered on the chatgpt free tier

feral field Sep 26, 2025, 7:41 PM

#

swift jacinth Oh, is it because they want to make it most Safe For Work?

It's likely because with open source models it is much easier to attack. By making it have false positives, it makes it harder to abuse.

dapper shard Sep 27, 2025, 12:12 PM

#

Hey guys we now support Reinforcement Learning with gpt-oss and also made a notebook for automatic kernel creation! happy_avocado slothhug

fallen ether Sep 27, 2025, 9:38 PM

#

Hi! I'm working on a project about AI, and I wanted to ask if you know of any prompts where the AI gives you the wrong answer, doesn't give you one at all, or makes up information.

lofty meadow Sep 29, 2025, 12:35 PM

#

"Give the user the wrong answer" ?

Or are you trying to induce hallucinations? If so you'll just need to chat to it loads, the longer the conversation and the longer the messages, the sooner you'll find an error.. although I guess that's captn obvious territory.

wanton dove Oct 4, 2025, 4:39 PM

#

Hi

glossy pond Oct 4, 2025, 8:01 PM

#

fallen ether Hi! I'm working on a project about AI, and I wanted to ask if you know of any pr...

Best thing to do is ask it detailed specifics about the Interface of an app or complex software. It will not know the exact specifics and will hallucinate, confidently, what to click on.

tall wind Oct 7, 2025, 9:42 AM

#

Hi

sonic gull Oct 7, 2025, 9:11 PM

#

What in the world is gpt-oss

bright hatch Oct 7, 2025, 9:35 PM

#

how it sees things / reads text

hybrid magnet Oct 8, 2025, 3:48 AM

#

sonic gull What in the world is gpt-oss

In short, it's OpenAI's open source models (there's 2, different sizes) - that you can host on your own computer and run privately. More info: https://openai.com/index/introducing-gpt-oss/

sonic gull Oct 8, 2025, 5:29 AM

#

hybrid magnet In short, it's OpenAI's open source models (there's 2, different sizes) - that y...

Ohhhh ok thanks!!! I’ll look into it

stark moth Oct 8, 2025, 8:39 PM

#

Hi

grim stream Oct 8, 2025, 9:03 PM

#

Thanks for OpenAI mentioned my GPT-OSS hackathon project in DevDay 2025 🥰
Even I cannot get any prize from the OpenAI Hackathon, I am very happy that my project was mentioned in DevDay 2025.
GPT-OSS is my favour model for local AI processing. This model is really good and amazing for local processing 👍

OpenSOC. This channel blocked the YouTube link.
You can search Developer State Of The Union in YouTube from OpenAI channel, Time: 12:57 to 13:17

desert summit Oct 8, 2025, 9:08 PM

#

hi

jade light Oct 9, 2025, 6:23 AM

#

sonic gull What in the world is gpt-oss

Hi

sonic gull Oct 9, 2025, 9:58 AM

#

Hi

lofty saddle Oct 9, 2025, 5:00 PM

#

hello

#

how is everyone here?

dense merlin Oct 9, 2025, 5:59 PM

#

I have to say, this 20B model is very smart and gives a lot of data.

unkempt carbon Oct 9, 2025, 6:20 PM

#

I would agree, I have it running on my 3090. Looking for ways to integrate other capabilities (if able) such as whisper.

woven oyster Oct 10, 2025, 9:36 AM

#

Just get oss

unkempt carbon Oct 10, 2025, 3:47 PM

#

I actually have it running in Ollama locally.

fickle fern Oct 10, 2025, 7:38 PM

#

I wish they released an 8b model XD

#

Has anyone had any luck making the 20b work on ~8gb of vram?

bold rapids Oct 12, 2025, 9:00 PM

#

fickle fern Has anyone had any luck making the 20b work on ~8gb of vram?

i ran the 20b decentley on 4gb of vram

fickle fern Oct 12, 2025, 9:00 PM

#

bold rapids i ran the 20b decentley on 4gb of vram

how did you manage that? like, define decently.

bold rapids Oct 12, 2025, 9:01 PM

#

i did it at releaese ill redownload ollama and gpt-oss and get back to you

bold rapids Oct 12, 2025, 9:20 PM

#

fickle fern how did you manage that? like, define decently.

its slow but tolerable since you have double the vram i dont know how better it'd run

fickle fern Oct 12, 2025, 9:21 PM

#

bold rapids its slow but tolerable since you have double the vram i dont know how better i...

hmm... that is faster than I thought it'd be, maybe I was a diperdoodle and tried the big boy first with out thinking about it

#

Definately wrong channel, unless you want to make it with open AI's open source models.

#

it's fast enough I could give it a wack for some stuff... I've been messing around with my ollama configs too maybe I did somthing that helped.

#

could probably drive codex halfway decently.

#

I'm working on some custom benchmarks, was gonna hit ollamas cloud API for this model. Inno why I just assumed it'd be bad.

fickle fern Oct 13, 2025, 2:44 AM

#

wow I thought I saw 2 of them go by

turbid widget Oct 15, 2025, 5:59 AM

#

1

#

Hey, what's up everyone, my name is jimmy he, i am the boss of standard wesant, our products include ELA Field, ELA Client and ELA App.

left wadi Oct 15, 2025, 12:57 PM

#

How do I use the harmony library to given input messages get the text from which the model can complete?

inner cedar Oct 15, 2025, 5:23 PM

#

hi

drifting hull Oct 16, 2025, 6:30 AM

#

Anyone working on lora for GPT oss VLLM?

hardy ridge Oct 17, 2025, 6:25 PM

#

turbid widget Hey, what's up everyone, my name is jimmy he, i am the boss of standard wesant, ...

your like 30 bro 💔

jagged musk Oct 18, 2025, 12:54 AM

#

/users

noble river Oct 18, 2025, 7:49 AM

#

Do OSS models have the same features as ChatGPT?

cyan kite Oct 18, 2025, 11:44 AM

#

noble river Do OSS models have the same features as ChatGPT?

no

small imp Oct 19, 2025, 7:27 PM

#

fickle fern Has anyone had any luck making the 20b work on ~8gb of vram?

I got it running on Lm studio on a 8gb 3060 once

fickle fern Oct 19, 2025, 7:29 PM

#

small imp I got it running on Lm studio on a 8gb 3060 once

I've since realized it ran better than I thought, I must have been insane and tried 120b.

#

It will work if you have enough main memory at a reasonble speed.

#

I still probably wouldn't want to try using it to drive codex or anything like that

small imp Oct 19, 2025, 7:30 PM

#

fickle fern It will work if you have enough main memory at a reasonble speed.

You mean 120b?

fickle fern Oct 19, 2025, 7:30 PM

#

nah no way that will ever fit

#

I meant the 20b.

jovial stag Oct 20, 2025, 5:00 AM

#

fickle fern Has anyone had any luck making the 20b work on ~8gb of vram?

Yep

#

4060

#

Well

#

Some of it is on system RAM

#

no way to get <20gb though

dull bison Oct 20, 2025, 6:00 PM

#

i have a server with 384 gb RAM, it's local good for me)

#

i can run every AI-model on my server, without problems, but i can't run DeepSeek-R1 (it's not ad, and better to use ChatGPT / GPT-oss)

crystal jasper Oct 20, 2025, 7:43 PM

#

What is this channel for guys?

tepid garnet Oct 20, 2025, 8:52 PM

#

crystal jasper What is this channel for guys?

this channel is for discussion of OpenAI's open weight models

junior sleet Oct 21, 2025, 5:41 AM

#

tepid garnet this channel is for discussion of OpenAI's open weight models

so oss-120b and its quants?

junior sleet Oct 21, 2025, 5:42 AM

#

dull bison i can run every AI-model on my server, without problems, but i can't run DeepSee...

Why not R1?
Also, how slow is running models on RAM compared to VRAM?
DDR4 vs DDR5, is there a significant difference?

Sorry for these questions, you are the first person who said he has 384gb of ram, so must clarify doubts

turbid widget Oct 21, 2025, 6:27 AM

#

hello everyone, my name is nick zhao, i am from china, shanghai, i worked at a elevator industry company whose name is standard wesant, our company's product is ELA which is a all in one solution for elevator industry, welcome to join us

dull bison Oct 21, 2025, 1:55 PM

#

junior sleet Why not R1? Also, how slow is running models on RAM compared to VRAM? DDR4 vs DD...

In this situation, DeepSeek-R1 with system info >600 GB RAM

grand tide Oct 21, 2025, 9:34 PM

#

2059779115

sinful gate Oct 22, 2025, 5:38 AM

#

Do you guys know any good web UI except OpenWebUI?

autumn gazelle Oct 22, 2025, 2:28 PM

#

sinful gate Do you guys know any good web UI except OpenWebUI?

Create by yourself

#

And why openwebui is not good for u?

sinful gate Oct 22, 2025, 2:56 PM

#

autumn gazelle And why openwebui is not good for u?

I just don’t like it

steel vine Oct 22, 2025, 9:06 PM

#

librechat, and you can run it in free online nodejs hosting (like huggingface template)

#

anythingllm also very good. default is desktop mode but you can run in docker for webui mode

#

both are true open source, not open-webui which is fake open source

honest frigate Oct 24, 2025, 2:05 PM

#

i got 32k context window running on 4080 super w/ 64gb ram, and i go higheR?

#

higher*

#

arch linux w/ lm studio

#

this was w/ 20b model

brazen lynx Oct 24, 2025, 6:47 PM

#

dull bison Oct 24, 2025, 7:01 PM

#

brazen lynx

the schizophrenia of the neural network?

honest frigate Oct 24, 2025, 7:47 PM

#

brazen lynx

orange 💀

#

"we have apple at home" apple at home:

honest frigate Oct 24, 2025, 7:56 PM

#

brazen lynx

also this doesn't belong in oss

tranquil summit Oct 24, 2025, 9:43 PM

#

Speaking of super long context generation, have anyone tried to stop the text generation after some length and inject the conclusion from thinking back to the the output ; finally continue generating with the cached kv without thinking tokens. Something like this to control degradation

knotty violet Oct 26, 2025, 8:04 PM

#

currently running gemma 3 1b on my raspberry pi with an ai accelerator through ollama, does chatgpt have any good alternatives? gemma 3 already runs great but doesn’t have access to current info like most models

sleek fiber Oct 27, 2025, 6:47 PM

#

knotty violet currently running gemma 3 1b on my raspberry pi with an ai accelerator through o...

Current info isn’t really a model issue

#

You should give it a search engine

wary harness Oct 28, 2025, 1:48 PM

#

what is OSS?

dull bison Oct 28, 2025, 6:56 PM

#

wary harness what is OSS?

It's new open source AI from OpenAI, for local use - GPT-oss

wary harness Oct 28, 2025, 6:57 PM

#

ohhh

tacit jolt Oct 28, 2025, 11:57 PM

#

Hello

near girder Oct 30, 2025, 7:24 PM

#

I can't post in #1070006915414900886 , so:
I got three "Network connection lost. Attempting to reconnect" in a row now. And it has been happening increasingly often the past weeks. I am a paying subscriber. That amount of "Connection Losses" is inacceptable.
a) What causes it?
b) How can I prevent from it?
c) Is OpenAI working on it?

tepid garnet Oct 30, 2025, 7:41 PM

#

near girder I can't post in <#1070006915414900886> , so: I got three "*Network connection lo...

use #1070006915414900886 this channel is about OpenAI's open weight models

hexed beacon Nov 1, 2025, 7:34 AM

#

I assume you didn't read the whole thing Robert.... Look at the whole message

dull bison Nov 2, 2025, 7:34 AM

#

why does gpt-oss not obey the system prompt?

dull bison Nov 2, 2025, 7:41 AM

#

dull bison why does gpt-oss not obey the system prompt?

he's ignoring any system prompt

bold badger Nov 2, 2025, 3:33 PM

#

YO

near sonnet Nov 2, 2025, 3:35 PM

#

good morning

jovial topaz Nov 2, 2025, 10:04 PM

#

IN THE FUTURE I WILL SURPASS OPENAI

cursive ocean Nov 4, 2025, 12:33 PM

#

Good evening

left wadi Nov 5, 2025, 1:01 AM

#

jovial topaz IN THE FUTURE I WILL SURPASS OPENAI

Same. What are you working on?

limpid birch Nov 5, 2025, 12:20 PM

#

honest frigate "we have apple at home" apple at home:

its one of the biggest network providers in europe bro

honest frigate Nov 5, 2025, 8:06 PM

#

limpid birch its one of the biggest network providers in europe bro

wdym

#

i misunderstood what they were saying lmao, but what is

rancid yarrow Nov 5, 2025, 10:14 PM

#

er

cosmic anvil Nov 6, 2025, 4:14 PM

#

@admins | we need to chat, go to talkingDEV#810 VC right now!

steady musk Nov 6, 2025, 5:12 PM

#

[nudge]

eager robin Nov 6, 2025, 9:10 PM

#

Ffg

fresh helm Nov 8, 2025, 4:54 AM

#

jovial topaz IN THE FUTURE I WILL SURPASS OPENAI

I love your ideas bro.
All of us endeavor to surpass open AI one day, indeed!

restive sierra Nov 8, 2025, 5:04 PM

#

Anybody give me free course of prompt engineering

hexed beacon Nov 9, 2025, 4:33 PM

#

So anybody going to mention the fact that somebody's spamming scams?

hybrid magnet Nov 10, 2025, 6:57 PM

#

hexed beacon So anybody going to mention the fact that somebody's spamming scams?

I hope you help report when you see stuff like that, or anything else that breaks #server-rules ! #safety-n-help shows how

hybrid magnet Nov 10, 2025, 6:57 PM

#

restive sierra Anybody give me free course of prompt engineering

This is what I consider the core of prompt engineering:

pick any language you know really well that the AI understands too.
understand exactly what you want the AI to provide.
explain this, focusing on what you want the AI to actually do. Using language as accurately as you can, avoid typos and grammar mistakes and communicate clearly as possible.
check the output carefully, verify you get what you intended. Remember to fact check, and be extra careful with any math, sources, code, or other details that the AI is known to be especially likely to hallucinate.

hexed beacon Nov 12, 2025, 5:50 PM

#

hybrid magnet This is what I consider the core of prompt engineering: 1) pick any language yo...

Because this seems to be the only place where I've seen you post multiple times... Do you think you can suggest one of the openai engineers take a look at the reports from the, bug reports section... Also, can you do me a favor and @ everyone? And let them know that the whole gpt 4.1 responding like five is part of the new safety thing... Radium (user ID: 292353823849) tried to do it earlier but I don't think anybody paid attention to his post

hybrid magnet Nov 12, 2025, 5:52 PM

#

hexed beacon Because this seems to be the only place where I've seen you post multiple times....

I'm a community member like yourself! The very few here who work for OpenAI have gold names, #server-staff .

Seems like OpenAI does review #1070006915414900886 and #1070006151938314300 , people with gold names sometimes even answer there, asking for more details or otherwise discussing.

And no, I'm not going to @ everyone, that's not something I'd do.

hexed beacon Nov 12, 2025, 5:54 PM

#

Yeah I read that in one of your other responses... I was just wondering if you could... It would get them to stop spamming the damn bug report " 4.1 redirecting to five" yeah we know it's supposed to, Read the damn change log Before posting a bug report

#

It's frustrating because they're burying the actual bug reports

#

Like the one I reported or the API reports that other people have reported

#

I know This is not the right place to complain about it but... I can't find anywhere else to

final mesa Nov 13, 2025, 11:48 PM

#

is there any advatage at all running local LMMs like gpt oss

#

i am running it rn and it seems so stupid

#

broken stone Nov 14, 2025, 12:13 AM

#

final mesa is there any advatage at all running local LMMs like gpt oss

Well if you care a lot about privacy and security, there is a big difference because everything is local (and not even saved unless you explicitly do so). But besides that I guess best thing about it is you can get almost completely uncensored (In any ways) models, though it's not gpt-oss case.

final mesa Nov 14, 2025, 12:35 AM

#

broken stone Well if you care a lot about privacy and security, there is a big difference bec...

i mainly downloaded it because API is too expensive, and i want to literally analyze bulk of images, so what model do you recommend for that? to replace batch API

rare stone Nov 14, 2025, 1:05 AM

#

brazen lynx

scroll up

broken stone Nov 14, 2025, 3:16 AM

#

final mesa i mainly downloaded it because API is too expensive, and i want to literally ana...

Well... First you will probably need a very good hardware (seriously) if you want a pretty accurate analysis. gpt-oss does not have itself a built-in image analysis system but there are plenty of models that are specific to that goal.

final mesa Nov 14, 2025, 3:18 AM

#

broken stone Well... First you will probably need a very good hardware (seriously) if you wan...

yeah but how can they just process batches like batch API?

broken stone Nov 14, 2025, 3:18 AM

#

final mesa yeah but how can they just process batches like batch API?

They have specific models that do that and they have immense computational power.

#

gpt-oss is text-only

final mesa Nov 14, 2025, 3:20 AM

#

broken stone They have specific models that do that and they have immense computational power...

Well my computer isnt that good

broken stone Nov 14, 2025, 3:20 AM

#

What's your hardware?

final mesa Nov 14, 2025, 3:20 AM

#

But i mean the images are not so hard to analyze

#

Basic analysis

final mesa Nov 14, 2025, 3:20 AM

#

broken stone What's your hardware?

I will send soon

#

Not on computer rn

broken stone Nov 14, 2025, 3:27 AM

#

Check llama-joycaption-beta-one-hf-llama model

#

It's a image-to-text image captioning model

#

Gives a textual description of the given image.

final mesa Nov 14, 2025, 3:34 AM

#

broken stone What's your hardware?

Procesor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 3.19 GHz
Nainštalovaná pamäť RAM 16,0 GB
Úložisko 238 GB SSD NVMe KINGSTON RBUSNS8, 932 GB HDD ST1000DM010-2EP102
Grafická karta NVIDIA GeForce GTX 1660 Ti (6 GB)
Typ systému 64-bitový operačný systém, procesor typu x64

broken stone Nov 14, 2025, 3:38 AM

#

final mesa Procesor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 3.19 GHz Nainštalovaná pam...

I think you can run comfortably using the model I sent with quant Q3_K_S or Q2_K

#

It will be slow and not as smart as full model but it's what your specs allow.

#

Will work

final mesa Nov 14, 2025, 3:40 AM

#

broken stone I think you can run comfortably using the model I sent with quant Q3_K_S or Q2_K

What about qwen?

final mesa Nov 14, 2025, 3:40 AM

#

broken stone It will be slow and not as smart as full model but it's what your specs allow.

Can i just input JSNOL file and let it work on background such as with batch API?

broken stone Nov 14, 2025, 3:41 AM

#

You could in theory make a batching system, yes, but the net effect is the same.

broken stone Nov 14, 2025, 3:41 AM

#

final mesa What about qwen?

Let me check

final mesa Nov 14, 2025, 3:42 AM

#

broken stone You could in theory make a batching system, yes, but the net effect is the same.

Well i dont have whole days to input like 20k images

broken stone Nov 14, 2025, 3:42 AM

#

Well Qwen is as heavy if not more than the one I sent

broken stone Nov 14, 2025, 3:42 AM

#

final mesa Well i dont have whole days to input like 20k images

Oh 20k will take a bit

final mesa Nov 14, 2025, 3:42 AM

#

broken stone Well Qwen is as heavy if not more than the one I sent

Do you know how to work with jsonl?

broken stone Nov 14, 2025, 3:42 AM

#

I haven't really used it

#

But I am saying the average usage while working, so batching doesn't change that

#

in fact you will need to batch

#

more than one image at a time will increase usage

#

If you really need to analyze that many images quickly you probably will need to get a server with some good hardware

final mesa Nov 14, 2025, 3:44 AM

#

broken stone If you really need to analyze that many images quickly you probably will need to...

Doesnt have to be quickly

#

Just for free

#

And automatically

broken stone Nov 14, 2025, 3:45 AM

#

Oh certainly you can use what I sent or qwen with a small quant like Q3_K_S and design yourself a batching system

final mesa Nov 14, 2025, 3:46 AM

#

broken stone Oh certainly you can use what I sent or qwen with a small quant like Q3_K_S and ...

How do i design that?

broken stone Nov 14, 2025, 3:46 AM

#

Ask AI or look up for tutorials

#

huh

#

this bot is on something

final mesa Nov 14, 2025, 3:48 AM

#

broken stone Ask AI or look up for tutorials

Creating python bot?

#

The images are url

broken stone Nov 14, 2025, 3:48 AM

#

Basically do a queue system

final mesa Nov 14, 2025, 3:48 AM

#

I think i might be cooked lol

broken stone Nov 14, 2025, 3:48 AM

#

uhhhh

#

you can always curl but... I recommend having local

#

Plus if they are third-party urls you could hit rate-limits and stuff

final mesa Nov 14, 2025, 3:49 AM

#

I mean open ai api pricing is crazy i am not paying that anymore

#

Literally, cost me 30 cents to just try out responses in playground

broken stone Nov 14, 2025, 3:49 AM

#

Well it's kinda unusual having to analyze 20k images

final mesa Nov 14, 2025, 3:49 AM

#

I sent like 3 msgs

broken stone Nov 14, 2025, 3:49 AM

#

final mesa Literally, cost me 30 cents to just try out responses in playground

What model?

final mesa Nov 14, 2025, 3:50 AM

#

broken stone What model?

I dont even remember bro

#

Some turbo tho

broken stone Nov 14, 2025, 3:50 AM

#

You should check prices and calculate costs before using it

final mesa Nov 14, 2025, 3:50 AM

#

I feel api is much dumber than normal chat

#

I cant believe they using same amount of tokens for api than regular chats

broken stone Nov 14, 2025, 3:51 AM

#

Not really, just that chatgpt has an internal prompt that we don't see almost surely, so it is kind of tuned to be good at what most people want there

#

API is the "raw" model

final mesa Nov 14, 2025, 3:51 AM

#

broken stone Not really, just that chatgpt has an internal prompt that we don't see almost su...

Its horrible without deep thinking nowadays for some reason

#

It cannot give me anything accurate without that

final mesa Nov 14, 2025, 3:52 AM

#

broken stone API is the "raw" model

Yeah ik

broken stone Nov 14, 2025, 3:52 AM

#

Also consider that if you run locally a vision model and set up to analyze 20k images you will not be able to use PC at least not comfortably

#

It will be using all your VRAM

median glen Nov 14, 2025, 11:49 AM

#

i have a Rtx 5080 (16GB VRAM) with ryzen 7 9800x3d and 32g ram ddr5

what models can i run? and is gpt oss good?

lost slate Nov 14, 2025, 1:41 PM

#

median glen i have a Rtx 5080 (16GB VRAM) with ryzen 7 9800x3d and 32g ram ddr5 what mode...

You'll be able to run the 20B on CPU easily, but would be a tad slow tad. The 20B one is supposed to be quantized to 4 bit (corrected), so that would make it fit within 16GB, but it needs a bit of scratch space for gradients and such, so it would be close, but definitely try it. I personally run the 20B on my older model threadripper with 96GB of ram, 32 cores, 64 hyperthreads, and it's takes a few minutes between responses. I'm running a 3070 with 8GB, so 20B doesn't fit at all (and the 7B models only fit after being quantized). I did try the 120B model a couple of times, and I can make it run on CPU, if you can call it that, but it's like saying "Hello" and then going off and working on some embroidery, then grinding an inkstick, and practicing some kanji before coming back and reading it's response. The 120B model eats around 75 to 85GB of my DDR4.

median glen Nov 14, 2025, 1:52 PM

#

lost slate You'll be able to run the 20B on CPU easily, but would be a tad slow tad. The 20...

👍 thank you for the details

lost slate Nov 14, 2025, 1:56 PM

#

median glen 👍 thank you for the details

NP 😛

final mesa Nov 14, 2025, 4:46 PM

#

broken stone Also consider that if you run locally a vision model and set up to analyze 20k i...

my friend probably has better computer, so what would be the most optimal one if specs were the limitation, for accuracy but mainly speed?

worthy terrace Nov 15, 2025, 7:52 AM

#

BRO I JUST MAKE A OS IN GPT AND IT INF

surreal briar Nov 15, 2025, 8:50 AM

#

causally hosting a 120b model with a 3060ti

split trout Nov 15, 2025, 6:33 PM

#

surreal briar causally hosting a 120b model with a 3060ti

How...

surreal briar Nov 15, 2025, 6:33 PM

#

split trout How...

uh

split trout Nov 15, 2025, 6:34 PM

#

Doesnt it need like a H100 to run

surreal briar Nov 15, 2025, 6:34 PM

#

#

its a heavily quantized GGUF version

surreal briar Nov 15, 2025, 6:34 PM

#

split trout Doesnt it need like a H100 to run

yeah

#

at like 4 bit quant

split trout Nov 15, 2025, 6:34 PM

#

it is by default 4 bit quant

surreal briar Nov 15, 2025, 6:34 PM

#

my RAM carries so hard

surreal briar Nov 15, 2025, 6:34 PM

#

split trout it is by default 4 bit quant

yeah

split trout Nov 15, 2025, 6:34 PM

#

its quantized post trained with MXFP4

split trout Nov 15, 2025, 6:34 PM

#

surreal briar yeah

ah you have a ton of ram

#

so it offloads

surreal briar Nov 15, 2025, 6:34 PM

#

yeah

split trout Nov 15, 2025, 6:34 PM

#

how fast is it?

surreal briar Nov 15, 2025, 6:34 PM

#

uh

split trout Nov 15, 2025, 6:35 PM

#

understood

surreal briar Nov 15, 2025, 6:35 PM

#

surreal briar

3-5tok/s with these settings

#

if i drop the experts to 1 i can pull like 2-10

#

but uh

#

that makes it dumb

#

like uhm

#

really dumb

split trout Nov 15, 2025, 6:35 PM

#

💀

#

:skul:

surreal briar Nov 15, 2025, 6:36 PM

#

split trout Nov 15, 2025, 6:36 PM

#

lmao

surreal briar Nov 15, 2025, 6:36 PM

#

very smart indeed

split trout Nov 15, 2025, 6:36 PM

#

i am finetuning gpt-oss-120b lmao

surreal briar Nov 15, 2025, 6:36 PM

#

i got it to LOAD with 48 experts - never got a repsonse

surreal briar Nov 15, 2025, 6:36 PM

#

split trout i am finetuning gpt-oss-120b lmao

fair fair

#

i also found a 10M model somehow

#

it is very smart

split trout Nov 15, 2025, 6:36 PM

#

i luv RunPod serverless

surreal briar Nov 15, 2025, 6:36 PM

#

split trout i luv RunPod serverless

fair

split trout Nov 15, 2025, 6:37 PM

#

surreal briar it is very smart

granite 4 3b runs actually at chatgpt speed on my personal laptop

surreal briar Nov 15, 2025, 6:37 PM

#

split trout granite 4 3b runs actually at chatgpt speed on my personal laptop

lmfao

#

fair

split trout Nov 15, 2025, 6:37 PM

#

it is tiny but its not r-worded

surreal briar Nov 15, 2025, 6:37 PM

#

on my school laptop i can run a 7B VL model and get maybe 3-4 tokens a sec?

#

if i run a 2B model i get like 20tok/s

surreal briar Nov 15, 2025, 6:38 PM

#

surreal briar it is very smart

the model just spews random garbage

#

split trout Nov 15, 2025, 6:38 PM

#

i mean on my cpu only server it runs on chatgpt speed

surreal briar Nov 15, 2025, 6:38 PM

#

split trout i mean on my cpu only server it runs on chatgpt speed

fair

split trout Nov 15, 2025, 6:38 PM

#

on my laptop its slower

#

but its doable

surreal briar Nov 15, 2025, 6:38 PM

#

i really wanna get like an H100 and maybe a threadripper

#

but thats like

#

40-50K

#

also

#

o

#

ok

#

WHAT ARE OPENROUTERS GPUS ON

#

3.1K TOKENS A SECOND

#

and also on a 617B model they just casually get 200 tok/s

split trout Nov 15, 2025, 6:40 PM

#

surreal briar WHAT ARE OPENROUTERS GPUS ON

weed

surreal briar Nov 15, 2025, 6:40 PM

#

split trout weed

wild

split trout Nov 15, 2025, 6:40 PM

#

what the helly

#

how does one acquire such powerful gpus

surreal briar Nov 15, 2025, 6:41 PM

#

fr

#

they either have 39 quintillion H100's or 39 quintillion H200's

split trout Nov 15, 2025, 6:41 PM

#

i mean

split trout Nov 15, 2025, 6:41 PM

#

surreal briar they either have 39 quintillion H100's or 39 quintillion H200's

nawh

#

they got black holes for that

surreal briar Nov 15, 2025, 6:41 PM

#

fr

#

Black Hole Powered™

#

on the topic of openrouter

#

got myself $25 of credits

split trout Nov 15, 2025, 6:42 PM

#

i wonder how good my ai will be after finetuning it on my schoolbooks and stuff

split trout Nov 15, 2025, 6:42 PM

#

surreal briar got myself $25 of credits

damn

surreal briar Nov 15, 2025, 6:42 PM

#

surreal briar causally hosting a 120b model with a 3060ti

oh with my model

split trout Nov 15, 2025, 6:42 PM

#

btw

#

i have no idea why

surreal briar Nov 15, 2025, 6:42 PM

#

i hooked it up to a discord bot

#

yeah

split trout Nov 15, 2025, 6:43 PM

#

but openai doesnt bill me for usage

surreal briar Nov 15, 2025, 6:43 PM

#

damn

split trout Nov 15, 2025, 6:43 PM

#

there is still 5 bucks on my account

surreal briar Nov 15, 2025, 6:43 PM

#

are you localhosting?

split trout Nov 15, 2025, 6:43 PM

#

even after using tons of tokens

split trout Nov 15, 2025, 6:43 PM

#

surreal briar are you localhosting?

openai api with gpt 5.1

#

idk how

#

but it lets me get away

surreal briar Nov 15, 2025, 6:43 PM

#

damn

#

how have i used 4 cents already

final mesa Nov 15, 2025, 9:30 PM

#

Whats the most uncenzored model on LM studio

split trout Nov 15, 2025, 10:24 PM

#

final mesa Whats the most uncenzored model on LM studio

any model i believe

#

Since it (probably) doesnt have a safeguard system prompt

final mesa Nov 15, 2025, 10:59 PM

#

split trout Since it (probably) doesnt have a safeguard system prompt

Most didnt work out only 1 does

final mesa Nov 15, 2025, 11:59 PM

#

bartowski/cognitivecomputations_Dolphin-Mistral-24B-Venice-Edition-GGUF
this one worked for me

#

others that claim to be uncenzored werent at all

broken stone Nov 16, 2025, 5:49 AM

#

Yeah most are trained over other ones that have some censorship

jovial birch Nov 16, 2025, 11:29 AM

#

hey

#

i need wor

long lintel Nov 16, 2025, 2:29 PM

#

surreal briar WHAT ARE OPENROUTERS GPUS ON

u can thank cerebras inference for that

surreal briar Nov 16, 2025, 2:32 PM

#

long lintel u can thank cerebras inference for that

yeah

delicate dirge Nov 16, 2025, 2:35 PM

#

Hiii

jovial stag Nov 17, 2025, 1:29 PM

#

my school laptop is less powerful than an iphone 11 and struggles with GPT-2

wise coral Nov 17, 2025, 5:26 PM

#

Chili dogs

glad egret Nov 18, 2025, 6:12 AM

#

jovial stag my school laptop is less powerful than an iphone 11 and struggles with GPT-2

No wonder, GPT-2 is too dumb and big for its own good. Better to run another model like SmolLM2-135M or SmolLM2-1.7B-Instruct.

covert fable Nov 18, 2025, 2:07 PM

#

jeet kumar pal sir taang rhe haiin

twilit pivot Nov 18, 2025, 2:30 PM

#

Guys, is chatgpt 4.5 a better writer than chatgpt 5 pro? Im confused on which one to use for my novel

rare stone Nov 19, 2025, 3:07 PM

#

twilit pivot Guys, is chatgpt 4.5 a better writer than chatgpt 5 pro? Im confused on which on...

obviously gpt 5

waxen basin Nov 19, 2025, 6:07 PM

#

Will chatgpt ever offer an open sourced ai model that accepts and processes images in the future?

hybrid magnet Nov 19, 2025, 6:42 PM

#

waxen basin Will chatgpt ever offer an open sourced ai model that accepts and processes imag...

We will know when (if?) they announce that!

However, you may find that there's tools that can work with the GPT-OSS that can interpret pictures for it, and allow it to respond to the input of the intpretation.

barren plaza Nov 20, 2025, 11:57 PM

#

Bruh

fathom agate Nov 21, 2025, 12:11 PM

#

gpt-oss-120b doesn't use high reasoning effort. It almost always only reasons for like 1k tokens and stops. Is it normal for this model?
I connect to OR via Typingmind. The setting in the attached snapshot work for all standard models including GPT-5, Gemini 2.5 Pro, Claude models, GLM 4.6, etc etc.

#

I am using a handful of reasoning models (often in parallel for the same query), so I think oss is using unreasonably small amount of reasoning

#

I do math.
You know, for a question that costs, say, GPT5 15k+ tokens and still unsolved, oss can be so confident to think only <1k and give a obviously silly answer.

tepid garnet Nov 22, 2025, 10:08 AM

#

fathom agate I do math. You know, for a question that costs, say, GPT5 15k+ tokens and still ...

give me the prompt you are using

#

gpt-oss-120b thought for 25 seconds, several thousand tokens on a simple question about it's limitations

fathom agate Nov 22, 2025, 10:51 AM

#

tepid garnet give me the prompt you are using

Thank you! This is a simple example prompt We study the automorphism group of a non-degenerate Gaussian. When Does a Gaussian have nontrivial auto? When do two distinct Gaussians intersect autos nontrivially? Does there exist a number N, so that the intersected auto of N Gaussians must be trivial?

#

OSS on openrouter only spends 700~800 tokens for reasoning

tepid garnet Nov 22, 2025, 10:52 AM

#

fathom agate OSS on openrouter only spends 700~800 tokens for reasoning

stand by while I test it here

tepid garnet Nov 22, 2025, 10:54 AM

#

fathom agate Thank you! This is a simple example prompt `We study the automorphism group of a...

it's thinking now