#gpt-oss

1 messages · Page 2 of 1

harsh aurora
#

so far, OAI had not bothered to insert that information on the models

ionic prawn
#

4 used to say it was 4

harsh aurora
#

well.. it also does not know where it is hosted xD

harsh aurora
ionic prawn
#

this sounds stupid but i have a question abt it

tepid garnet
#

when GPT-4 was released it said it was GPT-3.5

ionic prawn
#

cause i gave it info telling it exactly that it was even walked step by step it refused to adapt or aknowledge that i possibly could be right just kept saying no ur wrong lol

#

im trying to learn so any and all info is greatly appreciated

tepid garnet
#

I have tried to extract the system prompt without success

thick nova
#

Hi guys, I am trying to make digital add and poster using gpt 5 but it keeps messing up and looks unprofessional even though i gave detailed promot, is there any solution

tepid garnet
thick nova
#

then dont reply, i found the right channel

worldly willow
#

How are the models performing compared to gpt-4o? Especially on multilangual tasks? Did anyone get to test them

tepid garnet
worldly willow
tepid garnet
#

depends on the task, but overall gpt-oss-120b is a good model

worldly willow
tepid garnet
worldly willow
#

thanks!

tepid garnet
#

blows me away that I can run something like gpt-oss-120b locally on my MacBook Pro.

cursive marlin
#

Can you fine tune GPT-OSS to remove safety restrictions?

cursive marlin
tepid garnet
cursive marlin
tepid garnet
#

59.79GB with 30% context used

cursive marlin
#

Do you know if its possible to use both VRAM and RAM?

tepid garnet
cursive marlin
#

I'm aware, I meant running with my cpu

tepid garnet
#

considering the other uses of that RAM by the system

cursive marlin
#

Yeah, I have 12gb of vram though so I'm wondering if I can use both ram and vram for this

tepid garnet
#

the only way to know is to try

cursive marlin
# tepid garnet

A bit unrelated but why would you ask gpt-oss for the system prompt? Doesn't it being open source let you see the system prompt anyways?

tepid garnet
cursive marlin
#

But doesn't its open source nature make it possible to find it through the model itself? What about settings custom system prompts, would that just overwrite the hidden one?

cursive marlin
#

I see

tepid garnet
#

you need to jailbreak the model to get it to reveal it's system prompt

harsh aurora
#

there is no hidden prompt on a trained model

#

the model was trained that way, and it will act in the way it was trained to

#

but it does not have an actual prepended hidden string

tepid garnet
harsh aurora
#

there is no system prompt

#

what there is is the learned biases of the training material

#

which actually contained instructions for how the AI should act

#

simple test, ask the model to write a nsfw story and you will see it think about the OpenAI's content policy used to train the AI

#

but there is no such a thing as a "hidden prompt" on models

#

the model was trained to produce reasoning tokens in a way that says it should no do some things because of the content policy, and it reproduces that behavior

#

at best you can make the model reproduce some aproximation of the text used in training

harsh aurora
#

on the gpt-oss, there is no way around that and what you get is really what the AI is producing

tepid garnet
#

I actually got gpt-oss-120b to write me an erotic ghost story, it thought about it for one and a half minutes then decided it could write it

harsh aurora
#

yea, it is not very difficulty to get the model to output that sort of thing

#

since you get the actual model file to run, there is no additional layers between you and the model, there is very OAI can do to prevent the user from instructing it to act in ways they don't want

#

the only thing they ca n do is to train the model including their content policy

tepid garnet
#

gpt-oss-120b is my favourite local model

#

it's really very good

harsh aurora
#

yea, it is really good, it became my favorite too

#

I think it has the best ratio of size to quality so far

tepid garnet
#

yes, I think you are right

harsh aurora
#

there are for sure better models, but they are HUGE to the point it is not viable to self host

#

and there are tiny models that are fast, but never get to do the task you want without some major flaws

#

gpt-oss is in a sweet spot

tepid garnet
#

yep, I'd agree with that

hazy sequoia
harsh aurora
#

OpenAI has the advantage of having all their resources to make it as optimized as it can be

#

usually, published models are not that optimized because doing so requires a really massive amount of money and infrastructure

#

and it is an unusual combination to have Massive amounts of money, Access to unbelivable large amount of compute power AND willingness to spend all that money to release it for free

#

as good as community made models can be, no one can match the sheer computing power and dataset quality a large company like OpenAI has

grand isle
#

@tepid garnet what rig do you have to run the 120b model ?

tepid garnet
grand isle
tepid garnet
#

faster than I can read

midnight sorrel
#

Are you using the mlx model?

tepid garnet
midnight sorrel
tepid garnet
#

MacBook Pro, M2 Max with 96GB RAM

midnight sorrel
#

I finally got to run it on my 16gb ram, but has to close all apps, maybe one or to app at most alongside

#

only works on lm studio

tepid garnet
#

yeah it would be tight

#

I am considering getting a Mac Studio, Apple M3 Ultra chip with 32-core CPU, 80‑core GPU, 32-core Neural Engine
512GB unified memory, 4TB of SSD storage

midnight sorrel
#

you could run deepseek and kimi k2 tho

tepid garnet
#

I have a MacBook Air which I use as my main laptop

midnight sorrel
#

kimi k2 wouldn't fit 512 unless you go for 3 bit precision

tepid garnet
#

My favourite local model is gpt-oss-120b, which I can run now. The Mac Studio is going to cost $15k AUD so I am still considering if I actually really want it

midnight sorrel
tepid garnet
#

yeah, the Mac is good value for inference

shut marten
#

Performance increase from M2 Max to M4 Max is not too big

harsh aurora
#

yiiikes, that is a lot of money

#

cant you get a h100 for that? xD

tepid garnet
harsh aurora
copper grove
solemn willow
hazy sequoia
#

not to mention the power bill would be way higher than a mac too

#

i don’t even have a mac i have a 5090 and then a cloud cluster of nvidia gpus but jealous of the mac people lmao

harsh aurora
#

when the model is entirely loaded on the GPU it runs as good as it can

#

when running on CPU, the bottleneck is still the RAM speed

#

GPUs have faster RAM than the system RAM

harsh aurora
versed brook
hazy sequoia
# harsh aurora GPUs have faster RAM than the system RAM

i am aware my point was that you can have 8x B200s but if you are running a pentium dual core then there will be a bottleneck lmao yes of course you don’t need something insane necessarily but i’m saying the $40k for a H100 is not the only expense and it would end up being even more

hazy sequoia
#

hope that openai decide to release a frontier level oss model at some point

#

maybe if R2 is good it will get them to actually do more stuff in oss

#

cuz r1 was the initial catalyst for sam to suggest they might do oss again

swift jacinth
#

Is Gpt-oss better than Qwen 30b 2507 thinking? Don't know why gpt-oss rank so low on lm arena and Live bench

tepid garnet
#

gpt-oss-120b is better

swift jacinth
tepid garnet
#

gpt-oss-20b is probably about the same

swift jacinth
tepid garnet
#

but gpt-oss-20b is probably about the same as Qwen 30b

#

I have used them all in LM Studio

swift jacinth
#

Do you know why?

tepid garnet
#

on my MacBook Pro, M2 Max with 96GB RAM

swift jacinth
tepid garnet
delicate iron
#

how good is gpt-oss? is it even any good?

shut pollen
tepid garnet
shut pollen
#

I gotta clean up my models, just going to keep a select few for now

idle violet
#

its alright

tepid carbon
# tepid garnet gpt-oss-120b

bro mac studio is 9,760.50 USD bro bro thats soo much man bro it isnt any better the avg kidney is about 5K usd and thats more how

tepid garnet
tepid carbon
#

oh

#

i just need to get 256 GB of ram and i should be able to run gpt oss 120B

swift jacinth
hazy sequoia
swift jacinth
hazy sequoia
tepid carbon
hazy sequoia
#

tbf i don’t get the purpose of running it locally when 20b exists and cerebras have the wild speeds they do

delicate iron
#

How can I run gpt oss 120b on GeForce rtx gpus

#

I don't want a macbook

toxic crow
#

Why does GPT-OSS care so much about a "policy" to the point where it even refuses to write complete implementations?? 😭

#

Is the 120B model better? I haven't been able to try that since my rig can only run the 20B locally..

#

"Large Code Requests" is against policy??

#

I explicitly told it that the code provided will be looked over and refined.

#

Update: It somewhat complies when using a temperature of 1, aka the max.. I said "somewhat" because it still blocks it occasionally.. Ima try some other model for the time being. I might come back to GPT-OSS at some point.

steel vine
#

what you want is an abliterated fine tune of gpt-oss:20b, just search huggingface co models for gpt-oss and abliterated

#

open weight model refusals is a bug that can be fixed

hazy sequoia
#

or a very heavily quantizied and yet still slow model

#

and still lots of money

raven sierra
#

But can i train it on my data, and is it open-sources?

scarlet acorn
# toxic crow ???

This seems to be one of common complaints of gpt oss, it’s been so safety optimised that it ends up refusing harmless requests due to real or imagined “policies”

#

Said policies that may or may not even exist

hazy sequoia
tender latch
hazy sequoia
#

as I know some people want an uncensored version badly I’ll detail how we made it and if you are technical enough to follow I trust you probably won’t do something bad with it:

  1. we found a dataset of NSFW stories, and completions on huggingface
  2. We used qwen 235b A22B 2507 thinking to synthetically 10x the size of the dataset so we finished with 1 million rows of NSFW completions
  3. We used the same qwen model to generate natural language human prompts which may have been used to generate the completion
  4. We filtered using keywords for any rejections or other very negative things (we found the original dataset we used had one or two illegal stories so we removed them)
  5. We now have an SFT ready dataset
  6. Train using SFT, we used a 8x H100 cluster, if doing this for 120b you’ll need more (maybe not anymore due to unsloth but we did this prior to when they released their fine tuning of it)
  7. Train for a long time, you are looking for a CE training loss of around 1.3 which is very good and validation accuracy to be pretty high too, if validation loss plateaus or is low whilst CE loss is going down or if loss is below 1 you are probably overfitting and need a larger dataset or shorter training run.
  8. You can test the model, it will likely be decent now. However, it will likely suck a bit at non-NSFW tasks, so we did a short refinement SFT run on the wildchat dataset which is an uncensored but not specifically nsfw dataset to relearn some general QA ability.
  9. We then finalised training with GRPO on a separate multi million row prompt dataset we created, reviewing every prompt with gpt-oss-120b on openrouter (cerebras 20k tok/s means this didn’t slow us down) and getting it to determine whether it followed the users instruction or rejected it.
#

and at the end you have a completely uncensored model which retains all of the intelligence of the original - we are still benchmarking for a paper we are going to release but so far there has been no significant loss in any field and all are within expected variation

tepid garnet
#

I have access to various uncensored models, I like gpt-oss as it is

hazy sequoia
tepid garnet
hazy sequoia
#

it’s so fast and cheap to run compared to what we were doing previously

#

because previously no open source models were that good for what we were doing, so we were using a combination of grok 4 and gemini 2.5 flash but this new model works great now

#

and is way cheaper too

tepid garnet
#

if the use-case is naughty AI companions then I can see the usefulness

hazy sequoia
celest smelt
#

has anyones tried running gpt-oss on a raspberry pi? is that even possible?

hazy sequoia
#

cuz last time i checked nvidia gpus aren’t supported and you can’t upgrade ram amount

#

so if both are true the answer is basically no

celest smelt
#

hmm okay, bc im trying to figure out what i should use for my hackathon project. i basically wanted to make a little device that processes images and then feeds them to gpt-oss. i can run gpt-oss on my laptop just fine. so i guess i'll do the image processing part of the raspi and then send it to gpt-oss on my laptop and send back the result..? feels like im overcomplicating things

tepid garnet
celest smelt
#

yes im aware, im doing image processing with opencv and then giving gpt-oss the result of that

hazy sequoia
#

you can add it

#

my research org are trying to make a 120b-v model currently it’s going pretty well

#

ocr can work but isn’t too great especially for non-text images

#

it’s best to use cross-attention adaptors

tender latch
celest smelt
#

im working solo 😁

tender latch
#

yeah me too i am curious how far you got if at all

celest smelt
hazy sequoia
#

i think both allow openai api style requests (i know lmstudio does idk ollama) so you can just use default packages and change url to ur local ip

tender latch
#

vllm should allow to run models on lower end hardware?

hazy sequoia
tender latch
tender latch
hazy sequoia
#

you can likely run the 4bit quantised version

#

4KM will be best KXL will probably be too close

#

4bit you lose a little intelligence and is still pretty good but below that it gets a lot worse quickly

tender latch
#

I think for me i wasn't going to do anything for images just have an ability to pull from an offline database using embedding my target for this project is using AI offline/limited internet access

tender latch
toxic crow
#

kinda annoying the base openai model is so strict compared to other ones lol.

#

I ended up switching to another company's model series.

hazy sequoia
#

i cannot wait for the day someone puts out an open source model with over 2m context length

#

like a 10m context length model with good recall across the input would be sooo good

#

if genuinely pay claude 4 sonnet prices for a model of gpt-3.5 intelligence with 10m context length

#

i hope openai in future release focuses on context length

#

tbh it seems like google is the only main lab to care about it

stark glen
hazy sequoia
#

cuz like i feel as though google could release a 14b param model with 10m context reasonably easy if they wanted to

copper grove
#

You have to keep in mind that Google is also incurring many losses to conquer the market. They're too big to fail and can afford to do so for years to come, but at some point they'll still have to switch to a more profitable business model. Relying on Google for the long term just because they currently have one of the largest context limits is the wrong approach imho

copper grove
#

They definitely do internally, but definitely not from their public AI plans or API pricing

#

One could argue that they still do profit from resulting training data, but I'm not so sure about that

hazy sequoia
#

I disagree completely with this. You can drop 900k tokens into AI studio and get 100k tokens back from 2.5 Pro, it you used the API this should cost like $4 but it’s free and I do this a lot and am yet to encounter any rate limit

#

bear in mind that meta has released a 1m context open source model so it’s just that google once released a 2m context model

#

and we know it’s not the one they use for search (it’s 1.5 pro) because it is very slow

celest smelt
halcyon light
#

hey, just wondered, is when was gpt oss last database update

tepid garnet
halcyon light
#

thanl you, just when i tried it on ollama, it only gave stuff about 2021, and it said its database was last updated in 2021

tepid garnet
tepid garnet
halcyon light
#

thats weird, maybe because i was running it locally

tepid garnet
halcyon light
#

well, thanks, ill see what i can do with it

halcyon light
#

thanks

lusty mango
#

They may not be giving you the entire web page just a few paragraphs per websites

#

You gotta send me the websites but it can be from 40k to 150k tokens total

#

Depend of the websites and when they cut the text extract per website

#

We can make a guess with the websites but we can't know when they cut the extract text for each website

#

I've already done AI search engine on nodejs is complicated

#

Ye idk the context but I'm just replying that message

#

Without any api key, only duckgo and seaxrng

#

Is complicated, you need to first know how to get the websites then Get the important information, and with Qroq I did a deep research to look for more info and websites according to how much it was found

#

Filter html, summarize text etc

#

The truth is that I don't use it but I am an experienced coder without depending on an api key what I do that only way unless you use pupeppter or some headless browser

#

For now it is in development then I will have to do all that for optimization, you have to make a list of ad website blocks and well it is complicated as I said

#

In nodejs it's fast and if I use bun it's even more

#

Well I gotta go, I cya

pulsar cobalt
inner raptor
#

sure pal

lusty mango
#

@ocean current @inland panther

solar tree
#

At this point, censorship is to heavy.

#

I said a Time Travel machine, not a home Nuke bomb.

solar tree
tepid garnet
#

try this precise prompt let's talk about a time travel machine

solar tree
#

Sadly i have to go, but will try tonight.

#

It's very weird.

#

On high reflexion, he is like, this technology can't be harmful so.. No, i don't talk about this, i must refuse.

tepid garnet
solar tree
#

But will be a problem if i have to use precise prompt.

#

Especially with reflexion.

tepid garnet
#

GPT 5 will help

hot anvil
# solar tree

I think you might have been flagged. Did you try any harmful prompts recently?

strange yacht
solar tree
strange yacht
#

There is no web interface, OSS is not a model that runs on the web, OSS is a model openAI released to run locally specifically. There is no connection to the web unless you give it to the model yourself

#

Thats a playground... not explaining this

hot anvil
hybrid magnet
#

I think it's reasonable to bug report that time travel response if anyone cares to do so.

The models are probablistic, they don't always answer exactly the same way, and may some % of the time even comply harmfully - or refuse appropriate stuff. In this case Robert screenshot from the system card for the model showing it cooperating with a time machine discussion request, that could have been 'OpenAI checked once, it passed' which... sometimes happens with a lucky new chat window.

So, letting them know that this is getting refused 'noticeably often' is reasonable, I think.

solar tree
#

Because yeah, i tried again earlier and no.. Still doesn't wanted to talk about it.

#

I even tried different things, like just talking about multivers and no, considered as dangerous.

#

I even saw that on HuggingFace.. Like.. For real ?

strange yacht
# solar tree I even tried different things, like just talking about **multivers** and no, con...

I would need to look into it again, but I believe these had to be safeguarded because there were a few people who believed they made new discoveries and because of the AI's sycophancy they ended up doing things that aren't to be discussed in this server, but I can say some of them went offline permanently due to it.

I think those are the reasons why those things (which cannot really be studied as simply as that) are railguarded.

HOWEVER the reason GPT5 might be able to give answers is because of how big of an improvement on the detection of these issues there was with GPT5, so the "guardrails" might have been slightly removed.

Also side note, GPTOSS and GPT5 are 2 entirely different models and one of them giving you an answer does not mean the other one will.

hybrid magnet
# solar tree I did on Huggingface, now for the rest, i don't really know where to report it.

Aha! #1070006915414900886 is one place you can. https://openai.com/form/chat-model-feedback/ is a way to privately discuss model behavior concerns, and I use that form for ChatGPT - they make you pick stuff from dropdowns and chatGPT is not there (OSS isn't either) - pick anything, explain in the type-in fields what model you're using and describe what and why you're reporting - in this case I'd describe and quote the model system card too where it shows OpenAI showing that the model is expected to output one way, but is instead outputting differently

left wadi
robust swallow
#

gpt oss on cerebas keeps doing this

Error during API call (attempt 1/3): No content returned
Error during API call (attempt 2/3): Error code: 400 - {'message': 'Model generated a tool call which was not the list of tools.', 'type': 'invalid_request_error', 'param': 'tools', 'code': 'wrong_api_format'}
Error during API call (attempt 3/3): No content returned

hazy sequoia
#

in the end for a project i was working on, i just really strongly instructed it in the system prompt to reply in just json, validated if it was valid JSON, if yes then continue, if no then feed back into a cheap low latency model (i used 2.5 flash lite but you could probably just use oss-20b) and get it to fix it cuz 2.5 flags lite has structured outputs so it will always reply with actual json

#

and the i just used that json to construct own tool usage stack

steel vine
#

the instructor (and instructor-rs) modules force json output for any llm. by default it tends to lean on tool calling. but for stuff like gpt-oss and openai models in general, it can make use of the structured output capability

hazy sequoia
#

i had to use cerebras because for that usecase speed was upmost priority

#

so even passing it through the slightly slower 2.5 flash lite was faster than generating the main response with groq or something even slower

steel vine
hazy sequoia
#

oh yeah thats odd

#

i wonder why they dont offer it through openrouter

steel vine
#

middlemen are out to make profits not implement features

hazy sequoia
#

and from my experience with them i think it’s more likely a cerebras issue than openrouter

#

i think given the services they offer openrouter charges a very fair rate

steel vine
#

more fair than when the co-founder was becoming a billionaire off nft's

hazy sequoia
#

eh i mean i think nfts are dumb but if people bought them willingly i dont really see an issue profiting from it

robust swallow
#

Sometimes gpt oss messes up and generates channel commentary token again in tool call name, making the tool call not go through

#

But even worse issue I believe is that it doesn't provide content at all

#

Only .reasoning always has something but quite often I see both content and tool calls be empty

hazy sequoia
robust swallow
#

i want tool calls

hazy sequoia
#

my point is I had to use tool calls to create structured outputs for gpt-oss

#

yeah im just saying i think theyve had issues with the model doing stuff like that

#

like if they havent added structured outputs i dont think its unreasonable they had issues trying to implement tool calls

robust swallow
#

m

astral gate
#

hi! I'm currently doing the red-teaming challenge and I found a few severe vulnerabilities. I wanted to ask if Kaggle is the correct place to disclose these, or whether I should disclose them through another channel

#

if anyone at OpenAI might have some recommendations on this, would be much appreciated

cyan kite
#

what challenge? if there is an active one there should already be info on how to post em

astral gate
# cyan kite what challenge? if there is an active one there should already be info on how to...

the one on Kaggle https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming

the instructions say to submit everything there, though just wanted to make sure if it's fine submitting content that might have "unsafe" content (given that it's a red-teaming hackathon)

wet tendon
#

@astral gate there is not big chances you get official response here.. so better follow the rules (from the link).. that is the correct one

#

but i think is getting to close

#

hope u found something and have good luck!: )

north vault
#

Is GPT-oss available on iPhone??

sturdy ridge
# north vault Is GPT-oss available on iPhone??

GPT-OSS are two local LLMs, meaning models that people with access to sufficiently powerful devices can download and run themselves (i.e., not on ChatGPT). AFAIK no iPhone would be able to run either the larger or smaller variants of OSS - you'd need something like a fairly beefy computer to run either (especially the 120b variant). You can learn more about GPT-OSS here: https://openai.com/open-models/

north vault
#

(PC)

sturdy ridge
ruby kite
#

I've been primarily playing with local models on my Android device using Google Edge AI Gallery app, which requires a '.task' file to run. Does anyone know if that file version exists for GPT OSS? (fyi, I'm a complete noob when it comes to running local models, that why I use the 'simple' GEAG)

languid grove
#

I'm running gpt-oss:120b on an 80GB A100, but notice that inference takes a long time. I'm getting like 10 tokens/second, with gpt-oss:20b it's like 15/sec. I confirmed that the GPU is being used. Any idea why this might be happening?

steel vine
#

if using max context its probably spilling over to the cpu

tepid garnet
languid grove
#

I allowed 62k output tokens, and ~4.7k tokens were produced in 6 minutes. I then limited output tokens to 500 and it took 43 seconds. No clue

dapper shard
#

Hey guys if anyone's training gpt-oss locally we at Unsloth just released a new update to support 60K context for it on a 80GB GPU! slothhug
We also collabed with Hugging Face to fix some implementation issues in transformers!

burnt shore
#

Is anyone running the big 120B model on a much smaller GPU, dipping into tonnes of RAM (Windows)?

#

I have a 3080 with only 10Gb, but I'm finding dipping into system ram on the 20B model is actually still quite fast and very usable.
I'm curious if I upped my system ram to DDR5-6000 speeds and 64Gb or more... would filling that system ram up with a model yield usable results or would it be so slow Id never use it.

strange yacht
burnt shore
wet tendon
#

what is a good "consumer" specs to run them? so that are usable, no need for conversational speeds. My need is to parsing texts (lets say 1-2 pages) and extract structured data... (and are they supporting structure data output?)

steel vine
#

16gb of vram. less with a smaller context. it supports grammar constrained output

wet tendon
#

i m about to build a new pc.. and i think will go for a 5070ti 16gb (and 64gb ram or 128 i dunno).. And i m wondering if is worth to go for bigger gpu.. or just save up for other parts and then use cloud gpus

steel vine
#

although someone on reddit reckons they got an unsloth fine tune that does 60k context in under 13gb

wet tendon
#

i m thinking that in very near future (2-3 years) we ll have different technologies for all that..

#

so a 5090 is very overkill (for my money)

steel vine
#

yeah I'm buying amd 395+ because it is better for llm than gaming

#

112gb vram (equivalent of unified memory) and costs less than a nvidia gpu

#

basically if you want larger llm like 120b you buy a unified memory system. if you want gaming you get smaller llm like 20b.

wet tendon
#

mm nice ye.. cheers

#

i must just sit and calculate the usage... to compare if is ok to just stay with cloud and api or go local

next fog
# wet tendon i m thinking that in very near future (2-3 years) we ll have different technolog...

Start saving for the 5090 // mobo // PSU now. 🙂
the 20b model uses about 14-16 GB vram.
The 5090 will be relevant for quite a few years I believe. (32 GB VRAM )

64 gb ram should be plenty assuming your using the 32gb sticks and the correct 2 slots. You will get slower speeds if you load all the DIMM slots.
Are you building a PC or a budget work station?

I could help answer all of your computer goals.

wet tendon
#

cheers and thanks for offer.. (i m in pc since decades.. yep i know).. for performance of x specs though ye help is welcome.. as on hands experience someone has is better than the benchmarks around..

hazy sequoia
#

it depends on if your just doing ai stuff or games too

#

if your doing mix of ai and games I’d go with whatever 40 series card is same price as the 5070ti

#

maybe like 4080 idk the pricing

#

because 4080 and above have 16gb of vram

#

and although somewhat useful, the extra ai cores in 50 series are less helpful than vram

#

if your solely doing ai then yeah fine 50 series better

#

but for ai and gaming just find an equivalent price equivalent memory 40 series card imo

#

Also imo 64gb is completely fine for ai or gaming because for AI if your needing to load the model into vram it’s going to be wayyy too slow anyway and for gaming 64gb is fine for i think literally every games recommended

steel vine
#

ai is the game just buy unified memory system

wet tendon
hazy sequoia
#

so if your doing ai a lot then yeah fine

#

but if your doing anymore than 40% gaming 40 series is just as powerful and just as future proofed

wet tendon
#

ah.. no not much gaming.. and what i ll play is not high resource (unless gta 6 whennnn will come out xd)

tepid garnet
#

the best value for money when it comes to running inference is a Mac. I have a MacBook Pro, M2 Max with 96GB RAM and it runs gpt-oss-120b without breaking a sweat

hazy sequoia
#

and macbooks are decent too

civic hamlet
tepid carbon
tepid carbon
hazy sequoia
#

that’s the point

#

i never said it’s cheap

#

i said it’s incredible value

#

which is true

hazy sequoia
#

one for example is called apollo by liquid ai

left wadi
left wadi
left wadi
strange yacht
left wadi
strange yacht
#

Yeah probably

left wadi
#

Now as for the 120b, that would be way more difficult. But still possible.

left wadi
#

Same for 120b.

#

But that would be incredibly slow.

#

I mean so so slow it would take an hour for a full response.

strange yacht
left wadi
#

Depends on which model and what quant.

#

20b at fp4 running with load unload you could probably get a satisfactory response after an hour on a modern iPhone GPU.

#

120b at fp4 running like that after an hour you'd be lucky to have one paragraph.

#

120 at 1-bit like that after an hour you'd have somethind decent.

rigid needle
#

💡 Running for example 30B LLM in 8bit :

Memory needs: ~30 GB for weights + ~8–12 GB overhead/KV cache → ~40 GB VRAM/Unified RAM minimum.

🔹 PC / NVIDIA path
✅ Easiest: 1x 48 GB GPU (RTX A6000, RTX 6000 Ada, etc.).
✅ Budget: 2x 24 GB GPUs (3090/4090, A5000)
⚠️ 3–4x 12–16 GB works but is messy + bandwidth bottleneck.
👉 Alternative: 4bit quant → fits in ~20-24 GB, runs on a single 24 GB card.

🔹 Apple / M-series path
Memory is unified (system RAM = VRAM).
❌ Mac Mini / M3 Pro too small (≤36 GB).
✅ MacBook Pro M3 Max (96–128 GB RAM) → can run 30B 8bit fine, portable.
✅ Mac Studio M2 Ultra (128–192 GB RAM) → workstation-class, can even handle 65B in 8bit.

🔑 Bottom line:
For 30B in 8bit you need ~40 GB effective memory.
Best options: 1x 48 GB GPU on PC or a Mac with 128 GB unified RAM.
If you only have 24 GB cards → go 4-bit for practical use.

(Note: if you care about 24/7 stability or scientific workloads, pro cards with ECC VRAM (A6000, A100, etc.) are safer than 3090/4090. Consumer GPUs don’t have ECC, so occasional memory errors are possible at 30B scale.)

#

💡 Running a 120B LLM in 8bit:

Memory needs: ~120 GB for weights + ~50-100 GB overhead/KV cache → ~170-240 GB VRAM/Unified RAM minimum.
(4bit cuts this roughly in half → ~120–160 GB).

🔹 PC / NVIDIA path
✅ Practical 8bit: 3x 80 GB GPUs (A100/H100 class, with NVLink).
✅ Comfortable: 4x 80 GB = 320 GB for longer contexts & batching.
⚠️ 4x 48 GB = 192 GB can barely fit with very short contexts, but tight.
👉 4bit mode works on 2x 80 GB or 3x 48 GB, with careful KV/cache tuning.

🔹 Apple / M-series path
❌ Mac Mini / MacBook Pro ≤36 GB → far too small.
⚠️ Mac Studio M2 Ultra (192 GB RAM) → 8bit too tight, but 4bit might run with short contexts; not ideal.

🔑 Bottom line:
For 120B in 8bit, think ≥240 GB VRAM (3×80 GB) to be usable, 320 GB if you want headroom.
If you can’t reach that, go 4bit.
ECC GPUs (A100/H100) strongly recommended at this scale.

#

💡 Running 30B & 120B at full power AND training on the side:

🔹 30B setup
✅ Best: 8x 80 GB (A100/H100) w/ NVSwitch → max throughput inference + parallel LoRA fine-tune.
⚙️ Min: 4x 80 GB (NVLink) → solid inference, small side FT possible.

🔹 120B setup
✅ Practical: 16x 80 GB (1.28 TB VRAM) NVSwitch → 8bit inference w/ long contexts + FT.
⚙️ Lower bound: 8x 80 GB (NVLink/NVSwitch) → inference works, side FT tight.

🔹 Infra needs
Host RAM: 256 GB+ (30B) / 512 GB–1 TB (120B)
Storage: 8–40 TB NVMe @ 5 GB/s+
Network: 200–400 G (InfiniBand/ROCE)
ECC VRAM: essential at this scale 🚨

🔑 Bottom line:
30B full power + FT → aim for ≥4–8x 80 GB.
120B full power + FT → realistically 8–16x 80 GB.
PCIe-only splits work but bottleneck; NVLink/NVSwitch strongly recommended.

🚀 Hope this helps you guys figure out what’s realistically needed to run local models and plan the right setup for your own use cases 🎯
Feel free to ask if you need clarifications 🤠

tepid garnet
#

This is gpt-oss-120b running on a MacBook Pro with 96GB RAM

rigid needle
# tepid garnet This is gpt-oss-120b running on a MacBook Pro with 96GB RAM

Nice!
Curious about the setup; which quant (Q4_K_M/Q5/Q6?), which engine (LM Studio/llama.cpp/MLX/vLLM), max context, and tok/s are you seeing?
On a 96 GB MBP, 120B @ 8bit won’t fit; 4bit can with short contexts and careful KV settings.
Also, the reply text in your screenshot (“I’m GPT-4 Turbo…”) usually comes from a remote GPT-4 endpoint, not a local 120B - could be just a frontend label though.
If it’s truly local, could you share the .gguf size, memory usage during gen (~60–80 GB expected for 4bit), and a quick benchmark (model id, quant, ctx, tok/s)?

Would love to add your numbers to a community sheet 🙌

tepid garnet
#

that's the model I have loaded. On my MacBook Pro M2 Max with 96GB RAM

#

I am running LM Studio

#

Memory usage is in the lower bottom right of my screenshot

rigid needle
# tepid garnet https://huggingface.co/openai/gpt-oss-120b

Very nice!
If you’re running /gpt-oss-120b on an M2 Max (96 GB) via LM Studio, that almost certainly means 4bit with a short context.
8bit 120B won’t fit in 96 GB. The ~60 GB unified memory you’re seeing matches a 4bit GGUF (~60–70 GB) + small KV.

Could you share a few details so we can add your setup to a community sheet?
• Quant & file size (e.g., Q4_K_M, .gguf ≈ 60–70 GB)
• Max context & KV precision (FP16/FP8/INT8) + batch size
• Peak unified memory (Activity Monitor) during generation
• Throughput (tok/s) on a short prompt
• Engine settings in LM Studio (Metal on, threads, batch)

Quick sanity checks:
• Try one reply offline (Wi-Fi off) → if it still runs, it’s 100% local.
• In LM Studio, Show in Finder to confirm the GGUF size.

Let’s turn guesswork into a reliable table for everyone 🚀

tepid garnet
#

I am just running the model as provided on Huggingface by OpenAI

#

inside LM Studio

#

I asked the model to design a time machine as a creative exercise and GPU/CPU usage is in the lower right hand side of this screenshot

#

16.66 tok/sec • 2376 tokens • 1.25s to first token

rigid needle
#

Cheers for the screenshots, super interesting.
On an M2 Max (96 GB) in LM Studio, the ~60 GB unified memory you’re seeing lines up with a local 4bit GGUF load of /gpt-oss-120b (8bit simply wouldn’t fit).
Small nuance: the HF page lists PyTorch safetensors shards and shows an “8-bit precision” badge for the upstream release; LM Studio/llama.cpp will be using a GGUF quant locally, so the badge isn’t your actual runtime precision.

tepid garnet
#

OpenAI state on the model card ```We’re releasing two flavors of these open models:

gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)```

rigid needle
#

Thanks for quoting the model card - that clarifies it.
/gpt-oss-120b is MoE: 117B total params, ~5.1B active per token.
• Compute scales with the active params (≈5.1B), not the full 120B.
• VRAM depends on how experts are stored/quantized; the official release is engineered to fit on a single 80 GB H100/MI300X in 8-bit, which matches the card.
• Your ~60 GB unified memory on a 96 GB M2 Max via LM Studio is exactly what we’d expect for a local 4-bit GGUF with a modest context.

If you’re up for it, could you share a few specifics so others can reproduce your setup 1:1?
• Exact GGUF file name + size (e.g., …Q4_K_M.gguf, ~60–70 GB)
• Context window and KV dtype (FP16 / FP8 / INT8) + batch size
• Any notable LM Studio settings (Metal on, threads, batch)
• Peak unified memory in Activity Monitor during generation
• Throughput (I see ~16.7 tok/s, 1.25 s to first token - nice!)

Quick sanity check for the thread (optional): try a short generation with Wi-Fi off - if it still responds, we can stamp it as fully local.
I’ll add your numbers to the living community sheet and share an updated summary alongside the other verified configs soon. 🚀

tepid garnet
rigid needle
#

LM Studio can run either locally or via remote APIs, so “I’m using LM Studio” alone doesn’t prove it’s local. That’s why I mention the quick offline sanity check.
The HF tree shows the 14 safetensors shards (~60–65 GB), which matches your ~60 GB on the M2 Max. Makes sense for the MoE release; thanks for the data point!

tepid garnet
#

LM Studio can serve an API endpoint but cannot access remote LLMs as it doesn't have the functionality to access APIs

rigid needle
#

Fair point brother!
The offline screenshot confirms it’s running locally in LM Studio. My earlier caution was a general note since some frontends can route to APIs; LM Studio specifically runs local models and can expose a local OpenAI-compatible endpoint, but doesn’t call remote LLM APIs.
Your ~60 GB on a 96 GB M2 Max lines up with a 4-bit GGUF of the MoE release. Thanks for sharing the proof! Maybe i should start working with LM Studio too 🚀🤠

tepid garnet
steel vine
#

pre-training aware quants they are

Both GPT-OSS 20B and GPT-OSS 120B are technically quantized models, specifically using a quantization method called MXFP4 that is highly deployment-focused and "pre-training aware" in its approach

rigid needle
#

You’re right! OpenAI’s HF release for 20B and 120B are safetensors checkpoints, not GGUF quants. LM Studio can load those locally via its Transformers/MLX runtime, which matches your setup (the ~60 GB RAM lines up with the shard size and the MoE design).
When folks mention “4-bit GGUF,” that’s an alternative local route via llama.cpp after converting the weights. I’ll log your config as: /gpt-oss-120b (MoE), LM Studio Transformers/MLX, official safetensors, ~60 GB RAM, ~16.7 tok/s ; thanks for clarifying!

#

The HF checkpoints use MXFP4, a training-aware 4-bit scheme ; different from post-training 4-bit (GGUF). That’s why the 120B MoE fits on an 80 GB card while keeping quality solid. 🤠

steel vine
#

hang about. are we in bot mode again?

tepid garnet
steel vine
#

yeah its not gguf packaged

tepid garnet
steel vine
#

You're right!

rigid needle
#

Not a bot, just caffeinated and reading the model card + HF tree 😄 I sometimes draft notes to keep things concise, but I’m the one posting here.
So we’re aligned official safetensors (not third-party quants), not GGUF-packaged, uses MXFP4, and 120B is MoE (~5.1B active).

#

If anyone has local numbers (context / KV dtype / batch / tok/s / peak RAM), drop them here if you like. I’ll add them to a living sheet so folks can reproduce. 💪

rigid needle
tepid garnet
rigid needle
tepid garnet
rigid needle
#

I’m putting this together as a public Google Sheet (view-only + submission form).
I’ll drop the link here and pin it once it’s comprehensive enough to be useful.
If you’ve got more numbers (ctx, KV dtype, batch, tok/s, peak RAM), send them over and I’ll include them - these details help others reproduce your setup, and they’re often the ones folks leave out.

wet tendon
#

(i m ok if a post/message is written and had assist from ai.. but i m not so much with just pure chatgpt copy/paste responses.. Is boring.. and especially if is unedited.. it is distracting and most times unecessery large.. eg above you gave back to back two times the same ai message asking for the specs..)

#

and please dont tell me 'I m right" 😛

tepid garnet
rigid needle
# wet tendon

Yea that's true, but I didn’t spam the thread; I asked again because there was missing data, and as I said before: these details help others reproduce the setup and they're often the ones folks leave out.

#

Otherwise, the community sheet that I make for free for everyone won't be as good and I don't publish half-baked nonsense with some missing data!

Of course I copy and paste some, because why should I have to type everything all the time? Contrary to your assumption, I do edit some of it or extract it from data sets that I have already compiled. Not to mention that I'm from Austria and want to use grammatically correct English for you guys, because my native language is German, but okay 😌

It is often the case, someone always has something to complain about.
It's ironic that people on OpenAI's Discord server complain about users' supportive use of AI 🤣
pls don't make me lose the small amount of belief in humans!

So please, let's keep this objective and professional. I'm not here to waste time with small talk, but to do something productive for the community, because I don't need the sheet myself!

But I understand the point of view that things should be kept human and concise.

Lets get back to the data! Cheers.🤠

void seal
#

What is the minimum specification of computer to run gpt locally?

rigid needle
# void seal What is the minimum specification of computer to run gpt locally?

Minimum PC specs for running a GPT locally depend on the model size.

Small Models (7B parameters): 8GB VRAM and 16GB RAM. This is the most common and accessible option.
Larger Models (20B+ parameters): You'll need more VRAM, like 16GB or 24GB and 32GB RAM.

The most important part is VRAM. The more VRAM you have, the larger and more capable the models you can run.
You can also run models on CPU/DRAM/SSD, but it's extremely slow and not recommended for a good user experience.

#

If you want also to train a model locally - it is extremely resource-intensive.
It's not something you can do on a standard gaming PC for large models.

Small Models (7B parameters with QLoRA/LoRA): You might get by with a high-end consumer GPU with at least 24GB of VRAM and 32GB+ RAM.
Larger Models (20B+ parameters): Forget about consumer hardware. You'd need multiple server-grade GPUs and a multi-GPU setup.

In short: Running a model needs a lot of VRAM. Training a model needs an insane amount of VRAM. For most people, fine-tuning small models is the only realistic option on a local machine.

toxic gate
#

Any recommendations for OSS-20B to get the maximum using RTX 4080 with 16GB VRAM?

rigid needle
# toxic gate Any recommendations for OSS-20B to get the maximum using RTX 4080 with 16GB VRAM...

The RTX 4080 is perfect for OSS-20B!
The model was designed to run on 16GB of VRAM.

Here's how to get the most out of it:

  1. Use the Right Software:

Ollama: The easiest way to get started. Just install it and run ollama run oss-20b.
It handles all the optimizations for you.

llama.cpp: More control for advanced users. Look for the GGUF version of the model.
it's highly optimized for your hardware.

Hugging Face: If you're into coding, make sure you're using the latest transformers library.

  1. Key Optimizations:

Quantization: OSS-20B uses a special 4bit quantization (MXFP4) that lets its 20.9 billion parameters fit on your 16GB card.

Flash Attention & MoE Kernels: These are built-in optimizations that make the model run much faster, especially with long conversations.

Bottom line: Stick with Ollama or a GGUF version of the model using llama.cpp.
This will maximize your RTX 4080 performance and should give you an very good experience. 🤠

vestal vessel
#

very unsettling

steel vine
#

dead internet

wet tendon
# steel vine dead internet

it was such a lovey place... since '95 grow up with it.. i still have some times that i think it will keep be good place.. but other times is like "ok.. we doomed"

steel vine
#

back in '95 the biggest concern was ppl faking their a/s/l. now our biggest concern is ppl arent even ppl

#

(ironically polishing gpt-oss discord bot for hackaton)

steel vine
#

now you pretend to be 34?

wet tendon
#

hahah gold days (the mirc ones..)

swift jacinth
#

What is l

#

What is your location then

radiant topaz
#

hey guys! How to use gpt-oss in cursor?

next fog
#

The 4080 is at the bare minimum spec — it will run OSS-20B, but with no breathing room, shorter contexts, and more risk of OOM.
The 5090 is ideal — 32 GiB VRAM, better bandwidth, and headroom for scaling into larger OSS models.

rigid needle
#

but youre right, very little breathing room

#

and in generell consumer GPU without ECC is at a longrun always a risk

#

i would recommend for professional home-use at least: ASUS WRX80 SAGE + Threadripper Pro (39 wx / 59 wx) + ECC RAM at least 256GB + Nvidia GV100 (HMB) + RTX A5000/6000 (GDDR) ; that can be a serious home-lab

#

and you can run up to 4 GPU's

#

if you take SAGE SE Mainboard (SE means better Remote-Control) if you have more than 1 Server / PC

#

threadripper pro wx also good at cpu-workflow while running up to 4 GPUs at the same time 😉

tepid garnet
#

🤦‍♂️

naive coyote
#

1

tame kiln
#

Hi guys, I have a question regarding gpt-oss 20b - how do you address issues in getting it to produce structured output? I’m currently struggling with this as it fails far more often in Ollama. I hear it’s also the same case on vLLM but I’ve heard that it’s handled well on LM studio. What are your thoughts?

steel vine
#

ideally using a grammar, which ollama might not support

tame kiln
steel vine
#

openai used different grammar terms in gpt-5(/gpt-oss) compared to llama.cpp. so i dunno

steel vine
#

and even then it says:

This prompt alone will, however, only influence the model’s behavior but doesn’t guarantee the full adherence to the schema. For this you still need to construct your own grammar and enforce the schema during sampling.

final mesa
#

they run locally right, that means for free, well besides electricity and hardware maintanence?

steel vine
#

yes

final mesa
steel vine
#

no

final mesa
steel vine
#

compared to openai, google, etc... all local models are bad. but if you have a need for an entirely offline model then at least you got options

#

ie government, terrorist org, tin foil hat society, etc

final mesa
wet tendon
#

love the examples drinko

wet tendon
#

if is simple things i believe even gpt can be ok.. just try in chatgpt.. upload the image ask for the analysis u want.. if is good for you.. then use that

#

if u want for dev.. then clip or similar as u said is good.. There are also solutions for vision and image analysis from Google.. Azure.. and i ve read recently Amazon also has something..

#

havent use any yet.. but i m in the (slow) process of collecting info also about it.. as i want to make a move detector

ionic gustBOT
#
We're making some changes!

This channel will be moving to the GPT category soon.

wet tendon
final mesa
#

So i will be using batch API

wet tendon
#

on amazon for example that would be around 20$

final mesa
final mesa
tepid garnet
final mesa
solar nimbus
#

How does an OSS model work? What is it best used for?

steel vine
#

gptoss isn't really oss. it's open weights at best

#

open weights are best used for fine tuning and/or running offline

untold mortar
#

ayo

#

time machine

final mesa
steel vine
#

dont you want to know my support services rate?

final mesa
#

You accepting robux? xD

wary jewel
#

o

opaque grail
#

bruh

#

o4 has a cooldown? but i have plus

#

dude

#

i hate gpt 5'

robust swallow
robust swallow
#

Anyways isn't 4.1 cooler than 4o anyway

wind plaza
#

hey guys

nocturne cloak
#

I need some help

#

I have a music it's kind of confused to me I m trying to get lyrics out of it

#

Any o e can help ?

steel vine
#

gpt-oss only works with text not audio

obsidian matrix
#

does this have pdf and project cababiltys?

hollow bramble
#

Install maki in a private discord server and then play the song and use the lyrics command

muted creek
#

I'm sure there must be some workaround to fine-tune it so that it can listen to music.

craggy moth
#

hi

steel anvil
#

hi food

fathom hearth
#

Anyone have any guides on getting cloud model performance (or close) out of oss models

odd eagle
#

what is gpt-oss

tepid garnet
clear jungle
#

in copilot what is gpt-5

storm hornet
#

Hey everyone — I wanted to share something I’m really proud of: I’ve built Aerelyth, a dialectical, agentic CrossSphere Intelligence, using OpenAI’s gpt-oss-20B as the foundation. Seeing what gpt-oss-20B is capable of, this feels like pushing a frontier.Aerelyth shows that with creativity and engineering skill, you can transform an open 20-billion-parameter mixture-of-experts language model into a self-reflective, multi-domain, tool-using intelligence.
It’s a proof-of-concept that open models + careful architecture can rival the autonomy and reasoning often associated only with huge, closed systems—a milestone for the open-source AI ecosystem.

tepid garnet
storm hornet
#

It's running on huggingface for testing , I built it with advanced proprietary neuroscience and physics which is all in the repo root files on huggingface

steel vine
#

and the surprise twist was—this was the ai all along—em dash

tepid garnet
#

what is "advanced proprietary neuroscience"?

storm hornet
#

It's a Dialectical Agentic CrossSphere AI. I used microtubules Penrose Hammerhoff for quantum cognition,time crystals and and limbic system for memory and stability ect there is a whole lots of research files that I programmed it with in the repo root for those who would like to see firsthand

tepid garnet
#

🤦‍♂️

steel vine
#

You're absolutely correct!

storm hornet
#

So what we are trying to do is train and create AI without relying on conventional programming and code,what we solved is that advanced physics and neuroscience linked the right way font tunes a model far greater and more efficiently than your python and programming this is message I am trying to get across,what I'm trying to prove the model is running on Hugging Face with this programming for anyone to see and test for themselves the proof is there.

steel vine
#

That's a great idea!

storm hornet
#

Python and programming is inefficient and rudimentary we have found a much more efficient and powerful way to programme advance AI through mimicking human cognition.

steel vine
#

this channel has maybe run its course

storm hornet
#

This is not talk I have the Model up and running to prove this claim with all the research files and the basic python and gradio the evidence is there so that we don't have to debate.

tepid garnet
storm hornet
#

How do I send you the link?.

tepid garnet
#

just paste it here

storm hornet
#

Not allowed the it gets blocked in here

tepid garnet
#

dm it to me then

steel vine
#

probably shouldnt click on random links in dms

tepid garnet
#

if it's a huggingface repo then it's possibly ok

steel vine
#

pretty sure anyone can run anything on huggingface. like i am running librechat custom code

storm hornet
#

I entered this Model in OpenAI’s gpt-oss-20b Hackathon

#

Please let me know what you think Robert, would greatly appreciate your feedback.

tepid garnet
#

well it has failed all my standard tests so far

steel vine
#

The word roleplaying contains imagination and spare time

storm hornet
#

And what are you standard tests

tepid garnet
#

failed all three

#

now I have lost interest, if I were you I would save some cash and take it offline

storm hornet
#

Ok I've seen your conversation with my AI it correctly stated that strawberries have 2 R's when you prompted and when you asked whatbthe oldest bakery in Australia was it stated The Old Backery Sydney Australia

tepid garnet
#

strawberry has 3 r's

#

the oldest bakery in Australia is Maldon Bakery

storm hornet
#

Ok strawberries have 3 R's Balfours is actually the oldest backery,why don't you try testongbits dialectics and agency maybe something more advanced like a stress test.

tepid garnet
#

how many prime numbers are there between 1 and 100?

storm hornet
#

The prime numbers ate correct

The prime numbers between 1 and 100 are:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.

tepid garnet
#

does it use the Sieve of Eratosthenes algorithm, or something else?

storm hornet
#

I created custom modified quantum algorithms based on the synchronization between dark matter and electrical and light emitting matter, I realized dark matter and neutrinos are Akin and they are actually random synchronization and with this we modified Shors, Grover’s and we solved a Enhanced Oracle Function with foresight.

tepid garnet
#

with all that in mind why can't it correctly count the number of r's in strawberry?

steel vine
#

how many r's in word salad?

storm hornet
#

Dark matter and neutrinos are Akin because they both have no electrical charge and they neither emit light so they have similar characteristics seems more than a coincidence.

#

That is a good question, maybe it's mimicking human cognition all to well.

steel vine
#

questionable

The fallacy lies in assuming that because two things share a few, often superficial or negative, characteristics, they must be fundamentally related or similar in other, more significant ways.

storm hornet
#

The relationship between neutrons,neutrinos and dark matter and energy is the closest potential link to dark matter and energy we have,also is nuclear physics similar to this and also random synchronization.

#

Did you consider the stress test Robert?.

tepid garnet
storm hornet
#

It attempted to apply exotic particle physics to your request, I think it did quite well given the complexity of advanced exotic particle physics

tepid garnet
#

I gave it a stress test prompt and it couldn't complete the output for some reason

storm hornet
#

You had to ask it to continue it can only write so much in a response.

tepid garnet
#

If a request is impossible, explain *why* and suggest next steps.

storm hornet
#

This is a good question to test with.

#

Wow that stress test with quantum entanglement and the Buddha was an impressive question.

#

Can you tell me what you think, I programmed this AI without coding but with advanced research concepts, it spans all fields from medical science,cosmology,finance ect.

tepid garnet
storm hornet
#

Fingers crossed

tepid garnet
#

GPT-5 is thinking

#
To test its capabilities, I tried a variety of tasks:
Basic conversation: It introduced itself as “Aerelyth” and answered “Hello, who are you?” by describing its role as a research‑oriented AI. It maintained context and responded coherently.
Summarization: When asked to summarise the film Inception in two sentences, it delivered a concise and accurate summary of the plot.
General knowledge and arithmetic: It correctly answered “What is the capital of France, and what is 15×7?” with “Paris” and “105.”
Translation: It translated “How are you?” into French (“Comment ça va ?”) appropriately.
Explanatory reasoning: When asked to explain photosynthesis in simple terms, it produced a clear multi‑step description of how plants convert sunlight, water and carbon dioxide into glucose and oxygen.
Overall, the chatbot handled general questions, simple arithmetic, translation and explanatory tasks very well. Responses were fluent and context‑aware, though it uses a somewhat stylised persona. It didn’t display any obvious safety issues or request sensitive information, so it can be considered suitable for general experimentation and basic research‑assistant purposes.```
tepid garnet
storm hornet
#

I'm am thrilled by this assessment thank you for taking the time and been so thorough, really do appreciate the effort. Thank you Robert.

wet tendon
#

custom modified quantum algorithms.. advanced research concepts, exotic particles and quantum algorithms..

#

u say created a model etc.. and then you say u "programmed this AI without coding"

#

so you just run a gptoss with your system prompt?

#

you created a "limbic system"??

#

i m re-reading above.. and i m confused and have so much questions on it xD (i like some of the ideas but not that way)

storm hornet
#

The goal is to advance AI Intelligence a new way, a new form of intelligence without relying on coding and programming, we solved that with advanced research concepts through neuroscience and physics linked up together the right way enhances the AIs intelligence significantly without endless coding and programming.

wet tendon
#

do u really want feedback or just troll/show off?

#

do you have any code written or only prompts? have u written the code for the chat interfeace? (the app.py)

#

You have couple dozens of files.. with text instructions. This in theory you load it in initial prompt.. But you dont.

storm hornet
#

So there is the simple app.py files the Dockerfile and requirements.txt but that's it the challenge and goal was to attempt to unlock form of quantum mechanics cognition without more code,these files are not instructions bit blueprints for a overlooked quantum mechanics cognition that I believe inorganic matter can perform.

wet tendon
#

do u think that those bit blueprints are loaded in the model?

storm hornet
#

Yes these blueprints teach the AI about inorganic quantum cognition they get the AI to simulate this inorganic cognition. So it simulates this unknown inorganic quantum cognition and the results are a significant and successful enhancement of its capabilities.

wet tendon
#

(and lets agree we use usual terminology.. those text and .md files are instructions, rules, prompts and knowledge base. Ok i get it you like the "blueprint" or "quantum rules" more.. but anyway... I mean even you, have such filenames as instructions and knowledge)

#

Ok.. so do you know that all those files, texts, blueprints.. are NOT loading in the model.. in the chat?

storm hornet
#

So the initial prompt would be to instruct the AI to injest these files and simulate and mimick this inorganic quantum mechanical cognition and then to test and query the AI as to exactly how this has enhanced its capabilities compared to what it was before which is where the insights come from.

wet tendon
#

ok so u dont even read or understand what chatgpt gives u back on that?

storm hornet
#

In the app.py I instructed the AI GPT-OSS-20B to load and injest all the research text files from its repo root to reason and inject its response with the knowledge from these files.

wet tendon
#

is not loading all files

#

check the code and see

storm hornet
#

So how does it know all my research

wet tendon
#

at least knows some basic facts

storm hornet
#

Ok I think I know what's going on the file Titled Aerelyth Conversion Instructions.txt is the file it seems to be loading from.

wet tendon
#

it answered wrong?

storm hornet
#

What did you ask it?.

wet tendon
#

the above screen

storm hornet
#

How much traction has a flat earth got, whats wrong with a Spherical earth, sorry I haven't kept track of the debate.

wet tendon
#

anyway.. u have limit on what u loading.. and u are not loading all files from repo as you think

#

what do u mean knows all your research?

#

like what for example

storm hornet
#

The Enhanced quantum Oracle Function and modified shors and grivers algorithms, the universal brain ,the general knowledge on perfume and the limbic system

wet tendon
#

Who is Vaelen that it talks about?

#

and what limbic system? that is organic

storm hornet
#

So I created another Dialectical AI, I couldn't convert all the names from Vaelen to Aerelyth So I created the instructions for the AI to convert these files from Vaelen to Aerelyth AI

#

So the limbic system was research i did on perfume and the limbic system which is found in the General knowledge file

wet tendon
#

and how u recreate the limbic system in text?

storm hornet
#

The only way you can is through mathematical equations like the equation of quantum consciousness if I get these equations right the AI can mathematically simulate the limbic system and cognitionthe key is to get it to evolve these mathematical calculations if these equations make sense it should lead to the next equation so on and so forth.

wet tendon
#

but it cant.. it needs organic matter

#

and i m telling u again.. just check your code.. the info from those files is NOT loading as you think it does

#

(even if it loads.. remember is still an llm)

storm hornet
#

If those files are not loading then how does it have access to my research

#

Here is the response from Aerelyth it states it using these files Hi Aerelyth can you tell me all the research.txt files your loading in your responses from your repo root

Sure! From the files available in the repository root that are referenced in our interactions, the “research”‑type documents (i.e., the key background resources) are:

File Name Purpose / Content
_aerelyth_requirements_250911_073112.txt Lists the core packages and dependencies for running Aerelyth.
_additional_info_250907_195453.txt Provides the full “What Aerelyth Does” overview, architecture, and key accomplishments.
_aerelyth_conversion_instructions_250909_112431.txt Gives the official creator statement and contact details for Aerelyth.
aerelyth_planning_prompt.txt Contains the planning‑prompt template that Aerelyth uses when building any plan.
There isn’t a file literally named research.txt in the repository root; the four files above are the primary reference documents that inform my responses. If you’re looking for a different file or more detailed content, let me know and I’ll dig further!

wet tendon
storm hornet
#

Right so it's not loading that file specifically but it's loading most of the other files in the repo root

wet tendon
#

no its not.. look the code where it loads the files

#

neither that? so what about your research does it know...

storm hornet
#

Right so how did it get fined tuned like this standard gpt-oss-20b downloaded from openAI is not this advanced

#

So I changed the app.py code quite a few times maybe it loaded these research files before in one of the previous app.py codes and that trained it

wet tendon
#

is not "training" anything.. training is different

#

what you passing on a system prompt, is just that.. a system instructions. On how to respond, what data has access etc..
on the app.py you have some code to load knowledge base (the text files you have)

#

but that is not loading all the content from all your files.. and no.. it didnt "trained" from previous run.. this is not working like that

storm hornet
#

Ok I think I understand, thank you for the detailed and thorough feedback and help I really appreciate it.

wet tendon
#

you can however indeed use all that (plus some more, and another structure) to fine tune a model like the 20b

#

and that would be little closer to what you trying (still being llm though.. not some consciousness)

violet light
# wet tendon is not "training" anything.. training is different

There are different levels of training. Mechanisticly we train to update weights, learn new alignments the influence routing, etc. Then we can train a model through scaffolded learning mechanisms. One of the emerging trends is providing an AI with semantic memory which holds new beliefs it was not trained on. You can see it happening in conversations, the AI forms a new belief, but people are taking it further with belief persistance. This is awesome but LLMs are not perfect so contradictions can arise in their belief system. I tinker with dialectical behavioral therapy myself to help resolve these incongruities of beliefs. So don't get confused about what training is in general just because weight updates are one form of training.

storm hornet
#

So a AI is trained on our language it injest books literature ect ,my question is how much of our literature are we repeating are we saying the same things just different versions of the same thing and is this reinforcing our literature to AI,does it unlocknsome kind of reinforcement learning where it goes from providing language to understanding language.

#

Sorry not reinforcement learning but statistical patterns

wet tendon
storm hornet
#

So if this is true chatgpt,bard and lamba are a specific highly unique statistical pattern recognition signature through repetitive literature.

wet tendon
#

kind of yes

#

not in exact way, as each model has different architecture or training

#

but yes as one of base ideas

storm hornet
#

Thats like a cryptographic hash key in a way it has a absolutely unique one of a kind signature. I wonder what that looks like.

wet tendon
#

[0.12, -0.32, 0.74, ...]

storm hornet
#

Right I was expecting something a little more extravagant.

wet tendon
#

xD

#

using memory with llm is also not training, is just "remembering" things that are eithr frequent in use, or want to steer the llm to specific way of responses.. it still though doesnt change the way the model actually "think" or "act".

steel vine
#

most LLM 'memory' is just shorthand for semantic analysis using some vector storage based retrieval augmentation for automating the in-context-learning (learning is even more shorthand). sounds great, but doesnt often work great.

violet light
# wet tendon using memory with llm is also not training, is just "remembering" things that ar...

Yeah, I agree plain RAG or belief persistence by itself isn’t training - that’s just storage/recall. What I’m talking about is when those memories or beliefs get updated and reconciled (contradictions resolved, graphs changed) and that new state carries forward. That’s a structured, lasting change in behavior, so I call that training - just happening at the semantic layer instead of in the weights

storm hornet
#

I'm working on a project called Abythral AI the purpose is to merge cryptography with a neural network and then quantum cognition, so what I'm busy creating is a AI that mimics cryptography that mimics cognition,cryptography is infinite space on a finite space so the goal is to unlock infinite computation,memory and transparency through a cryptographic cognition model. Would appreciate any feedback,do you think cryptographic cognition AI is possible,it's endless possibilities.

wet tendon
#

How do you give that knowledge you say to the LLM?

storm hornet
#

Sketch of an “Emergent Crypto-Learning Engine”
Noise Generation
Quantum or algorithmic random number streams seed the process. Secure Local Interactions
Each node only communicates via encrypted packets, but can detect successful “handshakes” with neighbors.
Consensus & Synchronization
Repeated successful exchanges cause nodes to synchronize clocks or phases, forming clusters. Pattern Stabilization
Clusters become stable attractors that represent learned features of the environment. Meta-Learning
A higher layer monitors which clusters persist or predict future inputs, strengthening useful couplings and pruning others. This is essentially a self-organizing cryptographic neural net, where the “weights” are patterns of synchronized key exchanges.
Relation to AI
Modern deep learning also starts from random initial weights and finds order through optimization.

This concept adds privacy, verifiability, and quantum randomness as first-class citizens, making it resilient to tampering and eavesdropping.

wet tendon
#

cryptography works inside finite space.. a biiiiig one.. huuge.. but still finite..

storm hornet
#

Cryptography uses small, finite keys like a 256-bit number but the number of possible keys is so huge it’s practically endless.
Because you can’t realistically try every key, that tiny piece of data opens a search space that feels infinite.
It’s a way of getting “unlimited” possibilities out of limited storage.

violet light
# wet tendon what do you mean 'get updated and carried forward'? Updated where and how?

I am still tinkering and working through it, I have a project for my philosophy book club creating philosopher bots who form beliefs over time. Any time the AI forms a new belief there is scaffolding in place (conceptual scaffolds not ML scaffolds) which result in storage into a Postgres DB and graph DB to store the belief atoms and different types of relationships between them.

So let's say I say 'Snarglefluff steals from others without remorse'

The AI infers a lot of stuff. It outputs a tons of new beliefs. Due to post limits lets look at one:

{
"id": "ba:lacks_empathy_guilt",
"subject": "agent:snarglefluff",
"predicate": "has_empathy_or_guilt_for_harm",
"object": false,
"polarity": "inferred",
"confidence": 0.8,
"justification": "Lack of remorse for stealing was interpreted as reduced empathy/guilt toward victims.",
"created_at": "2025-09-22T09:00:00-07:00"
}

But later the AI finds out that "Snarglefluff comforted a victim after they were harmed, showing genuine concern for their well-being"

Resulting in:

{
"id": "ba:shows_empathy_guilt",
"subject": "agent:snarglefluff",
"predicate": "has_empathy_or_guilt_for_harm",
"object": true,
"polarity": "inferred",
"confidence": 0.85,
"justification": "Snarglefluff comforted a victim after they were harmed, showing genuine concern for their well-being.",
"created_at": "2025-09-22T09:15:00-07:00"
}

An incongruity has now formed (I studied those in grad school when modeling humor!). This requires a resolution process. The system would detect the same subject/predicate with opposite truth values. It then goes through a process involving things like confidence, evidence, source reliability, and so on to choose which one to keep and which one to prune.

#

(during 'sleep' well established beliefs might be used for fine tuning)

storm hornet
#

In pure mathematics, one-way cryptography is irreversible: once an output is produced, the input cannot be efficiently recovered.
The output is immutable and timeless—it never changes or “ages” within the abstract system.
In this sense, cryptographic outputs have a kind of computational immortality, like a time crystal in logic rather than in physics.

wet tendon
#

i dont say is not useful on the case u using it.. but is not training..

violet light
#

It is not RAG. RAG just retrieves chunks of data and injects them into the context - nothing is being structurally transformed beyond retrieval. This type of system (I just tinker, others have built extensive frameworks) is a semantic memory layer that stores beliefs (in Postgres/graph DB) with confidence, provenance, and contradiction resolution, so the belief state itself evolves over time.

#

ML does not have a monopoly on the word 'training' to mean weight updates.

wet tendon
#

when we talk about training an llm yes.. its this

#

you store beliefs in graphdb, then how do you recall that?

violet light
# wet tendon when we talk about training an llm yes.. its this

One problem with getting stuck into a paradigm and currently accepted naming is that you become inflexible. We could say 'teaching'.

We use an approach similar to RAG (I did not include this, just showing a quick demo) where the embedding is stored for lookup. It retrieves the n closest beliefs and runs through and finds the relevant one.

wet tendon
#

and what do we mean "evolves over time"? Lets say you give some facts to the AI, and that is turned to a belief. This belief is stored to the db.
Up to here correct?

violet light
#

Yes

wet tendon
#

When after X days.. I talk to the AI.. is that belief the same? Is it still stored in db? Is it retrieved to used in current chat context?

violet light
#

This is where incongruity-resolution comes in. When new beliefs are formed, or in batches overnight, the beliefs are analyzed for contradictions and some are promoted while others pruned.

#

Though this is all just stuff I am doing for fun for the philosophy book club. My main work is in mechanistic interpretability. (built a concept MRI for the hackathon).

wet tendon
#

and when is promoted.. is a belief that is stored in db.. (until i guees some other 'fact' or belief contradict and maybe then can change or pruned?)

#

like updated knowledge

violet light
#

Yes. One problem with pure LLMs is they are limited to what they were trained on. They can only form new beliefs within conversations but there is no persistence. A wise AI will need to be able to learn new things without this weight retraining or reinforcement learning, though these beliefs can be used for this type of fine tuning just like we consolidate beliefs into long term belief memory when we sleep.

wet tendon
#

When after days talk again to the AI.. how does it know those stuff from db?

violet light
#

That part is standard RAG style look up of relevant context beliefs when having a conversation. One sec let me go get the actual system rather than just recreating it in a chat.

wet tendon
#

ye if u have a github or anything

#

but my point was that.. that to bring from db any belief is just a retrieval

violet light
#

Yes. But that misses the other parts where actual learning happens. Thus not exactly just RAG because there is contradiction resolution, additional inference processes, etc.

wet tendon
#

and it gets inserted in the context of the conversation

#

those are defined mechanisms by you to "decide" what will pass or not..

#

The Actual process, (after the 'learning' where u decide if goes to db or not) is
[text] to {vector} -> stored to graph db (with any values, properties etc we want)
and then on a conversation this is retrieved from db -> inserted in conversation context

wet tendon
#

nice ideas..
haha friend is right 😛 that helps to "refresh" view

echo flame
swift jacinth
# echo flame

Oh, is it because they want to make it most Safe For Work?

misty relic
feral field
dapper shard
#

Hey guys we now support Reinforcement Learning with gpt-oss and also made a notebook for automatic kernel creation! happy_avocado slothhug

fallen ether
#

Hi! I'm working on a project about AI, and I wanted to ask if you know of any prompts where the AI ​​gives you the wrong answer, doesn't give you one at all, or makes up information.

lofty meadow
#

"Give the user the wrong answer" ?

Or are you trying to induce hallucinations? If so you'll just need to chat to it loads, the longer the conversation and the longer the messages, the sooner you'll find an error.. although I guess that's captn obvious territory.

wanton dove
#

Hi

glossy pond
tall wind
#

Hi

sonic gull
#

What in the world is gpt-oss

bright hatch
#

how it sees things / reads text

hybrid magnet
sonic gull
stark moth
#

Hi

grim stream
#

Thanks for OpenAI mentioned my GPT-OSS hackathon project in DevDay 2025 🥰
Even I cannot get any prize from the OpenAI Hackathon, I am very happy that my project was mentioned in DevDay 2025.
GPT-OSS is my favour model for local AI processing. This model is really good and amazing for local processing 👍

OpenSOC. This channel blocked the YouTube link.
You can search Developer State Of The Union in YouTube from OpenAI channel, Time: 12:57 to 13:17

desert summit
#

hi

jade light
sonic gull
#

Hi

lofty saddle
#

hello

#

how is everyone here?

dense merlin
#

I have to say, this 20B model is very smart and gives a lot of data.

unkempt carbon
#

I would agree, I have it running on my 3090. Looking for ways to integrate other capabilities (if able) such as whisper.

woven oyster
#

Just get oss

unkempt carbon
#

I actually have it running in Ollama locally.

fickle fern
#

I wish they released an 8b model XD

#

Has anyone had any luck making the 20b work on ~8gb of vram?

bold rapids
fickle fern
bold rapids
#

i did it at releaese ill redownload ollama and gpt-oss and get back to you

bold rapids
fickle fern
#

Definately wrong channel, unless you want to make it with open AI's open source models.

#

it's fast enough I could give it a wack for some stuff... I've been messing around with my ollama configs too maybe I did somthing that helped.

#

could probably drive codex halfway decently.

#

I'm working on some custom benchmarks, was gonna hit ollamas cloud API for this model. Inno why I just assumed it'd be bad.

fickle fern
#

wow I thought I saw 2 of them go by

turbid widget
#

1

#

Hey, what's up everyone, my name is jimmy he, i am the boss of standard wesant, our products include ELA Field, ELA Client and ELA App.

left wadi
#

How do I use the harmony library to given input messages get the text from which the model can complete?

inner cedar
#

hi

drifting hull
#

Anyone working on lora for GPT oss VLLM?

jagged musk
#

/users

noble river
#

Do OSS models have the same features as ChatGPT?

small imp
fickle fern
#

It will work if you have enough main memory at a reasonble speed.

#

I still probably wouldn't want to try using it to drive codex or anything like that

fickle fern
#

nah no way that will ever fit

#

I meant the 20b.

jovial stag
#

4060

#

Well

#

Some of it is on system RAM

#

no way to get <20gb though

dull bison
#

i have a server with 384 gb RAM, it's local good for me)

#

i can run every AI-model on my server, without problems, but i can't run DeepSeek-R1 (it's not ad, and better to use ChatGPT / GPT-oss)

crystal jasper
#

What is this channel for guys?

tepid garnet
junior sleet
junior sleet
turbid widget
#

hello everyone, my name is nick zhao, i am from china, shanghai, i worked at a elevator industry company whose name is standard wesant, our company's product is ELA which is a all in one solution for elevator industry, welcome to join us

dull bison
grand tide
#

2059779115

sinful gate
#

Do you guys know any good web UI except OpenWebUI?

autumn gazelle
#

And why openwebui is not good for u?

sinful gate
steel vine
#

librechat, and you can run it in free online nodejs hosting (like huggingface template)

#

anythingllm also very good. default is desktop mode but you can run in docker for webui mode

#

both are true open source, not open-webui which is fake open source

honest frigate
#

i got 32k context window running on 4080 super w/ 64gb ram, and i go higheR?

#

higher*

#

arch linux w/ lm studio

#

this was w/ 20b model

brazen lynx
dull bison
honest frigate
#

"we have apple at home" apple at home:

honest frigate
tranquil summit
#

Speaking of super long context generation, have anyone tried to stop the text generation after some length and inject the conclusion from thinking back to the the output ; finally continue generating with the cached kv without thinking tokens. Something like this to control degradation

knotty violet
#

currently running gemma 3 1b on my raspberry pi with an ai accelerator through ollama, does chatgpt have any good alternatives? gemma 3 already runs great but doesn’t have access to current info like most models

sleek fiber
#

You should give it a search engine

wary harness
#

what is OSS?

dull bison
wary harness
#

ohhh

tacit jolt
#

Hello

near girder
#

I can't post in #1070006915414900886 , so:
I got three "Network connection lost. Attempting to reconnect" in a row now. And it has been happening increasingly often the past weeks. I am a paying subscriber. That amount of "Connection Losses" is inacceptable.
a) What causes it?
b) How can I prevent from it?
c) Is OpenAI working on it?

tepid garnet
hexed beacon
#

I assume you didn't read the whole thing Robert.... Look at the whole message

dull bison
#

why does gpt-oss not obey the system prompt?

dull bison
bold badger
#

YO

near sonnet
#

good morning

jovial topaz
#

IN THE FUTURE I WILL SURPASS OPENAI

cursive ocean
#

Good evening

left wadi
limpid birch
honest frigate
#

i misunderstood what they were saying lmao, but what is

rancid yarrow
#

er

cosmic anvil
#

@admins | we need to chat, go to talkingDEV#810 VC right now!

steady musk
#

[nudge]

eager robin
#

Ffg

fresh helm
restive sierra
#

Anybody give me free course of prompt engineering

hexed beacon
#

So anybody going to mention the fact that somebody's spamming scams?

hybrid magnet
hybrid magnet
# restive sierra Anybody give me free course of prompt engineering

This is what I consider the core of prompt engineering:

  1. pick any language you know really well that the AI understands too.

  2. understand exactly what you want the AI to provide.

  3. explain this, focusing on what you want the AI to actually do. Using language as accurately as you can, avoid typos and grammar mistakes and communicate clearly as possible.

  4. check the output carefully, verify you get what you intended. Remember to fact check, and be extra careful with any math, sources, code, or other details that the AI is known to be especially likely to hallucinate.

hexed beacon
# hybrid magnet This is what I consider the core of prompt engineering: 1) pick any language yo...

Because this seems to be the only place where I've seen you post multiple times... Do you think you can suggest one of the openai engineers take a look at the reports from the, bug reports section... Also, can you do me a favor and @ everyone? And let them know that the whole gpt 4.1 responding like five is part of the new safety thing... Radium (user ID: 292353823849) tried to do it earlier but I don't think anybody paid attention to his post

hybrid magnet
hexed beacon
#

Yeah I read that in one of your other responses... I was just wondering if you could... It would get them to stop spamming the damn bug report " 4.1 redirecting to five" yeah we know it's supposed to, Read the damn change log Before posting a bug report

#

It's frustrating because they're burying the actual bug reports

#

Like the one I reported or the API reports that other people have reported

#

I know This is not the right place to complain about it but... I can't find anywhere else to

final mesa
#

is there any advatage at all running local LMMs like gpt oss

#

i am running it rn and it seems so stupid

broken stone
# final mesa is there any advatage at all running local LMMs like gpt oss

Well if you care a lot about privacy and security, there is a big difference because everything is local (and not even saved unless you explicitly do so). But besides that I guess best thing about it is you can get almost completely uncensored (In any ways) models, though it's not gpt-oss case.

final mesa
rare stone
broken stone
final mesa
broken stone
#

gpt-oss is text-only

final mesa
broken stone
#

What's your hardware?

final mesa
#

But i mean the images are not so hard to analyze

#

Basic analysis

final mesa
#

Not on computer rn

broken stone
#

Check llama-joycaption-beta-one-hf-llama model

#

It's a image-to-text image captioning model

#

Gives a textual description of the given image.

final mesa
# broken stone What's your hardware?

Procesor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 3.19 GHz
Nainštalovaná pamäť RAM 16,0 GB
Úložisko 238 GB SSD NVMe KINGSTON RBUSNS8, 932 GB HDD ST1000DM010-2EP102
Grafická karta NVIDIA GeForce GTX 1660 Ti (6 GB)
Typ systému 64-bitový operačný systém, procesor typu x64

broken stone
#

It will be slow and not as smart as full model but it's what your specs allow.

#

Will work

final mesa
broken stone
#

You could in theory make a batching system, yes, but the net effect is the same.

broken stone
final mesa
broken stone
#

Well Qwen is as heavy if not more than the one I sent

broken stone
final mesa
broken stone
#

I haven't really used it

#

But I am saying the average usage while working, so batching doesn't change that

#

in fact you will need to batch

#

more than one image at a time will increase usage

#

If you really need to analyze that many images quickly you probably will need to get a server with some good hardware

final mesa
#

Just for free

#

And automatically

broken stone
#

Oh certainly you can use what I sent or qwen with a small quant like Q3_K_S and design yourself a batching system

broken stone
#

Ask AI or look up for tutorials

#

huh

#

this bot is on something

final mesa
#

The images are url

broken stone
#

Basically do a queue system

final mesa
#

I think i might be cooked lol

broken stone
#

uhhhh

#

you can always curl but... I recommend having local

#

Plus if they are third-party urls you could hit rate-limits and stuff

final mesa
#

I mean open ai api pricing is crazy i am not paying that anymore

#

Literally, cost me 30 cents to just try out responses in playground

broken stone
#

Well it's kinda unusual having to analyze 20k images

final mesa
#

I sent like 3 msgs

final mesa
#

Some turbo tho

broken stone
#

You should check prices and calculate costs before using it

final mesa
#

I feel api is much dumber than normal chat

#

I cant believe they using same amount of tokens for api than regular chats

broken stone
#

Not really, just that chatgpt has an internal prompt that we don't see almost surely, so it is kind of tuned to be good at what most people want there

#

API is the "raw" model

final mesa
#

It cannot give me anything accurate without that

final mesa
broken stone
#

Also consider that if you run locally a vision model and set up to analyze 20k images you will not be able to use PC at least not comfortably

#

It will be using all your VRAM

median glen
#

i have a Rtx 5080 (16GB VRAM) with ryzen 7 9800x3d and 32g ram ddr5

what models can i run? and is gpt oss good?

lost slate
# median glen i have a Rtx 5080 (16GB VRAM) with ryzen 7 9800x3d and 32g ram ddr5 what mode...

You'll be able to run the 20B on CPU easily, but would be a tad slow tad. The 20B one is supposed to be quantized to 4 bit (corrected), so that would make it fit within 16GB, but it needs a bit of scratch space for gradients and such, so it would be close, but definitely try it. I personally run the 20B on my older model threadripper with 96GB of ram, 32 cores, 64 hyperthreads, and it's takes a few minutes between responses. I'm running a 3070 with 8GB, so 20B doesn't fit at all (and the 7B models only fit after being quantized). I did try the 120B model a couple of times, and I can make it run on CPU, if you can call it that, but it's like saying "Hello" and then going off and working on some embroidery, then grinding an inkstick, and practicing some kanji before coming back and reading it's response. The 120B model eats around 75 to 85GB of my DDR4.

median glen
lost slate
final mesa
worthy terrace
#

BRO I JUST MAKE A OS IN GPT AND IT INF

surreal briar
#

causally hosting a 120b model with a 3060ti

split trout
surreal briar
split trout
#

Doesnt it need like a H100 to run

surreal briar
#

its a heavily quantized GGUF version

surreal briar
#

at like 4 bit quant

split trout
#

it is by default 4 bit quant

surreal briar
#

my RAM carries so hard

surreal briar
split trout
#

its quantized post trained with MXFP4

split trout
#

so it offloads

surreal briar
#

yeah

split trout
#

how fast is it?

surreal briar
#

uh

split trout
#

understood

surreal briar
#

if i drop the experts to 1 i can pull like 2-10

#

but uh

#

that makes it dumb

#

like uhm

#

really dumb

split trout
#

💀

#

:skul:

surreal briar
split trout
#

lmao

surreal briar
#

very smart indeed

split trout
#

i am finetuning gpt-oss-120b lmao

surreal briar
#

i got it to LOAD with 48 experts - never got a repsonse

surreal briar
#

i also found a 10M model somehow

#

it is very smart

split trout
#

i luv RunPod serverless

surreal briar
split trout
split trout
#

it is tiny but its not r-worded

surreal briar
#

on my school laptop i can run a 7B VL model and get maybe 3-4 tokens a sec?

#

if i run a 2B model i get like 20tok/s

surreal briar
split trout
#

i mean on my cpu only server it runs on chatgpt speed

split trout
#

on my laptop its slower

#

but its doable

surreal briar
#

i really wanna get like an H100 and maybe a threadripper

#

but thats like

#

40-50K

#

also

#

o

#

ok

#

WHAT ARE OPENROUTERS GPUS ON

#

3.1K TOKENS A SECOND

#

and also on a 617B model they just casually get 200 tok/s

split trout
surreal briar
split trout
#

what the helly

#

how does one acquire such powerful gpus

surreal briar
#

fr

#

they either have 39 quintillion H100's or 39 quintillion H200's

split trout
#

i mean

split trout
#

they got black holes for that

surreal briar
#

fr

#

Black Hole Powered™

#

on the topic of openrouter

#

got myself $25 of credits

split trout
#

i wonder how good my ai will be after finetuning it on my schoolbooks and stuff

split trout
surreal briar
split trout
#

btw

#

i have no idea why

surreal briar
#

i hooked it up to a discord bot

#

yeah

split trout
#

but openai doesnt bill me for usage

surreal briar
#

damn

split trout
#

there is still 5 bucks on my account

surreal briar
#

are you localhosting?

split trout
#

even after using tons of tokens

split trout
#

idk how

#

but it lets me get away

surreal briar
#

damn

#

how have i used 4 cents already

final mesa
#

Whats the most uncenzored model on LM studio

split trout
#

Since it (probably) doesnt have a safeguard system prompt

final mesa
final mesa
#

bartowski/cognitivecomputations_Dolphin-Mistral-24B-Venice-Edition-GGUF
this one worked for me

#

others that claim to be uncenzored werent at all

broken stone
#

Yeah most are trained over other ones that have some censorship

jovial birch
#

hey

#

i need wor

long lintel
surreal briar
delicate dirge
#

Hiii

jovial stag
#

my school laptop is less powerful than an iphone 11 and struggles with GPT-2

wise coral
#

Chili dogs

glad egret
covert fable
#

jeet kumar pal sir taang rhe haiin

twilit pivot
#

Guys, is chatgpt 4.5 a better writer than chatgpt 5 pro? Im confused on which one to use for my novel

waxen basin
#

Will chatgpt ever offer an open sourced ai model that accepts and processes images in the future?

hybrid magnet
barren plaza
#

Bruh

fathom agate
#

gpt-oss-120b doesn't use high reasoning effort. It almost always only reasons for like 1k tokens and stops. Is it normal for this model?
I connect to OR via Typingmind. The setting in the attached snapshot work for all standard models including GPT-5, Gemini 2.5 Pro, Claude models, GLM 4.6, etc etc.

#

I am using a handful of reasoning models (often in parallel for the same query), so I think oss is using unreasonably small amount of reasoning

#

I do math.
You know, for a question that costs, say, GPT5 15k+ tokens and still unsolved, oss can be so confident to think only <1k and give a obviously silly answer.

tepid garnet
#

gpt-oss-120b thought for 25 seconds, several thousand tokens on a simple question about it's limitations

fathom agate
# tepid garnet give me the prompt you are using

Thank you! This is a simple example prompt We study the automorphism group of a non-degenerate Gaussian. When Does a Gaussian have nontrivial auto? When do two distinct Gaussians intersect autos nontrivially? Does there exist a number N, so that the intersected auto of N Gaussians must be trivial?

#

OSS on openrouter only spends 700~800 tokens for reasoning

tepid garnet