#programming | Neuro-sama Headquarters | Page 523

olive sable May 16, 2026, 3:55 PM

#

evilNya

nocturne olive May 16, 2026, 3:55 PM

#

neuroThinkSmug Yes but that costs more money

#

I don't have that kinda spare money for non-GPU things

#

It's basically 100€ with shipping

stark needle May 16, 2026, 3:55 PM

#

fair enough

native surge May 16, 2026, 3:56 PM

#

Yes

stark needle May 16, 2026, 3:56 PM

#

i mean just cause u seemed interested in vr

nocturne olive May 16, 2026, 3:56 PM

#

neuroThinkSmug Well I just saw you doing silliness in VRC and it seemed silly and interesting

sage crag May 16, 2026, 3:56 PM

#

native surge Yes

yes

native surge May 16, 2026, 3:56 PM

#

sage crag yes

this loads NeurOhISee

#

Me not

#

Meow

#

Or femboy

nocturne olive May 16, 2026, 3:56 PM

#

yes

native surge May 16, 2026, 3:56 PM

#

Ish

#

Nerd

stark needle May 16, 2026, 3:57 PM

#

sage crag yes

@☆♡《♥fox🦊ying•idol~★》♡☆ this u? 🪞

sage crag May 16, 2026, 3:57 PM

#

https://media.discordapp.net/attachments/963967500171763843/1073733493479194624/attachment.gif

stark needle May 16, 2026, 3:58 PM

#

MA_GoodbyeBalls

#

@nocturne olive is the training time that bad or why u need so much gpu compute

sage crag May 16, 2026, 3:59 PM

#

stark needle @☆♡《♥fox🦊ying•idol~★》♡☆ this u? 🪞

nrr

#

https://tenor.com/view/fox-foxes-happy-cute-snow-gif-15010450582016741990

Tenor

nocturne olive May 16, 2026, 3:59 PM

#

stark needle <@1126307678818422794> is the training time that bad or why u need so much gpu c...

neuroThinkSmug I like going big batch at reasonable speeds, right now it takes one day for a full model at 512 batch size

#

The more GPU power I have, the bigger and better I can go faster

stark needle May 16, 2026, 4:00 PM

#

are u pretraining the model from scratch?

nocturne olive May 16, 2026, 4:00 PM

#

I could cut training time around in half with a second 3090 by running in parallel like Wispers

nocturne olive May 16, 2026, 4:00 PM

#

stark needle are u pretraining the model from scratch?

yes This arch doesn't benefit from finetuning

stark needle May 16, 2026, 4:01 PM

#

nocturne olive <a:yes:1154130283734511667> This arch doesn't benefit from finetuning

hmm rip

nocturne olive May 16, 2026, 4:01 PM

#

neuroConfused

#

It's only 100M parameters

midnight sigil May 16, 2026, 4:02 PM

#

neuroKufufu I should start my cybersec journey

#

tired of playing games all day and having nothing to do

nocturne olive May 16, 2026, 4:03 PM

#

neuroThinkSmug Either way I'd certainly appreaciate being able to reach higher or equal quality in less time

midnight sigil May 16, 2026, 4:05 PM

#

smh why neuroBucket me

#

neuroLookDown

sage crag May 16, 2026, 4:12 PM

#

https://media.discordapp.net/attachments/963967500171763843/1073733493479194624/attachment.gif

midnight sigil May 16, 2026, 4:13 PM

#

neuroBucket

obsidian mantle May 16, 2026, 4:14 PM

#

sage crag https://media.discordapp.net/attachments/963967500171763843/1073733493479194624/...

olive sable May 16, 2026, 4:52 PM

#

enub

fast pagoda May 16, 2026, 4:56 PM

#

sage crag May 16, 2026, 4:56 PM

#

neuroBucket

#

pluu

frozen igloo May 16, 2026, 4:56 PM

#

olive sable May 16, 2026, 4:56 PM

#

newliv

fast pagoda May 16, 2026, 4:57 PM

#

https://tenor.com/view/homura-akemi-anime-madoka-magica-dancing-anime-dance-gif-15513013343261194666

Tenor

fast pagoda May 16, 2026, 4:58 PM

#

frozen igloo

hook the pressure washer on it

opaque wharf May 16, 2026, 4:58 PM

#

frozen igloo May 16, 2026, 4:58 PM

#

fast pagoda hook the pressure washer on it

neurOMEGALUL
This is a screenshot I think from the 400 car battery styropyro video

fast pagoda May 16, 2026, 4:59 PM

#

ohh
that copper plate does give styro impression

#

i still havent watched that

sage crag May 16, 2026, 5:01 PM

#

fast pagoda https://tenor.com/view/homura-akemi-anime-madoka-magica-dancing-anime-dance-gif-...

https://tenor.com/view/evil-vtuber-3d-head-turn-concerned-gif-15495186653811856266

Tenor

#

dont think this puts us on speaking terms

#

find the birth certificate or im not buying you food

fast pagoda May 16, 2026, 5:01 PM

#

https://tenor.com/view/family-guy-meg-yuri-yuri-chow-dog-food-gif-14764504896860940861

Tenor

#

https://tenor.com/view/teto-mitosis-mitosis-teto-gif-11683636566672045121

Tenor

#

they didnt give me one because of this

sage crag May 16, 2026, 5:02 PM

#

neuroD

opaque wharf May 16, 2026, 5:03 PM

#

I do wonder, is mitosis basically clone?

fast pagoda May 16, 2026, 5:03 PM

#

yes

#

genetically identical

#

unless cancer

opaque wharf May 16, 2026, 5:04 PM

#

It does makes me wonder how do we manage the administrative process of lab specimen after mitosis

fast pagoda May 16, 2026, 5:05 PM

#

https://en.wikipedia.org/wiki/HeLa

HeLa

HeLa () is an immortalized cell line used in scientific research. It is the oldest human cell line and one of the most commonly used. HeLa cells are durable and prolific, allowing for extensive applications in scientific study. The line is derived from cervical cancer cells taken on February 8, 1951, from Henrietta Lacks, a 31-year-old African-A...

sage crag May 16, 2026, 5:05 PM

#

freezing

#

enub

fast pagoda May 16, 2026, 5:05 PM

#

look into cell lines and how thhey manage those shit is wild

opaque sigil May 16, 2026, 5:08 PM

#

fast pagoda https://en.wikipedia.org/wiki/HeLa

better hela enub

stark needle May 16, 2026, 5:08 PM

#

h

opaque wharf May 16, 2026, 5:08 PM

#

Ketchup curry? catdespair

fast pagoda May 16, 2026, 5:08 PM

#

mushroom ketchurp

opaque wharf May 16, 2026, 5:08 PM

#

Still cursed

stark needle May 16, 2026, 5:08 PM

#

t

fast pagoda May 16, 2026, 5:09 PM

#

t

#

p

#

s

#

:

opaque wharf May 16, 2026, 5:09 PM

#

tpsbeat me to it

rigid snow May 16, 2026, 5:09 PM

#

slash slash

fast pagoda May 16, 2026, 5:09 PM

#

n

rigid snow May 16, 2026, 5:09 PM

#

e

fast pagoda May 16, 2026, 5:09 PM

#

u

opaque sigil May 16, 2026, 5:10 PM

#

opaque wharf Ketchup curry? <:catdespair:1087521982817509426>

curry ketchup NOPE

frozen igloo May 16, 2026, 5:10 PM

#

opaque wharf Ketchup curry? <:catdespair:1087521982817509426>

I read that as ketchup carry and was confused neurOMEGALUL

olive sable May 16, 2026, 5:11 PM

#

opaque wharf Ketchup curry? <:catdespair:1087521982817509426>

You wouldn't understand

#

Its genuinly good

fast pagoda May 16, 2026, 5:11 PM

#

europe is speaking

be careful everyone

olive sable May 16, 2026, 5:11 PM

#

newliv

opaque sigil May 16, 2026, 5:11 PM

#

sam europe has spoken mhm

fast pagoda May 16, 2026, 5:12 PM

#

stay frosty

kind nimbus May 16, 2026, 5:13 PM

#

opaque sigil better hela <:enub:1163912230585237655>

Hela Gewürz Ketchup neuroHypers

sage crag May 16, 2026, 5:13 PM

#

no child of mine will be bad at osu

#

5 digit or disowned

rigid snow May 16, 2026, 5:17 PM

#

had to scroll almost to the bottom, you'll never guess what AX stands for

sage crag May 16, 2026, 5:17 PM

#

agentic experience

#

enub

rigid snow May 16, 2026, 5:17 PM

#

smart

#

am dumb

sage crag May 16, 2026, 5:18 PM

#

https://cdn.discordapp.com/emojis/1495756232047329291.webp?size=48&animated=true&name=NekoCatPetting&lossless=true

#

who are these default people

#

are you real

rigid snow May 16, 2026, 5:18 PM

#

as far as i'm aware yeah

sage crag May 16, 2026, 5:18 PM

#

ive seen this guy in like 90 places

#

on the street

rigid snow May 16, 2026, 5:19 PM

#

sage crag May 16, 2026, 5:19 PM

#

NOT real

rigid snow May 16, 2026, 5:19 PM

#

i think he used to be a typescript nerd youtuber now everythiong is ai

sage crag May 16, 2026, 5:19 PM

#

vibecoding has become such a negative buzzword that people are calling themselves "REAL agentic engineers" instead

#

ai hero

rigid snow May 16, 2026, 5:21 PM

#

what did enums do neuroD

sage crag May 16, 2026, 5:21 PM

#

neuroCatUuh w ai

sage crag May 16, 2026, 5:21 PM

#

rigid snow what did enums do <:neuroD:1134963217676906606>

typescript enum NOT js enum

#

neuroSad2

rigid snow May 16, 2026, 5:21 PM

#

respectfully

#

what the fuck is a js enum

sage crag May 16, 2026, 5:21 PM

#

string

#

enub

rigid snow May 16, 2026, 5:21 PM

#

NeurOhISee

#

dsfksdfksf

#

i think i came up with a really cool idea, you know how harnesses pin editable stuff, e.g. a todo a model edits with tool calls? what if we do that but with mood/tone, can use that to prompt tts tone and change the model expression instead of semantic analysis and should reflect the intent of the model better

#

i don't think for example sarcasm is easily picked up by semantic analysis systems

azure lynx May 16, 2026, 5:34 PM

#

depends on how obviously sarcastic it is. a transformer based system might be able to represent something similar, assuming there is enough context and it's something obvious like "I just love how easily this website made it to lose all my data." because people usually don't associate loving with losing things the transformer might actually end up representing in a way which implies sarcasm. but sarcasm is hard without verbal and/or visual cues.

rigid snow May 16, 2026, 5:38 PM

#

yeah that's what i'm trying to address, what if the model just tells its intent in a way that doesn't require tuning it to do so? obviously increases latency since the model has to do a tool call once in a while to change its tone but it shouldn't be by that much if you're running the model locally. i'm gonna have to test this of course since i have a suspicion they're gonna be using the system wrong/not at all but i probably can fix this with a good enough prompt

azure lynx May 16, 2026, 5:39 PM

#

i have my agent do a tool call to set the emote they want to do, reminding them if they haven't set it in a while.

rigid snow May 16, 2026, 5:39 PM

#

yeah or that

azure lynx May 16, 2026, 5:40 PM

#

i think it probably helps with keeping them "feeling" a consistent way. usually they're just "vibing" though.

stark needle May 16, 2026, 5:48 PM

#

sage crag <:neuroCatUuh:1131629092475768884> w ai

@mouse is this true

sage crag May 16, 2026, 5:48 PM

#

https://tenor.com/view/cute-kitten-alone-sleep-tucked-in-gif-14814077

Tenor

rancid ore May 16, 2026, 6:01 PM

#

did you know gifs can probably be used in computercraft?

#

they're very compact.

kind nimbus May 16, 2026, 6:12 PM

#

tiny edge May 16, 2026, 6:16 PM

#

guess what program i was using

#

that made my system OOM

olive sable May 16, 2026, 6:18 PM

#

You have 16gb, i can think of 150 different programs that could

opaque sigil May 16, 2026, 6:19 PM

#

list them

olive sable May 16, 2026, 6:19 PM

#

neuroNo

rigid snow May 16, 2026, 6:20 PM

#

blender and ue are the first things that come to mind

olive sable May 16, 2026, 6:20 PM

#

Yep

#

Unity too

rigid snow May 16, 2026, 6:20 PM

#

haven't really used it so

#

surely can't be worse than ue

#

glueless

opaque wharf May 16, 2026, 6:21 PM

#

Tab hoarder enub

olive sable May 16, 2026, 6:22 PM

#

Coukd also be an ai thingy or a mkdern game

#

Considering this is neurocord

#

Altho the cpu is too bad to run an llm tbh

kind fable May 16, 2026, 6:23 PM

#

Good code twitch good code

opaque wharf May 16, 2026, 6:23 PM

#

olive sable Altho the cpu is too bad to run an llm tbh

It doesn't stop me neuroTroll

olive sable May 16, 2026, 6:23 PM

#

neuro3

bitter phoenix May 16, 2026, 6:23 PM

#

kind nimbus

I don’t like how most distros preinstall the Nvidia OSS driver which for some people including me is completely broken, or make the choice of using the official Nvidia complicated as hell. It’s tarnished people’s belief that Linux can work with Nvidia to the point they think they need a special distro for it, you don’t need to reinstall your os. When I think of the steps you have to go through on Debian to install it I’d just say it’s impossible to do correctly due to how much they hate closed source

opaque wharf May 16, 2026, 6:24 PM

#

Cachy enub

#

Honestly people are overestimating the difficulty of bastardized Arch

olive sable May 16, 2026, 6:25 PM

#

Nvidia linux is pretty easy no? It worked for me almost perfectly on all cards i tried on both OS i tried

kind nimbus May 16, 2026, 6:25 PM

#

bitter phoenix I don’t like how most distros preinstall the Nvidia OSS driver which for some pe...

I had to do quite a bunch to get my Nvidia Hybrid Setup correcly working on NixOS but now it works

opaque wharf May 16, 2026, 6:25 PM

#

Like look at steamos. Those are Arch based too

bitter phoenix May 16, 2026, 6:25 PM

#

olive sable Nvidia linux is pretty easy no? It worked for me almost perfectly on all cards i...

It’s the myriad of implementations that don’t work and are recommended to everyone

rough bloom May 16, 2026, 6:25 PM

#

olive sable Nvidia linux is pretty easy no? It worked for me almost perfectly on all cards i...

YES can cause some compatibility issues but the install itself is easy

#

pretty much every distro has either a package or an installer thing for it

marsh coral May 16, 2026, 6:26 PM

#

rough bloom <a:YES:1162976978278809641> can cause some compatibility issues but the install ...

isnt nvidia notoriously hard to go install on linux?

#

i meann i run fedora

opaque wharf May 16, 2026, 6:26 PM

#

rough bloom pretty much every distro has either a package or an installer thing for it

Even gentoo? neuroTroll

marsh coral May 16, 2026, 6:26 PM

#

i have to do akmods or smth

#

and i have to like

#

manually sign the keys

#

since i have secure boot

olive sable May 16, 2026, 6:26 PM

#

You just need to set the DRM thing to nvidia thingy in the boot params and rhen everything works completely fine

rough bloom May 16, 2026, 6:26 PM

#

marsh coral isnt nvidia notoriously hard to go install on linux?

vedalNeuroHUH no

#

maybe decades ago

bitter phoenix May 16, 2026, 6:26 PM

#

kind nimbus I had to do quite a bunch to get my Nvidia Hybrid Setup correcly working on NixO...

It could also be due to the desktop environment as well… I’m just remembering that hyprland doesn’t work with Nvidia. But besides that just using the official driver & gnome has been fine

#

With better performance than windows on everything

marsh coral May 16, 2026, 6:27 PM

#

rough bloom maybe decades ago

i meann im a newgen in linux and all this tech stuff

#

so idk abt the past

#

all ik is that i switched to fedora 3 months ago

rough bloom May 16, 2026, 6:27 PM

#

idk how it is on Fedora but pretty much every distro is forced to have decent NVIDIA support due to how common the cards are

olive sable May 16, 2026, 6:27 PM

#

bitter phoenix It could also be due to the desktop environment as well… I’m just remembering th...

Hyprland works perfectly fine with my sold 3090 and my 4090

tiny edge May 16, 2026, 6:27 PM

#

olive sable You have 16gb, i can think of 150 different programs that could

ye but look on the program list, all of them seem normal

marsh coral May 16, 2026, 6:28 PM

#

i think my system is boutta go OOM

#

all my swap memory is full

olive sable May 16, 2026, 6:28 PM

#

tiny edge ye but look on the program list, all of them seem normal

You're sorting by cpu usage, we cant see the one uding most ram

marsh coral May 16, 2026, 6:28 PM

#

i shouldnt have let my model run 24/7

tiny edge May 16, 2026, 6:28 PM

#

im using 100% zram

bitter phoenix May 16, 2026, 6:28 PM

#

olive sable Hyprland works perfectly fine with my sold 3090 and my 4090

Last I remember it was suggested to use amd cards. That could have changed

bitter phoenix May 16, 2026, 6:28 PM

#

tiny edge im using 100% zram

Zram my beloved

olive sable May 16, 2026, 6:29 PM

#

Its still suggested, but nvidia works fine too now

rough bloom May 16, 2026, 6:29 PM

#

rough bloom idk how it is on Fedora but pretty much every distro is forced to have decent NV...

I can see secure boot being an issue depending on how your system is configured
very much optional though colonthree

tiny edge May 16, 2026, 6:29 PM

#

olive sable You're sorting by cpu usage, we cant see the one uding most ram

true i guess, but if you look on the list you can clearly see which one is the problem

#

it cant get more obvious

rough bloom May 16, 2026, 6:29 PM

#

tiny edge true i guess, but if you look on the list you can clearly see which one is the p...

that damn kswapd

#

you should kill it glueless

tiny edge May 16, 2026, 6:29 PM

#

its not that bad

#

the other program tho

olive sable May 16, 2026, 6:30 PM

#

Idk what heroic is.

tiny edge May 16, 2026, 6:30 PM

#

the most unoptimized program ever created

olive sable May 16, 2026, 6:30 PM

#

Is that the league launcher?

tiny edge May 16, 2026, 6:30 PM

#

olive sable Idk what heroic is.

epic games+gog+ some other launchers

olive sable May 16, 2026, 6:30 PM

#

NeurOhISee

bitter phoenix May 16, 2026, 6:30 PM

#

olive sable Idk what heroic is.

Epic games launcher, they didn’t want to put fortnight on steam so they made a launcher

tiny edge May 16, 2026, 6:30 PM

#

its medal

#

if you look there's a bunch of 600mb instances

#

it has a terrible optimization which causes

#

terrible memory leak

marsh coral May 16, 2026, 6:31 PM

#

omg my system just went OOM

#

how is 90% of my cpu being used

bitter phoenix May 16, 2026, 6:32 PM

#

marsh coral how is 90% of my cpu being used

Do you use live filesystem compression or something, open htop

opaque wharf May 16, 2026, 6:32 PM

#

marsh coral how is 90% of my cpu being used

marsh coral May 16, 2026, 6:32 PM

#

bitter phoenix Do you use live filesystem compression or something, open htop

idrk

#

might be bcz im running my agent model, brave browser, running a fedora kernel update, then an akmods for rebuilding nvidia and vbox, and also running feishu

#

that mightve overloaded the system

opaque wharf May 16, 2026, 6:33 PM

#

So it is AI enub

marsh coral May 16, 2026, 6:33 PM

#

uh oh its happening again

opaque wharf May 16, 2026, 6:33 PM

#

opaque wharf

I was joking when I send this

marsh coral May 16, 2026, 6:33 PM

#

neuroSob

opaque wharf May 16, 2026, 6:34 PM

#

But its probably is the AI

marsh coral May 16, 2026, 6:34 PM

#

opaque wharf But its probably is the AI

usuallyy it doesnt jump this high

#

my ai inside a podman container on a separate account

bitter phoenix May 16, 2026, 6:34 PM

#

marsh coral uh oh its happening again

I just think of what could be causing the OOM and do pkill thing

opaque wharf May 16, 2026, 6:34 PM

#

By building, did you mean compile?

marsh coral May 16, 2026, 6:34 PM

#

tho it is running as a quadlet systemd service

minor crag May 16, 2026, 6:34 PM

#

I was thinking that training a 12B transformer on CPU would use all my 16GB of ram and set my CPU on fire

But I've done the math and using my backprop and forward pass optimisations it should use less than 3.5GB of memory (3GB for actual weights then maybe 500Mb for the CNN) and maybe 60 or 70 percent of my laptop CPU

So this seems quite promising

marsh coral May 16, 2026, 6:34 PM

#

bitter phoenix I just think of what could be causing the OOM and do `pkill thing`

it mightt be bcz im using akmods on nvidia and vbox

opaque wharf May 16, 2026, 6:34 PM

#

Because compiling will make CPU go brrr too

marsh coral May 16, 2026, 6:34 PM

#

since the kernel update js finished

#

tho like sometimes the system jumps to 99% CPU being used all the sudden

#

and the memory/RAM use spiking

#

then returns down to only 20% of CPU used

minor crag May 16, 2026, 6:35 PM

#

minor crag I was thinking that training a 12B transformer on CPU would use all my 16GB of r...

Plus however much ram I need for the KV cache, context window, and all that junk

marsh coral May 16, 2026, 6:35 PM

#

ok i think ik why cpu is jumping

#

cuz of akmods

umbral wigeon May 16, 2026, 6:36 PM

#

What will you do if this guy(me) PR you 80000 LOC

marsh coral May 16, 2026, 6:36 PM

#

rebuilding the modules

umbral wigeon May 16, 2026, 6:36 PM

#

umbral wigeon What will you do if this guy(me) PR you 80000 LOC

70 files changed

#

-10000 LOC

bitter phoenix May 16, 2026, 6:37 PM

#

minor crag I was thinking that training a 12B transformer on CPU would use all my 16GB of r...

What I did for free was put my model on google drive; open google colab and clone my repo and train using their free Nvidia tesla gpus and when the account was out of hours I would push the result to Google drive and sign into colab with an alt and then pull it back for more training again and again and pull it back to my computer when done, also make sure you’re using the best optimizer like Muon instead of AdamW and there’s no software bottlenecks

#

Also I’ve done my own testing: GQA > MHA, decoder-only > encoder-decoder > encoder only

#

Was pretty much already known but I wanted to see if there were any cases where it wasn’t true, but no

#

Just use GQA decoder only bf16 Muon

minor crag May 16, 2026, 6:39 PM

#

bitter phoenix What I did for free was put my model on google drive; open google colab and clon...

That seems like a lot of effort when I just said I can train a 12B model on my laptop

And I'm gonna have to test different optimisers to find what works best with my architecture

bitter phoenix May 16, 2026, 6:40 PM

#

minor crag That seems like a lot of effort when I just said I can train a 12B model on my l...

seems like a lot of effort
It was faster

bitter phoenix May 16, 2026, 6:41 PM

#

minor crag That seems like a lot of effort when I just said I can train a 12B model on my l...

I’ve been working out the best transformer setup for a while

minor crag May 16, 2026, 6:41 PM

#

bitter phoenix > seems like a lot of effort It was faster <:neuroSensei:1457993233547137125>

But it requires google of all things

bitter phoenix May 16, 2026, 6:42 PM

#

Once you have a good architecture you’re confident in you can just pretrain it and re-use that same model for multiple different things

#

I have an archive of 90m 500m and 1b models I made myself

#

The top property is data quality, recently it’s a lot of post-training research to get it just right

bitter phoenix May 16, 2026, 6:43 PM

#

minor crag But it requires google of all things

Their.. alright

stray dragon May 16, 2026, 6:43 PM

#

stark needle t

hi phrrrrr fox rectangle flag

stray dragon May 16, 2026, 6:43 PM

#

fast pagoda t

hi afunyun

stark needle May 16, 2026, 6:44 PM

#

minor crag I was thinking that training a 12B transformer on CPU would use all my 16GB of r...

what transformer

minor crag May 16, 2026, 6:44 PM

#

stark needle what transformer

?

stark needle May 16, 2026, 6:44 PM

#

architecture

minor crag May 16, 2026, 6:44 PM

#

bitter phoenix Their.. *alright*

Nope I have ethics

minor crag May 16, 2026, 6:44 PM

#

stark needle architecture

Custom :3

stark needle May 16, 2026, 6:44 PM

#

i doubt it would run even with heavy gradient checkpointing

bitter phoenix May 16, 2026, 6:45 PM

#

12b is huge

stark needle May 16, 2026, 6:45 PM

#

bf16 12b is like 24gb

#

add the two momentums of adamw

#

and ur dead

#

and we didnt even consider activations

bitter phoenix May 16, 2026, 6:45 PM

#

Might want to think about the evals to see if that’s really necessary

stark needle May 16, 2026, 6:46 PM

#

and flash attention doesnt exist on cpu in pytorch

minor crag May 16, 2026, 6:46 PM

#

I'm using a mix of optimisations with one being 1.58 bit weights so it takes up just about 3GB for the entire model weights

And I'm making the model bigger because I can just afford the compute and it'll offset the loss of using 1.58bit weights and some of the other optimisations

stark needle May 16, 2026, 6:47 PM

#

pretraining in 1.58 bits seems like a very bad idea

bitter phoenix May 16, 2026, 6:47 PM

#

minor crag I'm using a mix of optimisations with one being 1.58 bit weights so it takes up ...

A 1 bit model? That’s super cool

minor crag May 16, 2026, 6:48 PM

#

bitter phoenix A 1 bit model? That’s super cool

1.58 so Instead of 1 or 0 it is -1, 0, or +1 but the matrix multiply is still a lot cheaper since it only uses add, noop, or sub instead of multiply

stark needle May 16, 2026, 6:49 PM

#

fp4 training in the nvidia paper was already very unstable hence they had to do random hadamard rotations and keep the amax history and invent nvfp4, otherwise it would just NAN

minor crag May 16, 2026, 6:49 PM

#

stark needle pretraining in 1.58 bits seems like a very bad idea

Havnt gotten far enough to see but I'm using a custom method instead of backprop so it will work (in theory)

bitter phoenix May 16, 2026, 6:50 PM

#

minor crag 1.58 so Instead of 1 or 0 it is -1, 0, or +1 but the matrix multiply is still a ...

And you’re writing the transformer and training logic yourself? I would love to see it to learn from as reference, if that’s alright. I heard Microsoft was already working on something similar as a prototype

bitter phoenix May 16, 2026, 6:51 PM

#

stark needle fp4 training in the nvidia paper was already very unstable hence they had to do ...

I don’t appreciate enough how my training is always stable, except when I was pushing a 100k model too hard and broke a bunch of stuff and it NAN’ed

stark needle May 16, 2026, 6:51 PM

#

bitter phoenix I don’t appreciate enough how my training is always stable, except when I was pu...

more like cause we use a lot of stuff that is proven to work

#

switch the friggin activation function to something random and ur dead

minor crag May 16, 2026, 6:52 PM

#

bitter phoenix And you’re writing the transformer and training logic yourself? I would love to ...

I'm writing from scratch in luau using no ml libs or gpu accelerations but sadly a lot of my methods are custom and I don't want my methods to be public since I spent a lot of time working on them

bitter phoenix May 16, 2026, 6:52 PM

#

minor crag I'm writing from scratch in luau using no ml libs or gpu accelerations but sadly...

Same here

stark needle May 16, 2026, 6:52 PM

#

does luau support simd intrinsics

bitter phoenix May 16, 2026, 6:53 PM

#

My goal is I don’t think what I’m looking for is possible with existing tools like ollama, everything I’ve seen who’ve leaned on ollama to ‘make their own model’ get no where

minor crag May 16, 2026, 6:53 PM

#

stark needle does luau support simd intrinsics

I don't think so

bitter phoenix May 16, 2026, 6:53 PM

#

So I write everything so that there can’t be any blockers

stark needle May 16, 2026, 6:54 PM

#

minor crag I don't think so

training is gonna take a very long while in that case

#

especially since the ops are quadratic with standard attn

#

and no simd

#

for a 12b model

minor crag May 16, 2026, 6:54 PM

#

stark needle training is gonna take a very long while in that case

That's why im using custom methods instead of backprop that is a lot cheaper

bitter phoenix May 16, 2026, 6:55 PM

#

stark needle training is gonna take a very long while in that case

It would probably use a 64k tokenizer, which is even slower than my standard 8K

stark needle May 16, 2026, 6:55 PM

#

minor crag That's why im using custom methods instead of backprop that is a lot cheaper

right but it would still have to traverse each single param

fast pagoda May 16, 2026, 6:55 PM

#

stray dragon hi afunyun

hi

bitter phoenix May 16, 2026, 6:55 PM

#

bitter phoenix It would probably use a 64k tokenizer, which is even slower than my standard 8K

For a hero run

minor crag May 16, 2026, 6:55 PM

#

stark needle right but it would still have to traverse each single param

Yea?

bitter phoenix May 16, 2026, 6:56 PM

#

You could probably record the loss, batches & hours and make a chart

#

To know how long it will take

true hemlock May 16, 2026, 6:57 PM

#

bruh gonna train a 12b model on basic fp int units with no simd 💀

fast pagoda May 16, 2026, 6:57 PM

#

see you in 203

#

0

true hemlock May 16, 2026, 6:57 PM

#

ipc going to hell

#

no

#

see you in 3020

fast pagoda May 16, 2026, 6:58 PM

#

lmao_deepfried

stark needle May 16, 2026, 6:58 PM

#

also if u wanna reach reasonable loss faster u should prob use a smaller transformer due to how scaling laws function

amber fractal May 16, 2026, 6:58 PM

#

true hemlock bruh gonna train a 12b model on basic fp int units with no simd 💀

skull

stark needle May 16, 2026, 6:59 PM

#

bitter phoenix May 16, 2026, 6:59 PM

#

stark needle

I have this image saved already lol

stark needle May 16, 2026, 6:59 PM

#

bitter phoenix I have this image saved already lol

chinchilla my beloved

fast pagoda May 16, 2026, 7:00 PM

#

stark needle

i got 2 flip flops available

rough bloom May 16, 2026, 7:00 PM

#

minor crag That's why im using custom methods instead of backprop that is a lot cheaper

wouldn't it converge a lot slower without backprop?

true hemlock May 16, 2026, 7:01 PM

#

bruh lmao even at the absolute best 4 IPC apparently it would still take you 40K years to train with reasonable sized dataset

#

assuming 16 cores

bitter phoenix May 16, 2026, 7:02 PM

#

Maybe trim out some irrelevant training sources if possible?

true hemlock May 16, 2026, 7:02 PM

#

true hemlock bruh lmao even at the absolute best 4 IPC apparently it would still take you 40K...

(assuming foward prop alone)

minor crag May 16, 2026, 7:02 PM

#

I'm not training the entire 12B at once

I'm training 500M first then once the loss stabilises I'm gonna freeze the first 500M, add 500M more, train it for 1 3rd the total time, unfreeze the original 500M 1 3rd the way through that run and train both together for 2 3rds of the total time

Then repeat multiple times over a few months until the model hits 12B

This combined with the many optimisations and replacements compared to a standard transformer will train the entire model in a few months on CPU

true hemlock May 16, 2026, 7:03 PM

#

what

#

uh

#

okay, what size of dataset are you aiming for

rough bloom May 16, 2026, 7:03 PM

#

minor crag I'm not training the entire 12B at once I'm training 500M first then once the l...

many optimisations and replacements compared to a standard transformer will train the entire model in a few months on CPU

minor crag May 16, 2026, 7:03 PM

#

rough bloom wouldn't it converge a lot slower without backprop?

It will be a tad slower but my replacement should converge

rough bloom May 16, 2026, 7:04 PM

#

tad slower is an understatement usually

#

there's a reason why everyone uses backprop kek

#

(for normal ANNs, that is)

true hemlock May 16, 2026, 7:05 PM

#

because training each transformers layer would not only fuck up generalization a bit, it would still take you years, possibly hundreds, AND assuming foward prop alone.

bitter phoenix May 16, 2026, 7:05 PM

#

minor crag I'm not training the entire 12B at once I'm training 500M first then once the l...

So just expand the model as you go instead of starting at full size? I’ve experimented with that, don’t remember the exact sizes but after quadrupling the size it began training at 4 instead of 10. Not sure if it helps to start small or if is just convenient to start with the original size

minor crag May 16, 2026, 7:05 PM

#

true hemlock okay, what size of dataset are you aiming for

Not to sure what size it will be since I haven't finalised the dataset yet

kind nimbus May 16, 2026, 7:06 PM

#

bitter phoenix It could also be due to the desktop environment as well… I’m just remembering th...

no i use KDE that should not be an issue

stark needle May 16, 2026, 7:06 PM

#

@bitter phoenix idk if it interests u but here google leaks gemma scaling laws https://arxiv.org/abs/2501.18914

arXiv.org

Scaling Laws for Differentially Private Language Models

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data...

#

page 6

true hemlock May 16, 2026, 7:06 PM

#

rough bloom there's a reason why everyone uses backprop <:kek:1437034151860768809>

i don't know why avoid backprop even, with batch feed and grad acc its like pretty much almost negligible compared to the forward pass

stark needle May 16, 2026, 7:06 PM

#

rough bloom May 16, 2026, 7:07 PM

#

true hemlock i don't know why avoid backprop even, with batch feed and grad acc its like pret...

that + good luck optimizing without gradients

bitter phoenix May 16, 2026, 7:07 PM

#

rough bloom > many optimisations and replacements compared to a standard transformer will tr...

A few months is still a long time, I would prototype for a while before committing because it might turn into your own legacy model with one realization or improvement. I also bake my tokenizers into the model files

bitter phoenix May 16, 2026, 7:07 PM

#

stark needle page 6

Ty

stark needle May 16, 2026, 7:07 PM

#

true hemlock i don't know why avoid backprop even, with batch feed and grad acc its like pret...

use SNN bro

minor crag May 16, 2026, 7:07 PM

#

bitter phoenix So just expand the model as you go instead of starting at full size? I’ve experi...

Yep I'm just expanding the size by 500m each time and idk if it'll help but in theory it should work

true hemlock May 16, 2026, 7:07 PM

#

rough bloom May 16, 2026, 7:07 PM

#

rough bloom that + good luck optimizing without gradients

your training steps will be faster but you will need 10x as many to converge, if it converges at all, so you end up spending way more compute

opaque sigil May 16, 2026, 7:08 PM

#

SNN SCHIZO

true hemlock May 16, 2026, 7:08 PM

#

rough bloom your training steps will be faster but you will need 10x as many to converge, if...

what gets me more

#

is the fact that they wanted to train on cpu

fast pagoda May 16, 2026, 7:08 PM

#

bitnet works because it has gradients on the backwards pass, there's a latent full precision copy of every single weight which is actualy what is being optimized

true hemlock May 16, 2026, 7:08 PM

#

with no simd

amber fractal May 16, 2026, 7:08 PM

#

ebon basin May 16, 2026, 7:08 PM

#

uuuh net 11 has MCP directly integrated on one side I like it but on the ohter side MCP is bloated

rough bloom May 16, 2026, 7:09 PM

#

bitter phoenix It could also be due to the desktop environment as well… I’m just remembering th...

vedalNeuroHUH Hyprland works fine with NVIDIA
used to be an issue because NVIDIA insisted on EGLStream but that also just hasn't been the case for years now

minor crag May 16, 2026, 7:09 PM

#

true hemlock is the fact that they wanted to train on cpu

Can't afford a good gpu in this economy so I'm optimising until what I have works

stark needle May 16, 2026, 7:09 PM

#

minor crag Can't afford a good gpu in this economy so I'm optimising until what I have work...

use kaggle tpu

true hemlock May 16, 2026, 7:09 PM

#

minor crag Can't afford a good gpu in this economy so I'm optimising until what I have work...

are you sure that not using the simd unit on the cpu is "optimizing"

rough bloom May 16, 2026, 7:10 PM

#

minor crag Can't afford a good gpu in this economy so I'm optimising until what I have work...

gonna be optimizing for a few decades neurowheeze

amber fractal May 16, 2026, 7:10 PM

#

true hemlock are you sure that not using the simd unit on the cpu is "optimizing"

evilDentge

fast pagoda May 16, 2026, 7:10 PM

#

minor crag Yep I'm just expanding the size by 500m each time and idk if it'll help but in t...

gradient-estimate variance grows with parameter count so 12b is going to have like 0 signal

true hemlock May 16, 2026, 7:10 PM

#

bare non-vector arithmetic units alone are slow as hell there's literally a reason AVX exist

ebon basin May 16, 2026, 7:10 PM

#

another uii JS directly in MAUI

opaque sigil May 16, 2026, 7:10 PM

#

has anyone checked whether the add/sub only "matmul" in bitnet is even cheaper than using tensor cores

stark needle May 16, 2026, 7:11 PM

#

opaque sigil has anyone checked whether the add/sub only "matmul" in bitnet is even cheaper t...

yes on gpu when packing it in tensor cores

#

with triton kernel

ebon basin May 16, 2026, 7:11 PM

#

and AOT for android

minor crag May 16, 2026, 7:11 PM

#

true hemlock are you sure that not using the simd unit on the cpu is "optimizing"

I've not even gotten to the point of optimising the code and I assume the compilers either uses simd or I can mod the compiler to allow for simd intrinsics

opaque sigil May 16, 2026, 7:11 PM

#

is that a yes for it being cheaper

amber fractal May 16, 2026, 7:11 PM

#

minor crag I've not even gotten to the point of optimising the code and I assume the compil...

AINTNOWAY

true hemlock May 16, 2026, 7:11 PM

#

aint no way bro

fast pagoda May 16, 2026, 7:12 PM

#

you would need a population size that exceeds yo momma's weight

#

that's impossible

stark needle May 16, 2026, 7:12 PM

#

https://huggingface.co/blog/1_58_llm_extreme_quantization

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

bitter phoenix May 16, 2026, 7:12 PM

#

minor crag Can't afford a good gpu in this economy so I'm optimising until what I have work...

I’ve made a 256k model with a loss of 4.9 just considering every way I can push it towards the right answers. There’s tons of things you can do with smaller models to reach decent results instead of going larger

amber fractal May 16, 2026, 7:13 PM

#

amber fractal <a:AINTNOWAY:1169590854160224288>

This is an insult to my efforts, and my stuff doesn't even properly learn yet. But at least it is fast despite the operating logic being in python

nocturne olive May 16, 2026, 7:13 PM

#

neuroConfused What be going on here?
veryNeuro Seems kinda entertaining

true hemlock May 16, 2026, 7:13 PM

#

also i still don't get the reason with avoiding backprop

olive sable May 16, 2026, 7:13 PM

#

evilNya

fast pagoda May 16, 2026, 7:13 PM

#

catEat

minor crag May 16, 2026, 7:13 PM

#

bitter phoenix I’ve made a 256k model with a loss of 4.9 just considering every way I can push ...

I could use a 3B model or whatever but with my current code I can train a 12B model before my deadline is due

nocturne olive May 16, 2026, 7:14 PM

#

neuroConfused Who told you that?

amber fractal May 16, 2026, 7:14 PM

#

I'm ignoring backprop because I have to, I can't say anyone else should at all do so

nocturne olive May 16, 2026, 7:14 PM

#

neuroThinkSmug You'd be better off training on free tier Google Colab, at least it has a GPU

stark needle May 16, 2026, 7:14 PM

#

just use ngram model

#

atp

bitter phoenix May 16, 2026, 7:15 PM

#

minor crag I could use a 3B model or whatever but with my current code I can train a 12B mo...

I would say there isn’t much reason to go farther than 3b if you aren’t coding or mastering multiple domains. Depends on what the inference is

amber fractal May 16, 2026, 7:15 PM

#

nocturne olive <:neuroThinkSmug:1187047540122722374> You'd be better off training on free tier ...

not GPU optimized (real)

stark needle May 16, 2026, 7:15 PM

#

bitter phoenix I would say there isn’t much reason to go farther than 3b if you aren’t coding o...

google and meta usually run ablations on 50-100m parameter models

#

on ~100B tokens

minor crag May 16, 2026, 7:15 PM

#

bitter phoenix I would say there isn’t much reason to go farther than 3b if you aren’t coding o...

I get there isn't a point but I can do it so I don't see why not

bitter phoenix May 16, 2026, 7:16 PM

#

stark needle google and meta usually run ablations on 50-100m parameter models

I prototype on 90m

rough bloom May 16, 2026, 7:16 PM

#

amber fractal I'm ignoring backprop because I have to, I can't say anyone else should at all d...

if your architecture is somehow incompatible with backprop or has a better way to train it then ye that's reasonable evilNya
that's just not the case for the average ANN YES

fast pagoda May 16, 2026, 7:16 PM

#

i dont think even hinton has gotten forward forward prop to work in anything transformer shaped at all let alone a 12b

sage crag May 16, 2026, 7:16 PM

#

https://media.discordapp.net/attachments/963967500171763843/1073733493479194624/attachment.gif

bitter phoenix May 16, 2026, 7:17 PM

#

minor crag I get there isn't a point but I can do it so I don't see why not

If you’re confident neuroHypers

rough bloom May 16, 2026, 7:17 PM

#

minor crag I get there isn't a point but I can do it so I don't see why not

the reason is that you can't do it kek

fast pagoda May 16, 2026, 7:17 PM

#

neuroBucket

opaque sigil May 16, 2026, 7:17 PM

#

fast pagoda i dont think even hinton has gotten forward forward prop to work in anything tra...

sounds like a noob evilSMH

nocturne olive May 16, 2026, 7:17 PM

#

veryNeuro Highly entertaining

amber fractal May 16, 2026, 7:17 PM

#

rough bloom if your architecture is somehow incompatible with backprop or has a better way t...

Yeah, I assume not a lot of people have the smarts to uphold the drain damaging takes evilDentge

nocturne olive May 16, 2026, 7:17 PM

#

Keep it up

minor crag May 16, 2026, 7:18 PM

#

bitter phoenix If you’re confident <:neuroHypers:1135051812722982922>

If there is an issue causing it to take more time then ill still have a working model but it just won't hit that 12B goal in time

fast pagoda May 16, 2026, 7:18 PM

#

tribiall brain dablage

sage crag May 16, 2026, 7:18 PM

#

written in luau

#

neuroBucket

fast pagoda May 16, 2026, 7:18 PM

#

https://tenor.com/view/disney-stitch-lilo-surf-fun-gif-12286198

Tenor

amber fractal May 16, 2026, 7:18 PM

#

amber fractal Yeah, I assume not a lot of people have the smarts to uphold the drain damaging ...

To this day I'm surprised the NN series has actually produced something that learns at all

rough bloom May 16, 2026, 7:19 PM

#

amber fractal To this day I'm surprised the NN series has actually produced something that lea...

is apparently

sage crag May 16, 2026, 7:19 PM

#

chat can i train 1t model on my athlon xp

rough bloom May 16, 2026, 7:19 PM

#

surprisingly easy

amber fractal May 16, 2026, 7:19 PM

#

True, the real battle is making it useful

nocturne olive May 16, 2026, 7:19 PM

#

sage crag chat can i train 1t model on my athlon xp

yes

stark needle May 16, 2026, 7:19 PM

#

sage crag chat can i train 1t model on my athlon xp

train a 1 trillion parameter bigram model

#

1 trillion bigrams

#

✅

nocturne olive May 16, 2026, 7:20 PM

#

neuroThinkSmug It's only gonna take 1 trillion years

opaque sigil May 16, 2026, 7:20 PM

#

rough bloom if your architecture is somehow incompatible with backprop or has a better way t...

auto differentiation one of the best things people have come up with FOCUS

sage crag May 16, 2026, 7:20 PM

#

512mb ddr2 ram

#

enub

fast pagoda May 16, 2026, 7:20 PM

#

i think

nocturne olive May 16, 2026, 7:20 PM

#

Clueless Just keep all the weights on disk

fast pagoda May 16, 2026, 7:20 PM

#

llms currently

#

being so bloated

#

has distorted what a "large model" is

#

so badly that people think a 12b is a small network

sage crag May 16, 2026, 7:21 PM

#

1b params

#

big number

amber fractal May 16, 2026, 7:21 PM

#

✅

stark needle May 16, 2026, 7:21 PM

#

fr who remember 500m param vit/cnn being huge

bitter phoenix May 16, 2026, 7:21 PM

#

nocturne olive <:Clueless:947671183103508530> Just keep all the weights on disk

Then you bypass the finite dram limit GETHIM

stark needle May 16, 2026, 7:21 PM

#

everyone was like holy shit

rough bloom May 16, 2026, 7:21 PM

#

fast pagoda so badly that people think a 12b is a small network

ye

fast pagoda May 16, 2026, 7:21 PM

#

GO BEYOND

nocturne olive May 16, 2026, 7:21 PM

#

neuroThinkSmug Is a 100M parameter NeuroSynth big?

rough bloom May 16, 2026, 7:21 PM

#

GPT-2 LLM

sage crag May 16, 2026, 7:21 PM

#

1b params in your mnist number recognition toy

rough bloom May 16, 2026, 7:21 PM

#

117M params kek

sage crag May 16, 2026, 7:21 PM

#

neuroPogHD

fast pagoda May 16, 2026, 7:21 PM

#

https://tenor.com/view/go-beyond-plus-ultra-all-might-mha-bnha-memes-gif-19247796

Tenor

#

nan

sage crag May 16, 2026, 7:22 PM

#

3 parameter

#

llm

#

vs

nocturne olive May 16, 2026, 7:22 PM

#

nocturne olive <:neuroThinkSmug:1187047540122722374> Is a 100M parameter NeuroSynth big?

neuroThinkSmug Surely it's not that big

fast pagoda May 16, 2026, 7:22 PM

#

yes/no/maybe

sage crag May 16, 2026, 7:22 PM

#

my child

fast pagoda May 16, 2026, 7:22 PM

#

2

sage crag May 16, 2026, 7:22 PM

#

youre right

#

2 param llm vs my child

bitter phoenix May 16, 2026, 7:22 PM

#

Is the baby coughing

sage crag May 16, 2026, 7:22 PM

#

@fast pagoda are you coughing

fast pagoda May 16, 2026, 7:23 PM

#

FlowerCatJAM

sage crag May 16, 2026, 7:23 PM

#

the answer is maybe

rough bloom May 16, 2026, 7:23 PM

#

I bet on the LLM

#

it is immortal

fast pagoda May 16, 2026, 7:23 PM

#

https://tenor.com/view/magic-8-ball-not-good-outlook-robot-dreams-gif-13153136483637167804

Tenor

rough bloom May 16, 2026, 7:23 PM

#

so it will win

#

by default

bitter phoenix May 16, 2026, 7:23 PM

#

I bet on the baby

sage crag May 16, 2026, 7:23 PM

#

which one has higher battle iq

#

neuroBucket

amber fractal May 16, 2026, 7:23 PM

#

I'm betting on the baby

fast pagoda May 16, 2026, 7:23 PM

#

always bet on DaBaby

sage crag May 16, 2026, 7:23 PM

#

what if its an english exam

fast pagoda May 16, 2026, 7:24 PM

#

sage crag May 16, 2026, 7:24 PM

#

https://tenor.com/view/uma-musume-umamusume-silly-urara-haru-urara-gif-5941918971645523248

Tenor

#

animal farm

rough bloom May 16, 2026, 7:33 PM

#

https://tenor.com/view/agnes-anvil-agnes-tachyon-pain-invincible-gif-2562559485872649862

Tenor

stark needle May 16, 2026, 7:35 PM

#

https://klipy.com/gifs/megumin-konosuba-10--k01KRS4J3TFZ3PV4CHA0XHZ2Y9M

Klipy

Megumin from Konosuba Eating

▶ Play video

#

uranium rods

#

mrrrrrp

opaque wharf May 16, 2026, 7:42 PM

#

I need to learn chinese enub

#

#

fickle rain May 16, 2026, 7:46 PM

#

Oops, forgor that I need to add it back :D

#

Yay, no more bugcheck

sage crag May 16, 2026, 8:11 PM

#

https://tenor.com/view/evil-vtuber-3d-head-turn-concerned-gif-15495186653811856266

Tenor

#

ye

#

restore stack

kind nimbus May 16, 2026, 8:24 PM

#

NeuroPoggers

#

if anyone wants this as well you can submit here:
https://claude.com/form/cyber-use-case

obsidian mantle May 16, 2026, 8:52 PM

#

what does it allow you to do?

#

write exploits? neuroMonkaOMEGA

glass flower May 16, 2026, 8:53 PM

#

obsidian mantle write exploits? <:neuroMonkaOMEGA:1140023752973881434>

technically YES

obsidian mantle May 16, 2026, 8:55 PM

#

surely they will only give it to the good guys and there will be no issues with this system glueless

glass flower May 16, 2026, 8:56 PM

#

catAsk anthropic give me access to mythos...

obsidian mantle May 16, 2026, 8:56 PM

#

neurOMEGALUL

#

use claude to get an exploit to access mythos to make ultimate exploits neuro5head

glass flower May 16, 2026, 8:57 PM

#

LULE i'll be honest i wouldn't be able to afford to run a single query on mythos anyway... all the claude models are so damn expensive

obsidian mantle May 16, 2026, 8:57 PM

#

well

#

is there anything that can stop them from making this thing stronger and stronger

#

they can just grow it more and more cant they

glass flower May 16, 2026, 8:59 PM

#

i mean.. thats the goal of all frontier labs LULE

#

google has deepmind

#

deepseek exists

obsidian mantle May 16, 2026, 8:59 PM

#

"mythos, destroy this country please" glueless

glass flower May 16, 2026, 9:00 PM

#

openai now has gpt 5.5-cyber that is like mythos

obsidian mantle May 16, 2026, 9:00 PM

#

fine tuned to do cybersecurity stuff?

#

or smth

glass flower May 16, 2026, 9:01 PM

#

yeah finding exploits. security stuff

obsidian mantle May 16, 2026, 9:01 PM

#

or is it like

#

it just becomes so strong it can do it without any specific tuning

#

and nobody knows how it does it

glass flower May 16, 2026, 9:01 PM

#

its for their glasswing thing

#

which is what anthropic does with mythos

obsidian mantle May 16, 2026, 9:02 PM

#

is idea of clashing glasswing vs mythos being worked on

glass flower May 16, 2026, 9:02 PM

#

shrug

obsidian mantle May 16, 2026, 9:02 PM

#

can it potentially make them godlike while they fight

#

because they train in the process or something

#

or since they are llms it wont work

glass flower May 16, 2026, 9:03 PM

#

openai and anthropic haven't had any new stuff they developed in a long time

obsidian mantle May 16, 2026, 9:03 PM

#

just stacking more islands?

glass flower May 16, 2026, 9:03 PM

#

any innovation in the AI space comes from Google or deepseek. and anthropic/openai just adapt the research into new models

#

they are currently more interested in turning a profit rather than developing new ideas

obsidian mantle May 16, 2026, 9:04 PM

#

hmm

#

makes sense

#

so for google this thing is like a side job they are not feeding off it

#

if i got it right

glass flower May 16, 2026, 9:04 PM

#

google is the reason gpt exists LULE they wrote the original paper on how to do it. but just didn't release a model

#

they have been doing AI for like 3 decades at this point

obsidian mantle May 16, 2026, 9:05 PM

#

but anthropic/openai are focused on this thing and have no other income basically

glass flower May 16, 2026, 9:05 PM

#

google is easily the safest bet in the AI race

#

anthropic and openai are just the shiny things... but true AI will come from google

obsidian mantle May 16, 2026, 9:05 PM

#

NeurOhISee cool

glass flower May 16, 2026, 9:05 PM

#

if they ever decide to actually take it serious LUL

#

deepmind has been doing AI research for a long long time

#

it also helps that google has the whole thing under their belt. they do inference, models and the hardware to run the inference

#

they do the whole thing. so they are under no risk of external forces. like if nvidia goes down they take the whole AI bubble with them.. but google would be unnaffected

obsidian mantle May 16, 2026, 9:07 PM

#

they have their own hardware? NeurOhISee

glass flower May 16, 2026, 9:07 PM

#

YES

obsidian mantle May 16, 2026, 9:08 PM

#

is it like fully their own thing

#

ai focused

#

cant buy it on amazon and shit

#

secret google chips

glass flower May 16, 2026, 9:08 PM

#

pretty sure ye

obsidian mantle May 16, 2026, 9:08 PM

#

damn

glass flower May 16, 2026, 9:09 PM

#

i mean.. even their open source models have gotten a lot better lately. gemma 4 is a beast overall

#

google is just not focusing rightnow on benchmaxxing so they look bad and might not be the best in agentic tasks... but they focus on things that people might use in their products.. like the AI search thingy, the video summary on youtube. and ondevice small ai's

frozen igloo May 16, 2026, 9:10 PM

#

Classic

rough bloom May 16, 2026, 9:10 PM

#

obsidian mantle secret google chips

YES TPUs

#

can rent on Google Cloud but can't buy

glass flower May 16, 2026, 9:13 PM

#

if someone gave me $10k dollars and i had to bet it all on a company to win the AI race. i would put all the money in google tbh Minamhm

#

they might be lagging behind the best and greatest models... but they always catch up

true hemlock May 16, 2026, 9:14 PM

#

i'd say they're winning in terms of real breakthroughs

obsidian mantle May 16, 2026, 9:14 PM

#

what about deepseek

true hemlock May 16, 2026, 9:15 PM

#

i dont consider llm benchmark scores any significant.

glass flower May 16, 2026, 9:15 PM

#

deepseek isn't self-sufficent. they rent gpu's thats the main reason i wouldn't... but yes they would be second to google

true hemlock May 16, 2026, 9:16 PM

#

true hemlock i'd say they're winning in terms of real breakthroughs

im talking about the deepmind team coming up with actually scientiically useful models like alphafold 2.

glass flower May 16, 2026, 9:16 PM

#

YES

true hemlock May 16, 2026, 9:16 PM

#

screw the llm part that's unimportant as hell

obsidian mantle May 16, 2026, 9:17 PM

#

is alphafold that chemistry protein thing or what is it

#

i think i saw some youtube thumbnail about it

glad path May 16, 2026, 9:17 PM

#

glass flower if someone gave me $10k dollars and i had to bet it all on a company to win the ...

yeah probably

true hemlock May 16, 2026, 9:17 PM

#

protein folding prediction

tropic spindle May 16, 2026, 9:18 PM

#

frozen igloo Classic

I had to read this 7 times to notice the difference

glad path May 16, 2026, 9:18 PM

#

true hemlock protein folding prediction

so it'd be good for designing drugs that work at a molecular level

true hemlock May 16, 2026, 9:18 PM

#

idk man. google deepmind is the ONLY one making real academic contributions in the ML field.

glad path May 16, 2026, 9:20 PM

#

it seems like it'd be pretty useful for medical research

true hemlock May 16, 2026, 9:20 PM

#

the rest are cashgrabbing with llm shit and most they do is just modifying the architecture a bit

opaque sigil May 16, 2026, 9:22 PM

#

glad path so it'd be good for designing drugs that work at a molecular level

https://www.isomorphiclabs.com/ they spun this off into its own company a while back YES

Reimagining Drug Discovery Process with AI - Isomorphic Labs

Isomorphic Labs is building a future where frontier AI can help to unlock deeper scientific insights, faster breakthroughs, and life-changing medicines.

mighty thorn May 16, 2026, 9:36 PM

#

true hemlock idk man. google deepmind is the ONLY one making real academic contributions in t...

“real” == unusable by the masses btw

#

Don’t let him fool you

#

SNN always stays closed and unpractical at scale

true hemlock May 16, 2026, 9:42 PM

#

mighty thorn “real” == unusable by the masses btw

papers out there

#

if you can't implement the backend, skill issue honestly

#

ThumbsUpCat

fast pagoda May 16, 2026, 9:44 PM

#

mighty thorn May 16, 2026, 9:44 PM

#

https://cdn.discordapp.com/attachments/1202474609518055424/1503566336679346277/IMG_9909.gif

opaque sigil May 16, 2026, 9:44 PM

#

SCHIZO

mighty thorn May 16, 2026, 9:45 PM

#

He’s just cranky that his field has irrelevant for 4 years

rough bloom May 16, 2026, 9:45 PM

#

it's fine

#

next year is the year of SNNs glueless

opaque sigil May 16, 2026, 9:45 PM

#

mhm

obsidian mantle May 16, 2026, 9:45 PM

#

Snn is science neural networks?

mighty thorn May 16, 2026, 9:45 PM

#

rough bloom next year is the year of SNNs <:glueless:1282337396230328425>

And Linux and communism and free hugs

#

Minamhm

fast pagoda May 16, 2026, 9:46 PM

#

stinky neuro network, it's what neuro uses, ask the green fog

opaque sigil May 16, 2026, 9:47 PM

#

obsidian mantle Snn is science neural networks?

very scientific YES

obsidian mantle May 16, 2026, 9:48 PM

#

I am getting caveman vibes again vedalDespair

stark needle May 16, 2026, 9:50 PM

#

bro u dont know

#

mixture of snns

#

✅

stark needle May 16, 2026, 9:51 PM

#

mighty thorn He’s just cranky that his field has irrelevant for 4 years

bro

#

tensor networks bro

#

next gen shit

true hemlock May 16, 2026, 9:51 PM

#

bro

#

quantum neural network bro

#

next level

#

the future!!11!11!

stark needle May 16, 2026, 9:52 PM

#

true hemlock quantum neural network bro

actually tensor network is fucking insane

#

tensor networks is based in quantum computing or osme shit

#

https://arxiv.org/pdf/2401.14109

mighty thorn May 16, 2026, 9:53 PM

#

stark needle tensor networks is based in quantum computing or osme shit

x is crazy. Shit is the future
doesn’t know what it is

#

Shadow is an ai bro now

stark needle May 16, 2026, 9:54 PM

#

u can represent bigger models with less params

#

via tensor networks

#

its fucking insane actually

#

#

literally

#

less params to backprop

#

and theres a fucking pytorch impl

obsidian mantle May 16, 2026, 9:55 PM

#

Why is training speed an issue at all

stark needle May 16, 2026, 9:55 PM

#

https://github.com/rballester/tntorch

GitHub

GitHub - rballester/tntorch: Tensor Network Learning with PyTorch

Tensor Network Learning with PyTorch. Contribute to rballester/tntorch development by creating an account on GitHub.

obsidian mantle May 16, 2026, 9:55 PM

#

Do some neural networks take like super long to train

#

And its viable because can shrink 10 years training to 1 year for example

stark needle May 16, 2026, 9:56 PM

#

mighty thorn ***x** is crazy. Shit is the future* *doesn’t know what it is*

bro u can decompose a tensor into a graph

obsidian mantle May 16, 2026, 9:56 PM

#

Or is it because every train is trial and error

#

And you can do 1000 attempts instead of 10 attempts for the same period of time

#

I am caveman please clarify vedalNeuroYay

stark needle May 16, 2026, 9:57 PM

#

https://tensornetwork.org/

Tensor Network

The Tensor Network

Resources for tensor network algorithms, theory, and software

#

schizo site

mighty thorn May 16, 2026, 9:58 PM

#

obsidian mantle Do some neural networks take like super long to train

Grokking Minamhm

obsidian mantle May 16, 2026, 9:58 PM

#

stark needle u can represent bigger models with less params

Why is there no 200b tensor llm that you can run on 16gb vram yet

#

Or is it wip

stark needle May 16, 2026, 9:59 PM

#

obsidian mantle Why is there no 200b tensor llm that you can run on 16gb vram yet

u can in theory

#

ucan convert all tensors

obsidian mantle May 16, 2026, 9:59 PM

#

Why arent they making it

#

I mean

stark needle May 16, 2026, 9:59 PM

#

a company does this shit

obsidian mantle May 16, 2026, 9:59 PM

#

You need x10 times less islands

stark needle May 16, 2026, 9:59 PM

#

Model as a service

obsidian mantle May 16, 2026, 9:59 PM

#

obsidian mantle You need x10 times less islands

Why nobody uses it

stark needle May 16, 2026, 9:59 PM

#

https://multiversecomputing.com/

Multiverse Computing

Multiverse Computing - Pioneering the Era of Efficient and Sovereig...

We empower organizations to run secure, production-ready AI with tailored solutions — reducing compute costs and retaining full control across cloud, data centers and edge environments

obsidian mantle May 16, 2026, 9:59 PM

#

Or is it new and wip and they are doing it rn

stark needle May 16, 2026, 9:59 PM

#

obsidian mantle Or is it new and wip and they are doing it rn

this is shit from 2024

#

but the idea of tensor networks exists since like forever

obsidian mantle May 16, 2026, 10:00 PM

#

Soo

#

Where models

stark needle May 16, 2026, 10:00 PM

#

idfk

#

ive tried this shit tho myself

obsidian mantle May 16, 2026, 10:00 PM

#

obsidian mantle You need x10 times less islands

This justifies economy

stark needle May 16, 2026, 10:00 PM

#

and it rly works

#

ive ran like 100M param pretraining

#

with tensor network uses like 10M params

obsidian mantle May 16, 2026, 10:01 PM

#

Do you need some quantum computer to run it

stark needle May 16, 2026, 10:01 PM

#

no

true hemlock May 16, 2026, 10:01 PM

#

obsidian mantle Do some neural networks take like super long to train

imagine having to iterate 30 billion weights times the highest context window, times the dataset length in tokens which is also in the billions to trillions of tokens

stark needle May 16, 2026, 10:01 PM

#

gpu is fine

obsidian mantle May 16, 2026, 10:01 PM

#

No i mean

#

I get that its super good

#

And overpowered

#

I ask why for example

#

Openai wont use it rn

stark needle May 16, 2026, 10:02 PM

#

glueless

#

who knows whos using it

obsidian mantle May 16, 2026, 10:02 PM

#

To make 10000000b model that fits into previously 1000b model size

opaque sigil May 16, 2026, 10:02 PM

#

if it works at scale and doesn't fuck up everything they're probably using it already

obsidian mantle May 16, 2026, 10:02 PM

#

And save 90% of money on building islands

mighty thorn May 16, 2026, 10:02 PM

#

true hemlock imagine having to iterate 30 billion weights times the highest context window, t...

Imagine having to actually progress the field instead of gatekeeping and bragging with unverified claims

opaque sigil May 16, 2026, 10:02 PM

#

granted that's a big if

rough bloom May 16, 2026, 10:03 PM

#

apparently super good
really old
has PyTorch impl
somehow nobody is using it

stark needle May 16, 2026, 10:03 PM

#

just gonna say that google had also a library

#

https://github.com/google/TensorNetwork

GitHub

GitHub - google/TensorNetwork: A library for easy and efficient man...

A library for easy and efficient manipulation of tensor networks. - google/TensorNetwork

#

glueless

opaque sigil May 16, 2026, 10:03 PM

#

archived neuroPogHD

obsidian mantle May 16, 2026, 10:03 PM

#

true hemlock imagine having to iterate 30 billion weights times the highest context window, t...

Wait are you saying its bad
I didnt quite get it
I felt like what you said was like.. benefit for it

stark needle May 16, 2026, 10:03 PM

#

deepmind has alr schizoed into this shit

true hemlock May 16, 2026, 10:04 PM

#

true hemlock imagine having to iterate 30 billion weights times the highest context window, t...

that's how much is going on with training llm. they had to optimize any caching related techniques on the attention heads just to make it viable

#

im just giving some examples for scale

obsidian mantle May 16, 2026, 10:04 PM

#

So you saying that it works and everyone using it

mighty thorn May 16, 2026, 10:04 PM

#

obsidian mantle Wait are you saying its bad I didnt quite get it I felt like what you said was l...

Alright so

#

Quack is an SNNbro

#

He hates ai industry

obsidian mantle May 16, 2026, 10:05 PM

#

Why my gemma4 26b takes 17gb vram then

mighty thorn May 16, 2026, 10:05 PM

#

But has agi internally but it’s just too dangerous to release

true hemlock May 16, 2026, 10:05 PM

#

idk wtf is this dog on about lmao

mighty thorn May 16, 2026, 10:05 PM

#

So basically

amber fractal May 16, 2026, 10:05 PM

#

Weaksauce hating

mighty thorn May 16, 2026, 10:05 PM

#

You know how OpenAI refused to release gpt 2 cause it was too dangerous

#

That’s where they are rn

#

They are gpt 2

opaque sigil May 16, 2026, 10:05 PM

#

you're talking about two completely separate things FOCUS

mighty thorn May 16, 2026, 10:05 PM

#

They are still catching up

amber fractal May 16, 2026, 10:06 PM

#

Kaine is yapping because they want to spread infomation, never said it good or informative tho

true hemlock May 16, 2026, 10:06 PM

#

opaque sigil you're talking about two completely separate things <:FOCUS:1168267148523737239>

yeah im just giving scales on what's going on with pretraining a whole llm

#

maybe im leaving details like moe feed forward is much more compute efficient

stark needle May 16, 2026, 10:07 PM

#

use RWKV or some shit bro SCHIZO

#

gated deltanet

true hemlock May 16, 2026, 10:07 PM

#

idk, kaine is talking about something else

mighty thorn May 16, 2026, 10:08 PM

#

mighty thorn They are gpt 2

Not capability wise. SOTA SNN can barely spell. But I mean they are still in the phase where they all think they are savants who just created sentient machines and need to gatekeep it for the protection of the human race

true hemlock May 16, 2026, 10:08 PM

#

and being somewhat obnoxious for absolutely no reason whatsoever

obsidian mantle May 16, 2026, 10:08 PM

#

So quack is talking about different thing completely did i get it right
Not talking about tensor model shrinking

#

Or what

rough bloom May 16, 2026, 10:08 PM

#

obsidian mantle So quack is talking about different thing completely did i get it right Not tal...

yes

true hemlock May 16, 2026, 10:08 PM

#

yes

opaque sigil May 16, 2026, 10:08 PM

#

quack was talking about why training can take tens if not hundreds of million gpu hours YES

obsidian mantle May 16, 2026, 10:08 PM

#

Oh

#

I see

true hemlock May 16, 2026, 10:08 PM

#

yeah

obsidian mantle May 16, 2026, 10:09 PM

#

Then this tensor shit is mega overpowered? No?

true hemlock May 16, 2026, 10:09 PM

#

there's a reason we laughed off of a guy who thought he could train a 12b 1.58bit model on a cpu

mighty thorn May 16, 2026, 10:09 PM

#

true hemlock and being somewhat obnoxious for absolutely no reason whatsoever

Quack would never be obnoxious about another persons interests, projects, or field of research glueless

true hemlock May 16, 2026, 10:09 PM

#

within few months

rough bloom May 16, 2026, 10:09 PM

#

obsidian mantle Then this tensor shit is mega overpowered? No?

supposedly

#

but somehow never deployed anywhere neuromegadance

#

not widely known for LLMs at least

opaque sigil May 16, 2026, 10:10 PM

#

obsidian mantle Then this tensor shit is mega overpowered? No?

hard to tell, most optimisations end up being quite specific and not applicable to everything so it's easier to just not bother

rough bloom May 16, 2026, 10:10 PM

#

despite existing for ages already

true hemlock May 16, 2026, 10:10 PM

#

opaque sigil hard to tell, most optimisations end up being quite specific and not applicable ...

this tbh

opaque sigil May 16, 2026, 10:10 PM

#

i don't remember much about tensor networks to give an actual answer FOCUS

obsidian mantle May 16, 2026, 10:10 PM

#

Okay and then my gemma4 takes so much vram because gemma4 200b fitting in 4090 would be so strong i could exploit windows on it or some shit

opaque sigil May 16, 2026, 10:10 PM

#

no-brainer optimisations are obviously used everywhere

obsidian mantle May 16, 2026, 10:10 PM

#

While mythos needs only 500gb vram while being 500000b model

stark needle May 16, 2026, 10:10 PM

#

it would still use a lot of vram for activations

true hemlock May 16, 2026, 10:10 PM

#

god, im late for my fishing trip brb need to call an uber

stark needle May 16, 2026, 10:10 PM

#

activations dont get reduced

stark needle May 16, 2026, 10:11 PM

#

true hemlock god, im late for my fishing trip brb need to call an uber

have fun quack

obsidian mantle May 16, 2026, 10:11 PM

#

Oh so this thing is purely training related

#

And no x10 benefits on inference

rough bloom May 16, 2026, 10:11 PM

#

no, it only affects the parameters I think

stark needle May 16, 2026, 10:11 PM

#

only params are affected

rough bloom May 16, 2026, 10:12 PM

#

shrinks the model itself, but not the space required for any computations with that model

stark needle May 16, 2026, 10:12 PM

#

Tensor networks +deepseek compressed sparse attention SCHIZO

mighty thorn May 16, 2026, 10:12 PM

#

obsidian mantle While mythos needs only 500gb vram while being 500000b model

That is absolutely not true

#

I want a source so I can ridicule it

obsidian mantle May 16, 2026, 10:12 PM

#

Numbers are random

stark needle May 16, 2026, 10:13 PM

#

mighty thorn That is absolutely not true

bro mythos is just a wider moe

mighty thorn May 16, 2026, 10:13 PM

#

obsidian mantle Numbers are random

HAHA RANDOM HAHA

obsidian mantle May 16, 2026, 10:13 PM

#

But i already got answered

obsidian mantle May 16, 2026, 10:13 PM

#

mighty thorn HAHA RANDOM HAHA

I mean i took these numbers out of my ass to pinpoint the scale of shrink

mighty thorn May 16, 2026, 10:13 PM

#

stark needle bro mythos is just a wider moe

That’s why it’s a fraud

stark needle May 16, 2026, 10:13 PM

#

@mighty thorn u keep saying moe is shit but gemini and chatgpt and shit are all moe

mighty thorn May 16, 2026, 10:13 PM

#

stark needle <@928273478795292683> u keep saying moe is shit but gemini and chatgpt and shit ...

They would be better if the hardware used for them was used for not MoE

stark needle May 16, 2026, 10:13 PM

#

that's just poor copium

obsidian mantle May 16, 2026, 10:13 PM

#

obsidian mantle Oh so this thing is purely training related

So is this basically correct

stark needle May 16, 2026, 10:14 PM

#

mighty thorn They would be better if the hardware used for them was used for not MoE

not economical at scale

obsidian mantle May 16, 2026, 10:14 PM

#

I still need 20gb for 26b model

#

But its trained much faster

#

Trying to understand

opaque sigil May 16, 2026, 10:14 PM

#

moe is good for what it's made for, which is allowing you to cram far more knowledge into the model for relatively cheap FOCUS

rough bloom May 16, 2026, 10:14 PM

#

MoE is awesome for big models

opaque sigil May 16, 2026, 10:14 PM

#

no need to pretend it's worthless

mighty thorn May 16, 2026, 10:14 PM

#

You either die dense or live long enough to see yourself become sparse

rough bloom May 16, 2026, 10:14 PM

#

rough bloom MoE is awesome for big models

faster and easier to deploy EssexHyperNod

obsidian mantle May 16, 2026, 10:14 PM

#

Moe is just mega fast inference

stark needle May 16, 2026, 10:14 PM

#

rough bloom MoE is awesome for big models

moe is for expert paralleism on interconnected systems and shit

obsidian mantle May 16, 2026, 10:15 PM

#

Which is good

rough bloom May 16, 2026, 10:15 PM

#

stark needle moe is for expert paralleism on interconnected systems and shit

yee

mighty thorn May 16, 2026, 10:15 PM

#

opaque sigil no need to pretend it's worthless

MoE is worthless for LLMs in most contexts, but very useful in the contexts where it’s actually intelligent to use

stark needle May 16, 2026, 10:15 PM

#

mighty thorn MoE is worthless for LLMs in most contexts, but very useful in the contexts wher...

???????????????????

fast pagoda May 16, 2026, 10:15 PM

#

but it's used in all sota contexts currently t

stark needle May 16, 2026, 10:16 PM

#

bro moe makes llm infinitely cheaper to train

fast pagoda May 16, 2026, 10:16 PM

#

doesnt seem particularly useless

stark needle May 16, 2026, 10:16 PM

#

u just need more vram

#

but compute is the same

#

activations dont grow

#

u can use higher context length

true hemlock May 16, 2026, 10:17 PM

#

i don't think kaine knows what he's talking about

#

missed the point of moe entirely

mighty thorn May 16, 2026, 10:17 PM

#

stark needle ???????????????????

MoE is worthless for LLMs in most contexts, but very useful in the contexts where it’s actually intelligent to use

obsidian mantle May 16, 2026, 10:17 PM

#

I tried moe vs non moe and all i noticed was x5 faster inference

fast pagoda May 16, 2026, 10:17 PM

#

what is .... most contexts?

stark needle May 16, 2026, 10:17 PM

#

mighty thorn * MoE is worthless for LLMs in most contexts, but very useful in the contexts wh...

most contexts

#

bro all sota llms are moe

mighty thorn May 16, 2026, 10:18 PM

#

stark needle bro all sota llms are moe

Because cheaper for improving benchmark scores versus actually making better models

stark needle May 16, 2026, 10:18 PM

#

mighty thorn Because cheaper for improving benchmark scores versus actually making better mod...

❓

mighty thorn May 16, 2026, 10:18 PM

#

stark needle ❓

Because cheaper for improving benchmark scores versus actually making better models

stark needle May 16, 2026, 10:19 PM

#

mighty thorn * Because cheaper for improving benchmark scores versus actually making better m...

bro

#

u gotta be fr

rough bloom May 16, 2026, 10:19 PM

#

fast pagoda what is .... most contexts?

anything except tiny local models where you need as much density as possible because you have little VRAM neuromegadance
everywhere else MoEs win I think

true hemlock May 16, 2026, 10:19 PM

#

in the enterprise space where they basically have endless hbm memory, moe is highly beneficial

obsidian mantle May 16, 2026, 10:19 PM

#

But moe doesnt shrink required vram

true hemlock May 16, 2026, 10:19 PM

#

less compute being used

stark needle May 16, 2026, 10:19 PM

#

dense llms is just poor people copium

true hemlock May 16, 2026, 10:19 PM

#

obsidian mantle But moe doesnt shrink required vram

which is why i said endless memory

#

also

#

moe is basically

mighty thorn May 16, 2026, 10:20 PM

#

obsidian mantle But moe doesnt shrink required vram

It explodes it while having almost all of it being idle and unused

fast pagoda May 16, 2026, 10:20 PM

#

the weakness of moe isnt a problem at sufficient active parameter counts
Something like Qwen3 30B-A3B knows things like a 30B model but is limited by the width of the active slice in any given forward pass

scale this to frontier size and the active parameters are in the tens to hundreds of billions again and it scales incredibly well when total params end up in the trillions

mighty thorn May 16, 2026, 10:20 PM

#

Least efficient possible idea

obsidian mantle May 16, 2026, 10:20 PM

#

mighty thorn It explodes it while having almost all of it being idle and unused

Huh

rough bloom May 16, 2026, 10:20 PM

#

rough bloom anything except tiny local models where you need as much density as possible bec...

and even locally MoEs are pretty good
it's dumb anyway so might as well make it fast

true hemlock May 16, 2026, 10:20 PM

#

yeah

obsidian mantle May 16, 2026, 10:20 PM

#

Didnt notice any explodes in 26a3

#

It just took like 17gb all the time

true hemlock May 16, 2026, 10:21 PM

#

basically

mighty thorn May 16, 2026, 10:21 PM

#

obsidian mantle It just took like 17gb all the time

17gb for 3b parameter while the rest do nothing 😭

obsidian mantle May 16, 2026, 10:21 PM

#

I understand that it assignes some "experts" that make it use only relatable params instead of all params evrry tjme

true hemlock May 16, 2026, 10:22 PM

#

you got a model that inferenced just tiny part of itself but knows as much as the total params
though the drawback is that its somewhat dumber, but when you have like hundreds of billions params scale and work with enterprise scale inferencing does it even matter

fast pagoda May 16, 2026, 10:22 PM

#

"nothing" is a misunderstanding of moe

stark needle May 16, 2026, 10:22 PM

#

mighty thorn It explodes it while having almost all of it being idle and unused

wrong when serving to many people u can get in fact higher utilization

#

due to expert load balancing

obsidian mantle May 16, 2026, 10:22 PM

#

mighty thorn 17gb for 3b parameter while the rest do nothing 😭

It only takes 3 but it selects it carefully

#

Whats the point of calculating impact of tokens that say 2+2=4 if it needs to write a poem

#

Its not a bad idea

true hemlock May 16, 2026, 10:22 PM

#

stark needle wrong when serving to many people u can get in fact **higher** utilization

ah great, he has never done batch inference before

mighty thorn May 16, 2026, 10:22 PM

#

stark needle wrong when serving to many people u can get in fact **higher** utilization

Assuming that they aren’t trying to use the same experts

true hemlock May 16, 2026, 10:22 PM

#

ThumbsUpCat

mighty thorn May 16, 2026, 10:23 PM

#

Which they will be

true hemlock May 16, 2026, 10:23 PM

#

mighty thorn Which they will be

glueless

mighty thorn May 16, 2026, 10:23 PM

#

Since only 3 are generalists while all the others are overfitted to Turkish history

true hemlock May 16, 2026, 10:23 PM

#

did bro fail statistics class

rough bloom May 16, 2026, 10:23 PM

#

mighty thorn Which they will be

NOPERSCat

true hemlock May 16, 2026, 10:23 PM

#

crazy

stark needle May 16, 2026, 10:23 PM

#

mighty thorn Which they will be

they arent bro

rough bloom May 16, 2026, 10:23 PM

#

this is why you have load balancing in MoEs

stark needle May 16, 2026, 10:24 PM

#

mighty thorn Since only 3 are generalists while all the others are overfitted to Turkish hist...

bro router z loss

#

load balancing loss

true hemlock May 16, 2026, 10:24 PM

#

also when you get multiple people using the same experts

#

that's not bad either

#

you batch the damn matrix

fast pagoda May 16, 2026, 10:24 PM

#

Multiple users hitting the same weights is the thing you want it's the entire economic basis of batched serving. Weights are readonly reads don't contend

true hemlock May 16, 2026, 10:24 PM

#

exactly

fast pagoda May 16, 2026, 10:24 PM

#

you can batch serve a dense model all on the same weights

stark needle May 16, 2026, 10:25 PM

#

fast pagoda you can batch serve a dense model all on the same weights

ye but then theres a ton of duplication

#

since u would be doing DDP

#

or TP if model too big

warped narwhal May 16, 2026, 10:25 PM

#

Does anyone here use both nixos and jetbrains ides?

true hemlock May 16, 2026, 10:26 PM

#

i don't use jetbrains

#

i use nixos occasionally though

#

shuni is proud

mighty thorn May 16, 2026, 10:26 PM

#

LLMs have been on the wrong track since GPT-o1
Now everything is MoE CoT benchmaxxed distilled int8 marketingslop

true hemlock May 16, 2026, 10:26 PM

#

well

obsidian mantle May 16, 2026, 10:26 PM

#

linux vedalEwNo

rough bloom May 16, 2026, 10:26 PM

#

true hemlock shuni is proud

neuroKufufu

fast pagoda May 16, 2026, 10:27 PM

#

throughput can get fugged for bad routing with one hot expert doing all or if the tokens all scatter like crazy that can be an issue but especially the single expert getting smashed is an issue of iimbalance

true hemlock May 16, 2026, 10:27 PM

#

god im still waiting for sleep deprivation to hit me im sleepy as hell i need the sudden surge of energy

warped narwhal May 16, 2026, 10:27 PM

#

I'm pulling my hair out trying to get flakes working with them, cause everytime I try using rust rover etc, it complains that it can't find the rust install because rustup isn't available (even though it is installed) and it fails to run it because it's a dynamic exe

rough bloom May 16, 2026, 10:29 PM

#

mighty thorn LLMs have been on the wrong track since GPT-o1 Now everything is MoE CoT benchma...

Qwen kinda is that kek

#

all the big LLM benchmarks are close enough that I do not care

opaque sigil May 16, 2026, 10:29 PM

#

surely there's a way to tell them where to find rust outside of rustup right

#

hmmm

true hemlock May 16, 2026, 10:31 PM

#

rough bloom all the big LLM benchmarks are close enough that I do not care

the benchmarks don't tell shit anymore i prefer to measure models based on how enjoyable it is to converse with + how reliable it is with doing tasks a user requested

warped narwhal May 16, 2026, 10:31 PM

#

opaque sigil surely there's a way to tell them where to find rust outside of rustup right

It will find rustc, but not the standard library

stark needle May 16, 2026, 10:31 PM

#

mighty thorn LLMs have been on the wrong track since GPT-o1 Now everything is MoE CoT benchma...

bro they can do more shit than earlier

#

earlier llms were unusable

#

actual braindead shit

opaque sigil May 16, 2026, 10:31 PM

#

are you sure you even have the stdlib

#

it's its own toolchain component

warped narwhal May 16, 2026, 10:32 PM

#

Yes, I can compile rust apps outside of the ide just fine, it's the ide itself that shits the bed when it tries anything

opaque sigil May 16, 2026, 10:32 PM

#

NeurOhISee

rough bloom May 16, 2026, 10:33 PM

#

true hemlock the benchmarks don't tell shit anymore i prefer to measure models based on how e...

ye basically that
they're still useful to know very roughly where an LLM is in terms of performance but you also usually don't need benchmarks for that mario

fast pagoda May 16, 2026, 10:33 PM

#

yea i mean the indicator is like a black and white line

#

it's either in step with peers vaguely on useful tasks or it's total dogshit

#

that's about the max judgement to be had there

rough bloom May 16, 2026, 10:34 PM

#

YES

mighty thorn May 16, 2026, 10:34 PM

#

stark needle earlier llms were unusable

I was having 4o do python and was fine with it

#

Then they forced CoT on us

#

Then MoE

#

And now we are here

stark needle May 16, 2026, 10:34 PM

#

mighty thorn Then MoE

glueless

#

bro 4o was moe

fast pagoda May 16, 2026, 10:34 PM

#

KEKW

mighty thorn May 16, 2026, 10:35 PM

#

5T frontier model with 400k active parameters

rough bloom May 16, 2026, 10:35 PM

#

Mixtral Oldge

stark needle May 16, 2026, 10:35 PM

#

mixtral was fucking god tier at that time

#

47b params for gpt4 perf

#

would beat the fuck out of llama 2 70b

fast pagoda May 16, 2026, 10:36 PM

#

god

#

i archived the fuck out of mixtral as my "if the world ends" model to have lmao

#

at the time

#

you just reminded me

mighty thorn May 16, 2026, 10:37 PM

#

fast pagoda i archived the fuck out of mixtral as my "if the world ends" model to have lmao

Yeah

#

I have a few of those

stark needle May 16, 2026, 10:37 PM

#

i had llms locally cause huggingface was down constantly

#

back in those days

#

in a fucking gitlab lfs

opaque sigil May 16, 2026, 10:38 PM

#

warped narwhal Yes, I can compile rust apps outside of the ide just fine, it's the ide itself t...

i just tried and rust-rover seems to find my toolchain just fine FOCUS
so idk what to tell you other than use rust-overlay I guess

mighty thorn May 16, 2026, 10:38 PM

#

Stranded on island with solar powered laptop and 8b local llm, the creation of the ultimate schizo

rough bloom May 16, 2026, 10:39 PM

#

NeuroChatting OpenClaw, get me off this island

opaque sigil May 16, 2026, 10:39 PM

#

opaque sigil i just tried and rust-rover seems to find my toolchain just fine <:FOCUS:1168267...

[toolchain]
channel = "nightly-2026-04-03"
components = ["rust-src", "rustc-dev", "rust-analyzer", "clippy"]

rust-toolchain = pkgs.rust-bin.fromRustupToolchainFile ./rust-toolchain.toml;

mighty thorn May 16, 2026, 10:41 PM

#

rough bloom <a:NeuroChatting:1078464633586864199> OpenClaw, get me off this island

Make no mistakes

stark needle May 16, 2026, 10:42 PM

#

mighty thorn Stranded on island with solar powered laptop and 8b local llm, the creation of t...

bro just

#

fuck the laptop

#

get 2x gb10

#

similar watts

#

run deepseek v4

#

moe

mighty thorn May 16, 2026, 10:42 PM

#

stark needle get 2x gb10

Ur actually delusional

#

Price check that for me

stark needle May 16, 2026, 10:43 PM

#

poor people cope

mighty thorn May 16, 2026, 10:43 PM

#

See how laptop like the price is

stark needle May 16, 2026, 10:43 PM

#

mighty thorn See how laptop like the price is

my laptop was 5000$

#

or osme shit

mighty thorn May 16, 2026, 10:43 PM

#

stark needle poor people cope

Mods ban this guy for discrimination

stark needle May 16, 2026, 10:44 PM

#

mighty thorn Mods ban this guy for discrimination

just get a job bro

#

i was earning slave wages

#

for 4 years

mighty thorn May 16, 2026, 10:45 PM

#

stark needle just get a job bro

Suspiciously works 20 hours a week shaped

stark needle May 16, 2026, 10:45 PM

#

while working at L4 technical level

#

as a fucking kid

mighty thorn May 16, 2026, 10:46 PM

#

stark needle while working at L4 technical level

Shadow should do the thing where the rich person gets rid of all of their assets and qualifications and tries to rebuild from scratch to prove it’s easy.
He’d last a few hours before breaking down upon having to go to chick fil a instead of eating 50lb of caviar every meal

stark needle May 16, 2026, 10:47 PM

#

mighty thorn Shadow should do the thing where the rich person gets rid of all of their assets...

bro

#

ive had

#

cold dms from random sillicon valley cofounders shit

#

on fucking discord

#

i didnt even need to show my cv lmao

#

just show technical competence

mighty thorn May 16, 2026, 10:49 PM

#

Let’s see what the people say

#

Dirty 1%er

obsidian mantle May 16, 2026, 10:49 PM

#

Age 20

#

I was living on 100$/mo neurOMEGALUL

fast pagoda May 16, 2026, 10:50 PM

#

me when i doxx

mighty thorn May 16, 2026, 10:50 PM

#

obsidian mantle I was living on 100$/mo <:neurOMEGALUL:1097297318119743638>

Shadow spent 40x that on a watch

mighty thorn May 16, 2026, 10:50 PM

#

fast pagoda me when i doxx

No he’s said before

#

Probably

obsidian mantle May 16, 2026, 10:50 PM

#

Well right now i am close to that

#

But im 28

#

Its like

#

Super huge difference

umbral wigeon May 16, 2026, 10:51 PM

#

Let's vibe code

#

https://tenor.com/view/jaded-disappointed-down-let-down-computer-gif-143123934597902682

Tenor

obsidian mantle May 16, 2026, 10:51 PM

#

Its basically luck

#

Time can counter luck

#

Effort too of course

#

Cant do shit without it

fast pagoda May 16, 2026, 10:53 PM

#

me, age 20

stark needle May 16, 2026, 10:53 PM

#

"luck" evilStare

obsidian mantle May 16, 2026, 10:54 PM

#

stark needle "luck" <:evilStare:1150846711456084040>

I discovered what programming is at 25 what do you want 💀

#

Yes luck

umbral wigeon May 16, 2026, 10:54 PM

#

obsidian mantle Its basically luck

Shit
https://github.com/weenachuangkud/FastCast2/pull/39

GitHub

Introduce serial and parallel caster modes with SoA simulation arch...

Summary by CodeRabbit

New Features

Added serial and parallel caster modes, Motor6D transform movement with pooling, and client-side serial/parallel FPS benchmarks.

Bug Fixes

Fixed high-fidel...

mighty thorn May 16, 2026, 10:55 PM

#

mighty thorn

This says a lot about society

#

The richer Australian with multiple a100 and a collection of exotic GPU and cpu defends the less rich guy

#

0.1% and 1% coming together to gaslight the poors 😭

umbral wigeon May 16, 2026, 10:57 PM

#

umbral wigeon Shit https://github.com/weenachuangkud/FastCast2/pull/39

215 commits now

obsidian mantle May 16, 2026, 10:57 PM

#

Wtf is he essaying there

#

neurOMEGALUL

mighty thorn May 16, 2026, 10:58 PM

#

obsidian mantle Wtf is he essaying there

It’s gonna be a long form defense of his $4000 watch

#

Which is worth more than everything I own combined

obsidian mantle May 16, 2026, 10:58 PM

#

I mean if you have 10k per month income why not

#

Literally my current job but in US gives exactly that

mighty thorn May 16, 2026, 10:59 PM

#

obsidian mantle I mean if you have 10k per month income why not

12 days for a watch 😭

obsidian mantle May 16, 2026, 10:59 PM

#

Yeah

true hemlock May 16, 2026, 10:59 PM

#

mighty thorn 0.1% and 1% coming together to gaslight the poors 😭

which one is which

mighty thorn May 16, 2026, 10:59 PM

#

true hemlock which one is which

How many A100 do you have?

obsidian mantle May 16, 2026, 11:00 PM

#

obsidian mantle Literally my current job but in US gives exactly that

But due to low luck im not in US so i get 1.2k

#

Yes luck x2

quasi sundial May 16, 2026, 11:00 PM

#

obsidian mantle May 16, 2026, 11:00 PM

#

Did you apply to google at 15

stark needle May 16, 2026, 11:00 PM

#

obsidian mantle Did you apply to google at 15

Ye

mighty thorn May 16, 2026, 11:01 PM

#

Child labor is the answer to poverty, ladies and gentlemen

obsidian mantle May 16, 2026, 11:01 PM

#

Did you speak english at 15

fast pagoda May 16, 2026, 11:01 PM

#

i don't think a combined spend of like 12k-15k on tools that are used in your line of work + less than 5k hobby shit constitutes rich

rich is a whole different ball game than anything described

umbral wigeon May 16, 2026, 11:01 PM

#

Is it legal? To let 15 y/o working in big tech? Like in your country

obsidian mantle May 16, 2026, 11:01 PM

#

Did you know what programming is at 15

#

Yes luck

umbral wigeon May 16, 2026, 11:01 PM

#

umbral wigeon Is it legal? To let 15 y/o working in big tech? Like in your country

My country can't(laws, school system)

mighty thorn May 16, 2026, 11:01 PM

#

fast pagoda i don't think a combined spend of like 12k-15k on tools that are used in your li...

Ur also rich but I let you slide cause no $4k watch

stark needle May 16, 2026, 11:01 PM

#

obsidian mantle Did you speak english at 15

I could easily have passed c1 cambridge in middle school

fast pagoda May 16, 2026, 11:02 PM

#

umbral wigeon May 16, 2026, 11:02 PM

#

stark needle I could easily have passed c1 cambridge in middle school

What your country?

stark needle May 16, 2026, 11:02 PM

#

obsidian mantle Did you know what programming is at 15

How do u think i got in

obsidian mantle May 16, 2026, 11:02 PM

#

stark needle I could easily have passed c1 cambridge in middle school

There was no human in my town who would pass C1

#

I am trying to explain that you got lucky to be there where you were

#

With your skills

stark needle May 16, 2026, 11:03 PM

#

umbral wigeon What your country?

Swiss

umbral wigeon May 16, 2026, 11:03 PM

#

stark needle Swiss

Do you have github, I wanna explore

rough bloom May 16, 2026, 11:03 PM

#

obsidian mantle I am trying to explain that you got lucky to be there where you were

the power of having different circumstances

stark needle May 16, 2026, 11:03 PM

#

No my gh is empty

obsidian mantle May 16, 2026, 11:03 PM

#

Good education too

rough bloom May 16, 2026, 11:04 PM

#

rough bloom the power of having different circumstances

and putting in the effort to actually use those circumstances

#

neuroPogHD

fast pagoda May 16, 2026, 11:04 PM

#

spawn luck

mighty thorn May 16, 2026, 11:04 PM

#

rough bloom and putting in the effort to actually use those circumstances

No room for said effort without the prior luck

#

Can’t build skyscraper on dirt

#

Need foundation

obsidian mantle May 16, 2026, 11:04 PM

#

Hesright

#

I wasn't supposed to speak English at all for example

#

I just randomly stumbled upon forsen stream and understood 10%

umbral wigeon May 16, 2026, 11:05 PM

#

obsidian mantle Good education too

How each eduactions system in different countries even works, I mean how did someone at 15 able to get phd, do they just max stats and beat it to phd?

fast pagoda May 16, 2026, 11:06 PM

#

the ability to be active in this discord to the extent that the majority of people in this conversation are is already indicative of a similar type of luck if not the same extent
there are entire swathes of the world who have no inkling of the wonder we interact with every day

obsidian mantle May 16, 2026, 11:06 PM

#

umbral wigeon How each eduactions system in different countries even works, I mean how did som...

Idk about phd at 15

#

You need to be noticed

stark needle May 16, 2026, 11:06 PM

#

I dont have phd idk

obsidian mantle May 16, 2026, 11:06 PM

#

What if there nobody to notice you

#

Will you aplly to google

#

"hello i can make calculator im 15"

mighty thorn May 16, 2026, 11:07 PM

#

fast pagoda the ability to be active in this discord to the extent that the majority of peop...

https://tenor.com/view/socrates-spinning-gif-2769471696194500629

Tenor

#

Thank you Socrates

#

Very cool

stark needle May 16, 2026, 11:07 PM

#

obsidian mantle Will you aplly to google

Yeah i would have

fast pagoda May 16, 2026, 11:08 PM

#

you're welcome, i won't charge for that one since you're on the brink it seems

#

further will require an active credit card on file, however

umbral wigeon May 16, 2026, 11:09 PM

#

fast pagoda the ability to be active in this discord to the extent that the majority of peop...

I'm active on discord, cus I just wanna yap and share or talk, I wanna doom scrolling tiktok, discord, facebook

frozen igloo May 16, 2026, 11:09 PM

#

mighty thorn https://tenor.com/view/socrates-spinning-gif-2769471696194500629

fast pagoda May 16, 2026, 11:09 PM

#

frozen igloo

oi

#

he's stuck like that

#

dont make fun

#

birth defect

frozen igloo May 16, 2026, 11:10 PM

#

fast pagoda birth defect

It’s a rare disease but not unheard of

umbral wigeon May 16, 2026, 11:11 PM

#

umbral wigeon I'm active on discord, cus I just wanna yap and share or talk, I wanna doom scro...

And to stop doom scrolling, I'll shut down myself(sleep)

#

https://tenor.com/view/donald-duck-sleep-gif-12073235593935901891

Tenor

#

https://tenor.com/view/threddy-threddyrex-windows-updates-gif-20412365

Tenor

obsidian mantle May 16, 2026, 11:13 PM

#

I read it fully now

#

Holy shit

#

neurOMEGALUL

umbral wigeon May 16, 2026, 11:14 PM

#

I think programming is easy

#

If autistic person can do it

obsidian mantle May 16, 2026, 11:15 PM

#

Is like "i bought 10 lottery tickets and won them all"

#

Congrats

umbral wigeon May 16, 2026, 11:16 PM

#

umbral wigeon If autistic person can do it

Because I'm autistic

#

https://tenor.com/view/bosnov-67-bosnov-67-67-meme-gif-16727368109953357722

Tenor

mighty thorn May 16, 2026, 11:17 PM

#

umbral wigeon https://tenor.com/view/bosnov-67-bosnov-67-67-meme-gif-16727368109953357722

Get out

obsidian mantle May 16, 2026, 11:18 PM

#

I also fixed previously unknown bugs and got +100$ raise at my job
I guess i played my cards alright

#

Now company is dying and idk if i should flee the country or go to capital

umbral wigeon May 16, 2026, 11:20 PM

#

If I was a hiring manager, I would like to take a look at github than CV or resume honestly

obsidian mantle May 16, 2026, 11:21 PM

#

My managers dont know what github is glueless

umbral wigeon May 16, 2026, 11:21 PM

#

Or any profile that shows you have passion in the tech

#

If someone actually done something impressive, they should have it all, either github, your facebook, your X as portfoilo to show they're actually facts

#

Because how I can believe if someone came up to me and say, I've graudated from cambridge and got a job at google 15 years old, I mean it's not bad
But I would like to see github profile slams on my face or any official papers that you're actually done it

obsidian mantle May 16, 2026, 11:27 PM

#

Its super rare but not impossible

#

I guess

#

Not like i ever witnessed anything like that myself

umbral wigeon May 16, 2026, 11:29 PM

#

umbral wigeon Because how I can believe if someone came up to me and say, I've graudated from ...

I'm sorry if my opinions hurt your feeling

#

umbral wigeon May 16, 2026, 11:32 PM

#

umbral wigeon Because how I can believe if someone came up to me and say, I've graudated from ...

(Also there are intreview rounds, coding problems screen share idk about that cus I've not experneiced it)

obsidian mantle May 16, 2026, 11:32 PM

#

However i find it concerning that you have to work in google at 15 to afford that shit

#

In 5 years

stark needle May 16, 2026, 11:33 PM

#

umbral wigeon Because how I can believe if someone came up to me and say, I've graudated from ...

I mean are u asking for proof or wym

umbral wigeon May 16, 2026, 11:34 PM

#

stark needle I mean are u asking for proof or wym

No, I'm just sharing my opinions

#

Not everyone have proof

#

I've heard that job hiring are using Ai to review CV/Resumes, it is that bad?

rough bloom May 16, 2026, 11:37 PM

#

umbral wigeon Because how I can believe if someone came up to me and say, I've graudated from ...

newerolookupheadempty getting a job at Google that early is probably not impossible if you can show technical competence and find the right opportunity

young plover May 16, 2026, 11:37 PM

#

glue I did have cool stuff on GitHub but Nintendo nuked the main project that I contributed to.

rough bloom May 16, 2026, 11:37 PM

#

not unbelievable at least

stark needle May 16, 2026, 11:39 PM

#

young plover <:glue:1360392827661717535> I did have cool stuff on GitHub but Nintendo nuked t...

make a switch 2 emulator neuroPogHD ✅

#

SCHIZO