#programming

1 messages · Page 523 of 1

olive sable
nocturne olive
#

neuroThinkSmugYes but that costs more money

#

I don't have that kinda spare money for non-GPU things

#

It's basically 100€ with shipping

stark needle
#

fair enough

native surge
stark needle
#

i mean just cause u seemed interested in vr

nocturne olive
#

neuroThinkSmug Well I just saw you doing silliness in VRC and it seemed silly and interesting

sage crag
native surge
#

Me not

#

Meow

#

Or femboy

nocturne olive
native surge
#

Ish

#

Nerd

stark needle
stark needle
#

@nocturne olive is the training time that bad or why u need so much gpu compute

nocturne olive
#

The more GPU power I have, the bigger and better I can go faster

stark needle
#

are u pretraining the model from scratch?

nocturne olive
#

I could cut training time around in half with a second 3090 by running in parallel like Wispers

nocturne olive
nocturne olive
#

It's only 100M parameters

midnight sigil
#

neuroKufufu I should start my cybersec journey

#

tired of playing games all day and having nothing to do

nocturne olive
#

neuroThinkSmug Either way I'd certainly appreaciate being able to reach higher or equal quality in less time

midnight sigil
#

smh why neuroBucket me

midnight sigil
olive sable
fast pagoda
sage crag
#

pluu

frozen igloo
olive sable
fast pagoda
opaque wharf
frozen igloo
fast pagoda
#

ohh
that copper plate does give styro impression

#

i still havent watched that

sage crag
#

dont think this puts us on speaking terms

#

find the birth certificate or im not buying you food

sage crag
opaque wharf
#

I do wonder, is mitosis basically clone?

fast pagoda
#

yes

#

genetically identical

#

unless cancer

opaque wharf
#

It does makes me wonder how do we manage the administrative process of lab specimen after mitosis

fast pagoda
#

HeLa () is an immortalized cell line used in scientific research. It is the oldest human cell line and one of the most commonly used. HeLa cells are durable and prolific, allowing for extensive applications in scientific study. The line is derived from cervical cancer cells taken on February 8, 1951, from Henrietta Lacks, a 31-year-old African-A...

sage crag
#

freezing

fast pagoda
#

look into cell lines and how thhey manage those shit is wild

opaque sigil
stark needle
#

h

opaque wharf
#

Ketchup curry? catdespair

fast pagoda
#

mushroom ketchurp

opaque wharf
#

Still cursed

stark needle
#

t

fast pagoda
#

t

#

p

#

s

#

:

opaque wharf
#

tpsbeat me to it

rigid snow
#

slash slash

fast pagoda
#

n

rigid snow
#

e

fast pagoda
#

u

opaque sigil
frozen igloo
olive sable
#

Its genuinly good

fast pagoda
#

europe is speaking

be careful everyone

olive sable
opaque sigil
#

sam europe has spoken mhm

fast pagoda
#

stay frosty

kind nimbus
sage crag
#

no child of mine will be bad at osu

#

5 digit or disowned

rigid snow
#

had to scroll almost to the bottom, you'll never guess what AX stands for

sage crag
#

agentic experience

rigid snow
#

smart

#

am dumb

rigid snow
#

as far as i'm aware yeah

sage crag
#

ive seen this guy in like 90 places

#

on the street

rigid snow
sage crag
#

NOT real

rigid snow
#

i think he used to be a typescript nerd youtuber now everythiong is ai

sage crag
#

vibecoding has become such a negative buzzword that people are calling themselves "REAL agentic engineers" instead

#

ai hero

rigid snow
#

what did enums do neuroD

sage crag
sage crag
rigid snow
#

respectfully

#

what the fuck is a js enum

sage crag
#

string

rigid snow
#

dsfksdfksf

#

i think i came up with a really cool idea, you know how harnesses pin editable stuff, e.g. a todo a model edits with tool calls? what if we do that but with mood/tone, can use that to prompt tts tone and change the model expression instead of semantic analysis and should reflect the intent of the model better

#

i don't think for example sarcasm is easily picked up by semantic analysis systems

azure lynx
#

depends on how obviously sarcastic it is. a transformer based system might be able to represent something similar, assuming there is enough context and it's something obvious like "I just love how easily this website made it to lose all my data." because people usually don't associate loving with losing things the transformer might actually end up representing in a way which implies sarcasm. but sarcasm is hard without verbal and/or visual cues.

rigid snow
#

yeah that's what i'm trying to address, what if the model just tells its intent in a way that doesn't require tuning it to do so? obviously increases latency since the model has to do a tool call once in a while to change its tone but it shouldn't be by that much if you're running the model locally. i'm gonna have to test this of course since i have a suspicion they're gonna be using the system wrong/not at all but i probably can fix this with a good enough prompt

azure lynx
#

i have my agent do a tool call to set the emote they want to do, reminding them if they haven't set it in a while.

rigid snow
#

yeah or that

azure lynx
#

i think it probably helps with keeping them "feeling" a consistent way. usually they're just "vibing" though.

stark needle
rancid ore
#

did you know gifs can probably be used in computercraft?

#

they're very compact.

kind nimbus
tiny edge
#

guess what program i was using

#

that made my system OOM

olive sable
#

You have 16gb, i can think of 150 different programs that could

opaque sigil
#

list them

olive sable
rigid snow
#

blender and ue are the first things that come to mind

olive sable
#

Yep

#

Unity too

rigid snow
#

haven't really used it so

#

surely can't be worse than ue

opaque wharf
#

Tab hoarder enub

olive sable
#

Coukd also be an ai thingy or a mkdern game

#

Considering this is neurocord

#

Altho the cpu is too bad to run an llm tbh

kind fable
#

Good code twitch good code

opaque wharf
olive sable
bitter phoenix
# kind nimbus

I don’t like how most distros preinstall the Nvidia OSS driver which for some people including me is completely broken, or make the choice of using the official Nvidia complicated as hell. It’s tarnished people’s belief that Linux can work with Nvidia to the point they think they need a special distro for it, you don’t need to reinstall your os. When I think of the steps you have to go through on Debian to install it I’d just say it’s impossible to do correctly due to how much they hate closed source

opaque wharf
#

Cachy enub

#

Honestly people are overestimating the difficulty of bastardized Arch

olive sable
#

Nvidia linux is pretty easy no? It worked for me almost perfectly on all cards i tried on both OS i tried

kind nimbus
opaque wharf
#

Like look at steamos. Those are Arch based too

bitter phoenix
rough bloom
#

pretty much every distro has either a package or an installer thing for it

marsh coral
#

i meann i run fedora

marsh coral
#

i have to do akmods or smth

#

and i have to like

#

manually sign the keys

#

since i have secure boot

olive sable
#

You just need to set the DRM thing to nvidia thingy in the boot params and rhen everything works completely fine

rough bloom
#

maybe decades ago

bitter phoenix
#

With better performance than windows on everything

marsh coral
#

so idk abt the past

#

all ik is that i switched to fedora 3 months ago

rough bloom
#

idk how it is on Fedora but pretty much every distro is forced to have decent NVIDIA support due to how common the cards are

olive sable
tiny edge
marsh coral
#

i think my system is boutta go OOM

#

all my swap memory is full

olive sable
marsh coral
#

i shouldnt have let my model run 24/7

tiny edge
#

im using 100% zram

bitter phoenix
bitter phoenix
olive sable
#

Its still suggested, but nvidia works fine too now

rough bloom
tiny edge
#

it cant get more obvious

rough bloom
#

you should kill it glueless

tiny edge
#

its not that bad

#

the other program tho

olive sable
#

Idk what heroic is.

tiny edge
#

the most unoptimized program ever created

olive sable
#

Is that the league launcher?

tiny edge
olive sable
bitter phoenix
tiny edge
#

its medal

#

if you look there's a bunch of 600mb instances

#

it has a terrible optimization which causes

#

terrible memory leak

marsh coral
#

omg my system just went OOM

#

how is 90% of my cpu being used

bitter phoenix
marsh coral
#

might be bcz im running my agent model, brave browser, running a fedora kernel update, then an akmods for rebuilding nvidia and vbox, and also running feishu

#

that mightve overloaded the system

opaque wharf
#

So it is AI enub

marsh coral
#

uh oh its happening again

opaque wharf
marsh coral
opaque wharf
#

But its probably is the AI

marsh coral
#

my ai inside a podman container on a separate account

bitter phoenix
opaque wharf
#

By building, did you mean compile?

marsh coral
#

tho it is running as a quadlet systemd service

minor crag
#

I was thinking that training a 12B transformer on CPU would use all my 16GB of ram and set my CPU on fire

But I've done the math and using my backprop and forward pass optimisations it should use less than 3.5GB of memory (3GB for actual weights then maybe 500Mb for the CNN) and maybe 60 or 70 percent of my laptop CPU

So this seems quite promising

marsh coral
opaque wharf
#

Because compiling will make CPU go brrr too

marsh coral
#

since the kernel update js finished

#

tho like sometimes the system jumps to 99% CPU being used all the sudden

#

and the memory/RAM use spiking

#

then returns down to only 20% of CPU used

minor crag
marsh coral
#

ok i think ik why cpu is jumping

#

cuz of akmods

umbral wigeon
#

What will you do if this guy(me) PR you 80000 LOC

marsh coral
#

rebuilding the modules

umbral wigeon
#

-10000 LOC

bitter phoenix
# minor crag I was thinking that training a 12B transformer on CPU would use all my 16GB of r...

What I did for free was put my model on google drive; open google colab and clone my repo and train using their free Nvidia tesla gpus and when the account was out of hours I would push the result to Google drive and sign into colab with an alt and then pull it back for more training again and again and pull it back to my computer when done, also make sure you’re using the best optimizer like Muon instead of AdamW and there’s no software bottlenecks

#

Also I’ve done my own testing: GQA > MHA, decoder-only > encoder-decoder > encoder only

#

Was pretty much already known but I wanted to see if there were any cases where it wasn’t true, but no

#

Just use GQA decoder only bf16 Muon

minor crag
bitter phoenix
bitter phoenix
minor crag
bitter phoenix
#

Once you have a good architecture you’re confident in you can just pretrain it and re-use that same model for multiple different things

#

I have an archive of 90m 500m and 1b models I made myself

#

The top property is data quality, recently it’s a lot of post-training research to get it just right

bitter phoenix
stray dragon
stray dragon
minor crag
stark needle
#

architecture

minor crag
minor crag
stark needle
#

i doubt it would run even with heavy gradient checkpointing

bitter phoenix
#

12b is huge

stark needle
#

bf16 12b is like 24gb

#

add the two momentums of adamw

#

and ur dead

#

and we didnt even consider activations

bitter phoenix
#

Might want to think about the evals to see if that’s really necessary

stark needle
#

and flash attention doesnt exist on cpu in pytorch

minor crag
#

I'm using a mix of optimisations with one being 1.58 bit weights so it takes up just about 3GB for the entire model weights

And I'm making the model bigger because I can just afford the compute and it'll offset the loss of using 1.58bit weights and some of the other optimisations

stark needle
#

pretraining in 1.58 bits seems like a very bad idea

bitter phoenix
minor crag
stark needle
#

fp4 training in the nvidia paper was already very unstable hence they had to do random hadamard rotations and keep the amax history and invent nvfp4, otherwise it would just NAN

minor crag
bitter phoenix
bitter phoenix
stark needle
#

switch the friggin activation function to something random and ur dead

minor crag
stark needle
#

does luau support simd intrinsics

bitter phoenix
#

My goal is I don’t think what I’m looking for is possible with existing tools like ollama, everything I’ve seen who’ve leaned on ollama to ‘make their own model’ get no where

minor crag
bitter phoenix
#

So I write everything so that there can’t be any blockers

stark needle
#

especially since the ops are quadratic with standard attn

#

and no simd

#

for a 12b model

minor crag
bitter phoenix
stark needle
fast pagoda
bitter phoenix
#

You could probably record the loss, batches & hours and make a chart

#

To know how long it will take

true hemlock
#

bruh gonna train a 12b model on basic fp int units with no simd 💀

fast pagoda
#

see you in 203

#

0

true hemlock
#

ipc going to hell

#

no

#

see you in 3020

fast pagoda
stark needle
#

also if u wanna reach reasonable loss faster u should prob use a smaller transformer due to how scaling laws function

stark needle
bitter phoenix
stark needle
fast pagoda
rough bloom
true hemlock
#

bruh lmao even at the absolute best 4 IPC apparently it would still take you 40K years to train with reasonable sized dataset

#

assuming 16 cores

bitter phoenix
#

Maybe trim out some irrelevant training sources if possible?

minor crag
#

I'm not training the entire 12B at once

I'm training 500M first then once the loss stabilises I'm gonna freeze the first 500M, add 500M more, train it for 1 3rd the total time, unfreeze the original 500M 1 3rd the way through that run and train both together for 2 3rds of the total time

Then repeat multiple times over a few months until the model hits 12B

This combined with the many optimisations and replacements compared to a standard transformer will train the entire model in a few months on CPU

true hemlock
#

what

#

uh

#

okay, what size of dataset are you aiming for

rough bloom
minor crag
rough bloom
#

tad slower is an understatement usually

#

there's a reason why everyone uses backprop kek

#

(for normal ANNs, that is)

true hemlock
#

because training each transformers layer would not only fuck up generalization a bit, it would still take you years, possibly hundreds, AND assuming foward prop alone.

bitter phoenix
minor crag
kind nimbus
stark needle
#

@bitter phoenix idk if it interests u but here google leaks gemma scaling laws https://arxiv.org/abs/2501.18914

#

page 6

true hemlock
stark needle
rough bloom
bitter phoenix
bitter phoenix
minor crag
true hemlock
#

<insert SNN mention copypasta>

rough bloom
opaque sigil
#

SNN SCHIZO

true hemlock
#

is the fact that they wanted to train on cpu

fast pagoda
#

bitnet works because it has gradients on the backwards pass, there's a latent full precision copy of every single weight which is actualy what is being optimized

true hemlock
#

with no simd

amber fractal
ebon basin
#

uuuh net 11 has MCP directly integrated on one side I like it but on the ohter side MCP is bloated

rough bloom
minor crag
true hemlock
rough bloom
fast pagoda
true hemlock
#

bare non-vector arithmetic units alone are slow as hell there's literally a reason AVX exist

ebon basin
#

another uii JS directly in MAUI

opaque sigil
#

has anyone checked whether the add/sub only "matmul" in bitnet is even cheaper than using tensor cores

stark needle
#

with triton kernel

ebon basin
#

and AOT for android

minor crag
opaque sigil
#

is that a yes for it being cheaper

true hemlock
#

aint no way bro

fast pagoda
#

you would need a population size that exceeds yo momma's weight

#

that's impossible

stark needle
bitter phoenix
amber fractal
nocturne olive
#

neuroConfused What be going on here?
veryNeuro Seems kinda entertaining

true hemlock
#

also i still don't get the reason with avoiding backprop

olive sable
fast pagoda
minor crag
nocturne olive
#

neuroConfused Who told you that?

amber fractal
#

I'm ignoring backprop because I have to, I can't say anyone else should at all do so

nocturne olive
#

neuroThinkSmug You'd be better off training on free tier Google Colab, at least it has a GPU

stark needle
#

just use ngram model

#

atp

bitter phoenix
stark needle
#

on ~100B tokens

minor crag
bitter phoenix
rough bloom
fast pagoda
#

i dont think even hinton has gotten forward forward prop to work in anything transformer shaped at all let alone a 12b

bitter phoenix
rough bloom
fast pagoda
nocturne olive
#

veryNeuro Highly entertaining

amber fractal
nocturne olive
#

Keep it up

minor crag
fast pagoda
#

tribiall brain dablage

sage crag
#

written in luau

amber fractal
sage crag
#

chat can i train 1t model on my athlon xp

rough bloom
#

surprisingly easy

amber fractal
#

True, the real battle is making it useful

nocturne olive
stark needle
#

1 trillion bigrams

#

nocturne olive
#

neuroThinkSmug It's only gonna take 1 trillion years

opaque sigil
sage crag
#

512mb ddr2 ram

fast pagoda
#

i think

nocturne olive
#

Clueless Just keep all the weights on disk

fast pagoda
#

llms currently

#

being so bloated

#

has distorted what a "large model" is

#

so badly that people think a 12b is a small network

sage crag
#

1b params

#

big number

amber fractal
#

stark needle
#

fr who remember 500m param vit/cnn being huge

bitter phoenix
stark needle
#

everyone was like holy shit

fast pagoda
#

GO BEYOND

nocturne olive
#

neuroThinkSmug Is a 100M parameter NeuroSynth big?

rough bloom
#

GPT-2 LLM

sage crag
#

1b params in your mnist number recognition toy

rough bloom
#

117M params kek

sage crag
sage crag
#

3 parameter

#

llm

#

vs

nocturne olive
fast pagoda
#

yes/no/maybe

sage crag
#

my child

fast pagoda
#

2

sage crag
#

youre right

#

2 param llm vs my child

bitter phoenix
#

Is the baby coughing

sage crag
#

@fast pagoda are you coughing

fast pagoda
sage crag
#

the answer is maybe

rough bloom
#

I bet on the LLM

#

it is immortal

rough bloom
#

so it will win

#

by default

bitter phoenix
#

I bet on the baby

sage crag
#

which one has higher battle iq

amber fractal
#

I'm betting on the baby

fast pagoda
#

always bet on DaBaby

sage crag
#

what if its an english exam

fast pagoda
opaque wharf
#

I need to learn chinese enub

fickle rain
#

Oops, forgor that I need to add it back :D

#

Yay, no more bugcheck

kind nimbus
obsidian mantle
#

what does it allow you to do?

#

write exploits? neuroMonkaOMEGA

obsidian mantle
#

surely they will only give it to the good guys and there will be no issues with this system glueless

glass flower
#

catAsk anthropic give me access to mythos...

obsidian mantle
#

use claude to get an exploit to access mythos to make ultimate exploits neuro5head

glass flower
#

LULE i'll be honest i wouldn't be able to afford to run a single query on mythos anyway... all the claude models are so damn expensive

obsidian mantle
#

well

#

is there anything that can stop them from making this thing stronger and stronger

#

they can just grow it more and more cant they

glass flower
#

i mean.. thats the goal of all frontier labs LULE

#

google has deepmind

#

deepseek exists

obsidian mantle
#

"mythos, destroy this country please" glueless

glass flower
#

openai now has gpt 5.5-cyber that is like mythos

obsidian mantle
#

fine tuned to do cybersecurity stuff?

#

or smth

glass flower
#

yeah finding exploits. security stuff

obsidian mantle
#

or is it like

#

it just becomes so strong it can do it without any specific tuning

#

and nobody knows how it does it

glass flower
#

its for their glasswing thing

#

which is what anthropic does with mythos

obsidian mantle
#

is idea of clashing glasswing vs mythos being worked on

glass flower
obsidian mantle
#

can it potentially make them godlike while they fight

#

because they train in the process or something

#

or since they are llms it wont work

glass flower
#

openai and anthropic haven't had any new stuff they developed in a long time

obsidian mantle
#

just stacking more islands?

glass flower
#

any innovation in the AI space comes from Google or deepseek. and anthropic/openai just adapt the research into new models

#

they are currently more interested in turning a profit rather than developing new ideas

obsidian mantle
#

hmm

#

makes sense

#

so for google this thing is like a side job they are not feeding off it

#

if i got it right

glass flower
#

google is the reason gpt exists LULE they wrote the original paper on how to do it. but just didn't release a model

#

they have been doing AI for like 3 decades at this point

obsidian mantle
#

but anthropic/openai are focused on this thing and have no other income basically

glass flower
#

google is easily the safest bet in the AI race

#

anthropic and openai are just the shiny things... but true AI will come from google

obsidian mantle
#

NeurOhISee cool

glass flower
#

if they ever decide to actually take it serious LUL

#

deepmind has been doing AI research for a long long time

#

it also helps that google has the whole thing under their belt. they do inference, models and the hardware to run the inference

#

they do the whole thing. so they are under no risk of external forces. like if nvidia goes down they take the whole AI bubble with them.. but google would be unnaffected

obsidian mantle
#

they have their own hardware? NeurOhISee

glass flower
obsidian mantle
#

is it like fully their own thing

#

ai focused

#

cant buy it on amazon and shit

#

secret google chips

glass flower
#

pretty sure ye

obsidian mantle
#

damn

glass flower
#

i mean.. even their open source models have gotten a lot better lately. gemma 4 is a beast overall

#

google is just not focusing rightnow on benchmaxxing so they look bad and might not be the best in agentic tasks... but they focus on things that people might use in their products.. like the AI search thingy, the video summary on youtube. and ondevice small ai's

frozen igloo
#

Classic

rough bloom
#

can rent on Google Cloud but can't buy

glass flower
#

if someone gave me $10k dollars and i had to bet it all on a company to win the AI race. i would put all the money in google tbh Minamhm

#

they might be lagging behind the best and greatest models... but they always catch up

true hemlock
#

i'd say they're winning in terms of real breakthroughs

obsidian mantle
#

what about deepseek

true hemlock
#

i dont consider llm benchmark scores any significant.

glass flower
#

deepseek isn't self-sufficent. they rent gpu's thats the main reason i wouldn't... but yes they would be second to google

true hemlock
glass flower
true hemlock
#

screw the llm part that's unimportant as hell

obsidian mantle
#

is alphafold that chemistry protein thing or what is it

#

i think i saw some youtube thumbnail about it

true hemlock
#

protein folding prediction

tropic spindle
glad path
true hemlock
#

idk man. google deepmind is the ONLY one making real academic contributions in the ML field.

glad path
#

it seems like it'd be pretty useful for medical research

true hemlock
#

the rest are cashgrabbing with llm shit and most they do is just modifying the architecture a bit

opaque sigil
mighty thorn
#

Don’t let him fool you

#

SNN always stays closed and unpractical at scale

true hemlock
#

if you can't implement the backend, skill issue honestly

fast pagoda
opaque sigil
mighty thorn
#

He’s just cranky that his field has irrelevant for 4 years

rough bloom
#

it's fine

#

next year is the year of SNNs glueless

opaque sigil
obsidian mantle
#

Snn is science neural networks?

mighty thorn
fast pagoda
#

stinky neuro network, it's what neuro uses, ask the green fog

opaque sigil
obsidian mantle
#

I am getting caveman vibes again vedalDespair

stark needle
#

bro u dont know

#

mixture of snns

#

stark needle
#

tensor networks bro

#

next gen shit

true hemlock
#

bro

#

quantum neural network bro

#

next level

#

the future!!11!11!

stark needle
#

tensor networks is based in quantum computing or osme shit

mighty thorn
#

Shadow is an ai bro now

stark needle
#

u can represent bigger models with less params

#

via tensor networks

#

its fucking insane actually

#

literally

#

less params to backprop

#

and theres a fucking pytorch impl

obsidian mantle
#

Why is training speed an issue at all

stark needle
obsidian mantle
#

Do some neural networks take like super long to train

#

And its viable because can shrink 10 years training to 1 year for example

stark needle
obsidian mantle
#

Or is it because every train is trial and error

#

And you can do 1000 attempts instead of 10 attempts for the same period of time

#

I am caveman please clarify vedalNeuroYay

stark needle
#

schizo site

obsidian mantle
#

Or is it wip

stark needle
#

ucan convert all tensors

obsidian mantle
#

Why arent they making it

#

I mean

stark needle
#

a company does this shit

obsidian mantle
#

You need x10 times less islands

stark needle
#

Model as a service

obsidian mantle
stark needle
obsidian mantle
#

Or is it new and wip and they are doing it rn

stark needle
#

but the idea of tensor networks exists since like forever

obsidian mantle
#

Soo

#

Where models

stark needle
#

idfk

#

ive tried this shit tho myself

obsidian mantle
stark needle
#

and it rly works

#

ive ran like 100M param pretraining

#

with tensor network uses like 10M params

obsidian mantle
#

Do you need some quantum computer to run it

stark needle
#

no

true hemlock
stark needle
#

gpu is fine

obsidian mantle
#

No i mean

#

I get that its super good

#

And overpowered

#

I ask why for example

#

Openai wont use it rn

stark needle
#

who knows whos using it

obsidian mantle
#

To make 10000000b model that fits into previously 1000b model size

opaque sigil
#

if it works at scale and doesn't fuck up everything they're probably using it already

obsidian mantle
#

And save 90% of money on building islands

mighty thorn
opaque sigil
#

granted that's a big if

rough bloom
#

apparently super good
really old
has PyTorch impl
somehow nobody is using it
neuroSussy

stark needle
#

just gonna say that google had also a library

opaque sigil
#

archived neuroPogHD

obsidian mantle
stark needle
#

deepmind has alr schizoed into this shit

true hemlock
#

im just giving some examples for scale

obsidian mantle
#

So you saying that it works and everyone using it

mighty thorn
#

Quack is an SNNbro

#

He hates ai industry

obsidian mantle
#

Why my gemma4 26b takes 17gb vram then

mighty thorn
#

But has agi internally but it’s just too dangerous to release

true hemlock
#

idk wtf is this dog on about lmao

mighty thorn
#

So basically

amber fractal
#

Weaksauce hating

mighty thorn
#

You know how OpenAI refused to release gpt 2 cause it was too dangerous

#

That’s where they are rn

#

They are gpt 2

opaque sigil
#

you're talking about two completely separate things FOCUS

mighty thorn
#

They are still catching up

amber fractal
#

Kaine is yapping because they want to spread infomation, never said it good or informative tho

true hemlock
#

maybe im leaving details like moe feed forward is much more compute efficient

stark needle
#

use RWKV or some shit broSCHIZO

#

gated deltanet

true hemlock
#

idk, kaine is talking about something else

mighty thorn
# mighty thorn They are gpt 2

Not capability wise. SOTA SNN can barely spell. But I mean they are still in the phase where they all think they are savants who just created sentient machines and need to gatekeep it for the protection of the human race

true hemlock
#

and being somewhat obnoxious for absolutely no reason whatsoever

obsidian mantle
#

So quack is talking about different thing completely did i get it right
Not talking about tensor model shrinking

#

Or what

true hemlock
#

yes

opaque sigil
#

quack was talking about why training can take tens if not hundreds of million gpu hours YES

obsidian mantle
#

Oh

#

I see

true hemlock
#

yeah

obsidian mantle
#

Then this tensor shit is mega overpowered? No?

true hemlock
#

there's a reason we laughed off of a guy who thought he could train a 12b 1.58bit model on a cpu

mighty thorn
true hemlock
#

within few months

rough bloom
#

but somehow never deployed anywhere neuromegadance

#

not widely known for LLMs at least

opaque sigil
rough bloom
#

despite existing for ages already

opaque sigil
#

i don't remember much about tensor networks to give an actual answer FOCUS

obsidian mantle
#

Okay and then my gemma4 takes so much vram because gemma4 200b fitting in 4090 would be so strong i could exploit windows on it or some shit

opaque sigil
#

no-brainer optimisations are obviously used everywhere

obsidian mantle
#

While mythos needs only 500gb vram while being 500000b model

stark needle
#

it would still use a lot of vram for activations

true hemlock
#

god, im late for my fishing trip brb need to call an uber

stark needle
#

activations dont get reduced

stark needle
obsidian mantle
#

Oh so this thing is purely training related

#

And no x10 benefits on inference

rough bloom
#

no, it only affects the parameters I think

stark needle
#

only params are affected

rough bloom
#

shrinks the model itself, but not the space required for any computations with that model

stark needle
#

Tensor networks +deepseek compressed sparse attention SCHIZO

mighty thorn
#

I want a source so I can ridicule it

obsidian mantle
#

Numbers are random

stark needle
mighty thorn
obsidian mantle
#

But i already got answered

obsidian mantle
mighty thorn
stark needle
#

@mighty thorn u keep saying moe is shit but gemini and chatgpt and shit are all moe

mighty thorn
stark needle
#

that's just poor copium

obsidian mantle
stark needle
obsidian mantle
#

I still need 20gb for 26b model

#

But its trained much faster

#

Trying to understand

opaque sigil
#

moe is good for what it's made for, which is allowing you to cram far more knowledge into the model for relatively cheap FOCUS

rough bloom
#

MoE is awesome for big models

opaque sigil
#

no need to pretend it's worthless

mighty thorn
#

You either die dense or live long enough to see yourself become sparse

rough bloom
obsidian mantle
#

Moe is just mega fast inference

stark needle
obsidian mantle
#

Which is good

mighty thorn
fast pagoda
#

but it's used in all sota contexts currently t

stark needle
#

bro moe makes llm infinitely cheaper to train

fast pagoda
#

doesnt seem particularly useless

stark needle
#

u just need more vram

#

but compute is the same

#

activations dont grow

#

u can use higher context length

true hemlock
#

i don't think kaine knows what he's talking about

#

missed the point of moe entirely

mighty thorn
obsidian mantle
#

I tried moe vs non moe and all i noticed was x5 faster inference

fast pagoda
#

what is .... most contexts?

stark needle
#

bro all sota llms are moe

mighty thorn
mighty thorn
# stark needle
  • Because cheaper for improving benchmark scores versus actually making better models
rough bloom
true hemlock
#

in the enterprise space where they basically have endless hbm memory, moe is highly beneficial

obsidian mantle
#

But moe doesnt shrink required vram

true hemlock
#

less compute being used

stark needle
#

dense llms is just poor people copium

true hemlock
#

also

#

moe is basically

mighty thorn
fast pagoda
#

the weakness of moe isnt a problem at sufficient active parameter counts
Something like Qwen3 30B-A3B knows things like a 30B model but is limited by the width of the active slice in any given forward pass

scale this to frontier size and the active parameters are in the tens to hundreds of billions again and it scales incredibly well when total params end up in the trillions

mighty thorn
#

Least efficient possible idea

rough bloom
true hemlock
#

yeah

obsidian mantle
#

Didnt notice any explodes in 26a3

#

It just took like 17gb all the time

true hemlock
#

basically

mighty thorn
obsidian mantle
#

I understand that it assignes some "experts" that make it use only relatable params instead of all params evrry tjme

true hemlock
#

you got a model that inferenced just tiny part of itself but knows as much as the total params
though the drawback is that its somewhat dumber, but when you have like hundreds of billions params scale and work with enterprise scale inferencing does it even matter

fast pagoda
#

"nothing" is a misunderstanding of moe

stark needle
#

due to expert load balancing

obsidian mantle
#

Whats the point of calculating impact of tokens that say 2+2=4 if it needs to write a poem

#

Its not a bad idea

true hemlock
mighty thorn
true hemlock
mighty thorn
#

Which they will be

true hemlock
mighty thorn
#

Since only 3 are generalists while all the others are overfitted to Turkish history

true hemlock
#

did bro fail statistics class

rough bloom
true hemlock
#

crazy

stark needle
rough bloom
#

this is why you have load balancing in MoEs

stark needle
#

load balancing loss

true hemlock
#

also when you get multiple people using the same experts

#

that's not bad either

#

you batch the damn matrix

fast pagoda
#

Multiple users hitting the same weights is the thing you want it's the entire economic basis of batched serving. Weights are readonly reads don't contend

true hemlock
#

exactly

fast pagoda
#

you can batch serve a dense model all on the same weights

stark needle
#

since u would be doing DDP

#

or TP if model too big

warped narwhal
#

Does anyone here use both nixos and jetbrains ides?

true hemlock
#

i don't use jetbrains

#

i use nixos occasionally though

#

shuni is proud

mighty thorn
#

LLMs have been on the wrong track since GPT-o1
Now everything is MoE CoT benchmaxxed distilled int8 marketingslop

true hemlock
#

well

obsidian mantle
#

linux vedalEwNo

rough bloom
fast pagoda
#

throughput can get fugged for bad routing with one hot expert doing all or if the tokens all scatter like crazy that can be an issue but especially the single expert getting smashed is an issue of iimbalance

true hemlock
#

god im still waiting for sleep deprivation to hit me im sleepy as hell i need the sudden surge of energy

warped narwhal
#

I'm pulling my hair out trying to get flakes working with them, cause everytime I try using rust rover etc, it complains that it can't find the rust install because rustup isn't available (even though it is installed) and it fails to run it because it's a dynamic exe

rough bloom
#

all the big LLM benchmarks are close enough that I do not care

opaque sigil
#

surely there's a way to tell them where to find rust outside of rustup right

true hemlock
warped narwhal
stark needle
#

earlier llms were unusable

#

actual braindead shit

opaque sigil
#

are you sure you even have the stdlib

#

it's its own toolchain component

warped narwhal
#

Yes, I can compile rust apps outside of the ide just fine, it's the ide itself that shits the bed when it tries anything

opaque sigil
rough bloom
fast pagoda
#

yea i mean the indicator is like a black and white line

#

it's either in step with peers vaguely on useful tasks or it's total dogshit

#

that's about the max judgement to be had there

rough bloom
mighty thorn
#

Then they forced CoT on us

#

Then MoE

#

And now we are here

stark needle
#

bro 4o was moe

fast pagoda
mighty thorn
#

5T frontier model with 400k active parameters

rough bloom
#

Mixtral Oldge

stark needle
#

mixtral was fucking god tier at that time

#

47b params for gpt4 perf

#

would beat the fuck out of llama 2 70b

fast pagoda
#

god

#

i archived the fuck out of mixtral as my "if the world ends" model to have lmao

#

at the time

#

you just reminded me

mighty thorn
#

I have a few of those

stark needle
#

i had llms locally cause huggingface was down constantly

#

back in those days

#

in a fucking gitlab lfs

opaque sigil
mighty thorn
#

Stranded on island with solar powered laptop and 8b local llm, the creation of the ultimate schizo

rough bloom
#

NeuroChatting OpenClaw, get me off this island

opaque sigil
stark needle
#

fuck the laptop

#

get 2x gb10

#

similar watts

#

run deepseek v4

#

moe

mighty thorn
#

Price check that for me

stark needle
#

poor people cope

mighty thorn
#

See how laptop like the price is

stark needle
#

or osme shit

mighty thorn
stark needle
#

i was earning slave wages

#

for 4 years

mighty thorn
stark needle
#

while working at L4 technical level

#

as a fucking kid

mighty thorn
# stark needle while working at L4 technical level

Shadow should do the thing where the rich person gets rid of all of their assets and qualifications and tries to rebuild from scratch to prove it’s easy.
He’d last a few hours before breaking down upon having to go to chick fil a instead of eating 50lb of caviar every meal

stark needle
#

ive had

#

cold dms from random sillicon valley cofounders shit

#

on fucking discord

#

i didnt even need to show my cv lmao

#

just show technical competence

mighty thorn
#

Let’s see what the people say

#

Dirty 1%er

obsidian mantle
#

Age 20

#

I was living on 100$/mo neurOMEGALUL

fast pagoda
#

me when i doxx

mighty thorn
mighty thorn
#

Probably

obsidian mantle
#

Well right now i am close to that

#

But im 28

#

Its like

#

Super huge difference

obsidian mantle
#

Its basically luck

#

Time can counter luck

#

Effort too of course

#

Cant do shit without it

fast pagoda
#

me, age 20

stark needle
#

"luck" evilStare

obsidian mantle
#

Yes luck

umbral wigeon
mighty thorn
#

The richer Australian with multiple a100 and a collection of exotic GPU and cpu defends the less rich guy

#

0.1% and 1% coming together to gaslight the poors 😭

umbral wigeon
obsidian mantle
#

Wtf is he essaying there

mighty thorn
#

Which is worth more than everything I own combined

obsidian mantle
#

I mean if you have 10k per month income why not

#

Literally my current job but in US gives exactly that

mighty thorn
obsidian mantle
#

Yeah

true hemlock
mighty thorn
obsidian mantle
#

Yes luck x2

quasi sundial
obsidian mantle
#

Did you apply to google at 15

stark needle
mighty thorn
#

Child labor is the answer to poverty, ladies and gentlemen

obsidian mantle
#

Did you speak english at 15

fast pagoda
#

i don't think a combined spend of like 12k-15k on tools that are used in your line of work + less than 5k hobby shit constitutes rich

rich is a whole different ball game than anything described

umbral wigeon
#

Is it legal? To let 15 y/o working in big tech? Like in your country

obsidian mantle
#

Did you know what programming is at 15

#

Yes luck

umbral wigeon
mighty thorn
stark needle
fast pagoda
umbral wigeon
stark needle
obsidian mantle
#

I am trying to explain that you got lucky to be there where you were

#

With your skills

stark needle
umbral wigeon
rough bloom
stark needle
#

No my gh is empty

obsidian mantle
#

Good education too

rough bloom
fast pagoda
#

spawn luck

mighty thorn
#

Can’t build skyscraper on dirt

#

Need foundation

obsidian mantle
#

Hesright

#

I wasn't supposed to speak English at all for example

#

I just randomly stumbled upon forsen stream and understood 10%

umbral wigeon
# obsidian mantle Good education too

How each eduactions system in different countries even works, I mean how did someone at 15 able to get phd, do they just max stats and beat it to phd?

fast pagoda
#

the ability to be active in this discord to the extent that the majority of people in this conversation are is already indicative of a similar type of luck if not the same extent
there are entire swathes of the world who have no inkling of the wonder we interact with every day

obsidian mantle
#

You need to be noticed

stark needle
#

I dont have phd idk

obsidian mantle
#

What if there nobody to notice you

#

Will you aplly to google

#

"hello i can make calculator im 15"

stark needle
fast pagoda
#

you're welcome, i won't charge for that one since you're on the brink it seems

#

further will require an active credit card on file, however

umbral wigeon
fast pagoda
#

he's stuck like that

#

dont make fun

#

birth defect

frozen igloo
obsidian mantle
#

I read it fully now

#

Holy shit

umbral wigeon
#

I think programming is easy

#

If autistic person can do it

obsidian mantle
#

Is like "i bought 10 lottery tickets and won them all"

#

Congrats

obsidian mantle
#

I also fixed previously unknown bugs and got +100$ raise at my job
I guess i played my cards alright

#

Now company is dying and idk if i should flee the country or go to capital

umbral wigeon
#

If I was a hiring manager, I would like to take a look at github than CV or resume honestly

obsidian mantle
#

My managers dont know what github is glueless

umbral wigeon
#

Or any profile that shows you have passion in the tech

#

If someone actually done something impressive, they should have it all, either github, your facebook, your X as portfoilo to show they're actually facts

#

Because how I can believe if someone came up to me and say, I've graudated from cambridge and got a job at google 15 years old, I mean it's not bad
But I would like to see github profile slams on my face or any official papers that you're actually done it

obsidian mantle
#

Its super rare but not impossible

#

I guess

#

Not like i ever witnessed anything like that myself

umbral wigeon
umbral wigeon
obsidian mantle
#

However i find it concerning that you have to work in google at 15 to afford that shit

#

In 5 years

stark needle
umbral wigeon
#

Not everyone have proof

#

I've heard that job hiring are using Ai to review CV/Resumes, it is that bad?

rough bloom
young plover
#

glue I did have cool stuff on GitHub but Nintendo nuked the main project that I contributed to.

rough bloom
#

not unbelievable at least

stark needle