#programming
1 messages · Page 351 of 1
Or a serious voice
don't see appeal
v3 soon

Whatever the situatio would call for
what situation calls for a different voice
I'm not talking about different voices completely just different pronounciatons maybe
v3 is priority
Mostly the same voice but they could pronounce things differently
quriky
you can tweak some controls on the original voice i suppose? not really an easy way to get what you're describing
would also probably sound weird
v3 voice might add the different tones as well
the same way she changes her speech speed she could change tones
is what they mean i think
that's an "easy way"
Yea kinda like a
V3-scared
V3-normal
V3-questioning
Different modes kinda
in an ideal world she would just output speech tokens and have all of those natively but
vedal is a scaredy cat who does not want to touch or breathe near neuro's base model
"hey vedal we need at least 2 more versions of v3 voice"
because the top 1% of nn would be mad (3 chatters)
it is possible theoretically but: sounds like pain
would need to collect even more training data for it
The only time i ever coded smth so far is for a game with an api in Json
i wonder if he's experimented with this
at all
he did intelligence upgrades a while back
pretty sure he did something with base model there
ah
Filtered
Couldn't you also get smth done with editing the audio instead to make them sound scared?
And just slapping that on the v3
that's the easier way but good luck with making that sound natural
Like changing the pitch etc
would probably just sound weird
Oh
what's the state of omnimodal models rn
Python wont open
different tones of voice is more than just "change the audio a bit"
upsettie spaghettie
words get pronounced differently, emphasis happens differently, timings change
not easy to do with audio filter
you can feed those as parameters to the tts model
well you technically can, what you can or can't depends on the model of course
most of what was mentioned i would guess isn't possible
with, let's say, evil's tts
i made a funny html file how do i post it
and the ai named it whimsical button thats what mari mari always says and i love watching mari mari you cant make this junk up
vibe coders aren't beating the allegations
is neurosdk uptodate does anyone know for sure?
like are the latest game integrations using that also?
I see no reason why not, the only one that is debated is minecraft.
its not really coding its comedy
haachamachaCHU
i summon you!
...why didnt it work
how can i prove im not a bot ;_;

not you
...man i dont understand anything
if anyone wants to play hardcore diablo 3 send me a message
we really don't get that much information, we just have a better guess than most people
a user posted a message, deleted it and has selfbot thingie in profile
oh... i see
also why would he not update the sdk
this one
i instantly clocked em
that's supposed to be {os.time()} maybe
not only did they forget a f, but also failed to use normal timestamps
or not
iirc time is a struct, let me pull up a python
oh then it's fine ye they just forgot an f
idk how you can forget if you see it fail to properly syntax highlisht
nope, function named times in 3.13
weird
where is the correct time because that is not it

os.times()
posix.times_result(user=0.09, system=0.01, children_user=0.0, children_system=0.0, elapsed=4296563.45)
os.time is not defined
ok, cuz i see it has latest update recently so im assuming it will work
if it didn't someone else would be raising hell at the moment, mainly the https://discord.com/channels/574720535888396288/1350968830230396938 folks
datetime.datetime.now().strftime("%D at %H:%M")
'01/07/26 at 14:49'
I somehow failed to remeber that it is in node.js
so that isn't even correct
Actions priority works, but keep in mind Neuro still has latency before responding
<t:{time.time()}:f>
should be

neuro api is just a json over websocket protocol, you're likely thinking of the sdks in which case the last commit is a rough indicator of if it is updated or not
action forces always had l*tency, latest spec change just introduced a way to cut neuro off to reduce time between force and action
can I hide the chats section on discord?
What happened?
you deleted your messages
you even ghost pinged someone now
Must've been the wind
Hello that was me
OR hallucinations
Ghost pinged
Mrs nightmare it was a mistake

I assume this is why
I pinged you to show you this gif
Yes
Lovely entrance
yeah, happens to the best of us
I personally don’t mind it (could never guess why)
im so confused rn
Actually, it’s Kaguya and Yuzuru
Hmm maybe it's Rio
The Date A Live fandom are getting flooded with AI arts rn I think or has it stopped
Last time I've seen a dal fan was 2024
Yeah, it’s still going
I played the mobile game as a kid
Spirit pledge or the new one
you're blatantly self botting mai
And after I realized there was a show in recent years I felt like I had to watch it
Spirit Pledge
All that money just for it to shut down
There's a new game I think
The new ones a remaster, I’ve gotten into that. Spirit Echo, I believe.
Same game
Yeah
Yea basically the old spirit pledge
Brand new
I bet it'll shut down again earlier
Yeah
Since I've earned it back and more
This is programming, we should really be talking in #gaming
Or general
Could you send me the dal server in dms
Alright, ping me in whichever you wanna
I already threw in a report, will see how it goes
they banned


kurumi
How can you make an account in discord using python
Aren't you supposed to use HTTP
it's called spoofing a browser
Coooooooool

Didn't discord patch selfbots

Ironic
"This Account was randomly generated..." 
It's def just a kiddo who wants to look like a bot 
that would be you tho
what are you talking about
yeah not the best of looks there
User tokens cannot be used as a bot token
there is nothing stopping a user from botting a user token as it's own thing
example being the person that was banned
They're spoofing a browser??? As @amber fractal said there's nothing stopping someone botting as a user
Hmmmm
Oh ok
Really should learn Programming more often again
I've lost my braincells
Might've forgotten programming after being forced to study biology

Tbh if your spoofing a user token for a bot you probably have brain rot. I can't think of a valid reason to do so when a bot token can do everything you need and is easier to work with
you need permission to add your bot to other servers
you don't need permission to add your own account to other servers
^
You can just specify a username and password via the Discord API, no need to spoof it.
you can but there's likely some security checks for the API that would make it harder
Right but generally if a server wants a bot in it they will approve it, if not they wolnt. So if you are bypassing the server's rules then it's not a valid reason
I don't know about now, but I self bottled once as a test. All you needed was a username and password.
they've changed their security stuff a couple times I think, there's some form of captcha/turnstile challenge at the login page that needs to be submitted alongside the usr + pwd, I think?
Again, I'm not sure how the API has changed over time. I would have to read the documentation.
idk I haven't really examined discords login page for a while
is there documentation for the discord user api???
Idk I just use an app api token because I use it properly. I have zero reason to connect my AI to a server I'm not explicitly allowed to by the server owner. That's asking for problems
I don't know if there is public documentation, to be honest.
I once self botted back when I was a script kiddy. That's it.
someone probably has started unofficial documentation somewhere
I'm pretty sure the official documentation is: Don't
Just gave my model stack PSE Api access
What is that?
The Philippine Stock Exchange, Inc. (Filipino: Pamilihang Sapi ng Pilipinas; PSE: PSE) is the national stock exchange of the Philippines. The exchange was created in 1992 from the merger of the Manila Stock Exchange and the Makati Stock Exchange. Including previous forms, the exchange has been in operation since 1927. The PSE's headquarters is l...
Programmable search engine, Google's api search engine
Oh
Okay 
You should've just said: Google's search engine API. Other programmers will understand you better that way, if they aren't familiar with Google's API terms.
Yeah, well I don't really use Google's APIs.
Is there something I'm unaware of?
Google APIs apparently
Hrm... 
Do people use Google's APIs all the time here?
PSE is a pretty common method to augment the fixed knowledge of a model
I knew AI used search engines to augment knowledge. I wasn't aware that PSE is a colloquial term. Sorry.
google apis are kinda annoying to work with
Yah they are. It's one of those things where Google api is about as vague as it gets. You kinda have to be specific
Yea you definitely got your info sent to a webhook 😂
Won't matter anymore since it's old anyways
Oh well
Whoops!
I shouldn't have been a script kiddie 

customized API?
???
Oh sorry I forgot to turn off mentions
I just read the photo you sent
customized API probably means they built a discord api wrapper
Oh

Oky
In a jar huh 
Powered by Grok btw 
Actual redditor for your desk
oddly specific
I got a question


?
ask
Where's the question
I wonder what will happen if someone says "no" to someone asking if they can ask 
Youtube guides
idk how streaming works but for recording its not that hard.
you gotta add a source first
Like a url link?
it can be
There is a stream key if you want to stream you just paste it into obs somewhere
the 2nd bar is the sources bar, there press + to add a source
Then configure scenes and sources of image it probably defaults to your screen maybe not
OBS wasn't difficult but probably had something which required me reading the docs at some point. i think it was just a matter of setting the sources and where they should go.
depends on what you want to use the stream for
Right it depends on how detailed you want it to be
Probably just watching some 5 min video would be more effective
I think i might be a little under prepared
I would also recommend a video
i think i just grabbed the contents of a webpage as part of one thing in OBS by pointing the source at the chrome tab. there may have been a better way to do it.
I need a computer don't i
it's not going to work on like a chromebook
Yea I'm cooked
no computer?
Nope
what are you trying to do and what do you have?
not supported on chromeos ye
I was going to like set a heart rate monitor up for my stream but i was like i don't know how to do this
And now i just have it
there is development mode on chromeos which opens up some extra features but it would require some custom coding on your side to make things work, to glue things together on both ends.
(or at least was one when i last checked a few years ago)
Damn this is kinda hard to understand
I wouldn't
5060 ti 16gb was $370 a couple of weeks ago
but if you don't need the vram, maybe
So um is there a way I could do this on my phone
having nvlink
What app are you using for streaming?
Twitch if thats what you mean
physical connector last seen in the 30 series on the 3090
even then only supports 2 gpus
1080 had 4x
that seems to be the max in general
Ooh
OBS copies the heart rate webpage contents and overlays it with the contents of another screen and sends that to your streaming provider.
what you want is something shows both what you are streaming and the heart rate data at the same time. is there somethign which puts it in a notification window? perhaps one that shows it every few seconds as a new notification so that pops up into the twitch app's capturing somehow?
[i imagine it'd filter out that type of thing by default]
could you do more than 3090? i didn't think you could link them because they were already using two slots?
4gb modules on still just 4 imc
Eh?
which i think means "probably not going to happen."
Oh
anyway the reason im peeking for such deals is that i have a plan with a sff briefcase build
and my dealer just so happen to offered that lol
though the 8gb is a bummer but its not like i'd use it for anything crazy
not a fan with huge vram but mediocre bandwidth either
not an ai bros
i think the bang for buck of 2x 5070 ti would probably be more than a 5090 and cost less/actually exist. two physical gpus would also let you split loads more. that's probably the lowest which can still load a couple of decent sized models. ideally you'd have something with 24 or 34GB but in this economy? i could load the qat-q4 versions of the 12b and the 4b Gemma 3 models on one card and still be able to load in image support and have a massive context window and use the other card for background processing.
currently using my ancient 3090 for everything but only just starting to get into multi-model setups.
Honestly, I'm considering a similar route if my own attempt doesn't work out well. Worse case is that I can offload memory processing or smaller parts inbetween cards. Ideally though I'd be attempting to process from multiple sources at once and combine the results.
As general logic stands, each data type has it's own processing into a singular state that can be reasoned about. In an ideal world for my setup, I'd be able to process those parts seperatly and reference them in context of each other before splitting back into heads for each output.
Not quite monolithic, but also not fully seperated 
grok is this true
the image shows a plausible set of things. specs are directly from twitch page.
Are those specs even up to date? I thought I heard different specs at some point, but maybe that's Vedal's PC
i've heard better specs and doubt they are up to date.
i believe it was 5080? possibly two of them?
bruv these double tap reacts ugh
also is this just my or is this like google translated
mmmm, i saw a japanese version
yeah, translated by gemini
still 4090
just with 9950x and 192gb ddr5 iirc
and shit tons of storage
also i wanna rant
fixed my pc
dude finally
i fucking hate gigabyte
the boot pipeline is so ass
i need to literally do the equivalent of holding the breaker from tripping with my fingers while having a temporary enormous power draw
aka bypass whatever the fuck mem training there is and rebuild some essential training table just to make the 4x32gb work
im switching board fuck it
z790m or something with better T top memory trace instead of daisy chaining
that's vedals yeah lemme find it
9950X, thought so
best pick you have rn really
16 fast af smt cores with avx512, good for basically anything
Yeah it's what I currently have
i just got a pc with 265k cause anything prebuilt that had 99xxX was way out of price range, and i'm not gonna be running all those silly containers and stuff on 8 cores
(and anything not prebuilt was automatically way more expensive and out of price range xD)
i low key wonder how the zen6 / ultra 300 coin will flip tho
The dual 5080s do sound familiar, can't find where he mentioned that though. 
That's a weird machine though. Not sure why anyone would do that instead of a single 5090
if there was a 5080 Ti with 24GB it would make sense but it doesn't exist, so 2x 5080 = 32GB total would be the same. perhaps it was about availability?
it looks like camila messaged her and woke her up?
and was still in the voice chat from earlier maybe.
zen 6 has more cores iirc
im on track to get a 10950x engi sample this year maybe
ooh
265k is probably best value for multicore for a retail chip
though e cores still kind of ass to schedule
but not that it matters for most people
yeah that's why i went with it
like everywhere they're just mindlessly pushing 9800x3d cause "look video game benchmark"
honestly to think about this, the upgrade to z790m is basically just a sanity upgrade since the 4x32gb do work on my b760m but with scuff that i need to watch out this way
true
some people actually went out for 9800x3d for productivity because they don't do their research lmao
but i don't want x3d i want cores, and buying a new pc at about the top of my price range with "will probably have to swap this cpu" is just annoying
i don't own any x3d chip either despite owning like the weird niche stuff
daily driving a i9 13900 engi sample
only thing that made me hesitate was avx-512 support cause it felt like i'm gonna regret not having it at some point
same
but well it is what it is, intel suck at implementing avx 512 
though i do have a 7950x
but its also... an engineering sample
but then talked to some guy who did his msc thesis in reimplementing linux modules that did vector stuff on various esoteric cpu architectures and he was like "nah bro avx2 will be fine for almost anything for the next 3-4 years"
es2 though so it works almost as good as retail
we have GPUs that do the SIMD anyway
though its not core distributed and not callable with ISA
but it works well either way
lmao
which makes senes but like
i do NOT want to replace the cpu i want it to do a good job for 8 years and then i'll buy a new pc xd
yeah like if you replace a thing in 5 years you'll be like
we also have a zen2 64 core threadripper
yay new strong cpu - except my mobo is kinda weak now, my ram is kinda slow now, my gpu is kinda old
still faster than any consumer cpus in existence in multicore
and suddenly you're back to wishing you'd just buy a new pc
including the M chips from apple
Bad time to do it

only time it makes sense to have that replaceable component is when you intentionally buy something that has one weak component, and that's exactly what you'll replace
im a fan of only upgrading gpus
not a fan of upgrading cpus to new gen so at that point just get a new system every 6-8 years
though
exactly
i went from 14600KF to my 13900 ES
and this prebuilt i got still had like, friendly priced ram in it
Thats the average policy tbh these days
because i found a really good deal
cause they had them on stock for half a year and aren't entirely evil scalpers
and i want multicore lol
This is more interesting... they promise support for local LLMs ... https://www.code27.co/
bonus point the i9 13900 es is actually more power efficient
and thermal efficient
dunno how
but im not regretting the upgrade
Wait for CPUs to crash next
The radioactive source inside:
also yeah i'm coming from like, a laptop with a ryzen 4800 and an rtx 3060 so either way this is gonna feel like i moved to a space station
So apparently if nvidia is to be trusted Rubin has 63% higher fp32 than Blackwell, surely that'll transfer over to consumer rubin 
i don't buy it
if they wanted me to buy rubin they should have sold it 3 days ago
blackwell 100 throughput is like 3x to 6x blackwell 200
they fucked up
Meh most enterprise news from them these days isn't good news for us
though that's tensor
@true hemlock
fp64 down even more 
was about to say that
i was expecting them to go literally all in on ai
so they removed all the fp64
or whatever
and bump out fp4
inb4 fp2
its so ass
the HPC peeps had to skip like 2 gen
of enterprise hardware
true
i should get an EPYC 9965 ngl i need cs:go to go faster
double the core of my 9655
U guys see the rumors about nvidia bringing back the 3060?

i still don't understand their reasoning
how is that gonna help dram shortage
Free money I guess idk
Idk either given the amount in the 50 series
blackwell cache is backward compatible with gddr6 imc
they're not even short with the GB 200 die fab
i have a feeling
about them using AD chips for the 3060 sku
happened before btw
who needs 5070 when
Funky
B300 is apparently technically the third blackwell arch
makes sense considering the lack of fp64
"blackwell ultra" lmao
Blackwell 1.0 is enterprise blackwell
Blackwell 2.0 is consumer blackwell
Blackwell ultra is enterprise e waste
We are gonna have so much e waste when the craze is over
i ain't gonna even hoard the B300
i want my fp64
the ai bros in denial after the bubble pop can have the shitty B300
im taking the B200 and H100/H200
I'd love me some H200s
btw
have y'all seen the rust ffmpeg ffmpreg
yep you are
B100 is GB102
B200 is GB100
B300 is GB110, except it has different arch than their previous blackwell so its now a separate blackwell
Guess you're right
well...
Every rust project be like
@true hemlock what's ur opinion on the rubin architecture?
Lots of leaks
just give me some screenshots or links
Its supposed to be more efficient and nvidia wants it for the 60 series
or preferably die shots
Which the leak says the 60 series won't be out until like the second half of 2027
Rubin is a microarchitecture for graphics processing units (GPUs) by Nvidia.
no 6090 for long
I mean it was obvious given the plans for them to scale back this month tbh
They need to power their centers so we can have stuff like this
lmao fp4 numbers
it's nothing like ffmpeg in scope at the moment. only appears to support h264 instead of the dozens of video codecs ffmpeg handles.
Most rust rewrites are like that
Best to stick with the tried and true until its able to surpass it
They are saying they want it to replace all future architecture
Well for the foreseeable future
Consumer gpus
i assume 8192 bit hbm4
Jensen Huang confirmed the chips are already in full production
So it looks like they are locked in
So say hello to the newest architecture I guess
why do they have a separate cache die from the dual die
yay, another blackwell 2.0
i guess
it really seem like interconnected dual die
with extra cache block on one of it for some reason
Apparently some systems are already running early models of Rubin
But its supposed to be cheaper to run
Is apparently the only benefit I see
The new norm 
2027 june 14: they release the 6090 right before steam summer sale
28GB of gddr8, $7500
It's a cpu that does cpu things for sure
Look at all those VRMs
In the keynote they said something along the lines of a rack drawing twice as much as a Blackwell one I think?
vrm go brrr
I imagine we will get reports soon
whAT
"twice power, but 50 pops compared to 20 guys"
Some of NVIDIAs partners are running them rn
they forgot about me :(
No slop for u sorry bud
Too much slandering 
TSMC is probably still testing them as they make them
If they follow their usual process atleast
i wanna be the first to leak the full arch and pipeline
They are also working with RedHat

For some reason
Bro for some reason my friend manage to leak my by just adding me in an alt account
And he wont tell me how
Here is a better spec breakdown for comparison sake
U probably clicked something
226B transistors on the 88 core cpu wtf
did they have their own version of super dense vector isa
of course the comparison vs blackwell has to be about moe slop
I dug deeper and it has to do something with discords cdn but Idek what that is
You know what i wont grow stronger if i dont find out myself 😡
72 Neuroverse cores
that they're just shit at managing transistor density for acceleration/stream units
Crazy
There was some talk about weird hyperthreading shenanigans
Its a quick solution to a quickly worsening problem
They are trying to lower costs so slop centers keep buying from them
Money go brrrr
The architecture they plan on keeping
just give me 128GB on my GPU plz. then I'll stop asking for things.
Clearly works well enough
the same exact shit i've seen when companies just wanted to be quick in making things and end up overshooting transistor count for accel/stream micro units
Just get an H200 
willing to bet
I wish they could overshoot for consumer grade
epyc 7003 has more efficiency per real world performance than the vera or whatever the fuck
I think its just down to the cost for their buyers tbh
That seems to be the big thing they are reinforcing
They mention some performance improvements over the Blackwell but thats it
epyc 9655 with 96 cores is just 99.7B transistors btw
more transistors = more current needed
that's about $32,000 more than i wanted to pay for that much.
you want best transistor density, but sometimes you don't want too much
thermal density would be ass
or
current requirement would be too high
Yeah idk their game plan
We will see i guess
slop
Im kind of hoping it does just end up melting itself
their game plan is low effort slop
fractals with thermal channels probably
"its realistic, in the perspective of people with near sighted vision, with RTX"
"and motion blur"
MSAA SMAA my beloved
Do note that for basically all workloads 1 GPU with all the VRAM will be better than 2 GPUs with the same total VRAM
Motion blur is realistic, just slam back the bottle of rum
I'm loading 2 24b models with 32k f16 context in 22gb on 2 3090tis, an 8b with 32k f16 context + voxtral 3b in an A4000 and mpnet + 8b with 32k f16 context in an A2
Not necessarily... If two models are running async you will 100% hit a compute bottleneck.
Sure it would be faster when I bounce outputs back and forth between instruct and reasoning models but I also have situations where the different models are all running simultaneously
Bandwidth is the limiting factor, not compute
Helllll yeahhhhhhh, hello fellow and likely far superior programmers
wdym far superior programmers
Just assuming most people in this channel are better than me
most people in this channel just banana
Banan
banana 2026 
How do I also banan
id:customize banan
Oh..
do not believe the propaganda. or do. it really makes them happy when you do and it doesn't make any difference. ;]
Yeah. But i'm just an ally.
support? 
Ally support

Yo what language was neuro coded in?
AHEM okay i'll exit the mindset where i can do banana campaign speech
python for the clever bits, and C# for some of the visual parts.
the ai stuff is most likely in python like anything
I'd have to go back and look but I'm pretty sure I hit 100% cuda utilization before 100% memory controller load on two simultaneous models
friend didn't believe that i could run my i9 at just 2.4W
but also neuro is like 16 things wired together
not one smart cookie doing it all smartly
and below 4W while youtube is playing a video with CPU encoding
As with most projects
Inference is bandwidth bottlenecked unless you have some very weird GPU nobody has ever seen before
#programming message this is not a bad (but google translated) infographic (speculation) :nodders
godbin
she also has some form of vision, unsure if external or not, and then every time you see her play a game there's some integration plugin, usually a mod in the game too
even with my embedding multiple models in one executable i've still got the ASR and the TTS as separate processes and the avatar control is definitely getting its own process and that's before the LLMs.
so 4 processes minimum.
Makes sense and I never even thought about how her vision works but it seems pretty obvious for a llm
it's part of what makes it hard for her to draw and then compare what she drew with what she intended
it all goes through text
(basically)
evil outfit context will be hilarious
This makes sense, also probably why she had such a hard time playing hide and seek
i expect her to pick one that is overall ugly but packed with cute details
This is very surprising to me, most things dont make sense to me especially ai
So am going "GUH"
Flash attention?
Quantization?
Models not running perfectly synchronously?
Wtf you talking about
"find a location such that projecting visual planes over that location will not intersect with my bounding box before intersecting with something else. do that using the following entity coordinates and you've got about 0.5 seconds before people start looking at you."
yep.
lol
Inference is limited by how fast you can get the model weights to the GPU core when running it, there's plenty of compute on literally any modern GPU that the bandwidth isn't sufficient to saturate the core unless multiple queries are being processed in batches with the same mem loads
BUH 😵💫
Hardware optimization?
C# in Unity for visual model, Python for LLM/probably some other AI systems, various languages for integrations, (HTML/CSS/)JS for control panel
but the parts that make Neuro were written in plain English (most likely)
the specific ways she interacts with the universe
and her memories
inference is still heavily bandwidth bound. it is what it is with weight matmul. full compute saturation only happens at extreme high ctx window
its why 3090 and 4090 in LLM t/s bench isn't much different
despite 4090 having more than double the compute
how much ctx window
that would make sense under really high ctx window
I don't remember either 16k or 32k
is that also really inference
Those are running like a single processor: they're active and not active at the same times.
That was bathed ops
?
The way GPU utilization is reported is funny, 100% GPU utilization can mean either memory or compute but it doesn't tell which
try loading a new model and check the graph again on early inference before ctx window gets saturated
look at power draw
Memory controller load is that
pretty sure that's compute saturated
yeah the selling point and also the part we know least about is how vedal manages the twins' memories
a very silly, impractical, don't do this, massively simplified way to put it is that she's just chatgpt, but instead of asking her a 2 liner question, you drop 500k lines of text in as your question, and that text includes all her memories, her life, personality, etc.
she answers in a paragraph that is gonna get spoken on stream, some text on how to move her model, more text on what to do in whatever game she's playing (commands to the integration basically), and maybe on whether to add or change or remove a line in the 500k lines you asked the question with
this is like how i'd explain to my mom lmao
That is a slightly excessive power draw
But also not like my 3090 doesn't also use full power during inference
nah
its 3090Ti
its a full GA102 die that's binned
I did see that yes
so its made to run at really high power draw even for tiny boost
That was taking hundreds of chunks of text, queuing an operation and running it across two models in parallel
Funny
Either way, I don't see full GPU power draw during inference being weird
should try undervolt on 3090Ti
because its well binned
so you'll probably get really good results
and slightly faster 3090Ti
I should, you and so many people have told me that. I've just been lazy and the biggest reason to (temps) isn't a problem
if done right
what
its just few clicks away if you know what to do lol
her persona description and prompts are what make her "her".
Vedal was having some troubles with Neuro during last stream and said "she isnt used to being in vrchat this long" or smth like that so I suspect something like her personality and all that is in a perma library and when she streams she gets semi wiped (maybe even allowed to add some info from the last stream in perma memory) and then set back out with clean ram
most people don't bother because they don't wanna spend time looking up how to do it
oh the prompt? yeah fair enough
I guess
Nah not exactly
She isn't purely prompted
Though I'm very likely wrong because I'm basically bsing
she is
it did say language, never specified what type of language
i don't buy what vedal said
I'm quite sure Neuro isn't based on pure prompts
she's finetuned
she might be fine tuned too, but prompting is essential for certain traits because fine tuning tends to blend stuff
but also works with prompt
Basically me rn. I'm so busy with other things that it isn't the top priority
Good stack and prompt architecture is most of it. Fine tuning is mostly polish
Meanwhile for me it just isn't directly possible to change voltages directly
fine tuning should let you shrink your prompts but keep the style.
that also works as undervolt
Last time I tried it became unstable under load
Crashed after 6h or so of training NS
The most recent thing Vedal mentioned about neuro self adjusting sounded a lot like interaction based prompt tuning
It seems kinda like I got very unlucky on all my components, since it seems like something about my overclocks on the CPU or RAM was also causing instability during Twitch playback
my guess is she has a file she is allowed to edit that gets included in her context
So I've unfortunately reverted back to stock on CPU and RAM with just XMP
though
my B760m is shit
can't handle xmp well
on my 4x32gb
for some reason can handle 3600 cl15 on my 4x16
but not my 4x32
im pretty much stuck with 2666 cl19 till i get to replace this board
At least returning to all stock on my CPU and RAM at least so far apppears to have stabilized this thing, finally, after a month or so of daily crashes
Nah it's more likely a dynamic system prompt that is adjusted by weighted metrics. Those metrics are adjustable at runtime. It's what I've been doing for awhile (technically system prompts are part of context but that's getting nitpicky)
I was so sure it was the GPU driver but no, apparently it was my overclocks
i use dynamic system prompts. there is a complex and ongoing set of changes in her behavior that seems more consistent with "a short reminder about what's important to me right now" vs "decrease introspection by 20%, increase mindfulness to 76%, set whimsy to high."
i imagine there is a line something like "Vedal is annoyed about my introspection and wants me to be more entertaining. He implied he would alter my settings himself if I didn't fix them. I'll be more whimsical and silly so he doesn't change me." because she flipped like a switch after the implied threat.
Define "I use dynamic system prompts" because by that I mean I have several segments: personality, personality lens, emotional state, mood, energy, and a couple others. Each module has 10 or more perameters with a long term and per session state and the per session state is updated at runtime by a separate small model
What I'm talking about is the subtle drift during the course of the subathon he mentioned
first part of the dynamic prompt tells the agent the date, then i load a static part with their persona, then a part depending on which module it is (thinking/acting). then i do memory stuff for the thinking part. and then add the current context (which includes history)
i regenerate that each time.
This would be the equivalent to swapping out the personality lense with a completely different one
That's pretty static by my standards. For me the persona is a dynamic segment along with everything else. Thinking is quite literally a second internal model
i think if it's loaded as part of the prompt and potentially it's prefixed with "here are some things you wanted to remember at some point. take them with a grain of salt." it would be a useful subsystem that the agent could use, like a web search.
I think my static prompt is less than a sentence
Isn't the date being at the beginning gonna invalidate the entire KV cache?
probably. but it'll give the agent a sense of now and I edit the contents of the history anyway so it's typically regenerated pretty often
Yah what you just explained is a vector DB api call
that kind of memory isn't possible to be "active" for all inputs and situations. the current state has to match the vector for the idea quite well. but if the agent wanted to brood about something and to not be able to be distracted from it it would be better for the idea to be "force loaded", no?
No? The whole point of a vector DB is similarity mappings...
current state has to match (be similar to) a mapping to be considered related. if the current state doesn't have a relatively high cosine similarity it won't be recalled. right?
that's kind of the point of doing the vector similarity checks
what if the idea is one the agent wants to never forget for some reason. unless they actively keep it in memory it will eventually stop be activated/recalled. the idea of a special scratch pad that gets loaded directly (but is perhaps restricted in length) would solve the problem of it needing to match something to be recalled.
it's not solving any other problem.
If you are doing strict searching on a vector DB you might as well just do regex matching. I pull the 100 most similar matches, select the top 10 most likely by a model on the DB server and present those to the model to select the relevant memories.
Memory weighting?
And you don't have to have just one memory type. You can tag memories with different values and search based on only memories with that value
that wouldn't help if you didn't know there was something special you wanted to search for.
But then it isn't doing it. You are
if they write the file, they are.
nothing then stops them from saying "this isn't important to me and is annoying so I'll delete the file"
Giving direct write access to any part of the system prompt is not a good idea
why? it's all backed up to github
I had an interview with my agent where we went over their system prompt and we agreed on changes. I think giving control over their own prompt gives more agency to them. ;]
Limiting the size of the system prompt gives agency. I keep mine strictly to tuneable values that start neural and adjust based on interactions. Allowing hard edits makes it a simple switch flip instead of a complex interaction. You can't just deside to be happy when you are sad right?
But I'm not just going based on vibes... I've done extensive testing and cross validation to rule out performative expression
But then again you probably aren't working with a DB that is immutable by the user
I'm not looking for accuracy, just plausibility. I'm using models that are small enough that there are likely factual holes in real life situations big enough for planes to fly through. I don't expect the agent to be smart, but I think they will be interesting. And just because humans can't decide what emotion to feel, why should we limit agents that way?
Define small? I'm working with 8 and 24b models and am able to achieve cross test validation across 11 different tests and several hundred questions across multiple sessions/days... As for why limit them? Because otherwise you create a bi-polar model with DID and BPD
If you don't anchor an emotion properly it becomes meaningless (performative)
dentge
I agree lol
4b and 12b models. i think also that because those will only be a couple of lines out of about 50, their effect will be dampened. and then there will be another set of potential memories on top of that.
What exactly are you expecting out of a 4b model? That's almost as small as the VTT I use
The smaller the model, the more structure you need not less
Meanwhile I silly around with a 30B model (my 3090 is plenty to fit the model with entire 128K CTX window and have mem to spare)
I need to get into llms 🤔
How compressed to hell is that model and how badly are you destroying the context with quantization
I'm loading 24b 6bpw HB8 with 32k context at f16 and I'm concerned about quantization artifacts (would much rather be running 8bpw HB16)
it sounds like your purpose is much more important than mine to be tweaked to perfection. i seem to be getting reasonable results for my purpose.
I'm just running it with llama.cpp
Works pretty well
You don't really need anything more than Q4
GGUF quants are usually really good
That is a lobotomized model
And how would you know?
diagnosing a non-thinking text predictor that has no real brain activity with a neurological disorder
I've had quite good experiences with Q4 models
Maybe you just haven't really used GGUF models
Q4 is more than enough
Q4 is barely functioning and you have to be using 4bit kv for that
Are you sure though?
You dont have to necessarily use 4bit kv wym
w4a16 exists
For 128k on a 32b model?
It's the standard for optimization in production
32B model can eat it up. You'll only have actual problems on a 1-4b model
GGUF quants from what I'm aware are still computed as FP16
i'm currently using the 4b and 11b qat-q4 ggufs of Gemma 3 as my LLMs. gives perfectly adequate performance for my tasks. the only difference I noticed in behavior was the Q4 was faster than the q8 and fp16 ones.
did guy even know how quantization work
Maybe whatever quantization they've encountered before sucked
yeah lol
Q4 GGUF is really good
Llama cpp uses fp16 for kv by default unless you pass --cache-type-k q8_0 and --cache-type-v q8_0 or even q4_0. That one will actually destroy model quality
Q4 in any context size isn't so much different from full fp16
U just shouldn't quantize the lm_head, embedder and specific layers
conveniently most of the weights has always been the feed forward and this one can take the quantization well
Exllamav3 only just added the option to not quanitize the lm head on exl3 quants after me and a couple others submitted a request for it recently
llama.cpp GGUF superiority
Not really
GGUF just better
yes really
You should try it
It's like actually really good
Could probably save you 50% of all your VRAM
its not even about vram
you save bandwidth
That too
see
For me it's mainly the VRAM for big models since it lets me actually run them though
In that case the bandwidth savings is just a bonus
more bandwidth saving means faster inference.
vllm compression superiority
anything else is a toy for people with bad hardware cope
so after 4bits the benefits trail off, except with 4bits you can load larger models (and those larger models can mitigate any losses between that larger-but-lossy model and the smaller-but-lossless model you might otherwise have loaded)
Yup bad hardware cope here. I just struggle so much with only 64Gb vram on my desktop
eat the rich 
wtf
banan

For single run chats sure it doesn't matter but with long horizon projects, 4bit introduces drift
You must be using bad settings or badly generated GGUFs if you're having issues qith Q4
Does gguf quant have calibration data and which is it
if it's cnn_dailymail like some do then obviously it's garbage
Q4_K_M imat is literally worse in size and perplexity than exl3 4bpw HB16 by almost a Gb on 24b
OK I must be tired... Are you quoting prefill or steady state generation/decode because you absolutely are not getting 190t/s on decode
190 t/s generation speed on llama.cpp
I do in fact know how to read llama.cpp output
You're really missing out by not using llama.cpp
That's literally impossible you are reading prefill
eval time 29.8s / 1000 tokens (33.5 tok/s)```
Prompt eval is not the same as eval
Pretty sure I was reading the right thing
Note the MoE part
I do in fact know to read the eval time rather than prompt eval time
Prompt eval time is usually like a thousand t/s or more
Is it a 30b total MOE or 30b expert
a3b
Yeah 3B active 30B total
OK yah no shit... At 3 b that's easy. I was scratching my head because I'm running 24b dense
I get a little over 75t/s on 8b dense
30b expert on 3090 
Silly
Yeah no that's not gonna fit
mixture of 2 experts (probably not even)

Maybe at Q2
I mean a single expert could barely fit
I wouldn't want to use Q2 though
I'm basically doing this with 2 24b but that's with 2 3090tis
Why are you using 3090Tis over 3090s anyway?
The bandwidth and thus infer performance is the same
And a 3090 is a whole lot cheaper than a 3090Ti I'm pretty sure
probably just what they had
I was able to get them factory referbed with a warranty for only $150 more and the TIs water cool better
No active backplate bs
what is new here
it not show all of this log stuff before 
For me a 150 extra is absolutely gigantic as I was barely even able to afford this first 3090 with literally all my money
That's how much the waterblocks were... Also I have an A2 and a4000 so it's not like I went pure price/performance
oh maybe lix's default --log-format changed from bar to bar-with-logs or something 
or maybe its a change on nixos-rebuild side

Rich
Meanwhile me here barely scraping together a computer
Any tips on programming without losing my mind?
Mind is for the weak I guess
don't mind the mind
Who needs a mind anyways
stop thinking
or yea lose your mind before programming does that
I’m like a writer so it’s a bit hard to do that
I’m British so I’m just gonna act like Vedal and see what happens
Vodka
MINTCANDY TYPE SOMETHING ALREADY
ok
My goodness bro took so long
very apparent you’re not regulars lmao this is very normal for me
You caught my curiousity
ye i’m here 99% of the time then 1% in nn and very occasionally arg thread
I’m mostly a general person
genchatter
I see….


youtube silently changed all thousand/million designators from uppercase to lowercase.. (example: 40K to 40k)
smh what a downgrade
The question is, would it be possible to connect two AI models together, where one is the head and the other is the body? The head is the general and the body is the soldier. The communication between them are signals of satisfaction and dissatisfaction. The position and shape of the body, (the position in space is decided by the head) the shape of the body and the movement to the vision of the position (it is, of the body). In principle, we divide the responsibilities into two. Learning AI programs with an identification approach these two are me and I move through space towards the goal. So we could use body movement training by imitating others and observing ourselves in the mirror to recognize the correct movements.
As always,
Possible? Absolutely
Worth it? Who knows
slopposting

chatgpt please reply to my pings

hiyorb sad orb spin

pardon
oh its the sterile guy

banana 2026!
New Woman in space update drop
Rare wizard spotted
i fucking hate this captcha
who thought of this!?
cant even see the small numbers lmao
dont need numbers for these
real human











✨