#programming
1 messages · Page 380 of 1
It's not solved in humans let alone ultra primitive approximations of their brain bits like current neural networks
Hell I hallucinate plenty but because I'm human its called talking out of my ass
and every possible input state leads to a value of some sort coming out, regardless of if the interpolations and suchlike had any training data supporting it.
(i wish my language model had been trained on less verbose material. it's hard to be concise.)
that ones ok. it'll get compiled away to nothing coz it's not used. it's when you start passing in the index and the value to write at that index and basically trust that the other guy checked it was within range
and really it's coming from some rando on the internet
That's also a problem with heap arrays if the checks aren't built-into the language
once you're writing outside of your array you can no longer say what the program will do.
AI is always hallucinating, its just sometimes the hallucination is correct
reality is that which, when you stop believing in it, doesn't go away. (or something like that)
is it really hallucinating if it's correct?
Oh, that statement gave me an enormous amount of knowledge. Thank you.
the more I see your takes the more I want to just write my entire understanding of brain into a document just so I can give an argument without struggling to collect my thoughts and convert them to words
LLM hallucinations come from being trained to produce likely text.
You can't effectively limit it to only produce things from its memories because most written text doesn't just say "I don't know" in that situation.
the problem is none of the weights or combinations have meta information about the reliability of the original information they were trained on, and so when the model is using areas outside of where it was trained it can't tell that the token it is producing is actually not very reliable.
better?
That mindset caused me to start developing my own thing
but geez the entire design process is insane
people can usually tell the difference between things they have been told or have read and things they think "because it's obvious isn't it?" and things they don't know.
how?
for transformers it can generalize but it will also create false connections, but for pattern matching those models are incapable of dropping infomation so they can't generalize
I mean
NNs are naturally decent at filtering noise
Butmaybe using them in LLMs degrades that capability
I think it is better to say the pattern matching stuff always lead to overfitting
I'd honestly have to look into it more
Eh, double descent phenomenon and grokking (unrelated to grok) demonstrate some odd exceptions
double descent
As parameters increase, it begins to overfit, as expected
Until doesn't
I still consider that similar to human learning, with the ol "peak of mount stupid" thing
it something to do with the continued tweaking the "unused" parameters after training has "finished" which finds a better solution once formerly unused stuff is close enough to help.
but that's just my interpretation.
I can see that
Yeah thats roughly the theory
the initial training stacks a lot on certian neurons
it becomes forced to ignore parameterse creating some degree of generalisation
then dealing with the smaller bits has a lot of noise to tune those as those small values will cause large swings while tuning those
In my own adventures, I'm fighting a lot of things. Mainly making sure that I can generalize with patterns
and it is defently not easy, the thing wants to hardlock so badly
causing me to continue to try and redesign it to be flexable enough to generalize but not enough to halucinate
I at least have one foot forward of if it isn't trained on something the model will react in a defined way
But yeah pattern matching by default is incapable of generalizing from my experince, it is pretty much trainable lookup table
just to be transparent, my current project is a self learning pattern deduction neural network
SLPD 
not the worst name in tech history
chat, will there be a switch 2 emulator?
cuz if there isnt going to be one, then ill save up for a switch 2
Neuro already "knows" how to, she had done so on stream (sorry for late reply).
7800x3d, b650 board, 2x16gb ddr5, 1tb nvme, 1000W gen 5 psu, and a decent case for $758 do i take 
I was closely involved in Switch emulation and I would not be surprised if it never happens. There's no usable attack vector for reverse engineering or dumping.
The Horizon kernel has been fully reverse-engineered and audited, it has no vulnerabilities.
The best I've seen on Switch 2 is using a ropchain to draw shapes on screen. But there's no ACE or privilege escalation.
The 3DS and Switch had similar flaws in their bootrom cryptography which has been patched already. That's why they were blown wide open.
does blender count as #programming or #artist-alley
depends
i have an issue with my model so i assume it would be closer to art
Compiling blender like Sam goes here
let me find a differenct channel
Yeah, model issue is closer to art
if your doing art then post it in the art channel, if your doing 3d printing then that goes here
at least that's my opinion
hmm... and there are only like 3 switch 2 only games that I wanna play rn, so idk if its worth it to save up for a switch 2 rn...
also your name should be "Most Annoying Dogest" just because it's a funny hypercorrection of the humorous repetition of the suffix.
and it makes it no longer make sense ;]
ehh, depending on what abotu 3d printing it could also go into art
She can learn, because Neuro isn’t purely an LLM. She’s a composite agent, with an LLM that acts as a core for her system. She is able to add information she has “learned” to her memory bank, sometimes even permanently unless she decides to remove it, and is able to recall past information. That is in essence “learning”, albeit rudimentary and in a different way from how we do it.
already tried saying that.
Oh. Sorry I didn’t see that.
don't worry. you'd think it'd help but it didn't.
What do you mean by “tried”?
I didn’t try to help anyone. I merely saw their comment and posted a disagreement. I didn’t see nor do I care much for the prior conversation.
tried to expain already
Well if you did then nice job being correct. I’m somewhat glad someone said it before me.
no I didn't
sorry i was in another window. i meant I said Neuro could learn even though the underlying LLM was "fixed" because she is a system with an LLM as well as external memory.
👍
it was difficult to explain that a system containing an LLM at its core might not have the same limitations as an LLM by itself. There seemed to be a misunderstanding about how long term associative memory might help with learning
(as in they appeared to think it didn't.)
37
36
Thirty five
30+4
33
32
Tortie une
(3x5)x2
29
28
27
26
I'm gonna get banned for spam 25
24
23
22
5x4
19
(3x3)x2
17
2x2x4
3x5
14
13
12
11
1️⃣ 0️⃣
9️⃣
8️⃣
7️⃣
6️⃣
5️⃣
4️⃣
3️⃣
2️⃣
1
0
-1
-2
-3.1415926508976554678213421
🫦
tru
literally 1849

21 denier. 😭
-3.14159265358979323844
only correct to 8 decimal places
Sorry mb mb
its not (its end_4 dotfiles btw but i switched bcz it was frickin laggy as a 100B neurosama llm LOL
late reply btw
world is mostly still sleeping i think.
lol'd here didnt read the rest
'hello world'
Translated it into APL for ya; hope this helps 
winner bro get back to goiky two needs you
What is stratch bro
*scratch mb
Do you mind if i ask you a question?
Sure
So yk those memory games where you gotta memorize a bunch of lights that flash in order and you gotta repeat it?
Yeah
Should I make a neuro integration for a washing machine so she can do vedals laundry
100%
Tryna make one of those but i've run into three problems
-
For some reason whenever i click the correct light it gives me my gameover screen
-
Turns white whenever i click on the right one
-
and it's slow? even tho in the video i followed it isnt
Neuro just turns everything pink
Hope this doesn't sound confusing
@autumn ore
Uhhhhhhhhhhhhhhhhh idk
I could make smth that Neuro can control that puts like pink dye in the washing machine as it’s going
go to the part of the tutorial where you made the logic for the lights and see if you did it right?
Dw bro
It's more like just writing down all the answers and less like learning
Do you pass school by writing down all the answers and then copying them later? I don't think so
Some people do 
Anyone know of a good vehicle traffic observation and prediction algorithm plus routing optimization. More of a potential study on my end. But could get funding if it's good enough. Thanks.
It's possibly a month to year long programming thing probably.
Nowadays, everything is machine learning usually
when in doubt throw a transformer at it 
Here is a starting point for OSM
https://wiki.openstreetmap.org/wiki/Routing/online_routers
local code = code 🧠
Thanks!
e
e
a
ae


t
hi samvanman
samvanwoman

now i will never forget
let me also rq
poopa scoopa

soopa troopa
And did you memorize things perfectly? No. It's a better way of doing things maybe.
Just because it is different to humans doesn't mean it is not useful.
Not at all the comparison I made
The comparison I made was to cheating exams by writing down all the answers on a piece of paper
its uses are not justifications for its current costs
current ANN llms need to die out so better ones can be developed
its like if we stopped at the flip phone and never made smart phones
good question
herobrine lang
the amount of // compiler broke in my code always increases, even if compiler bugs get patched

using c++ defeats the whole point of doing hblang
$declares := fn($T: type, $name: []u8, $D: type, $v: ?D): ?D ...
$replace := fn($T: type, $A: type, $B: type): type ...
$impls := fn($T: type, $I: type): bool {
$i := 0
$while i < @decl_count_of(I) {
$if declares(T, @type_info(I).@struct.decls[i].name, replace(I[i], ^I, ^T), null) == null return false
i += 1
}
return true
}
the amount of metaprogramming you can do is a little stupid, but it works! mostly
i dont know this lang
but land that abusing one symbol = bad lang
js dont, c/cpp dont, rust dont (exept lifetime)
nah, its fine
int foo(Type and value);
$ just means "do at compile time", and @ just means "compiler builtin"
^ is a pointer, := is a declaration
its not hard to understand

wrr
constexpr
consteval

I did
It's as close as it can be
I'm disappointed in myself... i have been playing way too much Hytale that i didn't really make music or code
edelposting: *the act of writing a whole essay in a separate discord, and then forwarding the whole message elsewhere. *

this mid-tier technique is known for its overwhelming strenght in stopping all current conversations, and for avoiding neurobot's
react
i think the actual meaning of edelposting has something to do with fire emblem, but i choosing to ignore that
... there is discord culture? i'm getting too old for this.
i just made that up on the spot if you couldn't tell
nope. getting too old for that as well.
Fun game, I haven't played in a long time but I got around 2pi million gil.

I liked crafting and selling things on the market, I saved up to buy a house, then made or bought all the things to decorate it, and then when I had around 2pi million Gil I ran out of things to do and stopped playing. Then they took the house and I lost all the stuff in it.
the location buttons on map didnt want to work so i had to involve mouse movement and it looks even better
fast travel done 
fuck i think its time to start making walking bot
scary
fucckkk
blender tile based rendering is broken
the moment i go above 2048x2048 resolution it errors out
doesn't sound like a lot
its because the default tile size is 2048
i can just increase that number, i have enough vram. but that's just avoiding the underlying issue
setting persistent data on fixes it


i guess for tile based rendering it resends all the scene data to the gpu again and that was failing? 
bro has a million dollars worth of ram in that photo
I am actually amazed how good hytale already is despite it basically being in the state of like 7 years ago
from a quick count and some math thats around 9k in ram HOLY
chips

yea same but i have the question the other way arround too, the, 7 years just for this? but i try to not be too harsh on them, because they are doing it (or at least for what it seems trying to) do it the right way
classic
the input system
it leaves all of the inputs generically typed
which means YOU have to figure out what type its giving you
it doesnt tell you

i want a scroll wheel input but like
is this a float?
is it a vector2 for some reason?
a double?
some custom type?

who knows

the windows font of visual studio gives me ptsd of when i was learning c++
its not documented anywhere what types are emitted
you just have to fill in something random
and then build the project
and read the runtime error
are we fr unity
welp
its probably a float

Dev who programmed that must have used a mac
time to switch build profile, rebuild, zip, upload, extract, and run the new server build
only to see that it's wrong

THE SCROLL WHEEL IS ACTUALLY A VECTOR2?
I WAS JOKING
direction and speed?
i know that but i thought those were buttons
wack
makes sense to me
ill take anything if its documented at this point 
i think scroll is -1, 0, or 1 in the top and bottom direction, same thing with the horizontal direction
yeah. kinda makes sense. We need more scroll balls.
Lmao, they managed to remove type from C#
I always open programming for shit like this lol
i thought you specify it in the settings
or is it like
uuuhh
you kinda can
but its still not 100% clear
Imagine in JS, there are different type of event. On mouse enter vs on key down. One contains info about position and another contains info about key
like
vertical and horizontal scrolling, some mice have horizontal scrolls like the logitech master series
But both type is "Event"
even if i set a scroll wheel to output a quaternion
that tells me nothing about how a particular scroll wheel action is represented by the fields of a quaternion
like
if i see the w component is negative
what does that mean the user did with the scroll wheel?
who knows
the scroll should be a vec2, why do you need a quaternion?
it was an example of course im not setting my scroll wheel to a quaternion

I am too afraid to ask why would it be possible in the first place
i had a feeling that you dont use quaternions for human readable output
you probably can do that but its 4 dimentional
not humane
even a vector2 for a scroll wheel is some amount guesswork
who knows which component is which axis
But the fact that you CAN is terrifying
I mean, we scroll up and down so it should be the y axis right? 
honestly, it would be funny
i know how quaternians work, partially, but even with thaat its nearly impossible to read
vibe inputs
the 4th part of that vector is actually the easiest part ot read
Go experience the explorable videos: https://eater.net/quaternions
Ben Eater's channel: https://www.youtube.com/user/eaterbc
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/quaternion-explorable-thanks
Pre...
like
hes rotating it along 1 axis and several components change
its not that straightforward
its not rotation
only the w does rotation
its pure math
or well, its not rotation along 1 axis
so that one time i did somethign with quaternions i tried to understand how they work
and i thought i did understand but it didnt work like that at all
Not really?
so i tried different variants and then it magically worked giving perfect results
it does
the x y and z defne the vector, the w rotates around that vector
that video would have saved me some headache a few years back.
thats what the x y and z are for
Ye, but saying xyz doesn't do the rotation itself is not accurate either

i didnt find this video when i searched last time for some reason
its not rotation as in that you can interpret them as rotation
they do result in rotation
can you visualize it at all
not really
so W fucks it all up, but it actually helps do some things
Just try it at the interactive site. Fix the w and edit one of the other component
the w is the best part tho 
had to write a quaternion transformation for objects in 3d space from one coordinate systen to a different one in a machine steering application. hard to wrap your head around, until it clicks.
turn on "show axis of rotation"
then you can see that w is not hard to understand
the rest is tho
so w affects all x and y and z
The 3b1b video has a link to web by ben eater that you can interact with
but you cannot see w
You can see the whole detail and play around with it there
Even if anyone here already knows quaternion, I still highly suggest checking out the website. The concept of interactive math is amazing
You are a memory distillation module.
Your task:
- Analyze the recent conversation
- Identify long-term, stable information worth remembering
- Form exactly THREE high-level questions about the conversation
- Answer each question concisely and factually
Rules:
- Use proper names instead of pronouns whenever possible
- Focus on relationships, preferences, recurring behavior, or facts
- Do NOT include transient details or one-off jokes
- Do NOT speculate beyond the conversation
- Do NOT output normal text
- Do NOT create null answers or answers that you dont know.
You MUST return your result as a single tool call to `create_memories`.
Each memory must include:
- A clear question
- A concise answer
- A memory type
- Relevant entity names
any tips for this memory creation tool prompt?
I assume it REALLY matters what model i use, as I was using a versatile llama 70b model and it was absolutely dog outputing things like
"A joke was said"
and just short things that had no context
so for 2d space we need R and angle to define any point
in 3d space it feels like R and 2 angles should be enough?
so if we dont care about length (for example pure camera rotation, we dont need radius (it feels like it?))
lets say R is 1
so we need only 2 variables to define any rotation in 3d space in this case
and they introduce 4 variables
its so weird to me
i mean
i dont understand why we need the 3rd
there are more dimensions because it's not just rotating to a position on the sphere, it's also keeping your axes oriented
how the hell do i understand that we need 4
keep axes oriented
hmmmmm
yeah 2 is not enough for that i guess
so 3rd angle is for this
and then they also bring back R i didnt want
and we get 4
but i think its absolutely not like this because whenever you change one of these guys
someone else changes too
i guess its becausse when we change some 4d component it actually can lead to some results that do not exist in 3d
so for 3d something else has to change
the important thing to remember about quaternions is "If it rotates to the right place, the library is working."
that's about how i use them.
yeah it was giving some weird results for me too until i just tried correct variant
but how do you feel it idk
i checked those videos briefly and i think that briefly is not enough at least for me
its interesting though
Quaternion is just one way to describe rotation in 3d tho
Other ways include
https://en.wikipedia.org/wiki/Rotation_formalisms_in_three_dimensions?wprov=sfla1
In geometry, there exist various rotation formalisms to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational (or angular) kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given inst...
wikipedia math
quaternian just avoids gimbal lock
Your way of expressing 3 angle is called euler angle
gymbag lock
idfk the proper spelling of th word
so with euler angles it depends on order and with quaternions it doesnt
something like that?
no
i mix up everything together
3D rotation is more complex due to holonomy. It is as you said, certain operation in 3D space is non commutative
my 3d printing slicer application (which takes the 3d model and "slices" it into instructions the printer can understand) uses Euler rotation when you're placing models to print.
weird things can happen when you ask it to rotate 180 degrees around certain axis, making the model either not move, or rotate 90 degrees around a different axis as well.
this is due to the "gimbal lock" problem.
in euler you can choose what order the roatation is aplied in, ussually X-Y-Z
so if you rotate X, then the Y and Z axis will also get rotated
this way Y and Z can end up becoming the same axis
or x and Z too i guess
stupid gif failed but its still showing it
"locking" the system into rotation in a degenerate two-dimensional space.
why does it sound derogatory 
it's the degenerates ruining everything
so when this lock happens uuhhhh
different axis rotations lead to same results
and you cant rotate it like you want until you unlock it

ye
it needs to somehow get authority back to rotate around the 'locked' axis, or it's going to have a bad time flying in the right direction
thank you blender
only the light is rendered
volume scatter with transparent background doesn't work at the same time, so ill have to do some compositor trickery
just add a white colored wall behind it (scale a cube)
kinda defeats the whole point of making the background transparent
why is it transparent?
cuz i want it to be
it it's just a want then why does it matter if you add a wall?
then the image wont be transparent
which you've established is just a want.
yes
the whole point of rendering is to get it to look like what you want
so ill make it look like what i want
You could use green and then chroma key it out in post
the problem with using a colour is that its not either full transaprent or full opaque. there are partially transaprent parts
which would make the chroma key fail
you should be able to see a volume scatter with transparent background though
could be a different issue
the moment you turn on volume scatter, the background in blender isnt transparent anymore
this should work 
hmmm
my rendered result looks different when i save the image

idk how to even fix this
i crashed blender now
it seems like its the alpha fucking up
bruh
@real sierra

logan paul this time, last one i saw was ksi
same exact image, jsut different x profile dited ontop
they actually do change the images slightly
e.g. sometimes the phone says 2500, other times it says 2700

i think it might be broken if you use different scenes
i have found out you can change the world for a layer
secret menu with world override
still broken
ye it works the moment i added a black background
what the hell
surely this is real time 
nah
if backread a bit you'll see my current issue
looks like the brightness of the background directly relates to the brightness of the light
this is like 20% rn
i think i fixed it????

no its still kinda fucked
not fully transparent
ok it should work now
ye
Building one?
Build it
i bought one, im not gonna build one
Make it strong enough to have a solid cone of light like that without falloff
uhm

Apparently it is now the default and only behaviour for Gemini to look at your phone screen when you say hey google, with the only way to disable it being to also remove your ability to manually screenshot
I discovered this after turning on the setting that makes it flash the corners of the screen when it looks at your screen contents
Skill issue
A real chad makes Albert Einstein roll in his grave
both of us are on discord
if we had the capability of proving einstein wrong we wouldn't be here
yeah. probably be on a different discord server.
The only reason I ever moved to discord years ago was because my groups were dropping skype
To my dismay
i dont remember what i started discord with
I remember the Skype days because they were great
I didnt like discord then and I still dont like it now but its what everyone uses

does this one loop?
Need an official neurosama aol
Ghosting hard
interesting
it also made my discord lag a fucktonm
just upload to Imgur 
no
Discord data centers are on fire from that gif
do you want me to >.>
i did a
ffmpeg -framerate 24 -start_number 1 -i /tmp/%04d.png \
-c:v libwebp -loop 0 -lossless 1 -q:v 100 \
/tmp/output2.webp
nah, that ghosting is because Discord doesn't support certain optimizations
oh, you're right
I tried 120 fps with webp, never could get it to play properly.... same result as here
I just use h.264, still
it was also 40mb and grainy, while h.264 looked better and was like 4mb
I have an extremely high quality tenor gif of American psycho somewhere
Like its ungodly smooth

I think its that one
It loads fast and its really good quality
Idk how they did it
it says 25 fps, so not interpolated.. well, it could be interpolated, but not extra frame rate from it
gif in h.264 format 
Not sure if they hand picked frames or what but id love to be able to recreate it
Oh damn

Very nice
60 fps interpolated to 240 
Lmao yeah its better than the actual streams
I might data dive into those gifs tomorrow and see if I can just make a simple converter
Took a bit to load but no more ghosting
22MB
higher quality than mine, but also over 30x the file size
*for Discord preview
mine doesn't have transparency, I need to figure out what sites use VP9 so I can use transparency
they might just remove the alpha layer, anyway >.>
the black section doesn't appear to have any details. perhaps it should be a different material?
Discord has since like 2024
not internally, afaik.. still need to use an external link to embed a gif that uses VP9
but I only know of sites that use H.264, so no transparency
If it relies on embeds I can just host a thing
it relies on an embed that autoplays, yeah
Ill look into making a component tomorrow
Ive been working on my site all day because im bad at ui/ux
#programming message like this
but VP9 and not removing alpha layer if it is present
H.264 doesn't support an alpha layer 😔
I mean its old
I mean, AV1 doesn't, either
AVIF does
Av1 is still in a weird phase so who knows
H.265 does i think?
I think so, yeah
maybe HEVC would have higher compatiblity than VP9
Loaded faster
still not super smooth
file size is still only 23MB
I guess 30 fps?
it flickers where there was transparency before now that it's a gif.
nah, hevc compatibility sucks because of licensing. VP9 webm should work fine on discord
it doesnt for me
Imagine being a programmer couldn’t be me
cool, probably good to use VP9 for transparent gifs, then
just need to know of a website or make one that supports that and embeds into Discord as a gif 😅
i can jsut set the fps higher by making the gif shorter
Funny enough I am working on being a programmer. Writing just a side project
barely visible what it even is
for the laoding in
its probably the 2560x2560 resolution
nah, Imgur converts that down
i never did
and I wanted 120 fps, but to get that without interpolating, I had to speed up 5x
Meow
the 3D model they provided didnt have all of the components
but im choosing to ignore it since i dont want to make them myself
that image also seesm to have a slightly different mobo layout
now i need to put the other electrical components into blender 
converting down is actually good for a "gif"
as the poor scaling browsers do is horrible, and if it's really high resolution, it'll have massive aliasing
so I don't mind that Imgur caps at 854x854
it's already too high to where I see aliasing
well, natively posting to Discord should create scaled down versions for the "previews" (heavy quotations, as people wouldn't consider the display of a gif a preview)
💻
realistically, yeah. kinda. why not?
some of the same skills carry over into "real" languages.
ohh
also got a 3D model of the orange pi 5 ultra now 
oh i need to make the mobo blue here

the pi board is less detailed, but it works
need to 3D model the cooler on it manually probably
vani lost access to main account
/j
look at the person who posted the schedule
last message from vanyilla was on 21/01/2026
New account joined on that day as well, hmmm

or like a day before
1 Gb/s 

ollama keeps trying to use my igpu
wtf
yes let me fit my 42gb model onto a puny igpu
Why use ollama when llama.cpp exists
whatever works dawg
i already use ollama with other applications that depend on it
why would i go out of my way to use another solution when this one works
?
pls help my laptop has this buzzing noise in the fans that only stop when i touch the = or backspace keys i asked general they told me to ask you guys
sounds like something might be loose?
could try just cleaning the fans and making sure that they're still mounted properly
doesn't sound like the fans themselves are damaged at least, I don't think that would go away just from pressing on the case a little
Burn it in fire, it's possessed
burning it in fire will likely also make the noise go away
nope tried that on older faulty laptop
crackling noises were louder than buzzing
I hate when fans rattle like they're hitting their edges
My middle front fan does that and it sounds so scary because the noise is so similar to HDD clicking
I think removing and cleaning is probably the solution
seems to be the most likely candidates for this problem
was this game made to be played on a handheld? why's the UI elements so bigg??
LLM
My loss is around 2.5-2.7 for 7000 steps
How can i do now
How SMOL? What hardware?
And how big data?
What you've given is not very sufficient context
The model is a 12-layer Transformer SLM (~50M params) trained from scratch. For hardware, I'm utilizing an NVIDIA A100 40GB with BF16/TF32 enabled, maintaining a stable throughput of 31s per 100 steps (effective batch size 256). The dataset is quite substantial for this scale: a 2.7GB binary corpus containing roughly 1.35 Billion tokens, ensuring a high data-to-parameter ratio for robust pre-training
Sorry
Batch size kinda low, but I see no issue here otherwise
Thx
@olive sable for more accurate drawing, you usually go to https://grabcad.com
It's funny how I didn't even notice the loss was in the negatives. What's your guy's opinion on this?

Should use different loss function that can't go nagative, like loss squared
I can't explain how my loss function works (it's a custom architecture, that's very different from transformers), cause it won't fit the discord message size limit xd
it keeps trending downwards by the hundreds, every epoch
smh
"does this mean the model is gaining knowledge outside of the dataset"? 
Well you should use either the absolute value or square of the loss, you can simply add a squared or absolute operation at the end of the loss function before returning the result or around everything inside it
Negative loss is not good
hm?
Negative loss means it's getting further away, but in the opposite direction
You want loss to approach 0
by getting further away, you mean...?
away from 0
Becoming a worse approximation of the function you're trying to approximate
So in the case of machine learining, becoming worse at actually following the data
I made a ver. of transformers that doesn't use gpu heavy math
Technically every machine learning ANN model is a function
Well it's fine if you're handling it properly
But in this case it seems like it's just set up to minimize the value of the loss, so bigger negative loss is better than 0 loss
Which of course is not what you want
if you would like, I'll publicize the repo for a bit, for you to see. because I don't know how to explain my loss function, without having to explain the entire architecture
Since loss is usually a sum of differences, negative loss just means that there's more differences that have a negative value
I'm not the usual case-?
If you're training a machine learning model with a sensible loss function, you want loss to approach 0
Either you make it also consider negative loss bad, or square or absolute value the loss
Squaring the loss should make it approach the desired value faster with high values of loss
Though it will also mean it approaches slower once it's close already

I can't explain
Loss means difference from desired value
(If you're using loss as it's intended to be used anyway)
And a negative difference is still a difference
all I can say, is the loss function and design is not that.
my loss function is based on coherence of the model's structure
You could always just paste the file if you want to show what's going on
and averaged out across multiple modalities, and systems.
its not one file, sadly
its like... I think, 200k lines maybe?
probably more
That is a bit excessive
Well is it still based on ANN?
no
What is it based on then?
its like 3 or 4 architecture concepts for AIs mashed together
But is it using the general ANN type architectures or is it mixing in stuff like SNN?
Transformers, Triangulation, and Neuron Inspired Link Net
completely different from whatever was published
And are they all just regular linear algebra models?
cause the only thing that you can reference or know of, from this. is Transformers
they are not
Are you sure? Not wx+b all the way down?
they are 3 different architectures, connected together. to uhm, work together to eliminate eachother's weaknesses
yes. I coded the thing
Well you're either using loss incorrectly or using loss not by the definition of loss, and it's on you to figure out which
Either way, negative loss isn't exactly a normal thing to see
this is one of the architectures, take a look?
I'm using loss, not by the definition of loss
yes
because, if it's high. then the model is learning in the wrong way. basically it's not coherent
the lower it is, the more coherent it is
btw, this is not the full architecture, its just one model
I also don't know if that version works xd
Well are you sure that extends to the negative side? Does your loss work by value or by scale?
Does it still become more coherent when the loss goes negative?
yes.
# Record metrics
avg_loss = epoch_loss / len(texts)
avg_coherence = epoch_coherence / sum(len(self._extract_triangles(
self.transformer.tokenizer.tokenize(t))) for t in texts)
the texts is the list of text examples. epoch loss, is calculated thru a nest of functions.
Sure looks a lot like regular ANN
though it's a different idea.
kinda similar, but not
I don't know, whatever you're doing is, have fun figuring out it's completely cooked when you get to negative 2 billion loss
You should probably consider actualy testing if the model is getting better as the loss gets more negative by collecting inference samples in something like Tensorboard or just a folder you can look in
Inference samples are really useful, that's how I found out I had accidentally trained a Neuro instead of a Melba already like a few thousand steps in
I did with a few runs, and it worked perfectly fine
this run is different, because I'm training on a larger dataset, while those other runs had two, three, or four examples that gets repeated a hundred or so times.
Well of course if there's only a few examples it'll learn it perfectly, it'll just memorize it
Overfitting magic
true, maybe that's why coherence maxed out to 2. and stayed there, along with loss staying the same. But this time, loss is decreasing lower than those other runs, and coherence is not maxxing out yet
And this time, its exponentially getting longer per epocj xd
Well, more data means it won't overfit as easily
true
Longer?
exponentially
?????
I am on codespaces tho, but this architecture is designed for CPU efficency
Longer in time?
yes-!, that's what I said-!
If the training steps are taking longer over time, then that sounds a lot like a programming skill issue of not freeing something
It is in python tho
and memory usage is at 7 percentage for that process
it's just cpu being maxed out
so no
➜ /workspaces/naivity-tpc (main) $ python test.py
Epoch 1/10 - Loss: -587.7614, Coherence: 1.3965
Epoch 2/10 - Loss: -678.9343, Coherence: 1.6289
Epoch 3/10 - Loss: -734.9632, Coherence: 1.7592
Epoch 4/10 - Loss: -787.7698, Coherence: 1.8630
Epoch 5/10 - Loss: -835.3904, Coherence: 1.9384
Epoch 6/10 - Loss: -864.8787, Coherence: 1.9730
Epoch 7/10 - Loss: -871.6677, Coherence: 1.9859
almost at 1k yayy
ML is supposed to be a very consistent workload
If the time between steps gets longer over time, you've most likely made a resource leak
how hard is it to make a resource leak in python again? or is it easier
If you keep adding something to a collection that never gets emptied, is in global scope and gets iterated over every step, it's quite easy
ooohhh
yea, there are multiple collections that does that
because that's the model's network
If you're just growing the model every step then of course it'll get slower
You should keep the model size static if possible
Adding more model over time just makes the model eventually memorize the whole data using all that added model
the model is empty at the start. and triangles, neurons, and attention heads get added per new information / concepts it learned
my architecture does this, so that there are no context windows
You should keep the model size static, letting it grow unbounded just makes it always overfit
we'll see I guess
Infinite model = infinite memorization
it's at epoch 7 now, 3 more to go
I'm not that dumb you know??
I atleast know what 10+ 9 is! it's 21!
Well, your model is certainly weird
Well not every machine learning model is transformers either
NeuroSynth isn't a transformer but it's still considered to fit under ANN
my architecture also supports audio and video, I had a plan to demo this architecture as smth like an chat agent, that can generate an anime or cartoon subway surfers :mmphm: or smth in real time. while also being able to talk to you, like an LLM
my friends said I was insane
Text-audio LLMs already exist, and the video functionality can be gotten by combining with a text-video model
Well, we'll see if you have enough data to make it actually integrate all the modalities well
Or if you just let it grow infinitely so much that it memorizes all the data again
my data, huggingface :D
Do they have video-text-audio trimodal data?
we could alway see if my architecture has the grokking effect
no, but I convert video pixel color, per frame. to trimodal data. and for audio, I just do somewhat the same
Em
You kinda need all the data types to be related to each other for the model being one for all to make sense
haven't tested the video and audio modalities yet. (since I'm still testing the text one), but it probably, should work. according the gpt-4o's analysis xD
If the data is not related to each other in any way, it's useless
The audio and video is related.
ChatGPT is an idiot and should not be believed with anything that isn't specifically quoted from the internet
but for text, I'll probably use smth like claude to describe the image or smth. or- idk- haven't gone that far
I'll find out later, when I get to that stage
true. but it was a joke xd
it probably will work, because I did test the architectures separately, and it did somewhat work.
but I never tested them together yet
today was me testing it all together, on the text modal
Well, have fun
Report back in 5 years once you're done debugging
may I put 5 dollars to the table, if it's 3 weeks? (jk)
I don't have spare 5 dollars
awww shucks, couldn't you spare some for the Rescue Cat shelter foundation-?
No I'm out of money
or maybe the neuro cookie foundati-
okay
Anyway I must go now
byeee :3
I don't understand what you're trying to do but it sounds extremely ambitious
yea, oh- wait- hi!

I never saw you here before, welcome tho!
am trying to build something impressive, and publish it as a paper, with my school. so I can get into some universities
basically, I'm trying to build an AI, that is very different from what all the other AI model's are built on
Would it taint my system prompt if I put message history in it, or should message history go in the "messages" property sent to openai spec apis
you would maybe want to try the assistant role
is boot.dev a good place to be learning python on
yea, if you have adhd, or just like gamified learning :D
it is easy for everyone
brilliant (the platform for learning) also is
just lookin for somewhere that i can learn
as well as being either free or at least cheap
well I also do message history between different users (multiple speakers)
Name: Message
Name2: Message
BotName: Message
for a beginner to coding boot.dev or brilliant is great for you (if you're just starting out)
though I don't know about their prices
(I never used them, I learnt coding before I knew they existed)
thats fair
The assistant role, I think. is for providing the model, with messages it had said. basically letting it know, that it was the one that said those things
yeah id love to be able to learn programming so like
Yeah, just wondering because I store conversation hisotry in the system prompt instead of putting it back into the messages field
could try putting in multiple users role messages, in the field. and label them as context, with the timestamp or order of the messages
but I think doing that is fine eitherway
if you don't see the AI confusing stuff
so just
{"role": "user", "message": "[Timestamp] [Gobbo]: this is what i said"}
but the AI does tend to lean more to the assistant and user roles for responses, because system prompts are usually just for instructing the model.
yea, that's what I meant
idk if the order of the messages matter (probably doesn't), but if you wanna try sure
Why do you do that?
only reason i ask is because I dont want it to taint the output, as i have an issue right now where it says some annoying starter linnes ro words it repeats every senntence and im trying to avoid that
i.e.
Responses are typically short
Says "Dude" alot
Say "sounds like" alot
Likes to use "huh" alot
I'm not great at model prompting, so idk. you could wait for someone more talented than me, to respond tho-!
Just something I did first and stuck with, didnt realise you could just put them in messages too thought it was for new input only
but if its bad to do ill refrain and do it the other way
just dont want it to think its receiving new messages
and answer them like theyre new
but i guess itll get cotnext
based on the asssistant messages being after
the old messages
My main reason to not do that is that honestly it's just clearly not how it's meant to work. As for the reason I'm unsure - I think system prompts are stronger than messages for signalling so you get stronger language from them? idk if it recognises the system prompt boundary
Well i should just stop being a chud and actually try it to see the difference so ill give it a shot either way, just wanted some clarification if anyone knew
ty
oh wait-! almost forgot to mention, if you would put a timestamp of the current date and time. probably put the current date and time (the latest current one) in the system prompt. so the model has something to reference for the time passed, during those other messages
From what I read models can be trained to prioritise things in the system prompt boundary so your messages are having a stronger effect than intended on the result when you put them in there - which is interesting but idk if I fully got the details down
lmao, the model doesn't know the answer xD
a 
Looks like its ready for twitch
Put in the messages yes
That gets passed into the model as conversation history formatted with the model's special tokens
Stuff in the system prompt is considered by the model to be examples
Using the messages field passes it as actual message history instead, which is not contained within system tags
yeah my thoughts exactly, i assume it was use it as examples ad thats why it uses said repetitive words
https://www.youtube.com/shorts/A_aAQBDNMN8
So.... Uh.... Are the people who make decisions at MicroSlop capable of feeding and dressing themselves?
Or do they also need someone to help them use the toilet?
They've built one of the least impressive AI solutions on the market
Stuffed it down everyone's throat, by greasing it with numerous levels of malware and broken codebases
Lost so many customers (including Enterprise ones), that they've begun hemorrhaging money from their recent investments into the project
Fired 6K employees to make up the loss
And now replaced the books that have ZERO electricity requirement, with AI-centred learning that requires heckin GPUs and RAM (and cost overhead to run them)
So that now they can ALSO piss off their employees...
Any reason why sometimes tool calling supported models sometimes dont return a tool call but instead put it in the message?
2026-01-26 14:00:26 - module.brain - INFO - {'role': 'assistant', 'content': 'ugh, fine... <function=brave_web_search>{"query": "Final Fantasy facts", "count": 5, "offset": 0}</function>'}
annoying a little bit
Can assume it's a failure by the model to format it correctly
Could be caused by high temperature
you might have a problem

still got another 200+ tabs before its too bad
also you can just close and reopen
oh, i turned that off
to fit more tabs
i've never had that.
it's a setting
yeah probably
i set it on purpose liek that
if i reboot browser i want it to destroy everything basically
lol
but i need these tabs because they are all very important ebay listings
can't imagine
i have officially closed a window because it was bullshit
o7 97 tabs
i got a lot of listing to still go
through
I've had several hundred tabs open, before...
your sata ports are satisfying
i have several thousand that i have saved in a different session
it's not so bad on this window
i can see waht site it is xd
1 rogue stack still extant
can't believe you use vivaldi but don't use tab stacks for this mess
i do use them
normally
but
i have an ebay window open first
so it's got that stack expanded
anyways
so it barely changes if i reopen
When I was pulling old drives from other builds that were being split up, I tried to keep them organised on attachment :3
the stack
yeah i actually gonna change that back
i like the one where it's 2 layers
oops
sorry wrong reply
Oh
i am planning on grabbing like 10 hdd rn so this gives me hope for the future
because
i wish to see a list of a lot of drives in a nice manner all organized
that's all i need the drives for
just look at me
em*
lol
jk
kek
Grab a bunch of WD 150 GB, 10K RPM VelociRapter 2.5" HDDs
you are really not that far off the plan
hell nah ive found today








