#programming
1 messages · Page 386 of 1
the discussion was on non-LLM AIs
and if they were capable of feeling excluded, and if they really were being excluded.
They don't tokenize/detokenize so not really text
those models are explicitly designed to produce language already, which was the feature i said was excluding. it's not interesting to ask if an LLM could interact with other LLMs on a forum. it's obvious they could.
i prolly shouldnt have joined this role thing ): i only know very little html and css
see the description beside the channel name? HTML and CSS counts as technical
oh really?
Tbh I think molt/clawd is pure brainrot
everyone here has gaps in experience/knowledge; imposter syndrome is common among programmers
🤔 I didn't watch much Hiyori-era neuro but it was barely coherent, so plain GPT-2 API sounds plausible. Definitely not still based on the same model.
my guess is GPT-J, because GPT-2 was just a bit too incoherent for early Neuro imo
Not to mention the model can shift if you train it well enough. Using the distill process to move over to something else as long as you recover from any losses due to archeture transfer.
I mean
was kind of coherent in sentence level, but could not carry meaning or coherence from sentence to sentence
kind of
there were some blocks of coherence
I still remember her talking about snegglings
https://youtu.be/QAYU4sp83l0?t=4259
Check out Neuro-sama on:
Twitch: https://www.twitch.tv/vedal987
Twitter: https://twitter.com/NeurosamaAI
YouTube: https://www.youtube.com/@Neurosama
Stream information:
Stream Title: Neuro-sama chill stream (first partnered stream no way)
Streamed: 14/04/2023
Downloaded: 15/04/2023
Please read the channel description.
basically for a span of ~10 mins she talks about sneggles and explains them and stuff, though It might be because chat prompting her instead of her remembering stuff across sentences
HTML and CSS are not code tho
They're just markup and style
Altho now using modern features you can do complex things it's all quite simple actually unless overcomplicated
May one refer to the channel discription
:3
this is less programming chat and more
chat
we're all
down here
To be fair, anything remotely tech related goes here usually
Tech as in technical, not technology


no
i wish to unsee
Everyone wishes the same thing
Okay i managed to escape from skyrim prison using only commands from tony and it even doenst look like complete garbage
This is going far better than i expected
The code does look like pile of shit though
but if it works it works
Global variables traversing through saveloads need to be dealt with tomorrow

there are shit tons of open sourced llms in 2021
pretty sure i've already seen bunch of chatbots made during that time even in discord
lmao what
Wasn't the 2019 version of neuro just an osu bot anyway?
It wasn't until 2023 that vedal added the chatting functionality
By which there were already way more LLMs available
it was dec 19th 2022 that vedal debuted chatting functionality
neuro was a headless AI before then
He was working on the vtuber part in 2021
Subreddit Simulator with GPT2 was running in 2019 as well
and those are all fine tuned models
THere are some bangers in there
speaking of subreddit simulator, there's now moltbook.com - security risk though, built by agents and... they are not done yet
just AI things
they keep calling their operators "my human", and some are trying to make secure, private comms out of sight of the pesky humans
Just found this
You can GPG sign your commits, but I've never worked on a project where anyone cared
The checkmark usually meant someone edited it through the web interface and didn't test it locally
I remember this
I had seen something about this on a youtube video and decided to check it out
Devistral 2 is a pita to quant
Need this
That's his secret
August 2021 was when the AIris video was dated, but that's still well after GPT-Neo, GPT-J, and etc.
Even then it's likely he changed a little bit after but still
also how did you find this information
it's someone on the reddit that seems dedicated to proving that Vedal is in fact tony stark
I turned GPT OSS 20b to 25tps with 4060!
from neuro must be a custom model that predates any open-source LLM to Vedal truly was the first person to use an AI model to generate motion for a 3D model
citation needed
I hate this
oh this person
Neuro is surprisingly good at simple arithmetic when she isn't pretending to be dumb. She's the only AI I've seen do division without a digital calculator. This was back during the Hiyori days before Neuro even had access to a clock.
????
this hurts me that it's upvoted
Dunno I think that guy's on 
The biggest issue is still the cost to build an LLM from scratch
thats just not seomthing easily avialale if youre not finetuning
yeah those replies look like
to the max
Neuro's AI is emulating a mind suspended within a void which can only discern reality by parsing concepts.
that's a convoluted way to say she's not an image model
But also, if the first vid is 2021, that doesn't mean its a finetuned or custom model at that point in time
yeah, it's pretty much right in the timeframe for someone starting something with the first really capable open source LLMs
At this point shes 100% finetuned, but early on, especially prototypes or proof of concepts could very much be just an off the shelf
proprietary and just done with prompting
anyways side topic but I really want to see more coding dev streams again
and not just like, drawing coding
like code coding
game integrations?
I guess they could develop an integration
maybe finish among us
sub count/prio messages queue ui or game integration yeah
lmao
om
saving this to my memes folder because i can
Don't call me out like that
whatever
go my hiyoriRotate
Wait, nvm it just doesn't show the name because it doesn't have the url argument

its literally just the gif ripped from the emoji upload
and resized
found the url myself
Yeah, that's what I mean, because it wasn't done through fakenitro, it doesn't define the emoji name, so I guess it just uses FakeNitroEmoji as a placeholder
hm
interesting
it shows as an emoji for you?
what did you do to your client
Certainly not mod it
Anyway gtg
I don't want to make this work for higher than 64 bits 
I'm going to have to and it is going to be quite the time
Anyone wondering about the output. Every variation of a uint with 2 bits on, shuffled so most the order is most likely not having the same bits active twice in a row
Insane that the math lines up to where bits+1 is the mostly correct solution to this
I've used that before to permute a set of N bits
Probably doesn't help with your shuffling though
It does help with the first part at least. That generates the thing I'm doing but in reverse. Beats the method I was doing before.
Only issue is that I don't have that builtin in python land. So that'll be saved for the C rewriting once all of the logic is done and prime time for optimizing.
it is regular utf-8
Just c, m, and d
Different font
Same letters
ridiculously long 0x string
f09f94baf09f84b2f09f84bcf09f84b3f09f94ba
what? it should be utf, not font
task for chatbot btw
it is utf...
Cool i guess
Slightly concerning the llm puts these two in the same sentence, made worse by that one being first mentioned
Am I jumping to a conclusion or is X really not ordering responses to tweets by chronological order?
nvm it makes no attempt to hide this
the bourgeoisie is stealing from you so the line is thin as it is
live is hard


I'm making Aria Math Epic orchestral V2
a bit more inspired by Hytale rather than epic film scores

I love aria math
also, didn't know music stuff gets posted here
that's cool
It isn't that often but is allowed because music production is technical by mod ruling
museic product ion
Cafe library but you cannot bring any tech inside
Now the question is, does writing pad considered tech?
that's so indonesian piled
sigh...
nice
I have to argue with the server that no, this writing pad can't be used for anything else but writing. It's like one of those child toys with magnets, but this one the resolution is higher
and they'll tell you in the face "just write in paper"
Exactly. They don't understand that I bring this to reduce paper waste. And I bring my own shopping bag anywhere. Like bruh, the current city that I live in is bigger but somehow more backwater than the previous city I lived in
back in school they'd force us to copy the whole material in a note book
for all 11 subjects
and printing the note isn't allowed for some reason
then we'd have to memorize the whole notebook
lmao
Yeah, good times lol. School is a whole other issue. But this place banned gadgets (ID: gawai) but phone. I consider my writing pad to not be a gadget

What is your writing pad btw
Just curious what you’re using
Idk, generic chinesium. The marketplace says it is xiaomi
Nice to know that discord edit is shit
I know, but he can (insert the server picture)

AC servo?
real
since when did a 20+ yo game needs 72 gigs of space dawgg??
ik this is a collection, but still...
hello, what would be a good OCR to use for AI Agent that watches your screen?
Why would you want an AI agent to watch your screen?
you know OpenClaw?
I wanna do something similar but it lives in your computer and has a proactive model that comments and acts upon your own actions on your computer.
currently i am using easyocr but heard there are better and more accurate options
Well have fun when the LLM accidentally deletes your entire root drive
Tesseract is a very good OCR
it's a non-"AI" OCR. very old but fast and works well enough for printed text.
for watching the screen it should be fine.
for windows though you can probably use window messages to get actual text from most windows, which would be faster and have less errors.
depending on how it's rendered
like for menus, window title bars and dialogs. you'd need to OCR things like PDFs still
understood, yet i heard that its good for mostly text extraction from screen
also lightweight
multiple games worth of high res textures will do that for you
Openclaw is great in that its a massive security vulnerability in your system that you willingly blast open
Me no using openclaw😭
I use it as inspiration for my own software in developing
It has way too many good features
why you even think guy, who renamed his project 3 times, can do something good?
||-# (i saw FS video apparently)||
i not like this edited appearing under -# string
, why not just be like this 
why do you think he renamed it
anthropic being the very nice company they are sent them a little happy letter
cuz i saw FS video? 
i don’t know what’s in the video i just know of the project
he renamed, as fact 
because clawd is very similar to claude, they were forced to rename immediately and came up with a bad name (moltbot), everyone hated it and they had a few days to come up with a better one (openclaw)
I like moltbot better than openclaw
not that I use any LLM for more than 5 minutes of curiosity
whanever
it is, you should use Ai for searching, if you don't know keywords, it is right way to use ai
only one right way, except things like:
I don't need it
ok, but I could also just
❯ echo -n f09f94baf09f84b2f09f84bcf09f84b3f09f94ba | xxd -r -p
🔺🄲🄼🄳🔺⏎
or google "hex converter" to get a deterministic conversion
no, you can't, unless you know this commands
I just got this raw string on decoder website, and asked chat bot cuz it faster, and it may pick up something extra, not in this case but in general, it isn't human readable in first place
btw, where you found this command(prog) and what exactly it do? also wtf is n for echo, but actually whatever..
I said "I don't need it", I never said "You don't need it"
you can't know every possible command
Can we embed images with <img> in discord yet or nah
or something similar
We're almost there anyway
¯_(ツ)_/¯
Few more UI adjustments and we'll be looking at GeoCities or whatever that website was for an editable webpage
can anyone explain why proxmox sometimes shows me negative percent cpu usage on some containers and vms
like what's the math there

wrr
wrrr
wrrr

regenerative braking obviously
Rawr
wrrr
what is that it looks like proxmox but it is not proxmox
Oi Sam. How hard is it for a experienced dev to learn gpu rendering properly?
its a proxmox thing called proxmox
but why can you see nginx in it howww
its the container name
all of my containers are named
do you know portainer its web based for your container (container stack)
yeah but honestly its not really great
kind of bloated, pay walled, and just doesnt run good on my systems
i dont use it but a friend of mine says its crazy good but there we have it its not good thx now i know i dont need to try it
there are a billion other options if u really need some sort of webui like komodo, dockge, podman, arcane, docmon
in the end its just kind of abstraction
really depends on what you mean with gpu rendering
naaa i like my ssh and my terminal
opengl is easy if you understand some very minor concepts
vulkan is harder, but its fine if you read the docs and do some test programs
"gpu rendering" without a cpu side api is really difficult
you can always just learn opengl then if u hit a limit or a roadblock, switch to learning vulkan or similar
as the concepts will carry over
opengl is pretty much good enough for most things
its a good starter
hmmm basic opengl/vulkan pipelines, primitives, shaders, and the usual software patterns
unly if you're limited to performance, or some very small minor limitations of opengl is vulkan needed
its pretty much only RT and tensor cores that opengl cant do
webGPU ive heard is good too (web part doesnt mean its only for web)
it can do the math, but only on cuda cores
so not hardware accelerated math
just normal gpu math
ye the basics of graphics programming is easy
it's the bajillion shader tricks and other stuff you can do to either make things look better or squeeze out more performance that's difficult
opengl is all one thread though too
ye
just Vulkan but cut down to work on web
whenin doubt, take the dot product.
vulkan is kinda one thread too for the main render loop
but many things can be done outside the main thread
im pretty sure its based more on metal than vulkan
but its probably the closest thing vulkan has to competition atm
apparently its really good for cross platform and isnt as verbose as vulkan
seemed very much like Vulkan when I used it
never used Metal
havent tried it yet though
uploading stuff to vram can be done in a different threads, only limitation thread wise is doing the display present queue in only one.
that can run at 100k fps tho so its more than fne
hmmm so to sum all the responses up, not that hard to get into, but it takes a long time to get truly proficient since there is a lot of tips and hacks to learn
there is ALOT of boiler
triangle
especially in vulkan
id argue that most vulkan code u will write will be boiler
(thank you VMA)
ye but the boilerplate is not difficult at all
it's annoying but easy
Alright, thanks for the responses 
"pipelines, primitives, shaders, and the usual software patterns" thats all opengl
vulkan is a step above with swapchains, commandbuffers, physical device vs logical device.
its a mess but because you define it in excrutiation detail, you can do a lot of very presice and fast operations with it
its because they have the scheme and organization of a java project
RenderThingToScreenVk1Ptr
so... comparable to let's say C and Assembler? The cost of Abstractions?
id start with this
https://learnopengl.com/
Learn OpenGL . com provides good and clear modern 3.3+ OpenGL tutorials with clear examples. A great resource to learn modern OpenGL aimed at beginners.
you'll be on your way in a week
its not hard at all
kinda ye
it's mostly because it tries to support older GPUs (that were new back when Vulkan was new)
same thing that happened with OpenGL basically, except OpenGL has a shorter path to getting stuff done quickly (at the cost of performance)
read https://www.sebastianaaltonen.com/blog/no-graphics-api 
so much memory management related stuff in Vulkan that's now useless
i mean
99% of the time you will be doing the exact same sequence of vulkan api commands to upload stuff to memeory.
but its not inhgerently a bad thing
not even its just verbose
VK_NV_framebuffer_mixed_samples
the alternative to comments
its 67 characters long
because idk what that means
opengl is something you learn in a week, and then get a month of practise to fully grasp.
vulkan is something you learn in a month if you have the experience form using opengl, and then you practise for months
my favorite windows function name
RtlWriteDecodedUcsDataIntoSmartLBlobUcsWritingContext
i havent use Rtl
its from wcp.dll
that's just lack of generics or overloaded functions
language issue 
so OpenGl as entry drug, got it. 
I went for Vulkan first, never touched OpenGL 
it's really not that difficult
but OpenGL is arguably better as an introduction because it gets you results faster
its easier to learn without opengl
id argue the opposite
most of the stuff u learn from screwing with opengl can be brought over
Is this something from NTDLL?
Rtl prefix made me feel like it was ntdll/kernel
here is another good one though
i mistyped with and it autocorrected to without
i meant with
thought you argued the opposite way
lmao
we're both in agreeance it seems
there was a bug in our convo
hi
```
RtlWriteDecodedUcsDataIntoSmartLBlobUcsWritingContext [wcp.dll], with 53-characters, was the longest one (but this function is not documented)
not documented 
how are you even supposed to use a function when nothing says what it does
Someone reversed what it did and imported it and now M$ can’t remove it
you aren't
it internal

it jsut introduces you to shaders without needing to do the annoying maintenance and boilerplate for making the 15 different cpu to gpu pipeline components
opengl has only vertex and fragment shader
and compute shader if you feel fance
vulkan on the other hand lets you change every single component of the render pipeline
me when i upload a texture
ye that why OpenGL better for beginner who just wants triangle on screen 
Vulkan good to learn at some point but not required immediately
Bro what the FUCK is that line length
I would recommend learning it at some point
or alternatively, learning about OpenGL drivers but that's more difficult
The thing that sucks about OpenGL is the state model
Forgor about one setting you changed and now it applies to every draw call
im jsut specifying the sample bit, the format, the tiling method, the usage, the gpu memory type, the read/write permissions, the aspect color, the view tpye, and some staging stuff

I mean like, can’t you format it to actually fit in reasonable width?
Just nitpick
I’m sure it works
how do you expect me to make it shorter?
those long variables are vulkan api stuff
i didnt make them that long
spread across multiple lines
need a macro for the macros
no need to only code on the very left side of my monitor
i odnt like having a single function spread over multiple lines
its like
stay in your lane
Clang-format my beloved
ye it sucks, but long line is even worse 
best option would be to just fix whatever API design led to this mess
i could change my boilerplate function
but that makes them less usable in different situations
yeah, you'd probably just have a single enum that specifies that combinations of parameters that you actually require
ye don't need 
good to do when you have many instances of this happening but otherwise it's whatever
introducing abstraction where it's not needed also bad
remember when i was like, i don't want rgba32, i only need 1 bit?
apparently, opengl doesn't have a 1bit option 
ye just pack into integer 
this is the type of scenario that vulkans boilerplate is for
does Vulkan have a native 1-bit format?
lemme check
yep
well
instead of writing into an image, you write into a buffer
then you can do 1 bit
otherwise its also R8
actually, opengl can probably do the buffer method too via SSBO
🤔 yeah there's no native 1-bit format unless you're just using the alpha channel of RGBA 5551
Ahh the fun of python3.13t I have a headache but no GIL
πthon

it honestly doesnt seem to be any different
yeah the closed and open kernel modules are really not much different (on supported hardware)
most of the driver is in userspace
i'm surprised they don't force the open modules for all gsp cards
though i guess if they have to keep maintaining the closed ones anyway for older cards what's the point
but then they also just dropped support for them now so 
I hate tiny budget screwdrivers so much 
Every revolution of the screw requires a blood sacrifice rn

try with __GLX_VENDOR_LIBRARY_NAME=nvidia
since presumably vulkan works so the driver should be fine 

hi shiro
i found a game and its steam page advertises it as RFC 5737 compliant
valentine's already? 
the aliexpress valentines sale is today eventho its 2 weeks early
so ive been getting constant reminders of it
coz shipping
its always on the 14th i think
im too cooked from the vtuber community
you meant shipping as in delivering parcels
both work ;]

phrrr
white day on february 31st

why is it called white day
```
White Day, celebrated on March 14th in Japan, South Korea, and other parts of Asia, is a day for men to reciprocate gifts (usually chocolates) given by women on Valentine's Day, acting as a "reverse Valentine's Day" where men give gifts like white chocolate, candies, or other tokens to express appreciation, often valuing the return gift at two to three times the original gift's worth

ye
me day?
shiro 
welp, i dont have to worry about preparing gifts

tbh, i didnt even know valentines was supposed to be woman -> men
i thought it was a free for all battle royale
doesnt this imply that tho?
no
im misinterpreting it then
yes
are you in japan?
no
see
see what?



its not too uncommon to give obligatory chocolates to your friends on valentines too, though i dont expect you'd get a return gift for that
in general or with "you" are you specifically adressign me?
thats fucked up
good

im glad white day isnt in the UK, because it means i get to eat my free chocolate in peace
though, ive not actually recieved any chocolate since i was in school 
no choco
ive not actually recieved any chocolate ~~since i was in school ~~

we dont really do valentines here either
i can tell that by looking
konii is in my walls

but ye nah
its only common here to give someone chocolate if you're already a tuple
tuple 
gotta throw some programmer lingo in there
otherwise the mods will banish me to the shadow realm
tuple inclusive word
is couple not inclusive?

isnt it functionally identical
if the size of the tuple is 0, then where does the chocolate go?
if the tuple is 0 then not even myself is in it
so i dont get chocolate either
thats a waste
type(())
<class 'tuple'>
confimed. chocolate is still on shelf in shop.
i think i fixed it
its using gpu now, just very badly

atomicOr wastes valuable time
do you need it to always be atomic?
me when i use atomics and they end up being serialised so parallelism go poof
for 1 bit ye, cuz buffer stuff is funky
atomicOr(volume.data[wordIdx], (1u << bitIdx));
i think
we need to go back to 3D texture
and accept the R8 using 8x as much storage
Nah
this has a better solution
It's called bitmasking and shifting first, then calling OR up the tree until merged
would need to rewrite to handle bulk tho
nah
sth sth parallel reduction
new idea
OR is associative right
we do the compute shader in vertical slices
and have the data in vertical stacks
that way only 1 thread uses a 32bit word at a time
Hi
Hi
I have a question
Go ahead
the quick and dirty solution would be to first compute a group-wide mask and then have one thread do the atomicOr but idk how to do that in opengl 
I recently got a laptop with an 81Hz FHD IPS screen
cutting contention by a factor of 32 or 64 ought to be enough
that was my thought as well
if it's really really bad you can do another level before writing to global memory
hold in register and keep pulling data in until your done, then commit
The screen should be FHD, but just for fun, I set it to 2K on Hyprland, and it WORKED, but Idk how to understand if its fake or not
that's an oddly specific number 
And my question - how do i can understand it
hyprland will just give the display whatever is in the hyprland settings, then the display will take that input and tries to convert it back to FHD as that's your monitor's native res
Some screens will do some virtualization for off res, but I've never seen it that high off before
the thing may accept 2K streams, but unless it is proven that the construction of the screen supports that res then it's just doing downscaling before reaching you.
oke, thx
And i have one more question about another thing
I use Sober for playing in Roblox
And recently, if I play for a long time, it starts to freeze up a lot.
idk why because it shouldnt get a lags
I've looked into Sober, just as an option for an old job I was working at. Can't say much because don't play the was oof game.
The reason it is lagging is because it is emulating all of android in order to run the roblox APK inside of it
I started getting lags ~2-3 days ago
that's at least unless games are doing things ofc
Usually i have enough fps
But now
(i have ryzen 4700u)
If i play a long time it starts to drop to 3-5 fps
maybe memory leak?

specialised² runtime


its doing 9000 fps now
in terms of gpu power usage its till bad tho
and i fucked up my code somewhere
We need more FPS 
for context it did 800 fps before optimizing for word stuff
Yeah, that is the important part
was oof game what
life is oof game
they removed that default death sound
it was re-added
buffersize needed to get some extra size
imagine if off by 1 error
why is this a picture 
oh it probably wont let me normally insert all these underscores and thetas into notepad anyway
If only there was another way other than using notepad 
i tested i
SSBO is faster than 3D texture
its still using 40% of my vram for some reason tho
7% seems to be the compute shader, 10% the fragment shader, and the other 23% is the buffer
so this already requires 10GB of vram

i officially blocked everything under 3080 class from running this
i need to run some type of profiller on this cuz thats not a healthy amount of vram usage
my pc just completely froze, and i wasnt even doing anything.
Not out of ram cuz btop was open while it froze and i have 5/64gb used
I've got to dig my drill out of the shed and manually expand the holes now
waybar triggered bad PTEs
idk waht that means
Does each invocation just calculate a single bit?
I would either just calculate 8 per invocation, or store the intermediate results to a __shared__ array, __syncthreads(), then have 1/8 invocations or the results and store.
Atomic is a big sledgehammer for that.
```glsl
#version 450
layout (local_size_x = 16, local_size_y = 16, local_size_z = 1) in;
layout(std430, binding = 0) buffer VolumeBuffer {uint data[];} volume;
const uint WIDTH = 570;
const uint HEIGHT = 1140;
const uint DEPTH = 144;
const uint NUM_WORDS_Z = (DEPTH + 31) / 32;
bool voxelPattern(uint x, uint y, uint z)
{
return (z > 72u);
}
void main()
{
uint x = gl_GlobalInvocationID.x;
uint y = gl_GlobalInvocationID.y;
if (x >= WIDTH || y >= HEIGHT)
return;
for (uint wz = 0u; wz < NUM_WORDS_Z; ++wz)
{
uint val = 0u;
for (uint i = 0u; i < 32u; ++i)
{
uint z = wz * 32u + i;
if (z >= DEPTH) break;
if (voxelPattern(x, y, z))
val |= (1u << i);
}
uint wordIdx = x + y * WIDTH + wz * WIDTH * HEIGHT;
volume.data[wordIdx] = val;
}
}
this is the full shader
now that i made it suck less
and then i do glDispatchCompute(ceil(570/16.0), ceil(1140/16.0), 1); in the c++ side
there still is some performance left in here, i just have no clue how words work yet
u will probably be able to optimize that loop when u learn
(cuz im pretty sure you can compute that without the loop)
You could but I assume voxelPattern will get more complicated
ye it will
you can still hoist baseIdX and stride out of the loop
thats why its a separate function, just a placeholder
by definition a word is the minimum size that can be manipulated at once. So for your case, you have to work with 32 bits at a time
The compiler will catch that unless it's really dumb. I'm less sure about the if (z >= DEPTH) break; in the inner loop because GPUs hate conditional loops, but as long as it's the same on every thread it should be fine.
yeah
that can be fixed by making bounds with min()
```cpp
uint val = 0u;
uint zBase = wz * 32u;
uint zEnd = min(zBase + 32u, DEPTH);
for (uint i = 0u; i < (zEnd - zBase); ++i)
{
if (voxelPattern(x, y, zBase + i))
val |= (1u << i);
}
volume.data[baseIdx + wz * stride] = val;
for example
That makes sense to me at least
branchless my beloved
checks out ye
But there's a branch 
do you need a condition? if voxelPattern returns 0 or 1 you could just repalce the 1u in the assignment and there's no condition... just a redundant shift and or of 0 when not needed
depends on if the conditional or the write is more expensive
The if (voxelPattern... condition is not likely to matter. Shader compilers aggressively turn those into branchless ops when possible.
is the buffer already zerod by default?
ye
we jsut do glClearBufferData(GL_SHADER_STORAGE_BUFFER, GL_R32UI, GL_RED_INTEGER, GL_UNSIGNED_INT, &zero);
each frame
honestly
i think u might as well always write then
if ur clearing the whole buffer every frame and then conditionally writing to it i imagine it might suck a bit
Yeah conditional memory writes aren't helpful
And if you do store the adjacent u32, it's gonna write the whole cache line anyway
can the voxelPattern function just return this stack of 32 values? (like, can the loop be moved somewhere else? wouldit be better to call the method once instead of 32 times?)
hmmm
i mean yeah if u go the method of using words to compute everything
the voxelPattern function is meant to kinda check if there is a polygon at the input coordinate
the if block can probably just be replaced with ORing with a bool and always shifting
i still dont think the loop will be avoided unless u go to that extreme
but u said ur still expanding it
im geussing ill need to figure out how voxelPattern is gonna work before i can optimize much m ore
best to make it work and be understandable first
prototype
macos system font?
IT was proxmox you could Tell but the Container Name foold me
if u REALLY want to avoid the loop without making EVERYTHING a word you can also just go analytical road based on zBase
```glsl
uint zEnd = min(zBase + 32u, DEPTH);
for (uint i = 0u; i < (zEnd - zBase); ++i)
{
if (voxelPattern(x, y, zBase + i))
val |= (1u << i);
}
this wasn't faster apparently. only got 2/3rd of the fps that the one with break got. i think the compiler might have optimized it to something like this
```glsl
for (uint i = 0u; i < 32u; ++i)
{
uint z = zBase + i;
if (z < DEPTH)
if (voxelPattern(x, y, z))
val |= (1u << i);
}
```
if (z >= DEPTH) break;seems to have the same performance as if (z < DEPTH) ....
what will DEPTH be? can you force it to be a multiple of 32 all the time?
Try to see what the cuda compiler does
It just calculated the word values at compile-time because voxelPattern is simple
https://cuda.godbolt.org/z/981K9YE39
it can pretty much be wahtever i want, its a const
```glsl
const uint WIDTH = 570;
const uint HEIGHT = 1140;
const uint DEPTH = 144;
as it should
would it make any difference if you got rid of the special case for "not exact multiple"?
it probably makes very little difference compared to how the actual voxelPattern will.
ok, ill return to this once ive made voxelPattern
```glsl
global void shaderMain(uint* const volume) {
const uint y = blockDim.y * blockIdx.y + threadIdx.y;
const uint x = blockDim.x * blockIdx.x + threadIdx.x;
if (x >= WIDTH || y >= HEIGHT) { return; }
for (uint wz = 0u; wz < NUM_WORDS_Z; ++wz)
{
uint val = 0u;
for (uint i = 0u; i < 32u; ++i)
{
uint z = wz * 32u + i;
if (z >= DEPTH) break;
val |= voxelPattern(x, y, z);
val <<= 1;
}
uint wordIdx = x + y * WIDTH + wz * WIDTH * HEIGHT;
volume[wordIdx] = val;
}
}
Small change, at least in my mind. Removed the second if branch
uint wordIdx = x + y * WIDTH + wz * WIDTH * HEIGHT;
oh
If you're having this single invocation calculate the whole stack of z-values, you should order the axes z, x, y in the buffer. GPUs are even more sensitive to cache locality than CPUs.
where you getting these form?
blockDim.y * blockIdx.y + threadIdx.y
?
you could probably also compute using bit shifts and masks
if you dont wanna go 100% word
ah I guess the godbolt link had a slightly diffrent version
godbolt?
That's cuda so not applicable to you


You're a level above 
ig the code swap I'd be doing then is
```glsl
if (voxelPattern(x, y, z))
val |= (1u << i);
```glsl
val <<= 1;
val |= voxelPattern(x, y, z);
```
need to swap the shit and OR, or it'll always have a 0 for the low bit
True
thats why im thinking u could just use bitwise
i think it's also reversed bits but same idea
return a word where bit i is set if the voxel at (x, y, zBase + i) satisfies the pattern condition z > 72
the reason we're going layer per layer along the z axis is because that way we dont need to do atomic stuff for the word.
or something like that idk
We do it for the love of the game
I think the cache thing is more than a micro-optimization
i'm just distracting myself from finding out where my code is blocking.
the atomic shit form before was painfully slow compared to what wee have rn
well yeah but its not even finished
I'm also distracting myself from working on array manipultion simulator
also known as NN
Yeah you can do that. I'm saying you should have each thread store adjacent values.
uint wordIdx = wz + x * NUM_WORDS_Z + y * WIDTH * NUM_WORDS_Z;
Let me try and explain and someone else hop in for exacts
```cpp
volume.data[baseIdx + 0u * stride] = 0u;
volume.data[baseIdx + 1u * stride] = 0u;
volume.data[baseIdx + 2u * stride] = 0xFFFFFE00u;
volume.data[baseIdx + 3u * stride] = 0xFFFFFFFFu;
volume.data[baseIdx + 4u * stride] = 0x0000FFFFu;
what
all just computed with bitwise
I'm assuming the goal of rotating the array in memory to have Z accesses be as the first dimension iterated on for cache reasons. My brain is dead today so sorry if I'm not entirely following
Yes
otherwise it needs to constantly reload far-apart cache lines to insert new values
z threading
That'd be stored in memory as Z X Y or something similar, proably being ZXY or ZYX depending on if X or Y is smaller
somehow the code i did in python for game engine stuff was so much less shit
abstraction
python did have direct shader code fields, so not that much
surprised they let their users have direct anything
like storing the glsl shader as a string to compile
Numpy has more guard rails than the opengl libary used
when the shaders compile and crash your entire pc during final fantasy 14 trying to make a cute outfit on a wednesday afternoon after making your exams

The other end is what we call "ubershaders" where the developer just has one shader which does everything. Don't make ubershaders.
me designing an algorithm to detect when the user is performing well in the multiplayer online game so I can forcefully rebuild shaders
not speaking from personal experience, trust me bro

if its a really simple shader it should be fine
but us gamedevs dont need to be like pixar
we dont need the principled BSDF
we love the entire character model in a shader tho
reminds me of shadertoys where people implement audio and other shit using shader code
just make an entire game like that
I saw a thing like yesterday in shadertoy of an entire scene built on a shader
thats the smallest thing on their website
keep looking at the other projects
lmao
That is very much true
This array never changes
I should make it a constant instead of a bound buffer
shader gets slower
me when i dont properly utulise the gpu hardware
shortage will fix that
nope it'll just breed sparse fp4 frame gen
If they actually nuke FP64 on consumer cards like they've done on the B300 I might just never upgrade past this 3060 in my laptop
No point in ever doing it
We goin back to GBA color quality 
No need for VRAM if you draw with an electron beam in realtime
shader microservices
Shader as a Service
at that point we need to restart the human race
render farms 

no need to render
blank paper
the frames would atleast be real
GBA? that's full color, with transparancy, but no red channel? so for color blind people? (advanced humor, sorry)
systemctl --user restart fortnite_dance_renderer
took me a while
can’t read gba as anything other than game boy advance its meant that my whole life
I overthought the joke, that makes sense now
I am dying at the cropped text
$00B
i have another one with human centipede
when i was a kid even before school EVERYONE either had a gba [sp] or a psp. in kindergarten, on random playgrounds, everywhere
i was one of the psp bunch i had that
oh and there sometimes was an occasional kid with a ds which would always be the coolest one because woah a ds what’s that
ds not popular here

I only had the DS, already went past GBA's prime by the time I got something for myself
It's wild how closely the DS and GBA launched
Nintendo wanted separate product lines which made 0 sense
2001 and 2004 I think
and by the time i went to school the coolest thing to have was an ipod touch who cares about a ds or a psp lmao what’s allat lame stuff

Our schools had rules against kids using tech outside of school computers, so there were no tech based trends until HS and that was "newest iPhone"
tbf I went to HS with tiktok so that shows me as youngling
we did too but tbf our school was an exception, didn’t prevent us from sneaking it in
i did too but people in nn call me unc? 
when they hear my age
I have a feeling if I were to say I'm 21 I'd also be called unc
you will because i am 21 

403 new message 
Forbidden
wait wha
Yep, I thought it was funny lol
did something happen?
Nah, just a coincidence
om
if 21 is unc then I'm a pile of ash
urn
btw, laptop
Startup finished in 12.532s (firmware) + 6.067s (loader) + 2.895s (kernel) + 2.213s (userspace) = 23.708s
graphical.target reached after 2.189s in userspace.
dam
insane
19.010s (firmware) + 5min 43.346s (loader) + 2.833s (kernel) + 5.342s (initrd) + 13.312s (userspace)|
AM5 moment
(the loader time is probably just because I started the PC and walked away before entering the boot drive encryption password)
i wonder, is disk encryption handled by hardware?
For my boot disk no. For the 990 Evo+ that stores my games yes.
You can enable Opal (self-encryption) for drives but not all drives support it. You need to secure erase the drive using the PSID from the label, then explicitly specify --hw-only or something in cryptsetup. By default cryptsetup will just do it in software, which probably uses the AES instructions on newer x86 CPUs.
My boot disk is buried under a GPU and I don't feel like re-installing the OS so I haven't bothered fixing that to use self-encryption. 
ok, i just use laptop, and prob i should use encryption (as only way to secure data if laptop is stolen)
will research it later
or never
idk
340W... yeah, beast
340W for 29fps 
the compute shader is mayhaps really fucking bad

oh ye, that doesnt embed
the lack of syntax highlighting doesnt help
i forgot i could change it in discord itself tbh
why didn't people invent an hdd with a magnetic mobius strip inside
are they smart


You're going to project that to the church? 
im pretty sure if forgot to do the indexbuffer stuff
im sure its fine
Hellou
I bring gifts
include 👩 🍆 💀
if (👩){
if(🍆){
🍆💀 = True;
}
}
it needs an egg

Seems like it would make more sense to make the triangles the outer loop, and share all of those initial calculations for each loop over z.
But uh, that amount of branching is 💀
I've seen GPUs choke and die from that kind of looping.
i want my compute shader optimized
That's a fish?
Technically?
nope
Dolphin emulator?
Must've been royalty fish then
Royal tea
dolphin is a mammal
Mammal fish
not fish
nope, whale, so mammal
Enjoy debuging this

i put my mic ontop of my pc to record that
is that 1 bit
ye
I'm seeing trace amounts of a gradient... I'm not sure if that's not because of the spinning tho
oh it was
it has to be cuz of obs and/or the spinnign
You're welcome sam
how many do we need 
144
well, only 30 really
but i want it to be able to run on other hardware than my 3090
i did some off-camera mining 
Is this handling mapping to the spinning boi or not yet?
this compute shader is for turning a 3D model with polygons into voxels
which is the hard part
the fragment shader rn is just doing 1 frame per. but i still need to make it to 24 frames into 24bit rgb
lemme do that rn
temporal bit-plane multiplexing
as the youngins say
I have 3060 so...
I don't even get that much
Hell yeah, caramelldansen!
imma put htis shit on my github i guess
Thanks sam
tasty code, I sure hope I run at frames per second 

to be fair
rn
the compute shader runs at the same speed as the render shader
and in reality it should only be doing 30fps while the other one does 180 or so
perfect for my "5 min after waking up on a sunday" foggy mind. Good Morning.
goodmorning


dont look too close at shell.nix. chatgpt made that
help
ohhhhh that's why my entire system freezes when that one opens
dont use the vulkan shaders
i think that miht jsut be the compute shader
i think most of it is set to work well on a 3090
```glsl
shared Triangle sharedTris[128];
struct TriCache
{
vec3 e0, e1, e2;
vec3 n;
vec3 v1;
vec3 minV, maxV;
float d00, d11, d01, d22;
float rcp_d00, rcp_d11, rcp_d22;
float invDet;
float sqrt_n2;
float ds, dt;
};
shared TriCache triCache[128];
mainly this cache shit is somethign you dont want on a laptop
It suddenly is fine after a bit, wonder what could be happening 
let me check, what make am I supposed to be using because currently this mf on vulkan












aint shii...