#Rosy
1 messages · Page 11 of 1
yeah, as an indie game dev
what kind of games are you making
good luck, that's awesome
nothing unfortunately, I got many ||burnout sessions|| and NIHd a lot since 2024. had unsuccessful team experience. at this point I'm just too deep and prideful to give up and just trying to scramble something
due to where I live my expenses are not that high, so I am able to take the risk. I wouldn't risk it if I were in europe mainland or hell, us
but currently since I abandoned my last NIH project a week ago, a horror game using godot 😂 but I suck at game design and been trying to figure out how I can make something actually fun
horror is a tough one too lol
the atmosphere has to be just right
what genre of game do you enjoy playing the most
unfortunately coop experiences
coop like online multiplayer stuff, or story games that are coop
i am making a horror game too, still building the basic engine tho 
2-4 player games, like those coop job simulation games, or coop horror games like lethal company, phasmophobia, survival games like project zomboid, valheim. there isn't a specific genre that I can pinpoint into
you got this
oh, hey 👋 that I mentioned failed team initiative was also a horror game, mostly a phasmophobia clone, or a phasmophobia variant we wanted. then I got bullied myself into not using an engine, did some rust NIH stuff with wgpu. then zig & odin vulkan stuff recently. there are some breakaway tries to using engines but NIH was strong. I mostly lacked a clear project goal though, so I randomly chose technical issues to pursue instead
like "this is going to be a 3d game that I don't know yet", that vague
I love psychological, dark and ominous horror movies, I haven't ever played a horror game although Metro Exodus and HL2 kind of have horror elements, I played Alien Isolation for a bit, I guess that's kind of a horror game, but probably none of those are what people think of when they think horror game
real
but recently, after the talk with bjorn a week ago, around the same time I saw something on twitter about a local indie/publisher I decided to go all in on engines. yeah I still think it sucks and I hate it but I don't have many choice atm. it is what it is, time to accept it
that's what I would do
psychological horror is cool
if I ever want to make game in the future, I would use Unreal Engine and I would find 2 talented and committed artists to partner with who also wanted to make a small game. If I can't find that I'm not making a game.
alien isolation was so scary, I watched a friend playing it
it's really fun, but I'm not good at video games
and so I cannot do well
I am really greatful for bindless in vk, because I'm currently writing my descriptor set initializer and I had forgotten how horrible the descriptor set API is
same
push constants working
now I have to write a bunch of memory allocators and math code before the pixels can change in an interesting way
my videos always demo a full rebuild to launch as cl doesn't do incremental builds and it's a unity build anyway :P
not that there's a lot going on anyway
it's probably around 1.5K loc of mostly create info struct spam
there's probably a raddbg cli arg to just start the program that would make it faster, since a lot of the initial delay is just me staring at the screen waiting to press F5 on the keyboard
did some research today for how to blit to a win32 window handle, I think it will take me a few tries before I understand how to even draw anything
also I guess I will start on a bit of math first before I start on an allocator, because I can push an orthographic projection via a push constant, and know that my math is working
then resizing the window won't skew the triangle
and that math will help me as a reference the math for the software rasterizer when I get to that
so I have two GP projects now, my C vulkan renderer and my win32 software rasterizer
only the software rasterizer is just restarted every day I do it
and a third is the computer enhance course sorta
have you ever tried that minimal windowing lib for writing diy software rasterizers
blanking on the name even though I've used it in the past
nanokatze has a gist pinned in their gists about resizing vulkan stuff
I haven't ever done a software rasterizer
I have an idea for how it should work based on my understanding of the math and the graphics pipeline
my window resizing works fine, it's just there's no projection of any kind so the NDC coordinates are just naively sent as screen coordinates and that's why I got the tall boi triangle up there I shared recently
I just need to write some math code to do projection and send it via push constant until I have an allocator and can actually create buffers on the GPU
I'm going to fill in all the host allocator stuff in the vulkan api too though, not just for the gpu allocation
those optional allocator parameters when creating things during init etc
I always forget the point of those allocator callbacks in Vulkan
I think it's for debugging drivers
I just want to do it for fun
and to be responsible for all of the memory my process allocates
as the spec says, none of that allocation is happening in the critical paths
It would be interesting to see when the allocator is invoked in the driver
hrm true, I'm curious as to when it wants to realloc
I don't know how hard the memory work that VMA usually does will be
Rosy used Volk, VMA, Dear Imgui, fastgltf, the fbx sdk, and SDL3, and a json library for the level state, and the C++ std lib, which offers a lot
I don't have any of those things anymore
oh and I used the nvidia image library that did all the compression
and mikktspace
hrmm
I got a lot of work to do
so you are writing your own vulkan loader too?
the work for VMA is only as hard as you need your allocator to be
I prefer not having to think about it personally, it's just not that big of a deal to learn about tbh
yes I am writing everything from scratch
I can't write the OS or C runtime or Vulkan stuff from scratch, but everything else
we'll see how it goes with images,
I'd like to load the khronos sample gltfs and render them and and their images correctly, but that's not for a long time
Skill issue. 
making ur own OS 
The GPU driver stack generally makes that infeasible unfortunately if you want to use the GPU
Maybe some insane mf has written a bare metal driver for AMD or something
you order it and you get a GPU covered in fur
pull a jonathan blow
yeah I'm not a jonathan blow, I didn't know what SRGB was like a year ago
I nearly got a blit to my window working in today’s video. After the recording I wanted to see how close I was and got my bytes to color the window within a few minutes. Huge progress from yesterday
What is the fopen referring to?
Is that how you blit to the window? Open the window buffer as a file?
ah looks like a video card thing
dri is linux' way to talk to the gpu
and on lunix everything is a file
Stream from September 16, 2024 at https://twitch.tv/sphaerophoria
Join on youtube for happy hour vods https://www.youtube.com/channel/UCXzL31BCxf8En1KT34gSK6g/join
Or on patreon https://patreon.com/sphaerophoria
00:00 Intro
17:33 Hello world in VM
32:00 Find currently active connector
01:26:15 Find preferred resolution
01:36:40 Draw stuff on th...
I had to play with that at work once. Kinda funky how you can just sorta jack whatever's on screen.
to the gpu driver, right?
I was thinking the trying to build a from scratch to rasterizer in one hour thing was going to be my side project to my main project the C vulkan renderer, but instead it has become the main project, and I think I will only have time to work on my vulkan renderer on the weekened, which is fine. I'm having a lot of fun with the daily task of making more progress. It's challenging too, and feels like real learning.
There's a reason most good graphics courses are built around software raster
At least ones focused on real time foundations
I haven't even gotten to the raster yet lol, I'm still having trouble with basic C and the win32 api, but I am thinking through that problem
one of my favorite projects of learning vulkan was a software rasterizer that evaluated in quads like a GPU and interpreted spirv
I really have to condense everything down and get faster on things as I make progress, I'm going to add time stamps to my video for each milestone to measure how long it took me to get through each step
really learned a lot about spirv there
like just opening up the window took up the full first video
and on the second one it took 30 minutes
and then that'll just keep shrinking
it's also pretty cringe to watch the videos myself, I see all the mistakes I made after I made them, seems obvious afterwards
but with previous projects there any new thing I did was always hard as I did it, but then eventually I moved on to new things I actually would forget how I solved those problems
having to do it again day after day is a really great way to imprint things onto my brain
it's fun though
I don't care if anyone ever watches the videos it's kind of exactly what I want to do right now
but a part of me thinks if I get good and fast enough that the dramatic difference between the first video and the video when I'm actually making good progress will be interesting to people
went from struggling to open a window to building something that does something cool
I am not sure if I explained the thing
Every day I start with a completely empty project and the goal is to get it done in an hour
@cloud rivet I somehow doubt your crashes are due to validation fails
usually the VVLs complaints are irrelevant to the drivers
I see, hrm. I will see if I can investigate the crash more. We can assume it’s not vcc related since I don’t have any direct evidence.
I bet there’s a bug in my arena allocation
It’s an access violation and the size of the vcc spv is much larger than what slang generates
If that’s not it I will learn the spirv spec and try to find a minimal way to reproduce
Correlation is not causation moment
I have learned from my C shenanigans that C++ shielded me from how horribly you can break not just your application but also your computer with bad bugs. Just absolutely bizarre behavior after the application exits.
I think yesterday I put Windows into a bad state with UB
It’s pretty cool
I did that during recording. So there’s video evidence
I figured out after what code of mine was causing all that, not how, but it was obviously UB and fixing it fixed the mess it was making
You can do this with C++ too if you try 
I love computers
the size of the generated code is not a useful metric
also it should have gone way down with the latest commits
it might very well be a Vcc issue, or an nvidia bug, or some weird edge case where both meet
the fact you're using shader objects might have something to do with it
i haven't tried that codepath at all
I will focus on reproducing this weekend
thanks!
no problem, will be fun
I could eliminate SO by using the alternative and see if that works as a first step
I will try that
one of my videos got a like from someone and the views 10'xed, then the next did not and it went back to next to none.
effectively no views, which is fine
just interesting, I think will try to make the videos better though, they are cringe
like why make a video at all if they're going to be bad
I'm going to build like a premiere template or something to make video editing fast, I can't spend a lot of time on that as I hate it, not editing the videos at all right now
I can spend a little bit of effort though and it will probably produce a nice outcome
I really like making the videos and doing this exercise, but it does kind of make my skills and problem solving performance, or lack of, very public
nobody's watching them though, so it's fine
it's cool to see how I made progress from one video to the next
which is why I am doing that
Wait did you link the videos here?
no I haven't shared them, they're very bad right now
they're just going into the YT algo without me marketing anything
well, I don't want you to, but if you want to, I link my YT in my profile, but they are super cringe atm, and will get better
maybe you have some feedback about what I could be doing better. One problem is that I live in a very small place, I do not have my own office, so my family can sometimes be heard in the background
also the first video has no audio
because I didn't know what I was doing
the audio on the second video is also very bad, I bought a new headset
Noise gate
are you using obs?
yes
it blocks voice below certain freq
Not exactly
👀
Obs has a noise gate in the audio filters
oh nice
It lowers the volume when you are not speaking
I saw premiere had audio options too, so I was going to look into those too
like on the weekend figure out a. nice template
It's like the opposite of a compressor
ah
Another tip. Get as close as possible to your mic
oh this one
I was using nvidia's noise reduction thingy
similar to discord's one in function I think
I'll check out your vids in a little bit and see if I have more suggestions
thank you
I was thinking a mic would pick up even more of the surrounding sounds than a headset mic
is that incorrect?
the headset already picks up a lot of ambient noise
maybe there's an attachment or something for that
I don't really want to invest too much into this until I know I want to keep doing it too
this kinda sounds like speedrun training more than anything too useful for programming tbh
yeah kinda, but I learn from it
but it is definitely geared towards taking shortcuts
I'm not trying to be an educator, nobody should learn from me lol, I don't know what I am doing
you can't get good results without being effective, even if it is speedrunning
as long as you are having fun, that's the only thing that matters unless it is your job
yeah
it is fun
it has some moments of stress when things go unexpected
I'm just going to keep doing it until I want to do something else idk
Not necessarily. Mics have different polar patterns, the one you want is cardioid. With headsets its usually a tiny electret mic, which are omnidirectional unless the headset designer carefully designed the housing to make it cardioid which is hard to do in a tiny enclosure
Having a cardiod mic close to your face also gives a natural bass boost which gives people the "radio voice" which might be desired
@cloud rivet background noise isn't too bad, but your headset mic is off axis, so it sounds very tinny
the mic should be basically right in front of your mouth
it could also just be a mediocre headset
Are you willing to buy a new mic? You don't have to spend all that much to get a big improvement
if you have a recommendation for a good mic I would appreciate it! 🙏
how much do you wanna spend?
yeah you don't even need that much
I don't spend money ever, my biggest expense is coffee
the AT2020 is a good workhorse and a solid mic
ok, I will look a that one
It comes in two versions a USB version and an XLR version
I don't know what XLR is, does that require hardware?
if you get the XLR one you also need an audio interface
It gives you more control, and more options if you wanna upgrade later
ah the good ol' red box
I use a focusrite 18i20, which is like a bigger version of that. and an AT3035 which is like an older and slightly higher end version of the AT2020
so if you get the focusrite and the AT2020, it will be pretty close to my setup
idk about mic stuff but you'll definitely want that audio interface if you ever decide to get into electric guitar
Foldable Microphone Boom Arm This adjustable mic boom arm is easy to fold and adjust for the perfect recording angle and height. Just loosen the adjustment knob and adjust. The boom arm offers a 360° rotation and 135° tilt. The angle between the two arms is 180° adjustable. Ideal for stage/studio...
this is the boom arm I use
most electric guitar you hear on recordings is actually just a mic on the guitar cabinet
usually a shure sm57
yeah for studio stuff, for hobbyists who just plug their guitar into their PC they use an audio interface
Sigh, future generations will never know the hearing damage I suffered doing recordings in the Toronto punk scene 
thanks, bought it
yes
no worries, feel free to hit me up if you run into any issues
Oh yeah one more bonus for using the interface instead of the usb version is you get zero latency monitoring
so you can hear your voice coming in through the mic before it goes into the computer
nice
finally making some progress beyond the win32 C stuff
LegendsOfTotalwar is a YT creator who makes videos solving other people’s failing Total War Warhammer games.
Anyway, his thumbnails are amazing. You have to know the game really well to get them, but they are often hilarious.
Being able to stick a joke into a thumbnail that your target audience gets is probably too tier thumbnail game. Would be cool to see if I can figure that out for my thing
I am not going to daily spam this channel with my YT videos btw. Maybe sometimes when I achieve a new milestone
Going to work on trying to get to the bottom of my vcc issues this weekend
bjorn be looking up to his name like a viking metalhead XD
cool
:P
it's even worse when I have a beard
people can probably guess my name is Bjorn without me telling them
ok that's enough PFPs from me lol
I see why you're called Bjorn since you look like that
Nominative determinism?
goal for this weekend is to create pipeline shaders to see if that gets my vcc shaders to run, build out some math, build a premiere template of somekind and write some neovim code to make building bat files easier and faster
the new mic, scarlett and stand work very nicely thank you!
Awesome, can't wait to hear them in action
anyway
awesome, I will shout out ASO :P
for all two viewers, one of which is you
It sounds so much more professional now
yeah it is good
unrelated, I envy you being ~15 years older than me and still having that perfect hairline
I too have long hair but I almost lost half of it
I turn 50 in 17 months, maybe it will all fall out then
I don't even know how to create shaders without SO, I have to like go look that up
I think I sort of know
the only reason I switched to Vk was because some actual game developer convinced me that SO meant VK was now just "fancy opengl"
turns out AMD is just emulating SO and there's a new shader objects v2 draft that's not public yet
I asked on the AMD discord
Current situation is: RADV (Linux) has a "native" implementation of EXT_shader_object. AMDVLK (Windows) just emulates it on top of EXT_graphics_pipeline_library for now.
There is a more fleshed-out shader_object v2 spec in the works, but that won't be ready to talk about for a while yet.
I don't really have a lot going on in palinode yet so it's fine
I think maybe though I don't know if I need a GPU to do graphics and have fun tbh
maybe GPUs were a mistake
you know what's interesting is
yeah I think I'm gonna let vulkan core spec mature a bit more before I go back to using it 
maybe I can use vcc code in my software rasterizer and in a shader
isn't Vulkan as old now as gl was when it hit its last version?
I don't know
and released it on June 30, 1992.
4.6[1] 31 July 2017
not even close
vulkan: 10 years
opengl first release: 1992
4.6: 2017
eh in 15 years it will be
yeah
so 2/3 of the way into opengl's lifespan
I think by the time vulkan is 15 AI will just on the fly generate all the frames
so no more graphics APIs :/
sorry
the vulkan core spec is slowly converging to be closer to opengl's level of dynamicness
which is nice for me, considering I am sorta heavily used to dynamicness
you might like Metal
me with my exclusively windows computers 
same
Me with my diverse cast of computers none of which are macs
Me with two computers that run Linux
I can't own more than one PC and one laptop. I could maybe, maybe have like a little mac mini or tiny little Nuc or something with linux on it
but I don't like having a server farm under my desk or in a closet or whatever like all my coworkers
man I get so nervous before I start recording and I have no idea why, it's just a recording
Are you doing any editing at all
it's a timed thing so no I don't want any cuts
I think?
I'm not sure
I might
I have premiere
I figured out how to cut with it
I am going to play with it this weekend see if I can make a nice template
I was just thinking if you don't mind doing a tiny bit of editing I'd just start rolling but plan on not starting the video right away, give yourself a minute to get situated with the recording on and then begin
just to beat the nerves of the start
that's a good idea
I guess on the other hand just starting immediately into the video does sort of force you to just accept what you get rather than giving yourself room to agonize over the perfect start
ripping the bandaid off
yeah, I just try to leg to and deal with it, I kind of just want to not hide anything about how hard this is for me, or any struggle I have
also sometimes when there's like something not working right I get some anxiety about being able to figure it out, but I force myself to focus
Shoutout to ASO ended up at the end, I forgot until the end, I'll remember to do it first next time. I will add it to my notes. Ended up having the mic too close to my face or the input too high or something and the video has unfortunate and annoying popping, I will get it right next time. :(
High quality testing.
Yeah, when you don't have a pop filter you usually have to make sure you're talking somewhat past the microphone, instead of directly at it.
vcc shader triangle 🎉
I also now have both shader objects and a graphics pipeline
#define push_constant __attribute__((annotate("shady::io::392")))
going to try push constants in a minute and then if that works I can default to vcc instead of slang
and then I can share my math.c code both application and in shader 
maybe I just do rasterization in compute and not use a graphics pipeline, tbh
it's kind of fun
then both projects are very similar and I can learn from both
vcc push constant
my shader:
#include <shady.h>
#include <stdint.h>
float floorf(float) __asm__("shady::prim_op::floor");
float fmodf(float, float) __asm__("shady::prim_op::mod");
float sinf(float) __asm__("shady::prim_op::sin");
location(0) output native_vec4 out_color;
typedef struct {
float t;
} pc_t;
push_constant pc_t pc;
// A naive triangle
vertex_shader void main()
{
float pi = 3.1415926535897932385f;
native_vec4 pos;
int i = 0;
switch (gl_VertexIndex) {
case 0:
{
pos = (native_vec4){-0.5f, 0.5f, 0.f, 1.f};
out_color = (native_vec4){1.f, 0.f, 0.f, 1.f};
break;
}
case 1:
{
pos = (native_vec4){0.f, -0.5f, 0.f, 1.f};
out_color = (native_vec4){0.f, 1.f, 0.f, 1.f};
break;
}
case 2:
{
pos = (native_vec4){0.5f, 0.5f, 0.f, 1.f};
out_color = (native_vec4){0.f, 0.f, 1.f, 1.f};
break;
}
default:
{
// Garbage data detectable with Renderdoc
pos = (native_vec4){7.77f, 7.77f, 7.77f, 7.77f};
out_color = (native_vec4){1.f, 0.f, 1.f, 1.f};
break;
}
}
float x_move_interval_ms = 10000.0f;
float ms = floorf(pc.t * 1000.f);
float test_1 = fmodf(ms, x_move_interval_ms); // value from 0 to 1000;
float test_2 = (test_1 / x_move_interval_ms);
float test_3 = test_2 * 2.f * (float)pi;
float s_x_pos_mod = sinf(test_3);
pos.x *= (s_x_pos_mod/2.f);
gl_Position = pos;
}
my config in my gfx.c
// #define USE_GRAPHICS_PIPELINE
// #define NSIGHT
// #define RENDERDOC
// #define VERBOSE_VALIDATION
#define FIF 3
#define MIN_IMAGES 3
// #define USE_SLANG
#ifdef USE_SLANG
const char *vert_path = "..\\slang\\output\\vert.spv";
const char *frag_path = "..\\slang\\output\\frag.spv";
#else
const char *vert_path = "..\\vcc\\output\\vert.spv";
const char *frag_path = "..\\vcc\\output\\frag.spv";
#endif
I think it's time to make the switch to clang
I saw gobium point it out in a shady issue thread
the matrix types
Currently, the element type of a matrix is only permitted to be one of the following types:
an integer type (as in C23 6.2.5p22), but excluding enumerated types, bool, and _BitInt types whose width is not a power of 2;
the standard floating types float or double;
a half-precision floating point type, if one is supported on the target.
also
// Matrix-vector multiplication (builtin operation)
vec3_t result = m * v;
I won't miss operator overloading in C now
:|
I need to be using this in both projects
I was thinking maybe dumping the from scratch in one hour thing, since I think it's getting in the way of the interesting parts
it was an interesting idea, but in practice, I'm not sure
also I learned more about vulkan today, specifically the difference between the semaphore used in VkAcquireImageKHR and the one used for presenting, and how I only need frames in flight count semaphores for the former and swapchain images count for the latter and why
plus all the graphics pipeline stuff I had never done before
which is not really all that much different than SO tbh
just a bit more struct spam
I think I'll add the software renderer to palinode as an alternative renderer
and then I can reuse the same C code in the shader
and dump the silly from scratch in one hour, but just keep doing the video recordings
hrm I wonder how using clang will impact raddbg though
hopefully not at all
I'm on Ryan Fleury's discord he's complained about clang debug info a little bit 😅
I was using raddbg with zig though with no problem and that was at the time using a llvm backend
the clang version that ships with MS is a little older than what the clang docs have
I'm going to see if I can download that instead
4GB 😅
clang builds noticeably slower
msvc is instant
clang hangs a second
idk maybe not!
you could turn it in software rasteriser series
Yes
I think I will just take the compile time hit and go with clang since that’s what the shaders use. I wonder if I can tell vcc where my clang is so it pulls in the same version and not the Visual Studio version. I will take a look later
I get matrices for free now
I wonder if I can get things like the adjugate of a matrix with it
Adjoint or whatever
I will probably still need some manual matrix math
It also has vectors which I am already using in the shader
vcc wraps clang, I need to understand how vcc works a bit better
Exposing vulkan and spirv as language extensions is brilliant and feels nice
That’s how I can likely get non uniform indexing
definitely share your findings
playing with vcc is one of those things that have been in my backlog for One Day™ forever
it's already been amazing for learning
I didn't know anything about clang or language extensions, so looking into matrices and adding the math operations like fmodf to my shader was an aha moment
that makes the code not portable but also awesome at the same time
I just typed s/less/not/ and Discord updated my previous message 
wdym not portable
yup, that has been a feature for a while
I think clang's language extensions are not portable to other compilers
like I'm stuck with compiling my application with clang if I use them
but I get the benefit of sharing code across my shaders and all the features
is it the kind of stuff you can't write a native equivalent of?
like write some kind of header where you block off the vcc version and the cpu version with preprocessors
yes they are language extension, they add syntax that are not valid standard C
A * B for example for a mat4_t type
there's no way way write that with C without the language extension
oh
I see what you're saying
that feels rough
it's pretty much standard procedure when sharing CPU-GPU code
hell it's pretty standard for C/C++ everywhere
how do you think libraries are cross platform
well I don't have to do any of that if I use clang
the non-clang version of the math would explode the LOC where it could all be on one line with clang
idk
ok I will correct: still possibly portable, if I am not lazy 😅
gcc does have the vector extension at least
do you use a swizzle ext? that's the main thing I can think of that might be a definite no go
it's part of Clang's opencl support stuff
but you can also use nasl instead, or glm or whatever you want
👃 NASL is not a shading language 👃. Contribute to shady-gang/nasl development by creating an account on GitHub.
nasl has swizzles implemented with C++ fuckery
the native types are just there to do i/o
for our path tracer sample application, we use nasl on the host too, it's just plain C++
oh neat, looking at the swizzling code now
I never really thought it was possible without a compiler ext
that was a one-day C++ hacking adventure 🐸
from a guy who doesn't really care about C++
I was like can I do .x ?, and then it worked! vcc ❤️
btw while you're here, what are your thoughts about the builtin cuda target for clang
i have no thoughts ig
you'll get it with the Clang language extensions and the native vectors, but you'll have to use clang on the host too
rip, I was thinking you might've investigated it as part of your research, I only found out it existed recently so I was wondering how it held up considering that hypothetically it'd encounter the same issues as you did
yes, I will use clang
what ? no, afaiu it only targets PTX
just an exercise in nvidia mainlining their needs
again afaiu
I noticed those cuda extensions yesterday after looking through the language extensions, was surprised, it I got the sense of clang scope creep, but I don't know
yeah ptx is what I meant, but didn't you have issues with needing to represent structured control flow to the compiler
my main curiosity is the codegen quality
nvidia doesn't use structured control flow
they have explicit masks used in their intrinsics
I see, so they shouldn't have any issues generating equivalent quality code to vcc
or vice versa
it's such a different model a comparison is hardly useful
and these things aren't really to do with code quality, they're more correctness at this point
being able to carefully construct programs that take advantage of insight into GPU topology
but yeah there is an open problem here with vcc, and i'm waiting for clang to ship their new reconvergence tokens
shady is ready for maximal reconvergence otherwise
yeah that's what I meant by quality, initially I was thinking there was a chance they'd generate "CPU-y" code, which in my mind is low quality, assuming it produces the correct result otherwise
those are in 21?
idk
what is cpu-y code
code generated with the assumption that it's running single threaded on a CPU
there's no real difference when looking at IR, it's all scalar at the levels I or CUDA is working at
with some intrinsics thrown in
scheduling branches, vector/scalar regalloc and masking codegen is the responsibility of device-specific JIT compilers in drivers
I see, so you don't really do any major transformations to the code to make it more "GPU aware", you just make all those annotations available to the user and make sure everything's plumbed through?
e.g. what's uniform/non uniform, where barriers are inserted, etc.
I'm just trying to get a rough feel for what I should expect a shader/kernel compiler to be responsible for that runs the risk of surprise shitty codegen
but it might not be meaningful, because I don't think glslang or dxc do anything fancy to your shader code beyond translating your intention to spirv, right?
well if there is some way to write some poorly performing code in the context of this project and vcc I am sure to find it
hm yeah kinda
some of the most important performance aspects are decided by the downstream compiler, so if you upset it by generating something it doesn't deal with well, you can hurt 🅱️erf
e.g. register pressure it can't optimize away
neither does vcc, conceptually
shady is complicated for completely annoying, bad legacy reasons
that's part of what made publishing difficult
a lot of the work that went in it is overcoming "artificial" obstacles we as an industry made for ourselves
by separating out gfx/compoot and never really moving away from the crappy self-limiting mindset of early SLs
large parts of SPIR-V for Vulkan just restate glsl limitations in an ssa ir formalism
that makes sense, good to hear that it's "only" legacy crust and not something that requires a bunch of additional compiler optimization work just to hit par with existing tools
definitely makes adoption way less risky
hrm, probably some risks since you can do things that aren't native to the ir like function pointers and generic pointers
but if you don't use those those things then you can hit par, but then that is kind of defeating the purpose of vcc and using a real language
there's way more reasons to not use glsl/hlsl even without those features
full power C++ TMP for one
slang's generics feels nicer than C++ templates fwiw https://shader-slang.org/slang/user-guide/interfaces-generics.html
I haven't used templates much in C++, only a little though
probably not comparable to "full power C++ TMP"
Didn’t do a speedrun training thing today and felt much better and fun.
I tried my hand at the Clang language extensions ext_vector_type and matrix_type and I had unrealistic expectations of how all that would work, but I like them.
I expected to be able to have some kind of nice way to write a matrix literal, but the initialization for a matrix type is a TODO in the draft so I made a helper function for readable matrix literals, for that so that's no problem, I think maybe the compiler will be kind with that and optimize it out so there's no run time cost, I could check godbold, but I don't really actually care right now.
I expected to be able to multiply a matrix with a vector (lol) and that's a no go, but it's trivial to convert a vector to a column matrix with __builtin_matrix_column_major_load.
the matrix type is definitely a WIP, but this is a for fun project, crazy stuff goes, I'm gonna make it work. I really need to read all the fine details of the language extension details. I was like learning on the go while recording and that was hard.
I'll have to write an inverse function and start building out a math library. I can't actually assume anything about internal memory layout since the draft says that is implementation defined, so I am not sure what I can assume about how this would work when I send data from host to the GPU and use the matrix_type in both the application and in my vcc compiled shaders.
I have to still carefully read the language extension docs to a complete picture and avoid mistakes and problems.
being able to do A * B is really nice and A * toMat(v) is also ok, and being able to swizzle vectors is gold
if I wrote my own math library without these language extension types I think it would be a worse experience
this is one for the pipeline
I'm just trying to have fun and share what I'm doing but it comes across as quasi educational and nobody should take any words of mine as something to learn from
I think I need this https://github.com/stevearc/conform.nvim
"learning software rasterising day1"
yeah I hedged and provided lots of disclaimers, but anyway I feel nothing but relief now
that's good
this focusrite thing has been amazing though, my audio out, not just my mic is so much better
it's good ye
i have it set to format on save
yes, that's what I will do too
having to indent everything manually felt like I was using notepad.exe
or xcode 
I mean, I know how to do block selection
yea but not just indents, honestly i just type complete garbage and let the autoformatter handle it
same
a cpu software rasterized triangle blitted to the HWND device context using the clang opencl vector language extension types
ok going to figure out how to do that on the gpu via vulkan compute dispatch
I need to figure out how to write vulkan gpu memory allocation code that I previously just used VMA for
I think I should be able to reuse some of this code with vcc
I was thinking after I do the gpu raster, I would do a RT version too
then I'd have a triangle via cpu rasterization, gpu compute, graphics pipeline and RT, and then maybe I just work my way up in parallel across those, idk
RT via vulkan apis, actually don't know how good my gpu is for that
it's kind of dated
NVIDIA GeForce RTX 3070
added interpolation to my software renderer
I love clang's vector type
I didn't even have to look it up, just used the ol brain and my understanding of barycentric coordinates
and of course I knew to convert to linear and back to srgb
it's actually a mix, not interpolation
well, I mean I have a lerp I wrote, and a mix and I used the mix, anyway
fun
it's kind of cool, this is all me. this isn't me passing vertices through a vertex shader and it doing the math for me, this is all my code, rasterizing and rendering. I wrote all of those pixels
the background colors oscillate that's why they look different from before
that's kind of an interesting artifact along the edges
I have no idea what that is
it's the video
it's not in the actual render
hrm it is
looks like aliasing
Once I have all the triangles I will start on an imgui
Just some text, text input and a slider to start
ASCII only
Once I have image creation working with vk I will blit my software raster bitmap to vk instead of the win32 HWND
then I can build the imgui once and it will work regardless of how anything rendered
honestly yeah it's pretty nice to write
except when it isn't, there are times i do miss having operator overloading and generics
but the simplicity is good
the clang vector & matrix types let me use math operators at least
but they are draft stage and clang specific language extensions
there's a vector type language extension that is compatible with a gcc extension though, I'm not using that one
alright time to start on creating images and buffers in vk, I have always referred to VMA docs I don't think I ever read much of the vk spec on memory I will start with that
just two chapters, 11 & 12
ah there's an appendix for memory also
references to Google's Fuchsia OS in the vk spec heh
I'm just going to start by creating a staging buffer and a device buffer pushing a small value to it and trying to read it from my shader and see what happens
using renderdoc to debug what's on the device and visible to the shader
the original vk 1.1 vulkan tutorial has some example code too
looks like you create a staging and device buffer, allocate some memory for a staging buffer and a device local buffer, bind the memory for each of the buffers, and then map memory to staging memory and then copy memory from staging buffer to device buffer
it doesn't look like much but it's just a proof of concept I got working, Pushed this triangle to the right with a float value I put into a buffer which is made known to the shader via push constant bda
so now I can allocate gpu memory, create vk buffers, map data to host visible buffers, copy data from my staging buffer to the gpu memory, bind memory, create a BDA and use it in a vcc compiled shader
only thing left is creating an image, I will work on that tomorrow
then I'll blit software rasterized bits to vk, and then I'll work on the compute software rasteriser
it's kind of lame how my render is just a triangle, and will be for a bit as I put in more scaffolding. It'll get start looking better though soon enough
I want to figure out the compute rasterization and the RT stuff
then I'll work on more interesting looking problems
I have a hard time understanding some aspects of vulkan sync, and haven’t gotten to the point where I intuitively understand when there’s a hazard
I have this execution dependency I resolved with an extra barrier that exists only because I get VVL sync errors telling me I need it. The message is bizarre
It said I needed a VK_PIPELINE_STAGE_2_NONE as a destination stage
I resolved it just with an extra barrier that just matched the stages I was actually using and I already had a barrier for
It seemed to say that that the memory barrier that transitioned the swap image after acquiring it was a write after read hazard
why would another barrier solve that?
I guess maybe because it is just an execution barrier it ensures the write doesn’t start until the read from acquire is done. I should read about acquire more
Well
That explains it
TIL
I guess I just accidentally got this right in my previous project
Vulkan sync 
The vulkan tutorial doesn’t have this barrier
It just has two
Hrm
Oh that makes sense though
It just presents
Ok ok
I can't stand it when I don't understand why something in my code works, I should be able to understand why I need every barrier I have, and now I do and the world makes sense
I just had to write it out
it's not reading from the acquire though, it's the presentation engine that is doing the reading
I guess assumed that the acquire handled all of this, like the image is officially "available" but I guess it isn't, not really. Acquire next available image that you still additionally have to make sure is actually available
got it
ahh, I like put the diff of the before and after with the sync and the operations I am making side by side in two text editors and I get it now
what I need is a way to print this out as debug info
so I can see it just like this
clearly I was missing a barrier
vkAcquireNextImageKHR
vkBeginCommandBuffer
image_barrier(cf.vk_cmd_buffer,
swapchain_image,
VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_GENERAL,
VK_IMAGE_ASPECT_COLOR_BIT,
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_NONE,
VK_PIPELINE_STAGE_2_CLEAR_BIT,
VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT,
VK_ACCESS_2_TRANSFER_WRITE_BIT);
{
vkCmdClearColorImage
vkCmdBeginRendering
// bind SO, configure graphics pipeline, push constant, draw triangle
vkCmdEndRendering
image_barrier(cf.vk_cmd_buffer,
swapchain_image,
VK_IMAGE_LAYOUT_GENERAL,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
VK_IMAGE_ASPECT_COLOR_BIT,
VK_PIPELINE_STAGE_2_CLEAR_BIT,
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_ACCESS_2_TRANSFER_WRITE_BIT,
VK_ACCESS_2_NONE);
vkEndCommandBuffer
vkQueueSubmit2
vkQueuePresentKHR
and now it's
vkAcquireNextImageKHR
vkBeginCommandBuffer
image_barrier(cf.vk_cmd_buffer,
swapchain_image,
VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_GENERAL,
VK_IMAGE_ASPECT_COLOR_BIT,
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_PIPELINE_STAGE_2_CLEAR_BIT,
0, // src access
VK_ACCESS_2_TRANSFER_WRITE_BIT // dest access
);
vkCmdClearColorImage
image_barrier(cf.vk_cmd_buffer,
swapchain_image,
VK_IMAGE_LAYOUT_GENERAL,
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
VK_IMAGE_ASPECT_COLOR_BIT,
VK_PIPELINE_STAGE_2_CLEAR_BIT, // src stage
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT, // dst stage
VK_ACCESS_2_TRANSFER_WRITE_BIT, // src access
VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT // dest access
);
vkCmdClearColorImage
vkCmdBeginRendering
// bind SO, configure graphics pipeline, push constant, draw triangle
vkCmdEndRendering
image_barrier(cf.vk_cmd_buffer,
swapchain_image,
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
VK_IMAGE_ASPECT_COLOR_BIT,
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT, // src stage
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT, // src access
0);
vkEndCommandBuffer
vkQueueSubmit2
vkQueuePresentKHR
I just need a way to print this ^^
I'm going to add a way to debug print my render graph
just so obvious it was wrong 
I thought I had cleaned up my sync, maybe I accidentally deleted a barrier when I added the graphics pipeline code
the or with 0 is also funny
| VK_PIPELINE_STAGE_NONE lol
jfc
I'm just going to use a linked list for my first iteration of my render graph tbh
using my per frame arena, just recreate it every frame for now, just to get something going
and then interate
this doesn't actually allocate any new memory, since the arena allocation happens at application start and I just reset it every frame
this is better because it is declarative, I don't have all the vk struct spam and api calls to scroll through to see what I'm doing
I'm not sure, will see
just sort of outlining how it might work in code this isn't functional yet
it works :D
no sync vvls or anything
now extending my render to do more complex things is just extending the graph
and adding additional graph node types
and handlers for them
way better than a giant render function where image barriers are 5 pages of code apart 😅
adding a blit for my software rasterizer tomorrow, then add a draw image and then add a software rasterizer in compute, then an RT triangle and then start on 3D stuff
I am going to move the graphics pipeline to use a mesh shader
I am on the ultimate single triangle render bike shedding journey
This is an enterprise triangle
fake, not written in java
Enterprise java makes me want to drive a boat off a bridge
True
Time to rewrite in Java jk
I will use uniscribe for text
Shameless win32 only
Wine supports uniscribe
Maybe I try and get my project to work on steamdeck with proton at some point
Now I am no longer ascii only
I wish I had known that you can put a break point in the validation debug handler a year ago
that is so incredibly convenient
the debug handler is synchronously fired where the error happens and it is in the call stack
so you immediately know what is wrong
I really feel like I hit a stride with my vulkan sync now. I added a per fif draw image and clear and render to it now and then blit it to the swapchain image, wrote out how the image barriers worked and got it mostly right. I did get the read and then write access masks wrong for the barriers that sandwich the dynamic rendering but I understood the issue, I never felt lost, was easy to fix
it's so easy to see what's going on now too
// Wait for swapchain image present
render_node_cfg_t acquire_barrier_cfg = {0};
acquire_barrier_cfg.image_barrier_cfg = (image_barrier_cfg_t){0};
acquire_barrier_cfg.image_barrier_cfg.old_layout = VK_IMAGE_LAYOUT_UNDEFINED;
acquire_barrier_cfg.image_barrier_cfg.new_layout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
acquire_barrier_cfg.image_barrier_cfg.aspect_mask = VK_IMAGE_ASPECT_COLOR_BIT;
acquire_barrier_cfg.image_barrier_cfg.src_stage_mask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT;
acquire_barrier_cfg.image_barrier_cfg.src_access_mask = VK_ACCESS_2_NONE;
acquire_barrier_cfg.image_barrier_cfg.dst_stage_mask = VK_PIPELINE_STAGE_2_BLIT_BIT;
acquire_barrier_cfg.image_barrier_cfg.dst_access_mask = VK_ACCESS_2_TRANSFER_WRITE_BIT;
acquire_barrier_cfg.image_barrier_cfg.image_type = image_barrier_type_swapchain_image;
current_node = add_render_node(actx,
arena,
current_node,
"wait for present read on swapchain image before blitting to it",
render_node_type_image_barrier,
&acquire_barrier_cfg);
// Prepare draw image
render_node_cfg_t reset_draw_barrier_cfg = {0};
reset_draw_barrier_cfg.image_barrier_cfg = (image_barrier_cfg_t){0};
reset_draw_barrier_cfg.image_barrier_cfg.old_layout = VK_IMAGE_LAYOUT_UNDEFINED;
reset_draw_barrier_cfg.image_barrier_cfg.new_layout = VK_IMAGE_LAYOUT_GENERAL;
reset_draw_barrier_cfg.image_barrier_cfg.aspect_mask = VK_IMAGE_ASPECT_COLOR_BIT;
reset_draw_barrier_cfg.image_barrier_cfg.src_stage_mask = VK_PIPELINE_STAGE_2_BLIT_BIT;
reset_draw_barrier_cfg.image_barrier_cfg.src_access_mask = VK_ACCESS_2_NONE;
reset_draw_barrier_cfg.image_barrier_cfg.dst_stage_mask = VK_PIPELINE_STAGE_2_CLEAR_BIT;
reset_draw_barrier_cfg.image_barrier_cfg.dst_access_mask = VK_ACCESS_2_TRANSFER_WRITE_BIT;
reset_draw_barrier_cfg.image_barrier_cfg.image_type = image_barrier_type_draw_image;
current_node = add_render_node(actx,
arena,
current_node,
"reset draw image before clearing the color",
render_node_type_image_barrier,
&reset_draw_barrier_cfg);
current_node = add_render_node(actx, arena, current_node, "clear color", render_node_type_clear_color, NULL);
render_node_cfg_t clear_barrier_cfg = {0};
clear_barrier_cfg.image_barrier_cfg = (image_barrier_cfg_t){0};
clear_barrier_cfg.image_barrier_cfg.old_layout = VK_IMAGE_LAYOUT_GENERAL;
clear_barrier_cfg.image_barrier_cfg.new_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
clear_barrier_cfg.image_barrier_cfg.aspect_mask = VK_IMAGE_ASPECT_COLOR_BIT;
clear_barrier_cfg.image_barrier_cfg.src_stage_mask = VK_PIPELINE_STAGE_2_CLEAR_BIT;
clear_barrier_cfg.image_barrier_cfg.src_access_mask = VK_ACCESS_2_TRANSFER_WRITE_BIT;
clear_barrier_cfg.image_barrier_cfg.dst_stage_mask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT;
clear_barrier_cfg.image_barrier_cfg.dst_access_mask = VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT;
clear_barrier_cfg.image_barrier_cfg.image_type = image_barrier_type_draw_image;
current_node = add_render_node(actx,
arena,
current_node,
"wait for clear color write before rendering to draw image",
render_node_type_image_barrier,
&clear_barrier_cfg);
current_node
= add_render_node(actx, arena, current_node, "begin rendering triangle", render_node_type_begin_rendering, NULL);
now that I have a draw image per fif I can take my software raster bits and upload them via my staging buffer and then copy the bytes to the draw image
and after that I'll start on the gpu raster which will also write to the draw image when it's being used
I'm not sure how RT will work I don't know anything about RT yet
it may seem a bit silly but what I basically end up with is a way to draw from all sorts of places, I could combine them all into one present, I'm not going to be limited to just drawing from the graphics pipeline anymore
and it just ends up being all about how I set up and configured the render graph
Claude says that copying my bitmap to my staging buffer to copy to my draw image which is blitted to the swapchain image would be more efficient than using StretchDiBlit because I am bypassing the window stack. I guess I will find out
I will keep StretchDiBlit around to measure
That entire bitmap has to go to the GPU every frame anyway since that’s what my monitor is plugged into. Why would me manually copying to the GPU be slower
It’s gotta be comparable
If it’s faster that’s even better
Just gotta measure
I will need per fif staging buffers
Or regions
I could rasterize in threads and always grab the latest finished one and copy it over
sounds like you got some exceptionally weird advice
StretchDiBlit is GDI software blitting, as in on your CPU
I think you either got a hallucination or a regurgitation of weird reddit misinfo
No I didn’t provide the full context here. I asked about a comparison of the entire process
A comparison of using StretchDiBlit to the HWND device context to just copying that same bitmap via vk to my swapchain
Like end to end either way it has to get to my screen
Through the GPU
Right
also don't forget there's a compositing step
I expect both to take roughly the same amount of time
What is the composting step?
where your OS blits the swapchain image onto the actual image of your entire monitor
alongside all the other windows
The bottleneck has to be the massive data transfer though right
Like any of this other stuff probably doesn’t matter?
PCIe is wicked fast
the problem I am seeing here is what timeline you're burning
if you're using StretchDiBlit you're burning CPU time on a ram-to-ram write
that will have to be uploaded anyway
I just don’t want my vk process to be slower than using SterchDiBlit to the dc and whatever that does
It sounds like you’re agreeing with claude
yeah I thought it told you the opposite
So me doing this through vk is actually a good plan
also if you really want to needlessly minmax this you can look into stuff like your GPU's transfer queue
and issuing a copy command vs writing into host visible memory
Hemm
not faster, they keep it off your main gpu timeline
Oh right
so you can transfer while doing other work
Exciting
also to fill your head with more pointless things to try to optimize something that barely matters - vblanco reported that you can get surprisingly faster results reading from device-visible host memory in a compute shader and doign your "copy" that way
albeit that was more in the context of buffers
for images I don't think it's necessarily the same story
Nice
I plan on also adding a GPU rasterizer so I can just repurpose it
For the cpu software rasterized bits
It’s just rendering a triangle but this renderer is already so much better than what I had in Rosy for so many reasons
The only that thing has that this doesn’t is MSAA which is trivial to add
Also it has mipmaps, I need to add that
No point to do that until I write a png decoder though
The png,jpeg and JSON code (for GLTF) will be the least fun code I will write I hope
Maybe I need exr and compression code also though
I would just only support uncompressed tiff personally if I were forcing myself to go without libs
and good ol' bmp
Nah, Id just use Windows Imaging Component if I wasn’t doing everything from scratch
I wrote my own readers for ktx and dds now that I think about it
it's just headers + binary data
ktx is even easier because the headers are slightly less dumb
and working from khronos specs is nice
I am going to learn a lot from doing all this from scratch
I am looking forward to the serotonin I get once I see a texture render that I decoded myself
It’s just the process will be the least fun since it all has to work before I can see anything
work end to end
The only not from scratch is the text, it’s just way too hard
I will call out to windows for it
yeah that's probably fair, cleartype is very nice as well
Even handmade hero code uses windows for fonts
man my structs are a crime against computers, the laws of nature, and good programming
yes team fat struct
I am not alone
I see similar structs in hmh code
oh I figured out some new C stuff
I have a common.h
which had a bunch of my majorly big structs
I wanted to separate out their definitions into files because they were sort of piling up in common.h
and I am using C style typedefs since I am writing C
and wasn't sure how to forward declare those
but I figured it out, I just typedef them where I forward declare them and then actually define the struct elsewhere
struct DescriptorSetAllocator;
typedef struct DescriptorSetAllocator DescriptorSetAllocator;
struct GraphicsContext;
typedef struct GraphicsContext GraphicsContext;
struct AppContext;
typedef struct AppContext AppContext;
struct CPURasterContext;
typedef struct CPURasterContext CPURasterContext;
so I reduced the struct spam in my common.h
I love C, it is fun
I haven't missed the std lib at all or raii, I just have a couple of arenas and I never worry about memory ever
I allocate 512MB of memory in two arenas, way more than I currently need, and nobody came and arrested me and my app is fine
I wonder why they have various aliases for static
I guess it communicates intent, but I didn't know what it meant at first
it's just to make it more readable yeah
sorry I thought you were referring to internal
it puts it into rodata
that and global
and local_persist
I think these are Rad Game Tools idioms
all the people who have worked at Rad Game Tools write very similar C
and now they're all like super influencers in low level and game dev programming communities
so it has spread
programming influencers 😩
they are
but in a good way?
they're on the opposite end of the spectrum as those hardcore OOPers like uncle bob
yes
❯ Measure-Command { cmd /c 'build.bat' }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 850
Ticks : 8506065
TotalDays : 9.84498263888889E-06
TotalHours : 0.000236279583333333
TotalMinutes : 0.014176775
TotalSeconds : 0.8506065
TotalMilliseconds : 850.6065
my current build time from scratch
just 4k lines of code though
I think I can just have it print out the milliseconds
❯ Measure-Command { cmd /c 'build.bat' } | Format-List -Property Milliseconds
Milliseconds : 790
idk
I actually don't know who uncle bob is
"clean code" guy
I have two copies of that book
one for me and one I wanted to give to someone else so they would learn what was in it
I have learned since then it was all really not good advice
but when I first read I was all into it, I don't know like 10 or 15 years ago
ah
I didn't know he had that nickname
he's not really my uncle in case you were wondering
I got a jury summons
I haven't ever been called before
but I gotta go monday
:\
I hope I can get out of it
maybe they see my long hair and go, nope not that guy
there will be a bunch of candidates and they (the lawyers) will ask you all questions to eliminate jurors that might be biased
so there's that, or you can just whisper "jury nullification" into one of the attorney's ears
lmao
Judges hand out contempt of court charges like giving out candy on halloween. I am going to behave I think
but then you could brag to everyone about your rap sheet
just keep the bragging to your close friends then
I'm jk. Good luck out there
I like to frame jury duty as a novel diversion rather than a boring thing you're being forced to do
Reframing is a powerful tool
yes
Your odds of actually being chosen are very low
Generally you just go sit in a waiting room and then they tell everyone the defendant pled guilty once they saw the jury candidates coming in and the reality of the situation hits them
Either that or they select their jury and everyone else goes home
And in the worst case if you actually get picked then you have paid leave to step out of the workforce and do your civic duty which is neat
Also by living in Alameda county you have the distinct advantage of being close to your courthouse, for some reason in Santa Clara county the courthouse me and all my family members have been summoned to is in Morgan Hill 💀
the lawyers give you really easy outs
the question I got was "would you take a cop's testimony over a civilian's" and I just said yes and that was that
you can go online and submit a request to change the location
Is it weird that I kinda want to do jury duty?
I actually haven't lol
But I just think its like actually pretty cool that the average person has the opportunity to participate in legal proceedings
Like if they don't pick you, someone else is gonna end up on the jury, do you trust that person?
I mean granted maybe things are a little different since I live in a country where judges still occasionally wear powdered wigs 
Yes I agree, unfortunately my work will suffer and I will come back to a bad mess. But I agree jury duty service is a good thing overall
I would like to track images and buffers as stateful objects in my render graph and have their uses update the stages and masks automatically instead of still doing it declaratively
Hrm
I am not going to do that yet
Or maybe I should before I add more stuff
Should not be hard
there is a saying that in a jury trial, your life depends on 12 guys who were too dumb to get out of jury duty
I wouldn't wanna do it personally because it's pure tedium, but yeah it's absolutely wonderful you can have your fate decided by normal people and not a handful of lying sophists
but in practice the odds you'll be some kind of rare hero of justice in a blazing hot trial are extremely slim
I also just prefer not being around people, especially people I don’t know
Me too generally but this is a rare opportunity to participate in government with a random assortment of your fellow citizens
Getting on an actual jury is a once in a lifetime experience
Yeah I was kinda disappointed when they dismissed us all
I got a letter telling me I was on the rotation but I haven't been summoned yet
I'm going to get this software rasterizer bitmap to vk thing going, and then I'll mess around with tracking image/buffer sync state. My brain wants to believe that I should be able to read the spec and learn from that what stage and access mask operations I do require and with tracking previous sync stage and access know what the next one should be based on the operation and that should just work and sync should be a mostly solved problem
and I won't even need to declare these things manually anymore
hrm
with buffers I think the offset or memory range of the buffer accessed is part of the resource that needs to be tracked, not just the buffer
anyway, gonna get this software raster copy thing working onw
a lot of engine bikeshed & fixed investment taking place before I get passed just a triangle
the fuck
I use photoshop rarely, and yet somehow creative cloud uses more memory than anything else
how do I even exit this program
what's it even doing
ok
that did it
just disabled all of those
so, I think maybe the first image format I implement is uncompressed exr
I was looking at OpenEXR and built the library and the tools and am able to take a polyhaven downloaded EXR and remove the compression from it with the openexr tools
C:\Users\Bjorn\Downloads>exrinfo charolettenbrunn_park_4k.exr
File 'charolettenbrunn_park_4k.exr':
compression: 'piz'
displayWindow: [ 0, 0 - 4095 2047 ] 4096 x 2048
dataWindow: [ 0, 0 - 4095 2047 ] 4096 x 2048
channels: 4 channels
'A': float samp 1 1
'B': float samp 1 1
'G': float samp 1 1
'R': float samp 1 1
C:\Users\Bjorn\Downloads>exrmaketiled.exe --help
C:\Users\Bjorn\Downloads>
C:\Users\Bjorn\Downloads>exrmaketiled.exe -z none "C:\Users\Bjorn\Downloads\charolettenbrunn_park_4k.exr" "C:\Users\Bjorn\Downloads\charolettenbrunn_park_4k_uncompressed.exr"
C:\Users\Bjorn\Downloads>exrinfo charolettenbrunn_park_4k_uncompressed.exr
File 'charolettenbrunn_park_4k_uncompressed.exr':
type: 'tiledimage' compression: 'none'
tiles: size 64 x 64 level 0 (single image) round 0 (down)
displayWindow: [ 0, 0 - 4095 2047 ] 4096 x 2048
dataWindow: [ 0, 0 - 4095 2047 ] 4096 x 2048
channels: 4 channels
'A': float samp 1 1
'B': float samp 1 1
'G': float samp 1 1
'R': float samp 1 1
tiled image has levels: x 1 y 1
x tile count: 64 (sz 4096)
y tile count: 32 (sz 2048)
oh there's a cubemap option instead of tiled
I can actually just work my way from uncompressed to zip (which I can reuse for png) to piz (which uses huffman which I can reuse for jpeg)
exr is really well documented
or maybe I go with png since there's working examples in hmh and stb_image, and stb_image also jpg
and then I try exr, exr looks trivial though
especially with no compression
idk I have other things to do before then
openxr's decompression code is all in C
and well documented
they have huffman and zip code too just as readable and documented
noice
that's cpu raster to vk
that was more work than I anticipated, and, I got a bunch of validation errors, but I got it working, now have to clean it up
it's cool how blitting an image does format conversion
I actually need to stop making my blit map ARGB since I no longer have to play by win32 rules
a benefit I hadn't anticipated when I planned all this
I have changed my mind on all this image & json stuff
I spent months on rsy asset format and that asset pipeline
I am just going to use it
It generates mipmaps and BC compression
kind of cool being able to see my software raster in renderdoc now
obviously not helpful with debugging the render, but I can now see all the color data
there were a lot of unforseen benefits to changing it to sending it to screen via vk
my bitmap is now using f16 half floats like 1.f16
typedef _Float16 f16;
typedef float f32;
typedef double f64;
typedef struct {
f16 r;
f16 g;
f16 b;
f16 a;
} vk_R16G16B16A16_sfloat;
u64 *pixel = (u64 *)actx->crctx->bitmap_bits;
{
int quad_row = 0;
for (int y = 0; y < actx->crctx->bitmap_height; y++) {
for (int x = 0; x < actx->crctx->bitmap_width; x++) {
vk_R16G16B16A16_sfloat color = {1.f16, 0.f16, 1.f16, 1.f16};
memcpy(pixel, &color, sizeof(vk_R16G16B16A16_sfloat));
pixel++;
}
}
}
totally works
am I spamming this thread too much 
ok, I honestly don't know why I post so much, when I wasn't on discord I was trying to put all my thoughts in my notes app
and I never really did
although I have switched to obsidian
and it's much better
I view posting on discord as basically interactive notetaking
it's more painful to go back and read the "notes", but usually you don't need to do that anyway
yeah, it's sort of like rubberduck debugging
the act of posting and discussing solidifies the ideas in your brain
I'm going to work on tracking sync resource state so I don't have to declare it
then I will finally start on gpu software rasterization
last week was a lot of scaffolding work
just because it's interesting, and I think valuable maybe? because of vcc I should just be able to reuse the same code I think? or write the code to be reusable?
could be an excuse to implement some stuff you can't normally get with ff rasterization
but I'm out of ideas rn
what's ff?
oh I see
yeah
I hope to find all the edge cases in vcc and annoy gob as much as possible with github issues
jk
I think gob would like that actually
nah I just really am looking forward to writing a bunch C shader code
you will be a valuable test subject
maybe at some point I can contribute a PR once I understand it better who knows
for vichichi
the code is clear enough that even I, someone with no background in compilers, could hack some changes into the glsl backend
yeah it's all been pretty readable
for the spir-v backend you'd have to be familiar with spir-v though
I have tried writing a little toy compiler once with yacc/bison one time
that was a long time ago
nice
I think this will work for tracking image sync state
typedef enum {
image_barrier_type_no_image,
image_barrier_type_swapchain_image,
image_barrier_type_draw_image,
} image_barrier_type_t;
typedef struct {
VkImageLayout layout;
VkImageAspectFlags aspect_mask;
VkPipelineStageFlags2 stage_mask;
VkAccessFlags2 access_mask;
image_barrier_type_t image_type;
size_t draw_image_index;
size_t swapchain_image_index;
} vk_image_sync_state_t;
going to try this
I don't want to stick the actual VkImage on it
I'm not sure
just going to try this
hrm
actually I should stick this on my vk_image_t
typedef struct {
VkDeviceMemory vk_memory;
VkImage vk_image;
VkImageView vk_image_view;
VkExtent2D vk_extent;
vk_image_sync_state_t sync_state;
} vk_image_t;
and remove those indicies from the sync state
I can actually now just keep track of it without changing my render graph and then transition to referring to its state as an iterative step
and see if that works, and if it does I can remove the explicit barriers and add the barriers on the render graph nodes that do operations that would add hazards
searching through the spec to see how the sync required for operations I am doing is documented, for example for blit and copy it says
All copy commands are treated as “transfer” operations for the purposes of synchronization
barriers
I feel like when something operates on an image the spec should be really clear on these things
so when I do something that takes an image or a buffer I should know what layout it requires and what stage it happens in and what access masks are needed/impacted
that's not really sync, but that's the kind of details spec should also have for sync imo
these things are just haphhazardly spread throughout the spec with no clear single place to figure it out
like sync is so important they created a validator for it
but then in the spec they just winged it
the "must be synchronized with" is just in chapter 38, 39 and chapter 4
I understand, the nature of which stages an operation occur in depend on how they've been configured
the Pipeline stage section in chapter 7 is the key then
sort of
my image sync state render graph works yay
I pass in my vk_image_t to my image barrier helper function
it reads the old layout, stage mask and access mask, and then writes the new layout, stage mask and access mask and that works
no VVL sync errors
I do have to update the vk image access masks after render begin since that writes to the image given without an image barrier
and after acquiring a swapchain image I set its access mask to none
no VVL, no more hard coded image barriers where I manually have to hard code image states
it's all programmatically managed now
no ALL_COMMANDS
it's just a single queue render with a very simple render, I'm sure this gets harder
but this is so much better than any vulkan renderer I have written so far, which was just a 2000 line monolitic typed out render function in Rosy lol
I can finally work on gpu software rasterization and work on rendering stuff now
a two week vk scaffolding detour
added keyboard input handling and can now switch between graphics pipeline render graph, shader object render graph and cpu raster render graph by pressing a number, going to add gpu raster next
the graphics pipeline graph and the shader object graph just render the exact same thing right now so you can't tell that's changing, but I confirmed in renderdoc
before it was command line arg
and that was annoying to test to make sure I wasn't getting vvls
I used to think that every thing would take more time to write with C since there’s no stl and no generics but actually the opposite is true. Faster compile times, void *everything, arenas, no fucks given -> light speed development
Not like vblanco 2 weeks to make a Minecraft clone light speed
More like 3 weeks and still staring at a triangle light speed
It took him like a year lmao, he just moved it to unreal in a few weeks
Yeah that’s true
Thinking about the uniform conversation in Grub’s thread I think I will actually try what that led me to think about. An OpenGL like uniform mechanism that lets me specify arbitrary data types I want available in my shader. So if I wanted to have a float4 or a float or a uint or mat4x4 I just call a function that creates this, gets an index back, and then I use that index to update the value and I send that index in with my push constant.
It will all be backed by a per fif buffer that is copied every frame. The index points into a table that returns the offset in bytes and type of the value. The bda will be into a global buffer that has all this available via the scene buffer. Maybe the just stick at the end of the scene buffer.
Hrm, this would be separate data from the scene graph that I get from an asset. Just arbitrary convenience data I could use when I wanted it
I like that
I don’t think I need a BDA in my push constants, they’ll just be global and the push constants wii have offsets
I got excused from jury duty. Just silently sat in a chair for 1.5 days
Nobody even talked to me
Typical lol
civil service = accomplished
I did my part
Since I am just going to use the rsy asset format I get the benefit from all my months of working on that and will just get mipmaps, mesh optimizations, mikktspace, bca compression. Only thing I need is copy over the Rosy scene graph and parse the dds header and have no problem loading sponza
No need to write a json parser or add image decoding
But I will have to write some C++ to extend it to get HDRI
So will do some work on Rosy
I guess that project is not dead yet
Not doing this work yet though
It’s all going to move real fast soon, once my software rasterizers and RT pipelines are working
Maybe eventually Palinode is just the renderer for Rosy 
Incredible
I am going to try and add the image2D builtin type to vcc so I can write to an image in a compute shader. Not fully familiar with how the vcc IR works but this seems like a small task given I can follow the sampler builtins that are already present
alright
been blocked on compute shader stuff because I want use descriptor indexing in my shader and that's not working due to vcc compiler errors
was assuming I was doing something wrong or it was maybe a small bug in vcc, which may still be the case, that it is a small bug
but without a broad understanding of how the compiler works my random attempts to be lucky with small changes that didn't understand how they would actually work failed, which course they would
so just gonna spend the weekend learning more
I have created a test case that fails in my fork of vcc
and while I haven't figured out the specifics I have pretty good understanding of how the frontend llvm works and how the backend spirv emit works
and I can read the IR dumps
I understand the address space a little bit
goal today is to understand how the descriptor annotation works right now without DI
like from C to spirv end to end
then I am just going to create my own node type in the shady IR and try to have it emit some simple spirv
so I understand that part
maybe as I grow to understand it I'll notice that this may be a small bug but just to get anything working at all with DI I want to actually bypass all the existing array address space stuff with a new annotation that just treats DI as a magical thing and to skip all the existing logic and just treats it as the opaque black box it is anyway
and then I can go back to my stuff
ok ok I think I understand the annotations thing end to end, the LLVM IR representation, and how they're parsed into shady nodes and how the global variable nodes get and store annotations, and how they're referred to in the backend to emit spirv
I'm going to add some debug logging, step through it in the debugger and print out the LLVM IR and Shady IR and then I think I will feel pretty good about it
there are cool
and helpful
like const char * LLVMGetValueName (LLVMValueRef Val) 🙏
TEXSAMPLER: Converting LLVM global 'texSampler' at 0000022C979F0320
TEXSAMPLER: Created GlobalVariable node at 0000022C97D5DF50
Converting function: main
TEXSAMPLER: Attaching annotation to dict for node 0000022C97D5DF50
TEXSAMPLER: Adding DescriptorSet=0 annotation
TEXSAMPLER: Attaching annotation to dict for node 0000022C97D5DF50
TEXSAMPLER: Adding DescriptorBinding=1 annotation
TEXSAMPLER: Attaching annotation to dict for node 0000022C97D5DF50
IR dumped
TEXSAMPLER: Emitting SPIR-V for global at 0000022C97D5F7D0
TEXSAMPLER: Has 3 annotations
TEXSAMPLER: Emitting DescriptorSet decoration = 0
TEXSAMPLER: Emitting DescriptorBinding decoration = 1
Wrote result to output.spv
%105 = ref(UniformConstant, %104): CrossDevice %105 = @Exported("texSampler") @DescriptorSet(0) @DescriptorBinding(1) GlobalVariable(type: %104, address_space: UniformConstant, is_ref: true, init: null)
%116: CrossDevice %104 = Load(mem: %102, ptr: texSampler)
this is the llvm IR:
@.str.11 = private unnamed_addr constant [28 x i8] c"shady::builtin::LaunchIdKHR\00", section "llvm.metadata"
@.str.12 = private unnamed_addr constant [25 x i8] c"shady::descriptor_set::0\00", section "llvm.metadata"
@.str.13 = private unnamed_addr constant [28 x i8] c"..\\test\\vcc\\textured.frag.c\00", section "llvm.metadata"
@.str.14 = private unnamed_addr constant [29 x i8] c"shady::descriptor_binding::1\00", section "llvm.metadata"
@.str.15 = private unnamed_addr constant [15 x i8] c"shady::io::398\00", section "llvm.metadata"
@.str.16 = private unnamed_addr constant [19 x i8] c"shady::location::0\00", section "llvm.metadata"
@.str.17 = private unnamed_addr constant [19 x i8] c"shady::location::1\00", section "llvm.metadata"
this is the spirv disassembly:
OpName %texSampler "texSampler"
OpName %texSampler "texSampler"
OpName %texSampler "texSampler"
OpName %stack_ptr_2 "stack_ptr"
OpName %generated_fini "generated_fini"
OpName %fragColor_physical_2 "fragColor_physical"
OpName %fragTexCoord_physical_2 "fragTexCoord_physical"
OpName %outColor_physical_2 "outColor_physical"
OpName %stack_ptr_3 "stack_ptr"
OpName %generated_fini_0 "generated_fini"
OpName %outColor "outColor"
OpName %outColor "outColor"
OpName %outColor "outColor"
OpName %stack_ptr_4 "stack_ptr"
OpDecorate %fragTexCoord Location 1
OpDecorate %fragColor Location 0
OpDecorate %texSampler DescriptorSet 0
OpDecorate %texSampler Binding 1
that also has unrelated out and in variables
alright, next thing is learning how to create my own annotation and IR node, and have it emit spirv, just to learn
this is interesting though from the spirv dis
%41 = OpLoad %40 %texSampler
%42 = OpLoad %v2float %fragTexCoord_physical_1
%43 = OpImageSampleImplicitLod %v4float %41 %42
the fuck is code motion
the compiler can legally move code around if it doesn't affect its semantics
ahh right
that's just clarifying that the compiler won't move your code in certain cases
thanks makes sense
to be clear, it's talking about texture instructions that use implicit derivatives, which are undefined in non-uniform control flow
