#Rosy
1 messages Β· Page 16 of 1
lucky you
that's the plan β’
I use it at work because thats the only choice. Its nice but some things are not
I hate that you cannot open a file in raddebugger without doing the text tab file:"...".data thing
i like how they wanna explore visualisations more
also the file path mapper is nice
because I work on 2 computers at the same time and sending binaries to other computer it is really handy
what is gpu mode? https://github.com/gpu-mode/lectures
a YT channel?
feels like some kind of organization
a four letter organization
no graphics content 
what is a four letter organzation
hrmmm I see amd mentioned a lot
there's nvidia and apple gpu content though
idk
mysterious
I joined the discord
like a three letter organization but with an extra letter
so it's 33% spoopier

it just seems a little AI pilled
boring
looks like there's a bindless extension for wgpu https://docs.rs/wgpu/latest/wgpu/struct.Features.html#associatedconstant.SAMPLED_TEXTURE_AND_STORAGE_BUFFER_ARRAY_NON_UNIFORM_INDEXING
Features that are not guaranteed to be supported.
yea
unfortunately just an extension and not on webgpu
i could really use bindless right about now
alright I'm going to add a slider to my ui
and a button
and then I can start on stuff
real 3D stuff
I think some of this server's activity has moved to other servers
that's fine
I shouldn't make it 4x4
it should be 2x4
32 * 4 * 2 = 256
I don't have 512 stuff
probably fine though actually
are you curious about using webgpu?
hell no
I tried it once
it's actually cool, except for the bindless part, wgsl syntax is weird
I just got everything I need with 
I love being able to send a giant buffer without describing anything about it other than the offset, a usage flag and how many bytes there are
btw you can simplify it even further because the usage flag has no effect on desktop GPUs (except for descriptor buffer)
oh I mean memory usage
yes
if by weird you mean ass then yes 
it's kind of interesting in how different it is, but ya
idk I made some spinny colored cubes with it iirc
I was working on a sameline component
like imgui
and I have decided instead that it is easier to just have to specify next line
I like that it is explicit
otherwise it has to kind of be an undo operation or a weird state management
and I rather have an explicit API and simple logic without weird state or lookahead or other bs
and I want these things to be nodes and not attributes
and it can serve as a horizontal rule tbh
hrm
maybe that should be its own node_type
I kind of want tabs also
SameLine() could just set a flag and widgets only increment the line if the flag isn't set
and if the flag was set, they unset the flag
but then every widget has to advance the line
I think explicit is fine though
and do logic on that
yeah
that is nicer than anything I was thinking of though
yes I like it explicit, nice
I think I will use a function pointer for button callbacks
hrmmmm
thinking
I think so
or I could return a bitmask
or you can give it a bitmask as an arg that can be read
oh no the bikeshed possibilities are growing exponentially
fuck it function pointer was my first idea
@broken fog do you attend Universidad Nacional del Noroeste de la Provincia de Buenos Aires ?
argentina is so pretty
every time my daughter advances into some new stage of her life I completely change how I feel about other people in that stage. Now I see young people attending college and it makes me happy to think about their future and what they're going to learn and do in their lives. Like I feel hope. I did not used to think about college kids in this way.
how did you think of them
nope, don't think i've heard of it either, why?
Someone from there on linkedin added me and I was wondering who it would be
Mostly just didnβt, felt nothing if I did I guess
is nvim telescope dead
it has tons of deprecation warnings and no updates in months and tons of github issues and prs ignored
ah
the latest release is from 2024 and switching to the master branch fixed the issues
my ui is getting a bit complex with multiple components having interactivity so switching to a unique ids and adding helper functions with safety bulit in, was just indexing into arrays yolo like before
I'm also going to have nested components hrm like tabs and trees and whatnot
#software-rasterization is weird, in api channels people state facts about how to use APIs and refer to specs, in that channel, it's just opinions, arguments and negativity and a weird gpus are bad take?
maybe that's harsh
it's just not my vibe
I just looked and see a huge argument about OOP lol
voxel's videos are really cool though
you might just be unlucky
also you say "in API channels" but you have not had your soul crushed by #opengl like deccer
Yeah thatβs a special case due to opengl popping up in gamedev beginner searches
oh yeah no idea
i don't use linkedin like ever
from what i've seen it's been taken over by a bunch of grumpy retro enthusiasts who screech any time someone asks about gpu sw raster
instead of just like
letting the convo happen
idk why
I just don't find that a useful channel to participate or lurk in
whatever this is the cooler sw raster channel π
#1311466891415519302 message
neat
makes me want to do some sw raster
oh well i'll get the chance to improve the one in my kernel thing for uni soon
though i was also thinking of implementing a smol desktop environment
how does the Argentina university system work? are they private or public or a mix of both?
just curious
we have both
public unis are either completely fucked because they have zero budget or great but small and very hard to get in (hard cap on how many students are admitted every year)
I see
private ones you have all kinds of stuff, from meme unis to very good ones
and they go anywhere from affordable to pretty damn expensive
not us expensive though
the military paid for my uni after I got out, I went to an inexpensive state school
like they are "your family is well off" expensive but not "get in lifelong debt" expensive
I see
student loans are very bad imo
like people don't move out to go to uni, unless you live in some province and are coming to buenos aires to study
but if you live with your family in ba you most likely stay there
I think if student loans were restricted to majors that led to job outcomes it might be better, but also more unfair I guess
I knew people with 6 figure debt and musical degrees
cause students just can't afford rent
6 figure debt is absolutely insane regardless of the job you get
yes
it's same as a house purchase basically
my school tuition was like $2k a semester back in 2002
you can get through all of the most expensive private unis here and pay tuition + some basic rent without ever hitting 6 figures total money spent
I didn't pay it though
it depends where you work in the US, if you work in the bay area you are doing really well
if you're working in oklahoma you are basically same salary as any professional service
yeah here swe has pretty good salaries and benefits overall but like, all salaries are shit
i think we have like 40-50% of the coutry below the official poverty line or something like that
also uhh

hola
(that wasn't even our worst inflation in the last decade)
yes
the line getting flatter is worse actually
that's the usd being kept kinda artificially low
I thought that chainsaw guy was fixing things
π
it's uhhh complicated and very schmolitical
at this point i honestly don't know if it's better or worse than the alternative 
oh ok, idk anything about politics or economics
the thing is 2020-2023 we had massive inflation and the usd shot up
if you had savings in ars you were fucked ofc but no one is that dumb
and if you are in software engineering you were very likely being paid in usd so you could shrug off inflation and cost of living in usd was very low even though salaries were also much lower than in us/eu
but more recently the exchange rate is stuck, but inflation just kept going just a wee bit slower
so cost of living in usd/eur is now higher than many european countries
like, some stuff is more expensive than austria
anyway yeah the economy is fun
at one point we had 9(!) different exchange rates for ars-usd 
wow
wow
a reasonable reaction 
arbitrage opportunities abound
I have to keep a window cursor to track inline positioning I realized, so I can detect the hover area for a button
buttons are a bit more complex than I expected
that'll apply to the slider too
every component has to update the window with how it impacts the cursor's x and y in the window so the next thing is positioned correctly
not just in terms of how how components are drawn
but also for handling events
this is made easier by being an explicit api
nothing wil be determined at draw time
my ui really had to grow up to add a button
unique ids, positioning cursor, new lines, helper functions
I thought I'd just draw a box and detect mouse coordinate and call a function
idk how I raised such a talented kid, she sounds like Evanescence. she's got a performance coming up I'm going to record it and put it up if she let's me
wasn't me though, was her choir and guitar teachers and all the work she's put into it
almost done with this button
something is making my frame rate like half
idk what
I'm going to work on the slider tomorrow
and then this weekend I"ll be into rasterizing
probably for a bit
I should read through how box2D uses SIMD
see this makes sense it's using immintrin.h types and functions throughout and not converting back and forth, I think maybe my piecemeal approach was an issue
I'm not worried about it right now
just something to learn from
typedef __m256 b2FloatW;
hrmm
it is done, I have a good enough UI to start rasterizing
it has edge cases and small issues but it's good enough to proceed, I'm not building a high polished UI here right now
Yes
I am currently working on a cpu software rasterizer but I blit to a vulkan swapchain
This all otherwise cpu though, not gpu
O ok
re #general or you coerce gobi to implement the missing feature in vichichi [veetcheetchee] π
im sure he is getting some motivation to work on it if more people use it and find missing things for him to fix/impl
padme-to-anakin-right?.gif
we need to convince some mega corp with too much money to fund gob
if he gets more exposure, then this could happen
if someone pressured me on my open source project I would delete the project, which is why I don't have one
so I'm probably not the person, but I do want to support vcc
and Shady
#general message
lol
I should shut up
hehe
let me tell you about this thing I really like going over everything wrong with it first is totally how my brain works
thats a totally legal thing to do
just like rolling all aspects of a game yourself, without firdparty libs
financially ? i have pretty much a dream job in this uni and unless you have enough money to hire more PhD students (and for that we have to make it a proper official thing through the uni, or maybe DFKI) there's little I can do with donations
I'm always up for for undergrad students/volunteers though, and we do have some budget to hire HiWi part-time (if you're studying at Saarland University, or possibly elsewhere in Germany)
well if I know anyone like that I think might be talented enough to do a good job I'll try and send them your way
some resolution scaling, I refactored the code also now to accept meshes with vertices and indices
it's just a mesh with 3 vertices and 3 indices though still
I'll work on adding transforms tomorrow, and a perspective matrix, and after that I can add more triangles
I simplified the background also
perhaps you could tap into government aids? (ah i didnt expect to come out like this but its also applicable i suppose in that context)
some game development fund bs they keep talking about
just make a game, call it vichichi [vee-tchee-tchee] (jaker may or may not tm'ed it already, hes πΊπΈ after all) some cool monchichi jumpnrun which makes use of the tech
then you just found a company with your uni hiwis, and let the prof think about a goal
these proposals also take paper levels of time to write
and i kinda need to graduate πΈ
that shall be your graduation project then πͺ
nope
i already have a thesis topic
you need to be a PhD to submit a grant proposal too, so if I get a post-doc I can write and submit one on my own, not relying on my prof or another post-doc
ah
i'm actually going to keep the lid on what i'm working on next
i went public far too early with vcc
vchichi 2 incoming
vcc 2
opengl 5 obviously
which has vccShaderSource, got it
a monchichi is probably a cool mascot for vcc
nevermind its probably copyrighted to death like ninendo π
I'm going to solve my shader issues by just not having any
I'm likely just going to have functions that have a builder state, and have builder functions that emit just the SPIRV I need, I'm not building a generic library for anyone to use, just going to solve problems I have
I'm sure anyone looking at whatever I come up with will go "but why" and that's fine
"but why" could be the tagline for my project
ah i thought it was first introduced in gl 2, oki then
I'm busy refreshing my understanding of LA and game math, rereading the first 7 chapters of essential math and first 3 chapters of lengyel's foundation book 1 before I get into doing anything
LA?
LA is a dangerous place
oh linear alg
yea π
these are 2 of the 4 books I read last year when I was working on my foundation project
I might add some more UI while I reread these, but focus is probably to just read for the next week and refresh it all
when I did this last year I wrote my own math library in zig, this time I will write my own math library in C
heh i had to write my own tiny math lib in c for the sw renderer
kinda fun, only did the absolute necessities tho
not having the stdlib will do that
I don't need or want a std lib
I like doing it myself
I got a profiler I can improve whenever I want, I have a UI I can hack on, I didnt' need Tracy and I didn't need Dear imgui and I don't need a glm or a stdlib
or an stb or anything else
you arent using the C std lib ?
or u mean u dont need the cpp std lib (which is why u are using C)
there's no C std lib
there's the CRT
oh libc
I was thinking like std in C++
hrm
yes I use libc
it would be pretty hard to do anything without it
i see
maximum bikeshed 
or the kernel stuff bluescreen was doing
I guess it is called the C standard library, ok I didn't know that
I guess I do need it
hehe
you're not nihing until you have to roll your own printf 
oh and no malloc cause you didn't implement a memory manager 
i love syscalls
asm by beloved
i actually get to work on that assignment again next week, gonna be fun
want to see if i can get a little gui desktop going
i made a syscall to draw gouraud shaded trongles 

so this is actually creating your own character representation from integer and floating point types?
like looking at the bits of the sign, mantissa and exponent in a float to write some characters into a buffer?
the custom printf? yea pretty much
i didn't add float support tho
didn't need it
and it's way more complex than anything else lol
chars and strings are trivial, ints are pretty shrimple
floats are where it gets complicated
i actually have fp math in the kernel which is kinda sus
but i don't care i want those 3d giraffics 
this runs in a virtual machine?
I just don't want to link third party code, I am not going to reinvent C or the operating system
ye, on qemu
tried booting it on real hw but it didn't work on my old thinkpad 
iirc it did boot on a uni laptop but it was unusably slow for some reason
but still seeing a thing you wrote actually boot on a real computer is cool
(ok we didn't write the bootloader part but still,,,)
inb4 π¦ πΊ coerces bjorn into BjornOS development
in december I am going to pause on working on Palinode and just try to make a small game with it in what ever state it is in by then, maybe a small snake game or something
I could already do that now, so should be even more capable by then
copy paste the repo and change whatever I have to make a small game with what's there
that should be fun and instructive
so 2.5 months of changes should maybe have some 3D and lighting and controls, probably just on the CPU software raster side though, GPU stuff is blocked by not having shaders anymore and needing to generate spirv
it's great re-reading the math, I lose nuance over time and forget things I'm not using, but I'm not using them because I forgot them. each time it's also less of a lift and I find I know some things now more intuitively
writing a math library with tests now, this is going to take a bit as I read and write code probably a couple of weeks, nihmaxing
I think what I am going to do is spend two months per quarter working on the renderer and one month on a small game, thatβll be 4 games a year
Tiny games time boxed to a month
Using this project
so what game are you making this month?
depends on what palinode can do by then
probably will just be cpu software graphics since I don't have shaders
nice, very doable
gta 6
I'm glad I did all that hard work last year making my own math library and doing all those math visualizations, reading 4 different math books and writing tons of notes. It's all coming back to me quickly. It's kind of interesting how I forgot about some of the details, but just reading is a quick refresher. I should probably do this like once a year until I just got it down. Then I can dive more into some of the complex things I always have to look up and maybe someday it all becomes intuitive. I have to look up rotation matrices and reread the perspective matrix math every time I want to touch one of those matrices still
What sorts of things does your math library do?
I don't really like glm
I've never used glm so I don't have an opinion π
I always just end up writing my own stuff too, like you.
just via frustum and winding order
I added triangle clipping but it is slow af
so I turned it off
I need some kind of near plane clipping since the triangle will just disappear otherwise it will cause a division by 0 or appear upside down if I don't clip the full triangle whenever it crosses the near plane
Guard bands will make it so you don't need to clip on the sides most of the time
can't guard band near plane as far as I understand the math
Yeah
I have really learned to appreciate GPUs
I currently don't have a depth buffer, I think I will work on that next
then I'll add a torus, and then some lighting
Do you have an obj loader
not yet
I will write one though
I think it would look horrible without a depth buffer, so I thin that should come first
Yeah for sure
I really like the torus shape for working on lighting
How will you implement the depth buffer?
was thinking just a f32 array
from 0.f to 1.f values
have a little debug visual so I can copy the values to the bitmap
What about the depth test?
I mean it's trivial with single threaded code
But if you want to process triangles in parallel, it's more complex
it's still single threaded, but I was thinking I'd always be rasterizing one surface at a time?
my rasterizer is uh, a bunch of functions and some shoe strings tying it all together, I think I will pass the z around and have the depth buffer just on the context object
rasterizer should be independent of fragment tests imo
in hw, the rasterizer generates a bunch of fragments and then you have early fragment tests, fragment shader, then late fragment tests and blending+srgbisms
yeah
the tricky part is that all of it is an atomic operation basically 
AND it has to happen in triangle submission order
or at least appear as though it did
I put a wrong if statement anywhere and my frame time increases an order of magnitude
I'm going to think about the concept of fragments
frogments
probably would be a rewrite to do that
it'll be interesting how all this changes once I start on the gpu software rasterizer
added support for rendering multiple meshes, now ready for adding a depth buffer and depth test
I can probably make some changes and reuse that "resolution factor" (happens at the end) as a generic quantization effect to use on the GPU, it looks kind of cool.
perf is looking great, massive improvement from before
is this still single threaded?
oh ig it's also not rendering at 4k anymore lol
Itβs still 4k
Yes single threaded
The perf got a little better, but it depends on how many pixels I draw. The paint pixel function is a bottleneck at per screen pixel resolution
I need a specialized draw function when it's directly to a screen pixel I think
I will need a proper pipeline eventually, it's just a big for loop that descends into a function call stack in a single thread
oh right
way fewer pixels covered than with the big trongle
before:
now:
I used a win32 thread pool https://learn.microsoft.com/en-us/windows/win32/procthread/using-the-thread-pool-functions
I'm rasterizing 24 rows at a time
cut my frame time in half
haven't started on the depth buffer yet
I could batch the work differently
it ain't much, but it's better
see that frame time approach 40ms and sticking around 15ms in the second
i32 num_threads = NUM_RASTER_THREADS;
for (i32 y = y_min; y < y_max; y += num_threads) {
raster_arg rargs[num_threads];
PTP_WORK work[num_threads];
for (i32 yo = 0; yo < num_threads; yo++) {
rargs[yo].t1 = t1;
rargs[yo].bitmap = bitmap;
rargs[yo].dimensions = (uint2){scale, scale};
rargs[yo].raster_bounds = bitmap_dims;
rargs[yo].bitmap_dims_f = (float2){width_f, height_f};
rargs[yo].mins = (int2){x_min, y_min};
rargs[yo].maxs = (int2){x_max, y_max};
rargs[yo].pitch = pitch;
rargs[yo].y = y + yo;
work[yo] = CreateThreadpoolWork(raster_surface_pixel_row, &rargs[yo], &actx->crctx->thread_pool.cleanup_env);
}
for (i32 si = 0; si < num_threads; si++) {
SubmitThreadpoolWork(work[si]);
}
CloseThreadpoolCleanupGroupMembers(actx->crctx->thread_pool.cleanup_group, FALSE, NULL);
}
my debug build is also as fast as my release was prior to adding worker pools so that's nice
putting my CPU fan to work though
I should sleep the thread to cap max frame time to 16ms or something
I can reuse this thread pool for other things tbh
neat
nice, threadpools are cool
have you tested against your own DIY threadpools?
I feel like you could get some improvement with some dedicated threads you feed with simple work queues
Yeah I am going to wrap the windows thread pool into a job system where I can iterate on it.
Itβs nice to see this lead to an improvement
I am unsure though why I if I wrote it myself it would be faster though
I am able to yield and wait these work threads
I think I need to break up the work better and think more on cache use across threads to get better performance right now
But it would not be a lot of work to implement myself
I just wrote all this last night after a full day of work so havenβt tested it against anything yet other than just yolo threads which ground everything to a snailβs pace
I want to make progress on depth testing now
it basically just depends on how much work you're doing to keep the threadpool fed, maybe the windows API for it has a negligible cost comparable to DIY thread pools
I just kind of want to do my own thing anyway
I think early tests to discard unnecessary work is next, which just checking the barycentric coordinate values of the bounds of the work group, and depth testing the pixels at those bounds will help with also
like if all four min max coordinates are not part of a surface, or occluded, I can avoid a lot of work
although maybe not in the latter actually
like maybe there's small surfaces occluding just the four corners nevermind, that won't work
I think occlusion culling will have to be its own thing via path tracing or something
if the bounds of one mesh are occluded fully by a single other mesh
I like how people are able to use "Scene views" framed by a ton of unnecessary UI empty space to excuse having a smaller resolution render, I will use this trick and nobody will notice #showcase message
actually I just like the UI
the smaller render is just a bonus :P
I'm kidding anyway about it being an excuse
This screenshot is from one of my favorite games, The Immortal. But even as a kid I noticed that all that space around the window wasn't doing anything (except for the health bar above)
(Some of the Ultima games also had small game windows, but at least the rest of the screen had other UI and served a purpose)
It also had great music (which I still listen to today). It was Will Harvey's studio.
If you've never played it it's probably hard to recommend today because it involved lots of trial and error and unfair deaths, but at the time it was amazing!
yeah Blizzard came out with a remake for Diablo II and I loved that game so much when it came out, but when I tried the remake (which I bought) it was unplayable
I played so many hours of that game
when it first came out
I couldn't play it for 1 hour when the remake came out, it was so frustrating lol
Diablo II Resurrected
our standards for playability have changed, or at least mine have
I've never played any Diablo games, believe it or not π«£
I do like the vibe, but the gameplay never seemed like my thing.
What was frustrating about the remake?
it's very difficult to see where the loot dropped, the navigation is really clunky and the pace of the game is very slow
it was just not any fun at all
I don't know, maybe I was having a bad day and I should try it again
I paid money for it and never even got past the entry level
That must be frustrating... Maybe they've released patches to improve it
I think gob left the server?
yea 

trying to break thread pools
I set the minimum number of threads to 1, and max number to 500 and just submit each pixel in a row to a thread which is thousands
my frame time for max draw dropped by 5ms
let me up it to uh more?
I want to know where it breaks
it doesn't break it just doesn't get better
oh interesting, the more work I give it the better it performs, increasing the number of threads actually has less of an effect
I was being conservative just giving it a tiny bit of work at a time, but and waiting
I have 12 cores, but just giving it 6 threads do to the same amount of work just increased frame time by 1 ms
from 10ms to 11
I'm going to try to submit the entire number of pixels of the bitmap and wait for it
I'm doing science π§βπ¬
I've peaked
this went from 40ms frame time to < 9ms
just submit the number pixels in the window work items and that's best perf
with 2x number of cores
hrm
I switched from stack memory to heap, because I got a stack overflow trying to create too big of an array for all the args for the thread functions and it got < 1ms faster
anyway
this is good for now
time for depth testing
I'll guess I'll work on a torus tomorrow and start on lighting
also I was stepping around the disassembly in a release build it is definitely vectorized, tons of SSE instructions
I think using the clang vetor types helps, I had align my arena memory allocations to 16 bytes, although I know nano said I can specify an alignment
let me see what that does
nothing
typedef f32 float4 __attribute__((ext_vector_type(4), aligned(4))); still needs to be 16 byte aligned like this
I don't want that anyway
rendering is slow again 
time to get a threadripper
I'm no match for 500 trongles
it's actually not drawing that's slow now, it's just the number of triangles, it's slow at any resolution
so the vertex processing is slow?
time to profile harder
ya
at least the time use was polite enough to bunch up mostly in one bin
sampling profiler time? or are you gonna instrument more
going to try and do my own sampling
get some % of times, then average the times and then multiply by number of runs or some janky strategy
so like 100k iterations, sample like 1% of that, get 1000 times
average those
multiply by 100k
idk
I should read about it
tbh I think just that would be useful
I need to do that thread local for the stuff in the jobs
I can have them write to their input and once they've joined I can read from them
idk
you don't need to use thread_local, it can be slower than just using normal memory
just operating data-parallel without threads stepping on each others' toes and requiring fine grained sync should be enough
nvm I might have misread
I don't actually qualify as thread local
the variables are just thread local because they're scoped at function scope in the job function
maybe I used the wrong words
ok I set up a good test scene, that has a ton of things that need fixing
performance, there's a weird bug where I'm not drawing pixels, I think at values close to zero or something? my y axis is inverted, and scaling is affecting frustum culling in a weird way
81ms in raster trongle 
I actually need a UI change first, I am going to add tabs, because I don't have enough space to add more tracing info
is the issue fillrate or vertex calcs?
does fps get good if you zoom way out
yeah drawing fewer pixels improves perf a little bit, but if I have just one giant triangle covering the full screen it's < 5ms per frame
so it's not drawing, but drawing is part of it
ah not fillrate issue then
oh I see what you mean, I didn't know what that word means
oh ye it's like, how many pixels can you draw per unit of time
I just have to measure and figure it out
ig your bigger issue now is whatever you're doing per vertex is slow
I like my hexagon floor though
I came by it by accident
I was working on making a sphere
and realized they looked nice tiled together as a plane
hexagons are cool
I think I could make a cool torus from hexagons since they tile so nicely
I should make all my shapes from hexagons
including cubes
they are the bestagons after all
alright time to start on a tabs UI
I think the weird blank pixels is a bug in my pixel to screen space function or precision issue or something, it gets worse depending on the angle yea, just looking at it
hrm
maybe the rasterization rules you're using are not watertight
this page has nice illustrations
https://learn.microsoft.com/en-us/windows/win32/direct3d11/d3d10-graphics-programming-guide-rasterizer-stage-rules
oh I think you're right since it's not a floatting point precision issue I can replicate with super huge pixels
lol
looks horrible lol
I'm going to read that thank you
I should share this ^^ in #software-rasterization to show off my skills

I don't have a concept of a pixel center
huh
idk what that means for my code right now
I have to think about it
pixels are basically points, like a thing that has no with or height, I guess that's a problem
having to do more math per pixel is not going to make my perf better lol
I'll figure it out
fixed
I love those hexagon tiles so much
ok, now working on ui tabs
that was really bothering me so glad to fix those gaps
TIL about clang blocks https://clang.llvm.org/docs/BlockLanguageSpec.html
I'm already all the way in with vectors and matrices
I'm going to use blocks and __block storage to power my tabs
I could have also used a static local variable, but those are kind of gross
idk
AKA gauge blocks
https://en.wikipedia.org/wiki/Gauge_block
||because you clang them together||
I like those
Variables qualified by __block act as if they were in allocated storage and this storage is automatically recovered after last use of said variable. An implementation may choose an optimization where the storage is initially automatic and only βmovedβ to allocated (heap) storage upon a Block_copy of a referencing Block. Such variables may be mutated as normal variables are.
oh
this is macos only lol
weird
I guess that was a waste of my time
static local variable it is
I changed my colors around
it now cycles between a bright blue sky and a darker blue sky and I changed the UI colors and added tabs
I also figured out what was slow
I didn't even need to do sampling
I have to rewrite the entire pipeline
the problem is how often I wait
I just returned immediately from the jobs without doing anything and it was slow, it's just the job scheduling & waiting
I shouldn't have needed the thread pool, people are getting performance than I am with more triangles than I am just single threaded, I am going to start over on how it works
single threaded perf is the fundamental thing
yeah
I consider multithreading to be a last resort unless it's super easy to add
it's a fun problem though
rasterization does not seem trivial to parallelize
I'm not frustrated or anything
yeah I didn't get that vibe
I have made a horrible attempt at parallelizing rasterization using cuda and it uh
somehow a single triangle ended up with 30 ms render latency iirc
I wonder what strats are optimal for CPU software rendering, maybe some kind of tile based solution where you bin your triangles into tiles and rasterize each tile on a different thread
that would probably lead to minimal sync requirements
I'd totally reach for tiling as a first option. Bet it works pretty nicely.
so I broke apart my rasterizer into different stages where each stage generates a flattened array based on the previous stage: meshes -> produce a flat array of triangles -> triangles produce a flat array of fragments
well these are more like potential fragments
everything is fast as can be
per frame raster time < 1ms
so now I am in this function that operates on each potential fragment, which is basicaly all the pixels that exist within the bounds of each triangle
it does nothing
I add a simple addition
just one addition
frame time increases by 1ms
that's with the addition
1.2M potential fragments
I can't do shit in this function
no addition
and yet the CPU can do tens of billions of additions per second
maybe the compiler optimized out a bunch of stuff when you didn't do the add, or it pushed some heuristic over a threshold that made it do something else, idk
I lied
I was commenting out and commenting the addition
but the thing I was adding was derived from a function that was optimized out
ok so I changed it
it wasn't the addition great
working on that function now that added a 1ms! it's a tiny function
single adds are impactful though, I just added of a simple float value to see how it compared to type casting an int to a float and it was meaningful frame time
they're both single ops
single instructions
I should look at the disassembly at this point
this is like squeezing blood from a stone
if I had tracy I would know more
I need to clean this shit up
I just need to pass in scalar values
well scalar values and the vectors
it's kind of cool though how fast everything else is fast
and completely isolated from this
at this point it's just a big array of fragments which I want to turn into just arrays of a depth and color pair that then test and paint
I'll keep going on this tomorrow
I don't remember, is there a reason that you can't use Tracy? Or have you just not set it up yet?
it's a from scratch project
no libraries or dependencies outside of the OS, the compiler and the vulkan driver and headers
I think counting the fragments is creating a dependency
it's a single integer
that probably is creating an issue for the compiler
are you doing anything with it besides incrementing?
look at the disasm
yeah
I guess that means you need to write your own debugger
it could be a lot of things but I'd bet a lot has to do with the overall memory access patterns
cache is king
I allocate memory enough space for all 12M fragment structs and then just iterate through the fragment structs to do calculate the barycentric coords, mix color and depth, and then draw, I'm not at home right now so can't share the code, but it's a small struct, should fit in the cache, and I am just iterating over the memory in the sequence it is written. If I do very little work in that loop it is < 1ms frame time
it's very interesting
this morning before my work commute I removed basically everything that loop does, and I will slowly add operations to it to see how it impacts the performance
there are no function calls, it is only working with the memory in the struct, although I do have a pointer to the bitmap and its dimensions in scope
it's hard to imagine that I am having any kind of memory access problems in this scenario based on what I understand, I will try and experiment more with it and work through the disassembly
if it does turn out to be a memory access issue I'll give in and add Tracy to investigate that
you should try to switch up the memory layout and see how it affects it
maybe try to make it a SOA
hrm
that's interesting
there's probably some alignment padding with the clang vectors
since they are 16 byte aligned
so maybe that will be a meaningful change
I was so motivated to work on this I didnβt want to go to sleep last night and I woke up this morning to hack on it a little lol
Gob pinged me on a change he made to vcc to add a DSL to configure/declare SPIRV intrinsics in vcc
Itβs really cool
Itβs still in draft but itβs pretty awesome
AOS vs SOA
so SOA was like a 20x+ improvement?
p_fragment_t *frags = p_malloc(sizeof(p_fragment_t) * num_fragments);
...
frags[i].screen_coords = (float4){((f32)frags[i].pixel_coords.x), 0.f, 0.f, 0.f};
vs
p_fragment_data_t frag_data = {0};
frag_data.t1 = p_malloc(sizeof(p_triangle) * num_fragments);
frag_data.bc = p_malloc(sizeof(float4) * num_fragments);
frag_data.color = p_malloc(sizeof(float4) * num_fragments);
frag_data.screen_coords = p_malloc(sizeof(float4) * num_fragments);
frag_data.pixel_coords = p_malloc(sizeof(int2) * num_fragments);
frag_data.depth = p_malloc(sizeof(f32) * num_fragments);
...
frag_data.screen_coords[i] = (float4){(frag_data.t1[i].positions[0].x), 0.f, 0.f, 0.f};
yeah, but this is just test code in my loop of 12M things, it's not rendering yet, I confirmed it was just accessing data that was slow
well it does actually render
but to one pixel
all 12M fragments are currently painting the same pixel just so the loop does something
let me slowly start adding instructions back in to see what happens
Which of those fields in the struct were actually being accessed?
i32 num_fragments = 0;
{
for (i32 si = 0; si < num_surfaces; si++) {
for (i32 y = surface_ctxs[si].y_min; y < surface_ctxs[si].y_max; y++) {
num_fragments += surface_ctxs[si].x_max - surface_ctxs[si].x_min;
}
}
}
actx->crctx->num_fragments_generated = num_fragments;
p_fragment_t *frags = p_malloc(sizeof(p_fragment_t) * num_fragments);
p_fragment_data_t frag_data = {0};
frag_data.t1 = p_malloc(sizeof(p_triangle) * num_fragments);
frag_data.bc = p_malloc(sizeof(float4) * num_fragments);
frag_data.color = p_malloc(sizeof(float4) * num_fragments);
frag_data.screen_coords = p_malloc(sizeof(float4) * num_fragments);
frag_data.pixel_coords = p_malloc(sizeof(int2) * num_fragments);
frag_data.depth = p_malloc(sizeof(f32) * num_fragments);
{
// Generate pixel coordinates for frags
i32 fi = 0;
for (i32 si = 0; si < num_surfaces; si++) {
for (i32 y = surface_ctxs[si].y_min; y < surface_ctxs[si].y_max; y++) {
for (i32 x = surface_ctxs[si].x_min; x < surface_ctxs[si].x_max; x++) {
frag_data.t1[fi] = surface_ctxs[si].t1;
frag_data.pixel_coords[fi] = (int2){x, y};
fi++;
}
}
}
}
u8 *bm = raster_ctx.attachments.bitmap;
i32 pitch = raster_ctx.ro_raster_ctx.pitch;
u64 *pixel = (u64 *)bm;
f32 width_f = raster_ctx.ro_raster_ctx.double_scaled_dims_f.x;
f32 height_f = raster_ctx.ro_raster_ctx.double_scaled_dims_f.y;
for (i32 i = 0; i < num_fragments; i++) {
frags[i].screen_coords = (float4){((f32)frags[i].pixel_coords.x), 0.f, 0.f, 0.f};
frag_data.screen_coords[i] = (float4){(frag_data.pixel_coords[i].x), 0.f, 0.f, 0.f};
frag_data.color[i] = (float4){1.f, 0.f, 0.f, 0.f};
frag_data.depth[i] = 0.1f;
frag_data.bc[i] = (float4){1.f, 0.f, 0.f, 0.f};
rgba_to_rgba16(frag_data.color[i], pixel);
}
is the relevant test code
I have both lines uncommented here
frags[i].screen_coords = (float4){((f32)frags[i].pixel_coords.x), 0.f, 0.f, 0.f};
frag_data.screen_coords[i] = (float4){(frag_data.pixel_coords[i].x), 0.f, 0.f, 0.f};
but in my test I just had one or the other
rgba_to_rgba16 actually writes to the bitmap
you can see that none of the other fields are currently being used
all the slowness is in the for (i32 i = 0; i < num_fragments; i++) { loop
yeah it's pretty clear to see why that'd cause cache perf gain then
why is that
you can keep the stuff you actually use in cache without it being padded out by the parts of the struct you don't
basically the concept behind SOA in general
ah right
typedef struct p_fragment_t {
p_triangle t1;
float4 bc;
float4 color;
float4 screen_coords;
int2 pixel_coords;
f32 depth;
} p_fragment_t;
typedef struct p_fragment_data_t {
p_triangle *t1;
float4 *bc;
float4 *color;
float4 *screen_coords;
int2 *pixel_coords;
f32 *depth;
} p_fragment_data_t;
yeah I have padding issues I noticed with the vector types
I saw in the debugger that it would add padding in other instances
I didn't realize that would cause issues here
yeah padding as in perpetually-unused space would be even worse, but it's also in the context of the loop
if you're not using depth for example, it's as good as unused space for a tight loop
oh I will use all of those
this was just testing
I need to start adding functionality back in
hrm
yeah I'll just test as I add pipeline stuff back in and see what happens
although I wasn't using them in the same loop so it was dead space actually
Always cool/interesting to see a real-life example of the benefits of cache friendliness in the wild
yeah thats what I mean

over the last few days something I may have noticed is that conditional logic to avoid doing a small bit of extra work actually doesn't seem worth it, it's either not much of a win, or hurts perf
idk I haven't really confirmed it
it depends on several factors
e.g. likelihood of taking a particular side of the branch, where the compiler put the code for each side, and whether the compiler actually emitted a branch instruction
I read a book that had some content about performance related to branches that I found interesting: https://www.amazon.com/gp/product/B09BZTGJM2/ref=ppx_yo_dt_b_d_asin_title_351_o07?ie=UTF8&psc=1
the author looks like me lol
x86 krill issue for not having conditional instructions 
but yeah if the branch is consistently taken/not taken the branch predictor will get it right basically 100% of the time and the perf hit shouldn't be too bad
then again, if the work you're avoiding is like two adds and a multiply it's probably not worth it
always benchmark
idk, that's just as good as it's going to be for a bit
have some frustum bugs
I kind of want to work on a gpu thing and take a break from the software rasterizer for a bit
I'm going to work on my spirv builder idea
see if I can replace my shader hello world triangle
I need to get my ui rendering on gpu renders
I think I just need to set the clear color to 0 for the bitmap each frame
hrm
and then I sample it from a full screen triangle
love to see this update
Same
So my next goal is a hello triangle with my own spirv builder and then make that a full screen triangle I will sample my cpu raster bit from so I can have a UI with my gpu render
ooh you decided on writing a spir-v VM as well?
or rather just having your software rasterizer run spirv somehow
my project is not a software rasterizer, it has a software rasterizer
I was intending to use spirv for the gpu stuff
TIL about https://github.com/heroseh/hcc
isnt that just vcc?
no
it's a different project
with different goals
hcc looks like it is a shader language
i meant like, isnt that just trying to do the same thing as vcc
no
vcc's goal is to be able to write programs on the GPU using the full capabilities of a programming language, like pointers everywhere
recursion
etc
vcc doesn't have these limitations https://github.com/heroseh/hcc?tab=readme-ov-file#limitations
hcc is more of a shading language in C
Intro # Vcc - the Vulkan Clang Compiler, is a proof-of-concept C and C++ compiler for Vulkan leveraging Clang as a front-end, and Shady our own research IR and compiler. Unlike other shading languages, Vcc aims to stick closely to standard C/C++ languages and merely adds a few new intrinsics to cover GPU features. Vcc is similar to CUDA or Metal...
Vcc supports advanced C/C++ features usually left out of shading languages such as HLSL or GLSL, in particular raising the bar when it comes to pointer support and control-flow:
This is a lot of effort. Why ?
Dissatisfaction with βLegacyβ Shading Languages # Back in the early 2000s a significant revolution happened in the world of realtime computer graphics: we moved from βdumbβ graphics accelerator that had only a fixed set of functionality (texturing slots, blended vertex colors, hardware T&L β¦ ) to increasi...
Gob has a very strong opinion about the unnecessary limitations of shader languages and a vision for how it should be, it's a very different project.
I actually don't know what I want to do right now
I think I might fork palinode and just make a game
and I can add whatever libraries I want to the fork just to make the game
and then go back to palinode when I get bored with that
and add anything that might useful to palinode back from what I did in the game
Actually making something (like a game) is a good way to figure out what an engine needs. Seems like not a bad idea if you want a break to do something different!
An MMO?
lol no
Sorry, I didn't mean to interrupt your idea with my silly comment... Tell us your idea!
wasn't silly at all, was funny :P
it's gonna be a small racing game
maybe just a race against time to avoid having to do any game AI, at least at first I don't know, anyway, just a small thing
going to cut every corner and reduce scope to bare minimum
Seems like it could work and have a pretty limited and feasible scope
I would like to use splines and a mesh shader for the track
switching between the graphics pipeline on the GPU and my software rasterizer (via key pressing so it's a little janky)
I didn't have any 3D set up via the vulkan graphics pipeline, I didn't have a depth image or depth testing, no mesh or index buffers, no transforms or camera or scene data. So I just set it all up the last couple of days
next thing is to get my ui to render on the graphics pipeline
this AI review bot has saved me from so many problems and headaches over the last year
GPUMeshVertex exists because the clang language extension vectors can't be sent to the GPU as they are
the bot understands that, which is cool
typedef struct MeshVertex {
float4 position;
float4 color;
float4 normal;
float2 uv;
} MeshVertex;
typedef struct GPUMeshVertex {
f32 position[4];
f32 color[4];
f32 normal[4];
f32 uv[2];
} GPUMeshVertex;
why not? do they not have the same representation in memory as sequential floats?
vk_buffer_t source = actx->gctx->staging_buffer;
size_t total_data_size = 0;
for (i32 i = 0; i < num_meshes; i++) {
size_t data_size = mesh_data[i]->num_vertices * sizeof(MeshVertex);
memcpy((u8 *)source.data + total_data_size, mesh_data[i]->vertices, data_size);
total_data_size += data_size;
}
VK_CHECK(copy_buffer(actx, source.vk_buffer, dest.vk_buffer, total_data_size));
vk_buffer_t source = actx->gctx->staging_buffer;
size_t total_data_size = 0;
for (i32 i = 0; i < num_meshes; i++) {
GPUMeshVertex *gpu_vertices = p_malloc(mesh_data[i]->num_vertices * sizeof(GPUMeshVertex));
for (i32 vi = 0; vi < mesh_data[i]->num_vertices; vi++) {
MeshVertex mv = mesh_data[i]->vertices[vi];
gpu_vertices[vi] = (GPUMeshVertex){
.position = {mv.position.x, mv.position.y, mv.position.z, 1.f},
.color = {mv.color.x, mv.color.y, mv.color.z, 1.f},
.normal = {mv.normal.x, mv.normal.y, mv.normal.z, 1.f},
.uv = {mv.uv.x, mv.uv.y},
};
}
size_t data_size = mesh_data[i]->num_vertices * sizeof(GPUMeshVertex);
memcpy((u8 *)source.data + total_data_size, gpu_vertices, data_size);
total_data_size += data_size;
p_free(gpu_vertices);
}
VK_CHECK(copy_buffer(actx, source.vk_buffer, dest.vk_buffer, total_data_size));
the docs don't have anything about their layout in memory
what's interesting is in renderdoc the vector types look correct
one thing I know about them is they are 16 bytes aligned
because if I don't align the memory I allocate that way the application crashes
Note: The implementation of vector builtins is work-in-progress and incomplete.
oh ok
what's interesting is that I'm using the OpenCL vector type
which if you use OpenCL works on the GPU? I don't know anything about OpenCL
I think I will need a per frame image for my UI
I'm going to have to do some math for these texture coordinates
oh ezpz
I remember someone saying that there's a severe limit on many memory allocations you can do in vk and I checked my vulkan info
maxMemoryAllocationCount = 4294967295
it's max32 int
so
maxMemoryAllocationCount = 4294967295
maxComputeSharedMemorySize = 49152
minMemoryMapAlignment = 64
VkPhysicalDeviceCopyMemoryIndirectPropertiesKHR:
VkPhysicalDeviceExternalMemoryHostPropertiesEXT:
maxTaskSharedMemorySize = 32768
maxTaskPayloadAndSharedMemorySize = 32768
maxMeshSharedMemorySize = 28672
maxMeshPayloadAndSharedMemorySize = 28672
maxMeshOutputMemorySize = 32768
maxMeshPayloadAndOutputMemorySize = 48128
maxMemoryAllocationSize = 0xffe00000
I'm going to just do one off memory allocations until something breaks or gets slow
I'm not cargo cult building a GPU memory allocator unless there's a reason for it that I run into
maybe it's something those poor souls who want to support android have to deal with
it wouldn't be too much work to add a basic gpu allocator though
I just want to kind of see something break or behave badly to see what happens
it's a bit too much work still to add another graphics pipeline with new shaders and draw commands I need to make that more ergonomic
spent some time today on improving my neovim experience, it is going well
I improved the status line and have information about what struct or function I am in, turned on line numbers and cursor line, which I think really helps
fixed a bunch of lsp problems
have the UI on the gpu now
it's really expensive
it's like 2-3 ms clearing the buffer, rendering it and copying it to the gpu
idc
I'm going to work on the game now
I think I can move the UI to its own thread maybe
I'm not going to worry about it right now I have a lot of frame time to spare for now
I used this full triangle solution https://wallisc.github.io/rendering/2021/04/18/Fullscreen-Pass.html
This is my graphics blog where Iβll post about graphics programming. Probably.
float2 uv = float2((id << 1) & 2, id & 2);
return float4(uv * float2(2, -2) + float2(-1, 1), 0, 1)
so yesterday I built a 3D graphics pipeline for a scene, and today I got my UI to render on the GPU, those each took a full day's worth of work, which I wish I had made some progress beyond it, but it is what it is. I need to reduce the amount of code it takes to make these kinds of changes
I think what I might do is start a ui raster job before the submit and wait for it after present, but again, just not going worry about it right now
I love these graphs the ai review bot creates
placeholder vehicle
going to add a super tiny placeholder track
once I have a little track I'll need shadows, I'm going to use the RT pipeline I set up for shadows, that may take a bit
I have an RT pipeline set up but it just does a triangle
I think I will also use the RT pipeline for the background scenery that isn't anything on the track
the track will be via mesh shader
once I have the track and shadows I'll start on the gameplay
with just this little test thing
I will need physics to accelerate the thing
also if you go off the track you fall into obliviion
I think the initial gameplay will just be beat your previous time for now
My friends and I were going to try and make a racing game for a game jam and I spent most of the time bikeshedding a tool for generating tracks 
Marching squares can be handy for turning MS paint drawings into meshes
I want to just do simple splines
it's going to be a 3D race track
there's going to be elevation
I don't make two dimensional games
Do you not like them or are you just not interested in making one?
not interested in making one
I love a lot of 2D games
I like doing 3D stuff is all
in terms of making something 2D is super boring for me
I want to be able to look around in a virtual world
I want to feel like I'm in the world I made
That makes sense, and I think I'm the same way π
Well, I don't know if I would say "boring", but I guess 3D is more interesting to me to actually work on
I have dramatically reduced the amount of work it takes to create a new graphics pipeline, finally
internal VkResult init_road_pipeline(AppContext *actx, Arena *arena) {
if (!actx)
fatal("no actx");
if (!arena)
fatal("no arena");
FileData road_shader = p_read_file(actx->arena, road_shader_path);
FileData source_spirv[] = {
road_shader,
road_shader,
road_shader,
};
const char *function_names[] = {"road_task", "road_mesh", "road_frag"};
VkShaderStageFlagBits stages[] = {
VK_SHADER_STAGE_TASK_BIT_EXT,
VK_SHADER_STAGE_MESH_BIT_EXT,
VK_SHADER_STAGE_FRAGMENT_BIT,
};
constexpr i32 num_shaders = 3;
VkPipelineShaderStageCreateInfo create_infos[num_shaders];
p_shader_cfg_t shader_cfg = {
.source_spirv = source_spirv,
.function_names = function_names,
.shader_stages = stages,
.create_infos = create_infos,
.num_shaders = num_shaders,
};
p_create_shaders(arena, &shader_cfg);
p_pipeline_layout_cfg_t pipline_layout_cfg = {0};
pipline_layout_cfg.push_constant_size = sizeof(road_pc_t);
pipline_layout_cfg.debug_name = "rosy v3 road_pipeline_layout";
VK_CHECK(p_create_pipeline_layout(actx, arena, &pipline_layout_cfg));
actx->gamectx->vk_road_pipeline_layout = pipline_layout_cfg.pipeline_layout;
p_pipeline_cfg_t pipeline_cfg = {0};
pipeline_cfg.shader_cfg = &shader_cfg;
pipeline_cfg.debug_name = "rosy v3 road_pipeline";
pipeline_cfg.pipline_layout = actx->gamectx->vk_road_pipeline_layout;
pipeline_cfg.pipeline = &actx->gamectx->vk_road_pipeline;
pipeline_cfg.enable_depth = true;
VK_CHECK(create_graphics_pipeline(actx, arena, pipeline_cfg));
return VK_SUCCESS;
}
I don't have materials yet, obviously
it's all very basic still
stuff is coming together, deleting the old stubbed out demo graphics and mesh shader pipelines, things are starting to feel focused
and purposeful
I'll start on the RT shadows tomorrow, that will probably be a bunch of work for a few days
you can't tell, but the vehicle is floating, it's not going to be a car game
shadows will help
I should add a scrollbar to my ui at some point
doing some research, people often use a g-buffer to generate ray queries to use for ray traced shadows
and a g-buffer is deferred rendering, which is not what I have right now
I really want shadows
I'm just going to add a simple single shadow map for now
I'll briefly look at doing a ray query
I don't know anything about those
oh I think that works
just load the meshes into the AS, and then see if anything at all obstructs the light direction, and if so the light is occluded
don't need to do set up the RT pipeline for it or have an SBT
and it will only do it if the fragment gets any light already
and I only need to do it for the road right now, since there's nothing that could block light from hitting the vehicle
ezpz maybe
I already have BLAS and TLAS setup code
Yep, and then if it's not occluded, inverse square law has you covered for the lights intensity
I have it set up in my dx12 renderer where I output an intensity from my shadow pass from 0.0 to whatever, 0 if in shadow or calculate intensity and output that
Then I just put that into pbr calculations or whatever
oh does dx12 allow ray queries from pixel shaders?
that's cool thank you
I imagine you don't use light intensity for sunlight though?
that's cool I can use additional ray queries for other light sources hrm nice
Not sure
I have a traditional rt pipeline setup
are you using just RT or also a graphics pipeline?
Just RT
nice
Doing a rewrite in vulkan with a raster gbuffer and rt shadows tho
I'm going to use the RT pipeline for everything not the road and vehicles and things that are not part of the race
It's the same code path, but returns either a flat 1.0/sunIntensity or 0.0
ah yeah maybe I end up doing this too eventually, but just to get a game going I will try and start with ray queries in the fragment shader, I don't have a lot of stuff so should be cheap I hope
I also won't have any skinned meshes in this game
the other thing that's nice about ray queries is I could use MSAA, whereas I'm not sure I could with deferred rendering
I'm not worried about AA right now
just looked at the spec, it does actually
inline RT is a thing, i havent touched it tho
raytracing is never cheap lol
is this still cpu rasterized or are you doing sw raster on gpu?
why not just do normal gpu raster
because I already wrote it for the CPU I guess
I don't want to redo it, it's fine, it's super fast
except for the upload the image thing
where's the fun in that
I'm going to eventually upload the UI image in a separate transfer queue via a thread or something
it's just a debug ui
it's not the game UI
I'm sorry I mispoke, this just normal graphics pipeline stuff, it is not software rasterised on the GPU it's just the normal GPU pipeline
I don't know why I said that
oh
will you be using the cpu raster stuff for anything?
if it makes sense to, maybe for UI editor widgets or something
maybe all my debug lines are just software rasterized tbh
I don't know
it's just a tool in the toolbox I have
I cleaned up my BLAS and TLAS creation code so I can use my mesh types to create arbitrary tlas/blas now, with support for instances, and it also sets up the descriptors too and adds them to the big bindless set. Should just be mostly shader work now to do the ray query and add a shadow. I also added the ray query extension and features enabling.
one of the issues with using RT shadows is my track is created in the mesh shader and has no vertices to stick in an AS
I may bake its self shadowing
I think so
hrm
I'm not going to worry about it right now
yeah
well the track is going to get procedurally generated from a long spline
it'll have a spline parallel to the track and then an array of curves perpendicular to the track as well
I am going for this https://www.youtube.com/live/MwnmN9yrCPI?si=dO2oJhXhb6GlnKNq&t=90
that would contribute more to realizing the game than AO tbh
yeah
I'm not going worry about that now
I like this too https://x.com/Jakob_Wahlberg/status/1974118026263101932
I'll have a track
can't tell if this character is a baby or a bald guy
I wonder how the suction works
I'm not sure, I'm going more in the killer loop single track thing since that'll be easier
maybe he's using sdfs to find the nearest surface
I just saw that video today
or just shooting rays in a sphere
I'm basically making a killer loop clone
idk feels doable
I should be able to get something interesting with very little graphics work I hope, and then I can work on the graphics and do all the graphics learning I want
and then I'm doing that on a game instead of sponza
So I am going to read the physics in a weekend document and see if it will work for me. If not I may just use Jolt but I would prefer to have my own physics code
I donβt need complex physics for this game
I would like it to feel fast and to handle well and for the drops to feel like falling
I am going to buy an xbox controller
I want rumbling support
I am also going to add audio
you know I'm fully just using win32, I should just be using DX12 instead of Vulkan tbh
it's the only cross platform code I have, the vulkan stuff
I'm happy with vulkan though, although I wish I could use PIX
looked through the physics in a weekend books, seems ok, I'm going to refresh my quaternion math and write some orientation code and start adding basic physics
i have custom behavior but I still use physx for handling the actual collisions
yeah I am not opposed to just hooking up jolt, want to see how far I can get on my own
my vehicle has to hover, so I think I will start with that
the sequel to F-Zero GX we never got
F-Zero, that's the game I have been trying to remember
thank you
I spent the past couple weeks playing every F-Zero game, and now I have tons of thoughts on this amazing series that I want to share. I'm so glad F-Zero 99 released and pushed me to go back and play these games, as they're all fantastic. (And yes, I know I didn't technically play EVERY F-Zero game, but I got all the big ones)
Twitter: https://...
I should get a nintendo switch or whatever
I used to have one of the elite ones but it kinda sucked, as the buttons would fall out :/
they're still charging $50 for that? i think that's what I paid for it 10 years ago
yes the eggs to xbox controller price ratio is collapsing
soon a dozen eggs will cost one xbox controller
so my goal is to get my vehicle to hover, move forward, turn and fall off the edge of the track
that's it
I have no idea how long this will take me
My xbox 360 controllers are still going strong πͺ
yeah they are great, I should just have gotten a plain xbox controller