#Kakadu
3436 messages · Page 4 of 4 (latest)
Shaded wireframe and texture coordinates modes
Only remaining one is the "shading normals" mode, then I'm done
Thicc wireframe
I'm back again
After 3 weeks!
I've been focusing on the wireframe method because I wanted to blog it
But it had a major problem as Cirdan put it
Those are some thicc lines
And more importantly those are some inconsistently thick lines
So I wanted to make it so that I could specify thickness in screen-space pixels and every triangle, no matter the perspective/depth would have uniform line thickness
My current method worked by assigning barycentric coordinates <1,0,0>, <0,1,0> & <0,01> to the vertices of a triangle respectively, in a geometry shader, and letting the rasterizer do its magic so I could use the interpolated barycentric coordinates in the frag. shader to check whether any of them are below a certain threshold; i.e., we are near an edge.
But doing this causes the thickness to depend on the triangle size (specifically the distances between vertices and the opposing edges, since that is essentially what a barycentric coordinate encodes, in normalized form).
So I said OK, let me calculate these vertex-to-edge distances in clip space, convert them to screen-space and use that in frag. shader. That way I could work directly with pixel-unit distance values.
Of course there are major problems with this strategy, which I didn't realize at the time, due to my lack of understanding in some important areas.
And implementing this, I saw there were problems visually: Line thickness for a triangle changed when the camera rotated. This happened for triangles where the depth varies from vertex to vertex. Triangles completely parallel to the near plane did not suffer from this problem for example.
Turns out this was the perfect challenge to deep-dive into rasterization, projection and pers. correct depth/attribute interpolation.
I mostly did reading/study and did very little coding during these 3 weeks
Scratchapixel has nice, detailed, but not too mathy (or rather, too rigorously mathy) explanations on these topics
So I started from scratch (pun intended) there, i.e., from the beginning.
Up until the end of the rasterization chapter
Also re-read Fabian Giesen's "A trip through the graphics pipeline", specifically the parts on rasterization and attribute interpolation
After all of these and some back and forth with ChatGPT (which was surprisingly insightful and didactic)
I realized 2 major problems
- Doing geometric operations in clip space does not make sense semantically: It is not a Euclidean space.
I was calculating the vertex-to-opposing-edge in clip space and then converting to screen-space.
It became clear that I had to invert this: First convert from clip-space vertex positions to screen-space, then do all those geometric operations.
But this in itself was not enough to solve the varying thickness problem
Because
- The default perspective correct interpolation (or hyperbolic or rational-linear interpolation) was distorting the edge distances I calculated in the geometry shader.
While this default behaviour of perspective correct interpolation is necessary for depth and attribute values, it is counter-productive in this case because the edge distances were already projected and transformed into screen-space.
There (in screen-space), simply linearly interpolating across the triangle's pixels is actually needed to produce the correct values.
In a way, we are accounting for the perspective twice if we both project and interpolate hyperbolically
At least this is how I understood it.
So disabling that via noperspective interpolation qualifier in shader code
I got ... something.
Nice wireframes, all equal thickness
But I conveniently disabled the ground, left, and right walls
Because there is a MAJOR problem
And I'm not sure why it happens but I strongly believe it has to do with clipping
As long as the triangle remains in viewport, the lines are of consistent thickness
But if it straddles the near plane, this artifact happens
welcome back
Part 1 of the blog post is live!
This is my first actual blog post! I had planned to write other blog posts first, but I decided to write about a recent subject I worked on instead. So, wireframe rendering it is!
Anyway, the reason I got into this topic in the first place is because I wanted to add “Editor Shading Modes” as I call it, similar to Unity’s, to my study rende...
Hey as far as I’m aware you can use fwidth to get a consistent edge width 🙂 I’m sure that’s what I did in my unity thing when doing wireframe debug
You're right of course
Part II of the blog will cover the stuff I explained above
Part III will be utilizing fwidth and calling it a day
The only "downside" to the fwidth approach as far as I can tell is you are not exactly working in screen-space values so you can't say "every line should be 4 pixels wide". But it is insignificant of course, just set a value (of whatever unit it is) you like and you're done
Of course, very good:)
Btw avoid geometry shaders, pipe it through tessellation stage instead, or just use a compute to make the barycentric coordinates, geometry shaders suck
fwidth is some derivation of ddx/ddy I think?
fwidth is abs(ddx(...)) + abs(ddy(...)) I think
Yeah I wasn't happy with using geometry shaders in the first place. I know they are not that performant. I thought about switching to tesellation, once I get to the topic of tesellation of course 🙂 Didn't think about compute though (and I also didn't get to compute shaders yet, at least on OpenGL).
Thanks for solid advice by the way, much appreciated 🙏
yeah, I didn't realise btw that you were doing tutorials, it's good though 🙂 also not speaking from some vast knowledge or trying to be rude, I think you understood what I meant anyway but I have anxiety so yeah 😄
I didn't think you were rude, not even for a second. Relax friend 
btw, what do you tink to my idea... if I have 50 spot lights all drawing shadows from 50 meshes.. that's 2500 draw calls for all the shadows...
Instead, I wanna collect which lights affect each object, build an instance buffer, and draw the 50 shadowmaps with a draw instanced call with 50 instances, each instance has its own shadow viewproj and tile offset/scale variables, then the only additional "cost" is pixel shader discard and vertex shader tile transform... but the cpu side will be blessed by only 50 instance draw calls 😄
I'm sure it'll improve performance considerably as I'm currently cpu bound on my mac at least
plus opengl on mac is god awful
🥺
honestly I admire the graphics community, everyone has their place here, I'm just doing opengl and c# so not in the major leagues heh but I've learned a lot so far and continue to learn 😄
You are instancing the meshes right?
Or the light data (viewproj etc.)?
Can you make it 50 spot lights and 60 meshes so I can understand easier 
I think I understand what you're proposing: Instance the light data, not the meshes, by finding out which light affects which object and building an instance buffer based on that. I'm not familiar with tiled shadow maps but a quick research shows that it is basically spatially partitioning the shadow map. So by vertex shader workload you mean to find out which tile of the shadow map (of a light instance) a mesh belongs to I assume.
I didn't understand what you are discarding in the pixel shader though.
Overall, sounds solid to me though
Well instancing the meshes with each instance using its own viewproj
Btw this is your thread right? @astral hawk
Yes
Do you mind if I post what I’m doing here ?
Sure
Shitty but it works, rough reflections using brute force importance sampling (because I don’t have time to perfect a mip gen for pbr)
I’ll show rough stuff
Or at least 0.3 rough to give you an idea of what it should be like heh
I still think my metalness is inverse but I’m not sure
The floor doesn’t have a metallic texture so it won’t be affected by me potentially inverting the metallic
I’ll probs redo the pbr anyway
Need to send the reflection as a hdr texture too
It looks awesome, though it's way over my head on the reflection front. But the AO looks nice I think
It’s sdr unsigned byte atm
Yeah I’m doing a massive hack
Getting a reflection vector and using atan and asin to convert to uv coords 😅
atan in a shader 
You're working with Unity's SRP right?
How is it? Acerola started that way too if I'm not mistaken
nope
this is using opentk / gl
🙂
and yes 😄 😄 slow as fuck I know hah
might use that atan2 approximation from shadertoy
🙂
Oh yeah you mentioned gl in mac I forgot sorry
I looked at your blog and saw that you tinkered with Unity's SRP
Nice stuff!
Yeah hehe I made my own pipeline
Which even performs much better than my opengl uni shit
Main fps killer is my shadowmap generation passes
So that's why you are trying to optimize it, got it 👍 Your approach sounded good to me, have you tried it yet?
That looks gorgeous. Is there image based lighting in there?
Also, do you have a thread here in this server?
Not just yet tbh I don’t wanna uproot my code yet, also not sure if I should send an array of indices to the shader as vert arrays with a ssbo of all light viewprojs in view or the matrices themselves
I don’t yet
Also yes there’s reflections from the background hdri
Idk if they’re correct as I need to validate stuff before I move forward
It’s just my uni graphics module coursework so I don’t think I should make a thread heh
🙂
I’m gonna try slap a latlong to cubemap thing out and do proper mipmap generation 😭
Because even mipmap roughness shit looks better and runs far better than importance sampling ha
I’ll take a comparison pic when I get to uni lol
Hope it doesn’t take longer than 10 mins to get to the bus stop I need to be off at cos I’ve got a presentation at 4:45 😭😭😭
What I’ve done to my pbr should be illegal 🤣🤣🤣
However when I’ve got the metals properly integrated (need to have reflections in a separate pass, the lighting needs to be diffuse and spec and additively composited, somehow without destroying the fragility of my ropey pbr, and then reflections need to correctly composite in to the specular 😭😭😭😭😭
I am sooooo behind schedule
Still not off the bus someone best let the bus through now
I did some debugging and the "moment" this bug starts to happen is when a vertex' w coordinate becomes <= 0
I know graphics shit well enough, I’m assuming that when it’s w <0 it’s behind the near plane then? Makes sense heh…
Wait is this still geometry shader pipe?
Maybe it’s legit being clipped?
Or the geometry shader can’t actually see that vert when it’s behind the near plane?
It sure is, I'm trying to wrap my head around why it is a problem
First image is showing a wall that is perfectly perpendicular to the near plane (i.e., a quad rotated 90 degrees around Y axis).
The top left vertex is at world point <-15,+15,-30>
This is when the camera is at Z = -31.0 (my engine is left handed: x right y up z forward). The wireframe works because all w values are >= 0
Second image is showing the same scene, but now the camera moved to Z = -29.99. The bug is present, because now the top left vertex' w is -0.01.
The second image is telling us that the vertex-to-opposing edge distances (in screen-space pixels) I calculated in the geometry shader are all zero extremely close to zero.
I was thinking "Ok there is clipping, so what? The vertex-to-edge distances I calculated in the geometry shader are interpolated for the newly created clipped vertices. So why are they all zero?"
And then I did some plotting.
For the second image, the top left vertex has w = -0.01, which maps to NDC of X = 1500 and Y = 1500
That's absurdly large
No wonder why this happens
I mean I knew that perspective distortion is a thing
But to see it in action, to this degree, is wild to say the least
And this, will conclude the Part II of the wireframe rendering blog
In Part III I will say "just use fwidth" and call it a day
To put things into perspective (pun not intended), for image 2 (buggy version), these are the vertex-to-opposing-edge-midpoint distances.
And now it's time for me to go to bed
when I see markdown with that color scheme I instantly just think its GPT...
That's because it is
And I verified the calculations on pen and paper
Up until to the plots
I didn't actually verify the edge distances but from a quick glance they seemed correct
If I do decide to include these in the blog post, I'll verify them
Come on, if we can't even use it for mundane calculations, what's the point of it existing 
It's a glorified wolfram alpha
then what about using wolfram alpha
There’s nothing wrong with using ChatGPT tbh
Fair point. But then again, as long as I know my way around the domain and can verify/dispute the results, I’d argue there is no harm in utilizing chatgpt.
There are cases when I can’t verify (this was not one of those), for which I abandon chatgpt and do actual research, study actual resources.
yeah, legit 🙂
physically-somewhat-based
128 importance sample iters, with 128 per pixel atan2 calcs to get reflection vector -> uv coord
:V :V :V
I'm back I guess
Or rather I will be (I hope)
I've been going through a burn out for 2 months now
In the last post, we implemented a triangle-based wireframe solution using barycentric coordinates to shade the edges of each triangle. We also brought back anti-aliasing, which we previously got “for free” with the default glPolygonMode( GL_FRONT_AND_BACK, GL_LINE ) API—assuming anti-aliasing was enabled to begin with.
However, a glaring ...
Onto part 3 and finally closing this chapter.
Damn its been 6 months huh
I am back again
Finalizing cod aw style bloom as we speak but I have some questions regarding the upsampling "radius" and the karis avg.
During the upsampling passes, both the source texture (mip i) and the render target (mip i + 1) are known, so why does everyone use a uniform scaling "radius" to scale delta uvs with when performing the tent (or whatever) filter in the upsampling shader for bloom?
Why not simply set it to the input texture (which is the one being filtered and is the mip i in this scenario)'s 1 texel which is 1 over the mip i's size in both axes?
I get "artistic freedom" and have nothing against it but as a default "exact fit" isn't what I said the default way to go?
@fluid parrot Would love to pick your brain on this one
Welcome back !! 🎉
I just recently implemented this bloom
not sure what you mean by a uniform scaling radius in the tent filter
I just did exactly what the slides said to do
you mean the 1/16?
because that's what the slides said to do is why I did it
if your technique works better you should do it, I'm pretty sure derived this empirically by testing different techniques
it says there "the filter do not map to pixels, has holes in it"
Hi Bjorn how are you?
Yeah I followed these slides (and Froyok's blog as well).
The third bullet point mentions the radius
Unless I am missing something, I'd say that "radius" is already known analytically
for upscaling mip 0 to mip 1, it is 1 over mip 0's resolution or eplicitly, it is
delta_u = 1 / mip_0_width;
delta_v = 1 / mip_0_height;
Why not simply do this and call it a day instead of using a shader uniform for this radius parameter as it is referred to in this particular slide?
no idea, the bloom looks good though
I guess you're not like going to get arrested by the call of duty police if you do it some otherway
Lol of course I just wanted to know
Bonus: If you do increase it beyond a certain point, you get these cool fake bokeh like effect
Do you have a repo I can look at for your bloom impl.?
I am having more of a problem with the Karis avg. anyway
no mine isn't public, but it's very similar to the logl one
I followed the slides only to do my implementation, and then someone later showed me that, and mine is pretty much the same
except mine is in C++ not some silly shader language :P
is what mine looks like
Looks beautiful
Yeah this one dropped the ball on karis avg. for sure
I was gonna reply but yours looked ok to me
logl one double accounted for dividing by 4 for example
I would appreciate it if you could show it to me or dm me if it is proprietary etc. I just want to look at someone else's code to sanity check mine
In the slides, there is a distinction between the full karis avg. vs. the partial karis avg.
They went with the partial
I'll explain what the hell I am blabbering about shortly
oh I just thought I had spammed your channel
I can put it back
suddenly a wall of code
You could never!
My channel is your channel please do spam it always
Here is mine
I don't touch the tent filter to adjust the bloom radius, instead I play with the blending intensity/weight between each upsample pass
Also Karis average is only for downsampling during the first pass
If you do on each pass you are going to kill your HDR range
Yeah I did implement the whole thing according to cod aw slides and only applied karis avg. on the first downsample pass.
It is done on paper and visually satisfying but I have some questions on how to apply karis avg.
Although it'll have to wait because I have a framebuffer resizing regression atm.
I'll get back to you guys once I am more free
You do pass a pixel size uniform to the upsample shader though
And it is set to 1 / resolution (of the previous mip in your repo's lingo) in bloom.lua
I am inclined to do that as well because I don't really see the appeal in providing that as a knob to the artist
Your Blend makes much more sense as a knob!
That's question 1 answered, thanks!
Trying out stuff for question 2 (karis avg.) which I realize I still didn't ask properly. Sorry about that but there is a chance that I'll be able to work it out (mostly from trial and error, with Froyok and Bjorn codes as well as the logl one)
If I can't, then I'll properly explain the problem and ask you guys
That SafeHDR utility function is interesting, have you ever been bitten by this @fluid parrot ? Or is it a preemptive measure?
Ok my karis average is the exact same as yours (except mine is needlessly verbose and yours is more compact) semantically.
You calculate 13 karis averages and also incorporate the spatial weights for the cod aw custom 13 tap downsample filter, which is also what I did.
Ours is the "full karis average" right?
And the partial they (cod aw) did is similar to Bjorn's?
I guess these are my only questions huh
Logl drops the ball in two areas mainly if I'm not mistaken:
-
There is a divide by 4 for blocks and there is also a multiply by 0.25 inside the KarisAverage() as well which is wrong and darkens the image I think.
-
Not sure if this is wrong per se but inside KarisAverage, they first convert the (presumably) linear HDR color values to sRGB and then do RGBToLuminance(). Even if this is true, shouldn't there be a re-conversion on the way out, back into linear HDR ?
Aside from logl, no one does this from what I can tell, so I didn't as well.
I think I'll put Bjorn's version (partial averaging) as a shader feature as well and let the client/game decide which one to use
idk if what I did is correct, it's just how I got to having a bloom. I am approaching these things breadth first, so I can continue to make progress on multiple areas of my project and dive deeper into specific topics as I need to. There's a conflict between deep learning and producing something interesting, and I recognize I can probably spend the rest of my life learning about just one tiny slice of graphics programming, so when this conflict comes up making progress always wins out and I learn as much as I can along the way without ever grinding anything to a halt
Well said
I on the other hand am authistic
That's probably an insult to actually authistic people so I'm redacting it
I am ... sub-optimal
it's all sub-optimal, it's a trade off, what you're doing is more common on this server and is a better long game
What you did is (as far as I can tell) correct by the way
especially if your interest is doing this professionally
Logl is not but yours looks like it is to me
this is just a hobby for me
Yeah but the thing is there is (and must be) a balance to this game
yes
How am I going to pass an interview without pbr, low-level apis etc.? Am I going to bore them to death with the particulars of a bloom technique from mid 2010s?
Anyone who lectures me (not saying you are lecturing but still) on this is more than welcome to because frankly I do need this told to me more often and in fact I made a pact with me that I would close the Bloom chapter this weekend for this precise reason
So thank you sincerely
that's questions you might ask @torn stream as I understand it dvesh hires graphics developers
there's also #1020406707488313444
Yeah I should do that
I don't know I guess I feel like I should get through pbr and whatnot before I can ask people about career advice, given that I (apparently) am in the long haul for this one
Yeah another reason I am not ping-bombarding people like Devsh, Matt Pettineo etc. lol
But from what I can tell, most people here are really chill and helpful
idk, people can set their notification settings
Yeah I trusted that when I pinged Froyok today
Hopefully you are right 🙂
I should get accustmed to frog emojis again
These default ones are cringe
By the way how's it going with you? Last I saw you you were working on a software rasterizer?
Today you shared some cuda code
yeah I just write CUDA now, I don't have a thread, I just update my website sometimes
I am over shader languages
A level of zen I aspire to
well, I can do whatever I want since it's just for personal projects and I don't care what anyone else thinks
How did the rasterizer go? Was it fun and/or educational?
it was educational. I don't think I did very well on it and I got bored with it
I will likely do it again, via CUDA on the GPU though
It seems like I won't really understand understand fabian giesen's "A trip through the graphics pipeline" without writing a rasterizer
Cuda/compute seems like the other half of the medallion
I wrote a pretty completish rasterizer, it just sucked and the perf was horrible
Isn't that kinda the thing though? Seeing as it is a software impl.
Unless you optimize to the bone, do simd etc.
Which still would probably suck I don't know
well I sort of gave myself a dumb self inflicted roadblock by being on a from scratch kick at the time and I was refusing to use tracy which would have helped me solve my bottlenecks, so that was also an issue
I see
are you working on your bloom on the same code base as your previous project?
what have you been doing for 6 months?
Yeah everything I do I incorporate into Kakadu (also slows me down I guess but I get to work on renderer arch. and a rendering pipeline so it is not a complete waste of time imo)
My actual work was kicking my ass for a while so I kinda had to slow down on my graphics studies
But I got lots of bugfixes, a major renderer refactor, introduce fullscreen effects as distinct stuff from regular rendering etc. to pave the way for Bloom
Did some research into msaa etc.
Not the best 6 months but I am picking up the pace
Monday -> Starting SSAO
Or maybe tomorrow if I can coimmit everything today
Oh I did a ray tracer project last Friday I think
That was fun
I may do fragment shader based ray tracing like this in Kakadu perhaps
Thanks, yeah I will force myself to be active no matter what
I think it's preemptive, I did that a while ago so I don't remember exactly
No it's the partial because it's applied in groups and not as one block
Logo seems wrong.
You can check out Blender EVEE code for a reference if you want, they do it and know what they are doing.
To compute luminance you should stay in a linear color space anyway
Thanks a lot
Coarse is what Bjorn did essentially
Fine is the partial karis average on 4x4 blocks, using 13 averages
13 karis averages*
Along with spatial weights
how many mips is that?
I have 7 mip levels where 0 is the source
with fewer mips the bloom gets really tight
I probably over did it but I like the result
if I turn the bloom intensity all the way up it looks like the whole world is glowing
Oh yeah I set it to 4 mips (including the original reso) during implementation and forgot to increase it
Let's try 6-7
obs darkens the colors despite my effort to remedy it but it is not that bad
My texture viewer doesn't render the mips for some reason I know that but didn't have time to check it
Now that bloom is all done and pushed I can check it
very nice
Thanks!
Either I remember it wrong or Nvidia captured faithfully
I swapped graphics cards with my wife because my trusty old gtx 1080 couldnt drive my new monitor. Now I have a radeon 6700xt
screenshots look a little worse too
I probably remember wrong
Oh that's interesting
Doubt it. These things are needlessly complex nowadays
None of us should, aside from Nvidia Amd Microsoft etc.
The idiot me would probably waste a couple more dayson why the orbs are flickering (or more like scanline effect I don't know how to describe it).
Logl article has a comment saying that the upsampling radius (the one I removed from my shader just like Froyok's) should be aspect-ratio aware or otherwise it could cause this.
I did notice that visually and guarded against it.
But now that I increased the mip from 4 to 6-7, it became visible again for some reason.
But my answer to this is
This
I don't see any flickering?
oh I do
hrmmmm
no that's just you changing the mips
Although it does look distracting
oh, that's a sampling artifact
how do you sample?
shouldn't be square like that
that's why I wrote the bilinear sample that was in my code
to get rid of a similar artifact I had
but if you're just using a hardware texture you can just set the sampling?
Indeed
I had a very similar issue until I fixed how I sampled
Thanks for the tip. I will look into the sampling
RenderDoc shows that all mips including the original hdr color buffer use GL_LINEAR for min and mag
with the wrapping mode set to clamp to edge for both directions
hrm, what resolution is your bloom image at?
I use a separate image for the mips, the same dimension of my very large draw image (maximum physical display size dimensions), and it doesn't include mip0, I only write mips 1 though 6 to it
my image is way too big
I don't think that's it though, maybe the mip dimensions have an off by one error or something
this is my sampling code
__forceinline__ __device__ float4
sample_clamped(int2 uvs, int2 mip_dims, int2 src_dims, int2 mip_offset, float4 *pixels) {
i32 x = CLAMP_MIN(CLAMP_MAX(uvs.x + mip_offset.x, mip_offset.x + mip_dims.x - 1), mip_offset.x);
i32 y = CLAMP_MIN(CLAMP_MAX(uvs.y + mip_offset.y, mip_offset.y + mip_dims.y - 1), mip_offset.y);
return pixels[y * src_dims.x + x];
}
__forceinline__ __device__ float4
sample_bilinear(float2 uv, int2 mip_dims, int2 src_dims, int2 mip_offset, float4 *pixels) {
float2 texel = uv - 0.5f;
int2 i0 = make_int2(floorf(texel.x), floorf(texel.y));
int2 i1 = i0 + make_int2(1, 1);
float2 f = make_float2(texel.x - i0.x, texel.y - i0.y);
// clang-format off
float4 s00 = sample_clamped(i0, mip_dims, src_dims, mip_offset, pixels);
float4 s10 = sample_clamped(make_int2(i1.x, i0.y), mip_dims, src_dims, mip_offset, pixels);
float4 s01 = sample_clamped(make_int2(i0.x, i1.y), mip_dims, src_dims, mip_offset, pixels);
float4 s11 = sample_clamped(i1, mip_dims, src_dims, mip_offset, pixels);
// clang-format on
return lerp(lerp(s00, s10, f.x), lerp(s01, s11, f.x), f.y);
}
I am not using a hardware texture, just a bitmap
__global__ void ps_upsample(
i32 width, i32 height, float4 *pixels, float4 *mips, i32 mip_level, f32 bloom_intensity
) {
i32 x = threadIdx.x + blockIdx.x * blockDim.x;
i32 y = threadIdx.y + blockIdx.y * blockDim.y;
i32 mip_width = width >> mip_level;
i32 mip_height = height >> mip_level;
if (x >= mip_width || y >= mip_height)
return;
__global__ void ps_downsample(i32 width, i32 height, float4 *pixels, float4 *mips, i32 mip_level) {
i32 x = threadIdx.x + blockIdx.x * blockDim.x;
i32 y = threadIdx.y + blockIdx.y * blockDim.y;
int2 idx = make_int2(x, y);
int2 src_dims = make_int2(width, height);
i32 mip_width = width >> mip_level;
i32 mip_height = height >> mip_level;
if (x >= mip_width || y >= mip_height)
return;
this is how I get my mip dimensions
my number of threads are greater than the image dimensions
ps_internal void downsample(
ps_ctx_t &ctx, cudaStream_t stream, i32 width, i32 height, float4 *pixels, float4 *mips
) {
{
for (i32 mip_level = 1; mip_level < 7; mip_level++) {
i32 block = 16;
i32 mip_width = width >> mip_level;
i32 mip_height = height >> mip_level;
dim3 dimBlock(block, block);
dim3 dimGrid((mip_width + block - 1) / block, (mip_height + block - 1) / block);
ps_downsample<<<dimGrid, dimBlock, 0, stream>>>(width, height, pixels, mips, mip_level);
getLastCudaError("downsample failed");
}
}
}
ps_internal void
upsample(ps_ctx_t &ctx, cudaStream_t stream, i32 width, i32 height, float4 *pixels, float4 *mips) {
{
i32 end_mip = ctx.gpuc_ctx->debug_bloom_downsample ? 1 : 0;
for (i32 mip_level = 5; mip_level >= end_mip; mip_level--) {
i32 block = 16;
i32 mip_width = width >> mip_level;
i32 mip_height = height >> mip_level;
dim3 dimBlock(block, block);
dim3 dimGrid((mip_width + block - 1) / block, (mip_height + block - 1) / block);
ps_upsample<<<dimGrid, dimBlock, 0, stream>>>(
width,
height,
pixels,
mips,
mip_level,
ctx.gpuc_ctx->bloom_intensifier
);
getLastCudaError("upsample failed");
}
}
}
dim3 dimGrid((mip_width + block - 1) / block, (mip_height + block - 1) / block);
notice the + block - 1
that avoids clipping
you shouldn't use my sample_bilinear, since you have hardware textures, just showing it to you
Original hdr color buffer is 1262x1649 (apparently the size I dragged the viewport borders to be at)
does it include mip 0?
Mips are all separate textures
Original is a separate texture2d
Mips are all separate texture2ds as well
I didn't "get to" texture arrays yet lol
Yes sure thanks anyway. Always helps to see other code
np, gl
I missed the video
This is so cool
Whatever became of this?
Oh ok I thought you switched to something else because Rosy had a thread
Oh it is still there ok
This was me being idiotic
Renderer updates the "intrinsic" (i.e., renderer-internal) UBO's viewport_size field whenever framebuffer resizes.
I used that when calculating the "radius" (delta uv) for the upsampling shader but during bloom steps, that uniform is not updated so it is always the original viewport size.
Passing the source/input texture's resolution as a uniform to the shader and using that to calculate the radius/deltauv fixed it.
There still is a little flicker but it looks normal and non-squarey
obs causes stutter during record sometimes
I made a revised study plan. I am open to suggestions
## Study / Implementation Plan
### Phase 1 — Image-Space Effects
- **RTR**
- Chapter 12: Image Space Effects
- **Kakadu / LOGL**
- SSAO
- Depth of Field
- Motion Blur
- Lens Flare
- (Bloom already done)
Use Jorge Jimenez slides for these.
---
### Phase 2 — Physically Based Lighting & Local / Global Illumination
- **RTR**
- Chapter 9: Physically Based Shading
- Chapter 10: Local Illumination
- 10.1 Area Light Sources
- Chapter 11: Global Illumination
- 11.3 Ambient Occlusion
- **Kakadu**
- Implement PBR (LOGL and/or other sources)
- Implement Area Lights
---
### (In Between)
- **Kakadu: DSA refactor across codebase**
- **Kakadu: Integrate Texture Arrays**
---
### Phase 3 — Shadow Mapping Enhancement
- **RTR**
- Chapter 7 (relevant sections): Shadows
- **Kakadu**
- Implement Cascaded Shadow Maps (CSM)
- Implement smart/auto directional-light camera placement
---
### Phase 4 — Re-evaluation
- **Kakadu**
- Revisit and refine SSAO using insights from RTR Chapter 11.3 IF NEEDED.
---
### Phase 5 — Compute
- **RTR**
- Chapter 23 (selective revisit)
- **Kakadu**
- Introduce SSBOs
- Introduce compute shaders
- Migrate an existing pass to compute
- SSAO or blur
- light culling or similar
---
### Phase 6 — Deferred Rendering
- **RTR**
- Chapter 20.1: Deferred Shading
- **Kakadu / LOGL**
- Implement Deferred Rendering
- Integrate compute-based light culling where appropriate
---
### Phase 7 — Visibility
- **RTR**
- Chapter 19.4: View Frustum Culling
- **Kakadu / LOGL**
- Implement frustum culling
I am straying from the ordering in logl because ssao for example is something I can tackle right now as I implemented the beginnings of a fullscreen effects system in Kakadu and I only have bloom as an effect for now.
No need to put deferred in the middle of bloom and ssao I think.
In RTR I am chronologically at chapter 9 physically based shading. I jumped to chapter 23 (I think) to read the hardware chapter.
I am thinking about skipping 9-10-11-for now and get image-space effects (chapter 12) out first. then go into PBR full blown and continue from there.
looks great!
Hmm ok there is a reason deferred rendering comes before ssoa in logl: per fragment normals are a thing by default in deferred rendering, which the ssao (crysis) uses.
I'll just read the deferred rendering chapter for now to get an idea on the implementation details and decide then.
I was almost brainwashed: https://www.youtube.com/watch?v=QVbOp1h-Jb4
Personal and strongly opinionated rant about why one should never use deferred shading.
Slides: https://docs.google.com/presentation/d/1kaeg2qMi3_8nQqoR3Y2Ax9fJKUYLigPLPfdjfuEGowY/edit?usp=sharing
Links:
https://github.com/zeux/meshoptimizer
https://vkguide.dev/docs/gpudriven/gpu_driven_engines/
https://vkguide.dev/docs/gpudriven/compute_culli...
Then I saw Devsh's comments here: #graphics-techniques message
Disclaimer: I shouldn't even begin to compare forward vs deferred at this stage, I know. That is a given and I fully acknowledge it. It was past working hours and I saw the vid on youtube and wanted to share.
Seeing @fluid parrot's immensely cool splash screen I thought why not
Animating stuff so I can get cool images with bloom
Maybe this one dunno
wow
I love bloom
I can't wait until I have more better lighting and more stuff looks good, it adds so much
Bloom is the key to nice images 😎
I wanted to make it animated by using the renderer (which initializes in 200 ms. give or take) itself but I can't because the splash screen animation would have to block initialization of the rest of the engine/app until it finishes because the rest of the init code contains lots of gl calls as well and opengl is not multithreaded right?
This could be a cool idea in a vulkan/d3d12 renderer perhaps but not for opengl if I'm thinking right.
If the rest of the init. code was pure cpu code without any gl dependencies then sure. Otherwise, a blocking splash screen kinda defeats the purpose imo.
well, you could do a little bit of extra stuff each splash screen frame instead of using a thread
you can use command buffers in separate threads in vulkan, and opengl context's are not as convenient
Yeah in gl it would be diminishing returns for unnecessary hustle it seems
Unity gets by with a static splash screen so why not
We can't all be Ombre
I'll try changing the preview image of this post/thread. Hope I don't screw it permanently
Yeah I nuked it lol
Ok it's back but it looks meh
Maybe this will look better due to aspect ratio
Heyo
Update: I am working on a big refactor that was long overdue before I continue with augmenting the Renderer with newer stuff (deferred rendering is on the way).
Currently the engine architecture is in a weird place (by design, or at least by intentional postponing of fixing it)
Inside the solution, there are multiple projects:
- Vendor (has deps like stb, fastgltf, ImGui etc.) -> produces static lib.
- Engine (has EVERYTHING else that is not client application/game specific) -> also produces a static lib. and consumes vendor.lib.
Most notably (for the refactor) contains ALL editor code as well, dispersed across the codebase in the form of#ifdef _EDITORblocks. All ImGui code is editor code. - Client applications: Currently there are 3 projects: Sandbox, HDR-Demo and Bloom Demo. There should/could be much more, but creating a new project means essentially duplicating the dir/proj of an existing one and modifying it. These are all executables by the way and they weirdly contain the editor and the engine runtime as well as client app logic, all in one.
Creating projects is a pain point right now.
Also I for some reason took Cherno's Application base class I think (not sure, it's been a while) and have virtual functions like Update(), Initialize() etc. even though there can not be more than 1 client application for a given project, so virtual dispatch is unnecessary (although irrelevant perf wise since every frame there is 1 indirection into long-running functions like Update(), Render() etc.).
What I want is to do some ordered refactors:
- Separate editor code from engine code; Do this in the form of a new Editor namespace and translation units, containing all editor code.
The rest of the codebase should mostly be unaware of the editor except for the Application calling the update and render functions of the editor per frame and keeping an Editor::Context member around.
- Separate the engine and the editor into 2 projects.
- Turn engine into a dll. I am not so sure whether I should do this or not. Aside from possibly hot reloading the Engine itself, I don't see an immediate gain from this.
- Turn editor into an executable. This is bound to happen soon. It is weird launching the client application executable but in reality you are launching the editor, but with custom logic per application.
- Turn client applications into dlls. They could be hot reloaded which is a must for iteration in my opinion.
- Separate client applications into two parts: A stub executable and a dll that will contain the actual logic. See number 7 below.
- Implement project creation from the Editor; When a new project is created, the editor will simply create a new c++ project that will house the client app logic in the form of a dll. When the client app is "built" from the editor, the editor will generate the stub executable which will load the engine runtime dll and give the control to it. Engine runtime will initialize itself (most notably the renderer, which is the bulk of the engine atm) and then will load and initialize the client app dll. Then it will enter the main loop and every frame call into the client app dll to execute custom update/render etc. logic on top of the engine runtime logic.
I am not sure how heavily I will invest in this, except for number 1 because I will be am doing that fully.
I plan on implementing all of these sometime but I need to balance maintaining Kakadu with learning graphics as well (although one could argue this type of engine arch. stuff is also a responsibility of a graphics programmer, at least the renderer).
Let's just see where things will go from 1).
I do want to create a "Many Lights" project/sample as well by the way and that is what triggered this refactor that was long overdue.
Oh and 8) Currently there is no scene data serialized to drive. Scene data is initialized via code during every run. So serialize/deserialize via a format like json maybe.
7 and 8 combined would make creating new projects a breeze compared to status quo.
#1067777224528375858 message
Adding this to the pile of stuff to fix after the refactor
Except for the Renderer.cpp, which contains a shit load of ImGui calls, the decoupling is (half) done.
Though I am fully done for the day
Update: Renderer is also rid of all editor code (except for some debug bounds checks but those are unimportant right now). I've been busy with my job again but all is good.
I brushed up on dynamic linking in the MSDN docs today. I will go for explicit linking in so I can hot reload stuff when I want to get into that.

