#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages · Page 8 of 1
uh
diffuse lighting means random direction right?
Should this random direction be within the "hemisphere" of the triangle's normal
I don't understand
This is cornell box
The one we all love
this is path traced with the emissive light on top
where the hell are these colors coming from
With 16 samples per pixel it's somewhat recognizable
There are still some bogus colors near the light source though
I kinda like how low spp counts yield completely bogus colors 
I'll try temporal accumulation
Nanite2 with raytracing next 😈
LESSGOO
I donno anyone who has it tbh
(We are not talking about the integrated one right ?)
Why ?
Extension holdin ya back ?
No, it shrimply can't
it's too shit 
i.e crashes on startup
It's not even old, it's a 12700H
I still have the issue of the image "converging" after a while or so
Even though it's very clearly still noisy
What’s the sample count and history depth at?
what is the ray offset after a bounce?
const auto normal = glm::normalize(
n0 * bary.x +
n1 * bary.y +
n2 * bary.z);
const auto point =
glm::vec3(ray.org[0], ray.org[1], ray.org[2]) +
glm::vec3(ray.dir[0], ray.dir[1], ray.dir[2]) *
ray.tmax;
ray = bvh_ray(as_vec3(point), as_vec3(random_direction_in_hemisphere(normal)), 0.1f);```
all in world space btw
hmm so you aren't offsetting the ray when you bounce?
How should I offset it?
offset by the normal scaled by a tiny amount
that should help prevent self intersection
your image should get brighter as a result, but it won't change peter panning
which tbh I thought was caused by too much offset
I think there is something fundamentally wrong here
With this
ray = bvh_ray(as_vec3(point + direction * 0.5f), as_vec3(direction), 0.1f);```
I no longer have peter panning and the image is darker
somehow the exact opposite of what you said jaker

Lmao
I am currently very tempted of moving everything to the GPU
.5 is a huge offset though
just so I don't have to wait 30 seconds for a single frame
should I try with .01
or .1
and direction is the wrong value to offset by since you will escape the surface more slowly than if you use the surface normal
I will
the point is to avoid self intersection caused by floating point error
I also notice a super weird "band" in the middle of the box
as if pixels clump up there more for some reason
Alright this is it
we're going fully GPU
I am a very impatient man
lmao
also I'm not sure why the offset is being weird
maybe you gotta negate it for some reason 
cursed lmao
Anyways, for faster iteration times, I'll be going GPU
template <typename Node>
template <bool IsAnyHit, bool IsRobust, typename Stack, typename LeafFn, typename InnerFn>
void Bvh<Node>::intersect(Ray<Scalar, Node::dimension>& ray, Index start, Stack& stack, LeafFn&& leaf_fn, InnerFn&& inner_fn) const {
auto inv_dir = ray.get_inv_dir();
auto inv_org = -inv_dir * ray.org;
auto inv_dir_pad = Ray<Scalar, Node::dimension>::pad_inv_dir(inv_dir);
auto octant = ray.get_octant();
auto intersect_node = [&] (const Node& node) {
return IsRobust
? node.intersect_robust(ray, inv_dir, inv_dir_pad, octant)
: node.intersect_fast(ray, inv_dir, inv_org, octant);
};
stack.push(start);
restart:
while (!stack.is_empty()) {
auto top = stack.pop();
while (top.prim_count == 0) {
auto& left = nodes[top.first_id];
auto& right = nodes[top.first_id + 1];
inner_fn(left, right);
auto intr_left = intersect_node(left);
auto intr_right = intersect_node(right);
bool hit_left = intr_left.first <= intr_left.second;
bool hit_right = intr_right.first <= intr_right.second;
if (hit_left) {
auto near = left.index;
if (hit_right) {
auto far = right.index;
if (!IsAnyHit && intr_left.first > intr_right.first)
std::swap(near, far);
stack.push(far);
}
top = near;
} else if (hit_right)
top = right.index;
else [[unlikely]]
goto restart;
}
[[maybe_unused]] auto was_hit = leaf_fn(top.first_id, top.first_id + top.prim_count);
if constexpr (IsAnyHit) {
if (was_hit) return;
}
}
}```
This is
the beast
That I must implement
somehow 
goto 
For this one, what are the args?
ray = bvh_ray(as_vec3(point + direction * 0.5f), as_vec3(direction), 0.1f);
is it new position, direction, idk last one lol
tmin
this one looks kind of right to me
except
mult by tmax seems weird
if the ray intersects a triangle
tmax is the distance the ray had to travel to intersect it
oh ok
template <typename T>
std::optional<std::pair<T, T>> PrecomputedTri<T>::intersect(Ray<T, 3>& ray, T tolerance) const {
auto c = p0 - ray.org;
auto r = cross(ray.dir, c);
auto inv_det = static_cast<T>(1.) / dot(n, ray.dir);
auto u = dot(r, e2) * inv_det;
auto v = dot(r, e1) * inv_det;
auto w = static_cast<T>(1.) - u - v;
// These comparisons are designed to return false
// when one of t, u, or v is a NaN
if (u >= tolerance && v >= tolerance && w >= tolerance) {
auto t = dot(n, c) * inv_det;
if (t >= ray.tmin && t <= ray.tmax) {
ray.tmax = t;
return std::make_optional(std::pair<T, T> { u, v });
}
}
return std::nullopt;
}```
I was thinking you had tmin, tmax as bounds and t was the hit point
ah ok
Tmin of .1 is really high
I guess they also use that to prevent self intersection, but it's not as good as the normal offset method
yeah I'm using both
actually using tmin = 0.001 fixed peter panning
I don't know why I didn't think of that
it actually looks like a proper cornell box now
With both offset and small tmin
This is with 64spp though
Otherwise the image is still dark
you can probably eliminate tmin entirely if you have the normal offset
The stupid band of clumped pixels is still here though
uhh
yeah probably a bad idea to use C++'s mersenne twister isn't it? 
For the random directions I'm using a normal distribution btw
it's battle tested at least
I'll plug in PCG
so that's probably not failing here
see if it's any different
wonder why it's only on two of the walls but not the third wall or the box
perhaps
I still need a ridiculous amount of samples per pixel to make the image non dark
I think it's normal to appear dark until you denoise it or take a lot of samples
This is with 4spp for eg
we register the darkness more than the pixels that are 100x bright as they should be
hmm
it has surprisingly little noise for being 4spp
or is this with your temporal filter
My "temporal filter" stops working after a while
I currently do this:
void main() {
const vec3 old = texture(color[0], uv).rgb;
const vec3 new = texture(color[1], uv).rgb;
const float weight = 1.0 / float(frame + 1);
o_pixel = vec4(mix(old, new, weight), 1.0);
}```
It's still this thing
it "converges" after 100 frames or so
what format is your texture
RGBA8
well that's gonna converge terribly
Do I have to make the texture itself RGBA32F?
use RGBA32F and then tonemap to RGBA8 for display
well that's fine as long as you don't accumulate with low precision values
it works
and is better than aces or whatever dumb """filmic""" "tone"mapper for the purpose of actually seeing what color is produced
but yeah my main recommendation is to accumulate RGBA32F values
second recommendation is to accumulate before tonemap (so tonemap can still output to RGBA8 buffer)
I will accumulate before tonemap and srgb
and then do another fullscreen tri to display
btw are you doing any sort of gamma compression/encoding/OETF thingy
just srgb conversion after tonemap
ok good
then your thingy will be Perfect and require no further changes after you do the accumulation thing
shockingly good
4 samples per pixel
I'll just use RGBA32 attachments for anything (color related) now 
The temporal accumulation still stops after a while
yeah that's normal
you could also try switching to exponential accumulation instead of linear after a certain number of frames so the contribution of new frames doesn't approach 0
e.g. after 100 frames switch to constant .01 factor instead of using 1/frames (btw I do not know if this will make it look better)
is there anything more clamplicated
no
both of which are stupidly clamplicated I assume
somewhat
I haven't made a decent path tracer so idk the mechanics of sampling algorithms
But I can comprehend denoisers since I made one
And the best ones today consist of two key parts: temporal reuse and spatial reuse (AKA a blur)
How does temporal reuse work
It's what you're doing
beautiful
Just the noisy parts
What I mean is that you want to blur irradiance
Because some aspects of the scene have no noise, like the surface material
And you don't want to blur the albedo, for example
Here's what you do
You trace normally, but don't apply the cosine or albedo factors of the first bounce until after you denoise
So you're looking at the "incoming light" of the surface the eye sees, essentially
I can show a pic of how it should look prior to denoising in a sec
raytracing in summer be like
Anyways
I have sponza
The most basic of path tracers can do this 
Simplest would probably be:
- nearest neighbor average of irradiance
- add albedo to step 1
- temporally accumulate
Then steps 1 and 3 are the subject of lots of research papers lol
amazing, which res and how much fps?
1280x720
fps are 0 
it's one frame every 30 seconds or so
The intersection itself isn't bad
at 1spp?
It's the looping over all pixels, for all samples, for all bounces
64 spp
16 bounces
ah ok
yeah thats actually really good
1 spp 4 bounce = 3 fps?
yes
720p
I was wondering though
can this be classified as "irradiance"
after all it's just how much light a given pixel has
could I use this as an "irradiance mask", blur it and then apply albedo and all the other stuff
Possibly
Yeah you could and it would preserve texture details but your final result would be very binary
It should contain some color from bounce light though
base color?
E.g. the floor next to the curtains should be colored
Not base color
Color from bounces (incoming light/irradiance)
sir I'm not sure what this means
Do you mean the color of the light?
ok I thought you meant base color as in the material
yes I meant that
the surface you are looking at should basically not be factored in the color which is being denoised
the base color factor in a gltf material
damn I'm not following at all 
yes
building it
I am sipping coffee while I await and look at my beautiful sponza
uh 2am coffee?
honestly path tracing is epic
ye why not
I can't recall the last time I've had 8 hrs of sleep 
I just remembered that fwog has a button to skip albedo modulation
ight so what am I looking at
so that's how it looks if I don't multiply the light value by the albedo of the first-bounce surface
that's essentially the color that gets denoised
actually here's how it looks before any filtering
ye but I still need to sample the texture and tint the ray color right?
Otherwise where do I get the red from
but you only do that after you denoise
So how do I get the red before denoising?
the red comes from secondary bounces, which are allowed to sample albedo and shiz
it's just the primary bounce that contains high-frequency surface info that needs to be preserved
just to clarify
the "primary bounce" is the one from the camera to the surface or from the one after that
the former (the very first surface the ray from the camera hits)
ok good
btw I have an example of what it looks like if you don't demodulate albedo in your denoiser
let's see
actually I'm not even denoising, so it would look worse than that
here is with proper albedo demodulation (plus denoising)
i.e the thing where secondary bouces sample albedo and all that
also you're apparently not supposed to apply the cosine term until after denoising
the surface2eye cosine term I mean
uh the one where you weight the contribution by the cosine of the angle
something something projected area
yes got it
amazing so denoising is another truck full of worms
beautiful
Well that'll come after I force the GPU to trace my bvh
btw here is another explanation of this demodulation thingy (just like 15 seconds)
https://youtu.be/2GYXuM10riw?t=265
This talk will present an efficient and high-quality Final Gather for fully dynamic Global Illumination with ray tracing, targeted at next generation consoles and shipping in Unreal Engine 5. Part of the SIGGRAPH 2021 Advances in Real-Time Rendering in Games course ( http://advances.realtimerendering.com/).
Hardware Ray Tracing provides a new ...
well ignore his words and just look at the slides 
That’s such a cool presentation
My take away is that lumen is basically a real time miracle
the point is that geometry normals are incoherent (they have no spatial coherence), but the light is basically a smooth gradient. That means applying the normals (via the cosine term) will make your luminance incoherent and thus harder to denoise
the only miracle here is that I am still awake
my explanation sucks, but hopefully it gives a geometric intuition
my monke brain needs to impl GPU trasversal
by the power of covfefe
Alright I don't think I can continue 
I can feel parts of my brain shutting off
It is time to schlepp
gn
lvstri speedrunning graphics algorithms 100% completion
you are fast af though
apparently that's what happens when you ignore sleep and ingest caffeine
Where can I get this superpower
I use super special coffee beans imbued with various uh, substances
||my lawyer requires me to specify that this is a joke||
man seeing ur progress makes me wish im not working on an old ass engine
i actually dont drink coffee
maybe thats whats missing
My progress is being hindered by how cool this sponza looks btw
I can't stop looking at it from different angles 
I know this has been done countless times but it's epic seeing it myself
rendered like that it do be looking more interesting than the usual sponzaism
I 🅱️ushed the thing btw
https://github.com/LVSTRI/Raytracer If anyone is brave enough to try it (it will make your CPU sweat a little)
I wonder if this is how cornell box should look with albedo demodulation
I'm only tinting the ray color after the first bounce
what is this thing
[build] ModuleNotFoundError: No module named 'jinja2'
ah
glad2
the same error as ever 
arch cant seem to find python-jinja2 ;p
I think you can just install it through pip
thats the thing
pip wants you to install it via python-packagename using the package manager of the distro
smh python
maybe i should setup a build container in the future, all this deps bs is starting to annoy me
deccer@rootfs Raytracer]$ pip install jinja2
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try 'pacman -S
python-xyz', where xyz is the package you are trying to
install.
If you wish to install a non-Arch-packaged Python package,
create a virtual environment using 'python -m venv path/to/venv'.
Then use path/to/venv/bin/python and path/to/venv/bin/pip.
If you wish to install a non-Arch packaged Python application,
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. Make sure you have python-pipx
installed via pacman.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
: )
[deccer@rootfs Raytracer]$ sudo pacman -S python-jinja2
error: target not found: python-jinja2
heh
I guess you could use pipx?
I dunno
pip3
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. Make sure you have python-pipx
nstalled via pacman.```
amazing
ill replace your glad cmakeism with the one from my cmake template
ye probably easiest
weird
[build] /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/bits/stl_iterator.h:2618:35: error: missing 'typename' prior to dependent type name 'iterator_traits<_It>::iterator_category'
[build] { using iterator_category = iterator_traits<_It>::iterator_category; };
[build] ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``` remains
no idea what to make of it
gcc builds fine now
[build] [27/28 96% :: 2.066] Building CXX object CMakeFiles/BVH.dir/main.cpp.o
[build] FAILED: CMakeFiles/BVH.dir/main.cpp.o
[build] /usr/bin/clang++ -DGLM_FORCE_AVX2 -DGLM_FORCE_DEPTH_ZERO_TO_ONE -DGLM_FORCE_RADIANS -DGLM_FORCE_RIGHT_HANDED -I/home/deccer/Personal/Code/External/Raytracer/deps/cgltf -I/home/deccer/Personal/Code/External/Raytracer/deps/bvh/src/bvh/v2/../.. -I/home/deccer/Personal/Code/External/Raytracer/deps/glm -I/home/deccer/Personal/Code/External/Raytracer/deps/glfw/include -I/home/deccer/Personal/Code/External/Raytracer/build/_deps/glad-build/include -g -std=gnu++2b -loop-vectorize -march=native -mmmx -msse -msse2 -msse3 -mssse3 -msse4 -msse4a -msse4.1 -msse4.2 -mavx -mavx2 -msha -maes -MD -MT CMakeFiles/BVH.dir/main.cpp.o -MF CMakeFiles/BVH.dir/main.cpp.o.d -o CMakeFiles/BVH.dir/main.cpp.o -c /home/deccer/Personal/Code/External/Raytracer/main.cpp
[build] clang-15: warning: -loop-vectorize: 'linker' input unused [-Wunused-command-line-argument]
[build] In file included from /home/deccer/Personal/Code/External/Raytracer/main.cpp:1:
[build] In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/filesystem:48:
[build] In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/bits/fs_fwd.h:35:
[build] In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/system_error:43:
[build] In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/stdexcept:39:
[build] In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/string:48:
[build] /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.1.1/../../../../include/c++/13.1.1/bits/stl_iterator.h:2618:35: error: missing 'typename' prior to dependent type name 'iterator_traits<_It>::iterator_category'
[build] { using iterator_category = iterator_traits<_It>::iterator_category; };
[build] ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] 1 error generated.
??
Linux uses libstdc++ as the default stdlib
if you want to use clang's you need to explicitly specify that
-libc++ or something
i thought cmake will take care of that when schwitching compilers/kits
the gcc variant builds, but doesnt run
runtime error?
The program '/home/deccer/Personal/Code/External/Raytracer/build/BVH' has exited with code 377 (0x00000179).
oh
it's probably the target_compile_options or shader/model paths
i did update my s ystem earlier today, and that completely fucked with my gfx drivers
let me reboot
yup
now it runs
ladies and gents
#version 460
#extension GL_ARB_separate_shader_objects : enable
#extension GL_EXT_scalar_block_layout : enable
#extension GL_EXT_buffer_reference : enable
layout (location = 0) in vec2 i_uv;
layout (location = 0) out vec4 o_pixel;
layout (scalar, buffer_reference) restrict readonly buffer b_camera_block {
mat4 projection;
mat4 view;
mat4 pv;
vec4 position;
} camera;
layout (push_constant) uniform u_pc_block {
b_camera_block camera_ptr;
};
void main() {
o_pixel = vec4(i_uv.x, 1.0 - i_uv.y, 0.0, 1.0);
}``` we got a shader
we got uv coordinates
we got the camera
we don't got the BVH though 
i like the neat naming
Having renderdoc is so good
I don't know who or where the person who decided on bitfields is
But I hope he will forever regret this decision
lodge a complaint with Dennis Ritchie
Jaker
do you reckon this is how cornell box should look with albedo demodulation?
#1090390868449558618 message
That looks probably roughly correct
The ray color is always getting tinted by the light's color though
Even the primary bounce
Should that also be deferred?
that's good
most GPU inefficient ray-tri intersection 
intersection_result_t primitive_intersect(inout ray_t ray, in uint start, in uint end) {
uint prim_id = -1;
intersection_result_t result = intersection_result_t(prim_id, vec3(0.0));
for (uint i = start; i < end; ++i) {
prim_id = uint(bvh_fetch_primitive_id(i));
const vec3[3] v012 = vec3[3](
vec3_from_float(vertices_ptr.data[prim_id * 3 + 0].position),
vec3_from_float(vertices_ptr.data[prim_id * 3 + 1].position),
vec3_from_float(vertices_ptr.data[prim_id * 3 + 2].position));
const vec3 p0 = v012[0];
const vec3 e10 = p0 - v012[1];
const vec3 e20 = v012[2] - p0;
const vec3 normal = cross(e10, e20);
const vec3 center = p0 - ray.origin;
const vec3 r = cross(ray.direction, center);
const float inv_det = 1.0 / dot(normal, ray.direction);
const float u = dot(r, e20) * inv_det;
const float v = dot(r, e10) * inv_det;
const float w = 1.0 - u - v;
if (u >= FLT_EPS && v >= FLT_EPS && w >= FLT_EPS) {
const float t = dot(normal, center) * inv_det;
if (t >= ray.t_min && t <= ray.t_max) {
ray.t_max = t;
result.prim_id = prim_id;
result.barycentric = vec3(w, u, v);
}
}
}
return result;
}```
We do got le tringle though
GPU accelerated, raytraced tringle
Now let's draw the rest of the fucking owl
ikea cornel box
you need to assemble it yourself
but half the parts are missing
how can this even happen

I've no clue 
looks more like froyok's volume shadows bleed through channel walls 😄
Alright I've found the issue
And it's not good
at all
I fixedn't
It's better than before for sure 
I'm not sure how or why
But it looks like I'm doing anyhit or something, instead of closest hit
What the hell is this 
how fast does this run
It's a bit broken so I'm not sure perf is meaningful but it do be running quite slow
1.25ms for sponza
Fortunately it doesn't scale linearly with triangle count
logN ftw
2.5ms for intel sponza
With the tree, curtains and leaves
Weird occupancy graph though, the hell happened in the middle
It would be very cool if the API to use the RT cores directly were exposed
But I guess they need their internal BVH to work correctly
madmann did it for me 
I just reimplemented the tracing on the GPU
I still don't understand building algorithms unfortunately 😦
Yeah, the bvh itself is two vectors
very easy to send to the GPU
minus the garbage bitfield but we don't talk about that
The tracing algo is also very simple
nice
eyyy
I got it
turns out I was overwriting an good intersection with a bad intersection if none were found 
Lovely
Beautiful
I also optimized it a little
The intersection code is utter garbage though 
Poor occupancy, you can feel when there is a very divergent pixel 
Damn 
lots of NaNs damn
can you not interpolate between neighbors when center is nan? 🙂
or run a denoiser over that
Nah this is a problem that must be fixed
hm
Not relying on NaNs doesn't solve the issue

epic circular pattern
feels like it was last wekeend
I don't remember having these weird circles in my CPU raytracer 
maybe I am having a stroke
perhaps not in the cpu raytracer, but i rember seeing these in context of your or jakers recent shizzles
That’s a really cool pattern though
self intersection moment
not self intersection moment
You can see here I get circular patterns when I'm not intersecting a primitive at all #ray-tracing message
green = I hit at least 1 primitive
red = I hit no primitive whatsoever in any of the bounces
Investigation results are in
Bounding boxes are extremely small sometimes, when applying a min and max bound to them the circles stop
You should start adding debug visualizations in this renderer
Also, I guess that precision on the GPU is inevitably lower or something, because the same bounding boxes work fine on the CPU
Like what I'm doing in ff (frogfood)
perhaps
I know the issue though, it's precision
Either precision, or I don't understand the ray-node intersection test
GPU floats don't intrinsically have less precision, though compilers seem to act as though --ffast-math is enabled
The standard guarantees some things about fp precision too
Well check the spec to make sure I'm not bullshitting you first
But also, I don't think you can change that
See what guarantees you actually get
NaNs are not required to be generated mfw
Any subnormal (denormalized) value input into a shader or potentially generated by any operation in a shader can be flushed to 0. mfw pt2
Oh come on
Operations including built-in functions that operate on a NaN are not required to return a NaN as the result.
Seriously
GLSL saddens me
// precision issues require me to do this ugliness
const bool hit_left = intr_left.x - 0.000001 <= intr_left.y + 0.000001;
const bool hit_right = intr_right.x - 0.000001 <= intr_right.y + 0.000001;```
At least sponza looks as nice as ever
heelll yee
I second bvh vizualization, helped me solve bunch of bugs
Ye it's probably the next or second next thing I will do
I kinda want to see some albedo
I get it I get it 😄
just one more feature bro, then stuff to help me debug 

I am not satisfied with my current solution btw
the adding of an epsilon adds the chance of traversing a lot more of the BVH than required
but I don't know how else I can address GLSL's special needs
there is no way to know what the internal spirv compiler will do
maybe you can slap the invariant keyword on some stuff
I mean
everything is random
I could dump the values of the bounding boxes on the C++ side and on the GLSL side
Through renderdoc
see if I have any loss of precision
try putting invariant on the output color and see if the epsilon hack is still needed
I guess I should also remove BDA... (shader debugger doesn't work otherwise)
I'll try
I've never used the keyword, so good luck
amazing
btw
you can represent any floating point number in GLSL as you would in C++ right?
I think so
for example, if I push const FLT_MIN, will it be the exact same?
the specs may offer different guarantees about that
particularly w.r.t. ULP precision
I'm about to use fp64
why do you need such exact precision anyways
yeah you need epsilon anywhere 🫄
now it just so happens that the bounding box of a plane is smol on the y side
sometimes it's so smol the GPU freaks out
I'll try looking at some other BVH trasversal impls on the interwebs
perhaps they do clever things
The precise qualifier ensures that operations contributing to a variable’s value are done in their
stated order and with operator consistency.
I don't get how this differs from the invariant qualifier, but okay
well I guess invariant implies that certain optimizations can be done, they just need to be consistent
so I guess putting precise on your output will make all computations leading up to it precise as well, epic
I think that is a lot closer to the fp rules of C and C++
amazing
how did you do it?
I think it will propagate up if you just apply it to the output color, no?
probably yeah
smh trying the hard thing before the easy thing 😄
yep
finally
thank god Khronos noticed that someone might want a bit more precision 
perchance
This path tracer really slows down my pc damn 
i remember compiling the fsr2 c++ lib grinds my entire pc to a halt for a good 10 seconds
ah yeah those shader permutations be thicc
dFdx and dFdy are undefined behavior in divergent control flow right?
Lucky for me, raytracing is anything but divergent, right guys?

yes
and here it won't be "slightly" wrong, it will be horrifically wrong
I remember debugging a similar issue when I was using texture for SSR
which danny
were you guys racing or something
nah I poking him to make sponza
because he procrastinated
so I made my own (jk, this is for learning BVHs
)
In fact I'll work more on the BVH after this
maybe, maybe even make the building on the GPU
maybe though
I do love myself some
ground truth
ye tiling on the floor looks like cobbled road 😛
whats the perf ?
also is this the super sponza you made or regular sponza ?
you better get used to it, because that's how the numbers typically look when doing GPU RT
even with RTX™️?
Yes
how many bounces tho
also sample count ? 👀
1spp and 4 bounces 
ahh
now do 2 bounces and 0.25 spp 
wait 40ms for 1spp and 4 bounces ?
thats too much no ?
that is reasonable imo
ye that's what I'm saying 
denoising + upscaling time smh
yeah u need to do 240p
ahh try at quater res ez perf ++
lol already sick of the perf
BVH build on the GPU
you gotta optimize it for the frogfood implementation 
I wonder how much faster will hardware rt be
cause at the end your kinda doing the same thing
except certain bits of the hw are locked when you're not doing hw rt
classic
the sad thing is that it's faster to rasterize primary rays too, so rip the only fast part about GPU PT
btw
I'm trying the albedo modulation thing
but your RSM thingy has a lot more red and green near the curtains
it won't do anything until you denoise fyi
There is barely any red here
my scene was a little more optimal for getting the color bleed
since I angled the sun to only hit the curtains
:(
You can kinda see the ground in the curtain lol
Try a different scene. I don't know exactly how it should look here
Make a scene where certain surfaces can only be illuminated by colored bounce light
Should I upscale after accumulating or before accumulating? 
Also tonemapping, before or after upscaling?
Tonemap last
why tonemap after upscaling?
Cuz upscaling affects the pixel color in physical ways
go up the stack and see which line of yours triggers it
It's random
sometimes vkWaitForFences, sometimes vkQueueSubmit2 and sometimes vkCmdDraw
Also, the simple act of binding textures does this
Here's another ridiculous thing: If I change the render resolution it works fine sometimes 
missing fence/barrier perhaps?
Not sure, but a vkDeviceWaitIdle doesn't fix the issue so I don't think it's sync related
I'll try slapping some full barriers everywhere
and then remove until it kicks you in th butt again
btw that's the name of their opengl driver dll 
opengl truly is a low-level api for nvidia
As usual OpenGL trumps everything
ogl and vk is the same dll for nv
o 😳
does validation scream?
No errors
Even with GPU assisted
I don't even have to sample any textures for the crash to happen, just bind a descriptor set and that's it 
just a sanity check
you are not destroying the resource you are waiting on are you?
Also you might be TDR'ing
Nope, I have checks for that, everything is refcounted
No TDRs either
they implement vulkan in the same dll as their opengl driver lol
this sort of nondeterminism smells like TDR though
I heard somewhere that they implement vulkan in terms of opengl, but I have no idea if that is a meme
Let me try reducing the shader to nothing
I could see it being true in the early days of their vulkan support
I think there is some truth to it
afaik that's the same reason why you get issues with windows that have an ogl context
i mean there has to be a reason why passing glsl in as shader module source without the ext used to work right
meanwhile, Vulkan is based off an API that AMD donated to Khronos 
mantle revival in 2024 pls
3dfx glide revival pls
can't have issues on nvidia cards if your api doesn't support them :)
Ok so
Reducing the shader to nothing except a texture sample works fine
changing the resolution (either up or down) works fine (sometimes)
Not binding the set works fine
Sampling from texture 0 only is bork
NOT sampling at all and binding the set is bork (except if the shader is basically empty)
bork as in crash?
Not sampling being an issue is weird
it might be some bad access (gpu based won't catch all of them)
what happens if you reboot and try again
I am not accessing the texture array in any way though
Yeah, bindless
array of texture descriptors or texture array
does the descriptor set contain anything
Are you dynamically updating the size of the set
No
hm
Btw. is VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT set as well?
yes
and the variable count binding is the last binding?
(not sure if validation checks for that)
This has resulted in failure
It still crashes
what the hell is going on
can you remove the variable size bit?
also the reboot will clean up bit perhaps
if there was a driver crash definitely
I did reboot
ah
I am thinking the shader has some kind of issue
Because if I shrimply output black, everything works fine
I've narrowed it down to BVH intersection
what is the pipeline?
if I call bvh_intersect, binding the set will crash
Is it an rt pipeline?
No it's a frag shader
regular old pipeline
I've narrowed it down further
It's the ray triangle intersection function
calling it will crash
what exactly is the function doing
beyond arithmetic
because arithmetic should never cause a crash unless it is a tdr 
nothing, it's pure ALU
And apparently this is the bit that makes it die:
if (u >= -FLT_EPS && v >= -FLT_EPS && w >= -FLT_EPS) {
const float t = dot(n, c) * inv_det;
if (t >= ray.t_min && t <= ray.t_max) {
const vec3 barycentric = vec3(w, u, v);
const vec3 n0 = vec3_from_float(vertices_ptr.data[prim_id * 3 + 0].normal);
const vec3 n1 = vec3_from_float(vertices_ptr.data[prim_id * 3 + 1].normal);
const vec3 n2 = vec3_from_float(vertices_ptr.data[prim_id * 3 + 2].normal);
const vec2 uv0 = vec2_from_float(vertices_ptr.data[prim_id * 3 + 0].uv);
const vec2 uv1 = vec2_from_float(vertices_ptr.data[prim_id * 3 + 1].uv);
const vec2 uv2 = vec2_from_float(vertices_ptr.data[prim_id * 3 + 2].uv);
ray.t_max = t;
result.prim_id = prim_id;
result.barycentric = barycentric;
result.position = vec3[](p0, p1, p2);
result.normal = normalize(vec3(n0 * barycentric.x + n1 * barycentric.y + n2 * barycentric.z));
result.uv = vec2[](uv0, uv1, uv2);
}
}```
is the vertex data valid
yep
idk how binding the set makes this here crash though
is it possioble to run this glsl through some glsloptimizer and run the outcome of it?
I wish there was an easy way to look at the spirv
https://shader-playground.timjones.io/ this no worky anymore sadly
does it not have that expression-esque output?
ye it do
download it
NOW
it works without AMD drivers
this is a sad world innit
gib your shader
the interface for RGA needs an upgrade
it only exposes like 25% of what RGA can actually do
otherwise you have to use a stinky command line
i wish we had proper debug info for shaders to correlate the isa back to the source/at least spirv
look at the ray tracing extensions for spirv
Huh, I am compiling with --target-env vulkan1.3 and it works 
you probably won't be seeing that for a while
deccer mightn't've done that
i can dream right
eh i better not confuse the horse conch
I've been contemplatong the frog for a while now
Still dunno how the fix works
This is the frog I've been contemplating btw
For the record, the "fix" is the following
Pardon my screenshot
But as you can see, I'm loading all the vertex data early in the loop
Now if I where to do separate loads for position, normal and uv
I will have the same old crash
Why? I have no clue
Is accessing memory in divergent control flow UB?
I would hope it's not?
the only potentially UB part of reading a buffer is with descriptor indexing
I will assume this is a driver bug and not think about it further
or OOB reads without robust buffer access 😉
Oh good call
Let me enable robustness one sec
Nope, even with robustness it crashes
hec
The next stage would be to have you guys test my thing
But eh
uncomment #define USE_BUGGY_BEHAVIOR to get crash (maybe)
huh you are doing VRS as well?
Yeah but not in this pathtracing project 
My bad
There we go, this is the version with basically all extensions disabled except the ones I actually need (BDA, desc indexing)
Ok, so here without and with USE_BUGGY_BEHAVIOR. Both seem to work fine but with gets me lower fps
no linux build smh
I thought lvstri was on Linux
I wouldn't expect a few milliseconds of delta between loading from memory 3 times and only once
But it's better than crashing in the driver 
Usually, but I'm gamin with frens while figuring this stuff out 
Will our tiny 1060s explode
No worries, the froge is but a mere 1 million tris
for you ill boot into windows my frog
i suppse USE_BUGGY_BABABOOEY thingy is an env var?
it's in trace.frag
amazing
$ ./PathTracer.exe
[2023-07-24 23:31:15.394] [wsi] [info] initializing window (width: 1920, height: 1080, title: "PT")
[2023-07-24 23:31:15.434] [instance] [info] initialized volk
[2023-07-24 23:31:15.478] [instance] [info] instance initialized
[2023-07-24 23:31:15.479] [instance] [info] validation layers initialized
[2023-07-24 23:31:15.479] [device] [info] found GPU: NVIDIA GeForce GTX 1060 6GB
[2023-07-24 23:31:15.479] [device] [info] acquired GPU: NVIDIA GeForce GTX 1060 6GB
[2023-07-24 23:31:15.480] [device] [info] API version: 1.3.242
[2023-07-24 23:31:15.480] [device] [info] device features enabled
[2023-07-24 23:31:15.480] [instance] [error] [general] terminator_CreateDevice: Failed in ICD C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_675be35f1ba2315e\.\nvoglv64.dll vkCreateDevice call
[2023-07-24 23:31:15.480] [instance] [error] [general] vkCreateDevice: Failed to create device chain.
[2023-07-24 23:31:15.480] [device] [info] device initialized
Segmentation fault
h
ah i may or may not have vulkan sdk installed
no its installed, i just rembered and checked, did it a week ago
updating dribers too...
hehe
colors on windows look much better than on windows, and disord is more schnappier too
ah im on 536.23, new one is 536.67
great
we should all switch to amd
Woah failed why?
i cant explain why but i prefer novideo
cant say, nothing in eventvwr
That’s so weird
634mb of blob
ah
when i use the installer installer, it shows the error : >
too low disk space
worked now, but i dont get why it cant init the device
is it using anything RTXy?
nope, only BDA, descriptor indexing and extended integer/fp types (uint64_t, etc.)
perhaps that wont work on the 1060
I'll integrate FSR2 tomorrow
anyways, path tracing is epic
I forgor to put this here this morning 
It'll serve as a good collection of all the reference stuff I get
Btw I tried to look at DLSS
And apparently you need some kind of APPID given directly by NVIDIA or something
did you remove the black lines on the texture?
black lines?
in the "path tracing nightmare" scene I have there are these black lines
Ah, mine doesn't have them for some reason
Here it is, complete with the big emissive sphere of doom
looks like one is with the other without textures
Weird, I exported this to .gltf + Textures because I cant load .glb, and the resulting image file has the lines.
Huh, I'm not sure
I'm not sampling any textures, for the record
Actually I lied, I am sampling base color, if there is a base color
Probably something wrong with the loading
yep. A shame really
atleast there is fsr
just give me 1 hour and I can send you a Noisefree Real-Time Video
what more do you want
30ms per frame is real time enough for me 😛
needs to build under linux 😛
@wicked notch Suggestion: You could ask mods to pin first post here: #1090390868449558618 message
path tracing has no right looking this good
wanna add AA pls?
ye I promised I'd add FSR 
are you planing to make this real time?
like with little noise as well not just high fps
Eh I have no idea how
it's already 1spp
I'm not really sure how to optimize trasversal
I would like to though
FSR can already smooth out the noise a bit
because regarding the AA if you plan on accumulating multiple samples its very easy to add. Just randomly jitter the primary ray inside the pixel
like with TAA
https://www.youtube.com/watch?v=VlGfFOZRubc 2nd asset potentially has a gazillion of tringles again
It's the first Tuesday of the month and Unreal Engine developers know what that means... free stuff! It's another Unreal Engine Marketplace giveaway, this time for August 2023. This month we have 5 "free for the month" assets, with no additions to the permanently free collection. These assets are yours to keep forever, so long as you "buy" th...
Lovely
Mayhaps I should invest in point shadows
I see aliasing too 
nothing that a little AA can't fix tho
do this!!11!1!
in the function that converts pixel to ray dir add random float 0-1 to pixel position. After some frames you got perfect aa
or is it random float in [-.5, .5]
should be [0.0, 1.0) in compute shader
Damn I always forget how good RT looks
btw hardware accel makes things go about 3 times faster
Which is less than what I'd expected tbh
im just leaving this here https://www.youtube.com/watch?v=ZVN50Oxyh5I
ery nice
I'm currently struggling to implement nanite like LODs 
I've been doing nothing but that for the past week or so
I mean I get how it works conceptually
kindof
up until where they go for persistent threads on the GPU 
But the LOD'ing I mostly understood it
What part about persistent threads conchfusez
I just fail to understand graph partitioning
oh jeez
first off, it would help knowing how to actually do persistent threads on the GPU
regardless of workload
because I have no idea what they are or how they work
Isn't it just the thingy where there is a work queue and threads keep grabbing work until there's none left
It's this part https://youtu.be/eviSykqSUUw?t=1553
ye
that's right
except on the GPU 
how the hell do you make a work queue on the GPU
uhh
so the job queue has all the cluster nodes to be processed
uh
processing involves checking whether the cluster should even be considered when selecting the LOD?
It's possible that they implement a mini job queue for each WG, that way they can sync pushes and pops
Instead of having a single global queue
Idk lock free programming is hard, so I want to imagine they did something the shrimple way 
mfw I have no idea about lock free on the CPU
NaiveJobQueue.cpp
how do you expect me to understand lock free on the GPU 
I just use moodycamel's queue
is there a moodycamel's queue for the GPU?
Uh atomic comp swap is your friend
perchance
https://ye-yuan.com/DSGPURayTracing/ uhm, no i think thats not what you are talking about
well regardless, I'm not even at step 0
actually yeah, it is
Just not applied to LOD but raytracing
the fundamental problem they're solving is one and the same
Damn this is interesting
I'm still lost 
but interesting nonetheless
literally me fr
Reminds me of a series I read where the main character journeys to the core of the earth (not the one with the movie starring Brendan Fraser)
and in it he falls through a hole (or multiple) like that one
Jules Verne 😉
Nope
It was a young adult series I think
I read it when I was in middle school (the last time I read fiction recreationally
) so I can't remember much
ah
Ah it was the Tunnels series
i rember a few drawings of young jaker 🙂
