#Platinum — Path Tracing in Metal
1 messages · Page 3 of 1
lmao
common discord L
yea
right
you can't say what it was tonemapped for
so the browser/OS/monitor each get to choose what they do
one after the other
unless one of them disables the tonemapping in the later stages
argh i stacked my monitors and my old monitor looks so oversaturated 
i love 'wide' gamut monitors that just stretch the input
and the srgb clamp mode just tints everything red 
right i'm just reading a bit about formats and i think i might as well give up on getting hdr images to look good on anything other than the exact display they were authored on/for

or you can export a video with a single frame
i mean isn't that kinda what avif is
idk if av1 supports hdr metadata then lol
hey you can use ultra hdr
jpeg tho 
it's pretty nice tho
it's sdr + gain map
so it looks good on sdr, and on hdr
does jpeg even support primaries other than 709?
dunno
ah, it can embed icc profiles so yes
what about .raw
is that an actual format? or you mean like, just output pixel data
i was kinda joking but yes
any reading on how this works?
i assume it's storing the gain map in one of the APPn markers
seems to be supported by openimageio as well
so now i just need to figure out icc color profiles and jpeg should do what i need
it's so jank though, i hate it 
can we please retire fucking jpeg already
oh nvm input only
fucking libraries
fine, i'll do it myself
lol
if i render in rec2020 and don't tonemap at all i get this 
(idk what the screenshot is doing to color)
looks about right for an underexposed render, are you clipping to srgb?
(if you aren't, your display probably is)
why would it
I'm outputting to a rec2020 pq swapchain
So the input colors and textures are in sRGB and you convert them to rec 2020 and render in that colorspace?
yeah
and that's the same as rendering in sRGB and then converting to rec2020 just before I write the pixel
which makes sense
the multiplier factors out
I don't get a similar effect to this
with the reflection changing
that effect is propably because the light color is the same or just (1.0,1.0,1.0) but the meaning of that has changed
otherwise what I’ve noticed is just slight changes in saturation or brightness, fairly similar to bluescreen’s sponza
if you have more complex light simulation I could very well imagine some effect shifting colors more drastically in BT.2020 in comparison with smaller colorspaces
ah
ight then
i assumed srgb display
it shouldn't, i might have a bug
also the dragon color is different in mine, i made them saturated to test gamut, i'd have to try the original scene
i know but i just noticed that i didn’t consider light color in my renderer which would be the reason my renders look different in every color space
so might have the same issue here
i am converting my emission from srgb, so that shouldn't be happening
wdym by this
no I have ascended to HDR master race 
@winged veldt what mean const 2000.0f here?
and why 2000.0f?
https://github.com/teofum/platinum/blob/975dd47a8884a4e5e4048f5f6c91fa43dcf051a5/src/renderer_pt/renderer_pt.cpp#L974-L976
did you change your russian roulette luminance calc 
i did not
shouldn't make a difference though
iirc i'm just using the highest component
well, that's wrong, for one
1000 because it's converting mm (conventional unit for focal length) to meters (unit length for my renderer), 2 because it's converting a diameter to radius
is it
iirc what you use for rr doesn't matter
it's unbiased either way
so it won't change the results
ye
fuck no 
why
i can try lol
you already have the code for that i assume
but yeah won't make a difference since it's unbiased
unless you hardcoded rec 709 -> rec 2020 without going through xyz
in my renderer my sun was always just emitting (1,1,1) light regardless of color space, so having a different internal color spaces will obviously make the output change
i go through xyz cpu side and feed the shader a 709->2020 matrix
I have constexpr functions for this so that it automatically does this
(another reason metal > glsl/slang/...)
bro it's a constant the shader compiler will optimize out
remember i have the option to change working space in the ui
it's not hardcoded 2020
what's the problem with taking a matrix in the args buffer
template<typename T>
kernel void bar(device T *x) { … }
template [[host_name("bar_int")]] kernel void bar(device int *);
template [[host_name("bar_float")]] kernel void bar(device float *);
do this but for color spaces
(this is just a sample from the metal spec)
but why 
why not
oh i should make a col<T> type
cause it adds code complexity for no good reason?
how is it more complex
your conditionals for switching color spaces just become if constexpr, and then the few extra lines of template instantiations
ok. thanks
i don't have any conditionals in the shader right now
ah true
i also don't have any hardcoded conversion matrices
they're calculated from the primaries and whitepoint, which gives me the flexibility to define any colorspace i might want
probably not strictly necessary, but it's cleaner imo
I just slapped the whitepoint etc. into an online calculator and then got my matrices
and mightve copied quite a bit from jaker
🤓
you should render in the monitor's colorspace directly
why? there's no real benefit to matching the working and display colorspace afaik
better to render in as wide a colorspace as possible
how would you tonemap to the display then 
hmmmmmmm how do you do autoexposure for HDR output?
my SDR autoexposure maps the average luminance to middle gray (~0.18)
but i need an output in nits 
no idea, i don't do ae
but you should be doing that in hdr anyway, no?
so ig it would still be the same
it should be half of the sdr range
hdr doesn't mean you should always use the full range, if the entire dynamic range of the image fits in sdr you display sdr
what is SDR range 
good question 
lol
ask your monitor vendor
what a monitor does when given an SDR signal doesn't really matter
i mean you'd want consistency between sdr and hdr versions of the same image ideally
hmm
can't really get that
it's why HDR handling is so fucked
SDR isn't consistent in the first place
most monitors just have their own primaries, and ofc brightness can't be controlled at all
I guess that's what the 'paper white' settings in games is
yea
i mean as far as brightness goes middle gray is just "half the max sdr luminance of the display, at the current brightness setting"
as for the primaries well just assume the monitor is properly calibrated, if it isn't that's a skill issue

I guess I make that configurable
i mean you can calibrate output for the monitor's primaries ig if you know them
that's kinda what monitor calibration is
edid?
(it never is)

[200,200]
ooooh i can ask windows for the SDR content brightness and set my white point according to that
if you read his articles it makes some kind of sense
the last few are totally unhinged though
hg2dc?
i have read them
i still don't understand shit
yeah took me a while and i still don't get some parts
he sounds like a weird guy and i want to imagine he was totally normal before getting into color math
I don't mean just that reply...
color math more like just color
"not even once"
if you read the articles in order you can tell when he goes off the rails 
no thank you
i've been gone for a while cause uni semester started and i've been getting settled in but anyway
getting started with my jpeg lib
i've decided to just write a library so i can use it for other projects as well (and also because there's a distinct lack of simple image i/o libs that don't suck)
fastjpeg??
so here's a jpeg
it's basically libjpeg hello world but hey it's working
now i gotta make it work with different input data types (int16, float32 etc) and figure out how icc profiles work so i can do colorspace stuff
then, ultra hdr support with the gain map thingy
oh and also be better at cmake so i can have the thing fetch and build libjpeg-turbo instead of relying on it being installed in the system (which it probably is because it's ubiquitous, but i'd rather not anyway)
ye, preview works
so does any web browser, afaik
arc (chromium) does indeed display p3 jpegs, so does safari
ye but what about hdr
i mean even stuff like VLC can't fucking display HDR content on Macs
it just works™
i can open ultra hdr "jpegs" just fine in preview
and iirc works in a browser as well
what does "ultra hdr" even mean here?
macos is actually like the one os where hdr is sort of sane
well apple was very early on the hdr train
it's android's jank format for doing hdr in a regular old jpeg
i've only skimmed the spec, but basically you have a container file with two jpegs
the first one is the actual sdr image
second one immediately follows and is a gain map
any software that doesn't support ultra hdr will just read the first, sdr jpeg and display it
however in that jpeg, the app1 marker has some metadata
that basically tells whoever is reading the thing "psst hey this looks like a regular jpeg but actually it's hdr, there's another jpeg with a gain map appended to the end of it"
that software is then supposed to read the gain map and compute the hdr values
just use a normal format what the fuck
it's absolute jank but hey it works, degrades gracefully to a sdr jpeg, and is more widely supported than... uhh most hdr formats
if this thing i'm making (i'm calling it simple_jpeg) turns out well i might make simple_avif or whatever
i disagree
oh stbi 2.0
exr is not good as an output format though
but with less suck
afaik stbi does the decoding itself and it's slowww
i'm just wrapping libjpeg and adding colorspace/hdr stuff
i mean i want to actually make this thing and have p3/hdr output for my renderer, not bikeshed over the jpeg spec forever
supports png and jpeg under one api
and is quicccc
oh wait nevermind
doesnt include encoders
yea
it also has no docs 
and the c api is generated from their nih language thing
but yea decoding only iirc
really?
i remember it was quite easy to get started
heavy use of c++ in the api so it has to be good
if it has i could not find them
it's a whole ass language they came up with
They actually managed to achieve memory safety^tm
fake, it's not written in rust so it's obviously unsafe ||/s||
@winged veldt maybe you can be interested in this https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling
Blog also contains info about blackmagic luts for color manipulation
I found implementation of some from psychopath's owen scrambled sobol
https://github.com/NVIDIA-RTX/RTXPT/blob/main/RTXPT/PathTracer/StatelessSampleGenerators.hlsli
got icc profile embeds working, the left image is display p3 and right is srgb (they contain the exact same data not adjusted for color space, so the left one should be more saturated and clip differently on srgb displays)
now working on a minimal icc profile generator so you can write images in any arbitrary colorspace
turns out a minimal icc profile is pretty shrimple, you can basically just define the primaries and transfer function and that's it
i realized something amazing today
for all the talk about colorspaces i do, my gltf importer doesn't import base color as srgb

hey mate
you know you're on mac, can you give me your compiled dylib for imguizmo?
I can't figure out how to compile it and of COURSE there's no compiled binaries anywhere online
:C
@winged veldt :)))))
btw I did a cool thing I need to make a thread on community projects for my "engine" 😄
also do you know why on mac, when going into fullscreen imgui looks sharp, but in a window it's slightly blurry, it's also breaking my fucking picking as the viewport pos I'm clicking doesn't align perfectly with the render texture faaack
like
in fullscreen it works perfectly
in window, it has a y offset so you can click top of an object it won't select but under the bottom where the object isn't it does
like by maybe 2-3 mm on my screen (13" screen)
😦
might be your metal layer's contentsScale isn't getting updated
It’s OpenGL via openTK
should probably move the discussion over to #opengl then
Yeah mainly was asking bluescreen anyway
I know it’s to do with retina dpi
Like I have my Mac scaled to 2560x1200 I think it is
Which makes dpi scale 1
Going to recommended res makes the dpi scale 2 and my entire imgui drawing ends up cut into the top left quarter
Mac is strange like that
yeah retina layer scaling is a bit jank
It also sucks for scalability when it comes to their ui, if you run at the legitimate max res
It goes tiny
Like idk I don’t understand if running native res makes it sharper I don’t think it does
Just makes the ui go tiny for everything
Retina is very strange yeah
But yeah I think what I need to do is derive the dpi scale from the openTK window and multiply my gl viewport with it
Btw @winged veldt tell me to go away if you don’t want me to message here 🙂
My work in progress windowing 🙂
if you can get at the NSWindow that contains your application the relevant value is the backingScaleFactor ivar
which is always either 1.0 or 2.0 afaik
Hmmmm I can calculate the dpi scaling anyway mate, using Size and Width/Height
Size is the larger of the two
Physical size ig
I just need to find the correct place to stick in the imgui windowing stuff
But I’m pretty sure it just goes in onresized
sorry, didn't integrate imguizmo yet—it's pretty high on the todo list though, so i'll probably get around to it after i'm done with the colorspace stuff
i don't know when that'll be, i've put my projects essentially on hold for a bit as life stuff happened
might come back to this next month, or whenever i have the time and regain motivation to work on it
probably your dpi scaling is fucked
how are you handling it?
I’m not currently lol
well yeah, you're going to 1x scaling on a display with tiny pixels
I wasn’t aware of Mac dpi
Yeah I run my Mac on higher res so like
It’s 1x dpi scale so when it’s that resolution it behaves
I don’t like how big everything is on the suggested res
But I think i just need to add the scale factor for imgui when I do the resize event
though iirc i don't handle changing monitors so it might break if you have two displays with different dpi scaling settings
it's pretty straightforward dpi scaling
Yeah it’s just weird to me because it’s a Mac specific thing
Windows always runs native monitor res so
Unless you like blurry shit
no it doesn't
you can set scaling and it defaults to 150% iirc in a lot of laptop displays
It’s also doubling my res for the fbos when dpi scale goes up
Oh yeah but it doesn’t do this shit lmao
windows having major scaling skill issue is a separate thing
Like what I mean you just don’t need to worry about your window size being different to pixel res on windows usually
I don’t do any dpi shit atm and had no problems with windows but mac it broke as soon as i changed res to test
But I may consider going to recommended res on Mac as it fixes the blurry imgui
yeah idk if windows handles it behind the scenes but mac does expect you to handle dpi yourself
Yeah haha
It’s fun anyway
Also guess what
Metal being the actual api on openTK on Mac has its perks
/*
* Scale fonts for high DPI rendering
* TODO: should rescale on monitor change
*/
int renderWidth, windowWidth;
SDL_GetRendererOutputSize(m_sdlRenderer, &renderWidth, nullptr);
SDL_GetWindowSize(m_sdlWindow, &windowWidth, nullptr);
m_dpiScaling =
static_cast<float>(renderWidth) / static_cast<float>(windowWidth);
I can for the life of me get 10bit output on windows from my engine lol
that's what i do with sdl
Yeah that’s all I need to do and then pipe in to imguicontroller.resized
Just scale gl viewport with the dpi
Basically anyway
Imgui resize in that controller does more stuff
Like ui mouse hit detection has to work with scaling too so
Will see if it fixes it when I pipe in the dpi scale
yeah you need to use the scaling factor in multiple places, input is one of them
Yeah
I’m pretty sure tho
The imguicontroller does it
Simply by calling resize
It should get the pixel width then
what's the imguicontroller
ah
you're on your own with that, i don't even know what opentk is
Let me know if you need to build the Dylib
Either way I would legit buy you a game on steam if you can give me the Mac lib for gizmo haha
i mostly statically link my deps tbh, but i'll share my cmake to do so
It’s not in any nuget packages
Ah ok heh well if you could that’d be ace
I’ll treat you 🤣
i'll let you know when i do
Thanks mate
no need for that lol
What you been up to lately, this thread disappeared and I had to search metal path tracing to find it because I forgot the name haha
It fell off my discord pinned threads haha
Btw I’m not versed in c++ I only use c# - dynamic linking is when there’s a DLL and static is when the lib is part of your app?
yea
just life happened, been going through some stuff and also super busy with uni and work so i haven't had time to work on any projects
i do want to get back to this soon, though, there's a lot i want to experiment with
Yeah, someone posting in showcase has some nice caustics idk if you’d be able to do that real time tho idk
Oh but then you aren’t doing real time anyway yet so
"yet" lol
i'm not even planning on it
least, not for this project—i might mess around with realtime rt down the line, but it's not in scope for now
though i do want to try implementing better light transport for caustics etc, starting with bdpt
just needed to get the colorspace crap out of the way first—i actually already have working jpeg output for display p3, so i think rather than adding support for arbitrary color spaces, cleaning it up and making it a library as i originally intended i'm just gonna integrate it into platinum and be done with it
Ahhhh good idea 🙂 and yeah deffo would be cool to see caustics 🙂
yeah bdpt and volumetrics are the two things i most want to try out
ok but when hdr tonemapping 
by my back of the napkin calculations, bdpt is at least twice as good as udpt
what about tdpt then
no
i mean some day
but probably not any time soon

or i might just tackle it now, idk
that was the plan, anyway
i'm just not too motivated to work on hdr when i'm missing a bunch of more important stuff, tbh
I'll have to do the math
if you like you can just yoink my thing when it's done and then scale it for pq or whatever.
if that's what HDR is in this case... I think, assuming Rec. 2020 working space, it could just be something like
m1 = 0.1593017578125
m2 = 78.84375
c1 = 0.8359375
c2 = 18.8515625
c3 = 18.6875
def pq_inv_eotf(x):
ym = pow(x, m1);
return pow((c1 + c2 * ym) / (1.0 + c3 * ym), m2)
rec2020_lin_val = transmittance * some_projector_brightness
pq_val = [pq_inv_eotf(ootf(x) * (paper_white_nits / 10000.)) for x in rec2020_lin_val]```
in theory, projector brightness could just be a, uhh... HDR-ness control knob of sorts, with radiance extremes already reined in
or - wait, you're doing apple stuff, right? I guess they have their own display p3 thing?
yeah, haven't looked into hdr output
wikipedia says dci uses pq, but apple is .... apple, so I have no idea
it's probably pq
oh good, a new version of openimageio broke the project
i hate image libs so much
errr... I dunno
I've had nothing but trouble with oiio
I guess they use PQ for consumer stuff anyway.
yeah it's kinda dogshit
but what image i/o lib isn't
stb is simple but horrendously slow and only really handles jpeg and i think png
python's imageio is mostly decent, but I'm guessing that doesn't help you
oiio is pretty solid and supports a lot of formats but it's a major pain in the ass to build and/or link hence why i end up relying on it just being installed on the system
pain in the ass to use too
I'm not a fan of their docs or API, IIRC
but then apparently they make breaking changes on a patch bump and my code no longer compiles against it (3.0.4.0 vs 3.0.1.0)
api is kinda jank but fine, i don't think you can do much better while being as generic as oiio is
the docs are kinda ass but at least they fucking exist lmao
stb has some comments in a header file, wuffs has literally nothing
might just deal with libjpeg, libpng and libexr directly, i'm already switching to libjpeg for output to support color spaces
but also i wish writing images to a file would just get out of the fucking way, because it's very boring code to write and has nothing to do with what i actually want to do (rendering)
I don't know. I find it very sparse and lacking options where it should be detailed and explicit and very thorough everywhere that idgaf.
the state of image io in general strikes me as surprisingly raw and immature-looking for how ancient and fundamental it is to just about anything
not the library, but just all of it
yeah you'd expect there to be at least one decent lib for it by now
honestly, i might just extend my libjpeg wrapper to input, add support for png and exr, and forget about image io libs
i don't want to spend the next month writing image i/o code, but it might be easier long term
like, even in python, which is reputed to be largely plug-and-play for most things, I've concluded on several occasions that the most appropriate image format is a raw binary dump
because there's always something... tiff API is broken, EXR is corrupted on linux, etc, etc, etc
ppm is unironically the best image format
PIL/Pillow is beyond help
that's like... too far the other way.
API simplified to the point that you get a fisher price image library.
"save this for interwebs and I don't care how"
yeah i'm convincing myself writing a libX wrapper is the way to go
reading a little more about HDR standards now I would like to reaffirm my position that the whole thing is unhinged and improvisational
Tiff is a horendous format really
What formats you want to use also?
I've briefly contributed to a tiff decoding library and while the spec is nice, it's missing a lot of details. TIFF is one of those "the spec is what the official implementation (libtiff) does".
And libtiff's source code is... one of the worst I've read. Like no wonder the thing gets CVE's almost every year.
It's the first codebase where I've seen the register keyword used. It's that ancient 
I don't think the format does anything particularly well.
ah, yeah that sucks
if you're gonna have the code be the spec at least make the code good 
fair enough, it's kind of an ancient format and dng/exr probably do a better job at what it's used for
tbh so are jpeg and png but we're still using them
need to read exr (for luts and hdris), png and jpeg (gltf textures), write jpeg (render output) and exr (raw output, lut gen)
tinyexr?
For save png I use lodepng. Works slow but good compress
i'm just gonna use libexr and libpng directly in my own abstraction
Or you could convert all textures to QOI with a cmdline tool 
Why not tinyexr?
since tinygltf I am quite scared of “tiny” libs
because of it you start to use "fast"
because i've had enough of dealing with someone else's shitty library
i'm making my own shitty library instead
what is not ok with tinyexr? I used it for read and write and no problems
fastexr???
boost 
fast uhh something
i'm probably just going to write a wrapper around libexr/libpng/libjpeg though, unless they're bad enough i actually go fuck it, i'm doing it myself
but i really hope it doesn't come to that because i don't actually want to write image i/o code
well to be fair I would rather write a wrapper around that thing google made
forgot the name of it
wuffs?
yes wuffs
use wuffs as much as possible is what i'd say
and then use other libraries for the things it doesnt support
wuffs scared @winged veldt 🤣
🐕
it only does decoding, and isn't documented, like at all
so that's a no from me
you need decoding for importing no?
and it is documented I am fairly sure
or use my code as documentation
yeah, but i'd rather not have two separate libs for encoding and decoding

what difference does it make
there's gonna be two separate code paths regardless
if whatever i use for encoding decodes as well might as well just use that, reduce the amount of dependencies
yeah, i know wuffs is fast, but i'm not decoding png in a performance critical context anyway
if it is, i could never find it
honestly the whole thing is just confusing as hell to me, every time i've looked at the repo i just have no idea what's going on
yeah true
I just use it because I don't need exporting, and it is safe and fast
basically my entire graphics programming endeavours boil down to having the fastest possible asset loading times
never got to the graphics part 
yea, i do need exporting and don't care about safety or speed (within reason, stb is just too slow)
though honestly i'd just use stb despite the slowness if it had the features i need
actually idk what you've worked on besides fastgltf but it's great
i wish more libs were like that
use it for importing. for exporting, have a popup tell user to photograph screen with phone
I made a DDS library
uh for graphics that's about it
I had a raytracer back in the day which didn't converge correctly and had broken diffuse shading, I had a Vulkan mesh shader renderer that just put triangles on the screen, I had a Metal mesh shader based renderer where I actually managed to get cascaded shadow mapping in, and I had a Metal based mesh shader based renderer with a basic VSM impl that borked the drivers and Xcode
I think that's about all I've managed to achieve 👍
oh and I've done MVK work
it's all just random garbage basically
besides my libs
tinyexr is not great tbh
I use it in my renderer and would like to replace it eventually.
What wrong with it?
I had a problem with outputting images in XYZ iirc.
WE'RE SO BACK
i got the thing building again, without relying on clion now just plain cmake+make
yeeted oiio, added tinyexr for lut loading
just need to bring in a png lib to bring back texture loading and render export and i can start working on it again
imgui 
a nice font and a bit of custom theming goes a long way
it supports a light theme as well
damn
doesn't look like imgui at all
btw I meant the whole render and everything looks amazing
do you like argentinan bbq
once i get everything working again i want to get volumes in
are you in buenes aires?
I was thinking of visiting clarolab team
the flight is like 14 hours or something :S
it's crazy though 14 hours and very little jet lag
because the direction is north to south
my brain has a hard time with that
yeh, it's a long flight
i'm used to them lol
flight to europe is about the same time
you should visit, it's a cool city
if you have the time travel around the country while you're at it
that sounds nice
hdri loading is back in, now supports exr (with tinyexr) and hdr as before with good ol' stb_image
uh oh my glass may have a little nan problem
so the only thing that changed is the lut loading
and the other materials work fine, so it's gotta be some issue with the 3d luts the glass material uses
It's honey for my eyes 🍯
You lowkey really have amazing looking ui
i think tinyexr might be ass
it can't cope with single channel exr files, just spits out rgba data so now i have to process it instead of doing a memcpy to a buffer
got it working, we're so back
gltf import is back, i now just fall back to stb_image for texture loading because it handles different formats seamlessly (unless it's exr, then it uses tinyexr)
it's slow, but whatever, i'll just multithread asset import later
p3 output works 🎉
if your display is srgb those two images should look exactly the same, but if you have a p3 display (and an image viewer that's properly color managed) the walls shold be way more saturated in the left image
Suurrelly Discord is an image viewer that's properly color managed
actually, yes
no waydge
it's chromium which handles color profiles fine
nice
So what's next now that the image IO stuff is sorted out?
uhh i still need to fix the lut export i think then i'm done
i mean
not that i'm using it anymore, but i do want it to work
i think ima add support for volumes next
simple beer's law first, but i want to do it in such a way that it supports proper volumetrics as well and not just the ugly hack yart had
which does mean messing around with the renderer architecture a bit, because now i'll need to actually track what medium rays are in
Yeah the RayVolumeState is quite a bit of a feature creep in my renderer, you always have to bring this one along in many places because there are many places where you may want to know if you're inside a volume or not, it's a bit annoying
May not even be a bad idea
keep it as shrimple as possible
I suspect my RayVolumeState is spilled to gmem anyways because of how long its lifetime is
So having it in gmem just simplifies the variable passing a bit and may not even be slower than having it in (spilled) registers
hm wdym
i'm not too familiar with memory management on the gpu
it would still be thread local ofc
just a global in the sense that i'm not passing it around as a parameter all over the place
yeah so in a global VRAM buffer right?
thread local because each thread wants its own state but by "global" you mean in an SSBO?
shader storage buffer object
just a buffer
like the one you store your materials in
oh
and geometry indices and whatnot
no need for that though
i don't need that data cpu side
i'm talking about literally just a global variable in the shader
thread RayState *rayState;
done
Ah so it behaves like a local variable just that it's accessible everywhere?
I see
if it doesn't i can just pass it around whatever
then:
Basically the ray volume state is created at the very beginning of your kernel (where you ray is created)
And you have to keep it around until the very end of the kernel when your ray exits the scene / has bounced the maximum number of times
So the ray volume state exists from beginning to end. That's why it's lifetime is long.
The few cosine terms and contribution terms when evaluating NEE for example don't have a long lifetime because there only there for a few lines of code and when you exit your NEE(), these variables are "freed" and registers allocated for them are also freed
The ray volume state is never freed, from the beginning of the kernel to the end. And big-ish variables / structures that have long lifetimes like that are often spilled by the compiler ime to save on vector registers usage
right
yeah in that case it's probably getting yeeted to vram
i honestly don't think about that too much
probably yeah
how many registers does a gpu have
I didn't either until I realize that the sole existence of my full BSDF code halves the performance of my entire rendering if not more
wait how
depends totally on the GPU. I literally how no idea how that works on Apple
In general it's 65k per SM for NVIDIA/AMD
oh that's a completely different ballpark from like cpu registers
oh ye
are we even talking about the same type of thing
yes
huh
it's just GPUs have a massive ton to allow for ultra fast context switching basically
how does that not make the chips chungus
also, what was causing this?
because the rest of the GPU core are quite small compared to a CPU core, that's why GPU cores are way slower on their own than a CPU core
yep hold on, writing two things at the same time, the explanation is cooking in the clipboard 
Because the BSDF code is so massive that it requires a ton of registers. So many that tons of registers are yeeted to VRAM.
And I suspect (haven't checked the assembly though) that the registers spilled to VRAM are the ones used in the eval/sample function of the BSDF
This makes the eval/sample functions super slow because they're pretty much operating on VRAM instead of registers lol. That's a bit of a simplified way of saying that but that's the idea
And so the difference between
- a Lambertian BRDF
- my Uber BSDF with all materials using parameters such that only the diffuse lobe is evaluated
is 2x in perf
Quite big, it's almost a full OpenPBR impl basically. It's around 2.5k lines I'd say, 8 lobes
i've thought about using metal's fancy compile time branching thing to build specialized shaders for each material before rendering
might do that at some point, see what it does for perf
would also eliminate a lot of branching in the shader
Yeah I think the best way to go for me at this point is wavefront path tracing + material sorting + specialized shaders such that it only compiles the shader code that is exactly needed for the material being path traced: do not compile the chonky glass lobe / thin-film interferences / double-metal / ... if the material is just specular + diffuse
I think doing this could be really good for registers (and thus perf)
Would you have to do sorting too then?
the thing that worries me a bit with wavefront path tracing is that it doesn't like path guiding 🙁
or path guiding doesn't like wavefront
I see it
in chrome browser
works as is
why do my photos I send from phone never have hdr previews though
if you could get p3 to show up
no idea, discord krill issue i think
hdr images only show in hdr when i zoom in for me
well shit
just tried to render intel sponza and it froze the entire os 
this worked before
at least it's not setup_loader_term_phys_devs: Failed to detect any valid GPUs in the current config that I have
though I do wonder why entire os has to die
likely something the gpu driver didn't like
i had full os crashes before when i had like an infinite loop in the shader or something
have you tried getting a better driver/os
shut

ok yeah i'm definitely doing something to lock up the gpu
the question is wtf is it
intel sponza rendered fine before and not much changed
well uhh
the old build also crashes
and if i import the gltf instead of loadin gmy old scene it renders fine
what the fuck
gippy yous 
gippy moment
sponza now with curtains, ~5 min at 4k samples
i should actually do stuff lol
anyway nice to have this functional again, and exporting in p3 too
rec2020 smh
exactly 



👀
i was planning on moving onto volumes right after doing mis
we can be volume buddies
https://github.com/teofum/platinum/blob/975dd47a8884a4e5e4048f5f6c91fa43dcf051a5/src/renderer_pt/shaders/kernel.metal#L554 @winged veldt why do you consider the bxdf sample when sampling direct lighting,,
no point doing NEE if the brdf is perfectly specular?
Not doing NEE on emissive materials? What is this biased felony 
am i not supposed to do that? the specular flag indicates the bsdf is a dirac delta so there's no point doing nee (bsdf pdf will be zero)
actually i'm not sure why emissive is in there 
i think it used to be a separate lobe with a nonsense pdf, so i skipped nee if i sampled the emissive lobe
but then i merged it into diffuse so it should be fine
The flag indicates that for a particular sample of bsdf though
Not the entire bxdf
Which is my point
Your bxdf might have both specular and diffuse components
But if you picked specular component when sampling you won't do nee
Even though it would still be beneficial??
Because there's diffuse component
Pbrt snippet I linked shows this btw
They have a flags method on bxdfs
That returns flags of all components basically
That contribute to bxdf eval
i'll try fixing that later today, cause i think i do have some missing energy from just that type of material
Read,,
thanks for pointing that out btw
Ye np
i probably have countless issues in my mis and bsdf code 
I'm dumb at light transport, I'm soaking things up from pbrt but I'm looking at your code for how do you organize things
Btw you should add programmable materials
It makes everything at least 10x more fun

Sampling is as easy as before tbc but you get more plumbing to piece together
ngl i'm also dumb at light transport
my bsdf is probably a mess so feel free to keep pointing out anything wrong/shitty code it helps
how's that work
Alr
like blender's node based thing?
Yeah
yeaa i'm skipping that for now 
@keen bear removed the condition on light sampling, doesn't seem to make any visual difference in the render 🤔
but maybe it'll help convergence on a more complex scene
it's certainly more correct, though
i've changed it to just the bsdf itself not being fully specular (either roughness >0, or metallic+transmission < 1 so it has a diffuse component)
also i clearly have a major firefly problem
i suspect something in my bsdf is bad
I'd figure it should yeah
if the light is double sided, i would edit the model to bring the light even closer to the ceiling just to be sure it's not the top of the light that's hard to sample and generating those fireflies
i'm getting fireflies too with your scene despite only having diffuse materials
hmm, doesn't change anything, so i don't think it's that
the light is double sided yea, but it's just two tris
so if it gets sampled you can't sample one specific side, if that makes any sense
if a ray happens to hit the top—or sample the light from the ceiling then yeah it'll hit the backside, but that should be no problem
though that does make me think if a sample at grazing angles might be causing fireflies with some insane pdf due to numerical instability
but it's much more unlikely to happen, no? specially after a few bounces
hm, there may be multiple sources of fireflies
you have way fewer than i do
how many samples is that btw
the left was 15k i think, but honestly it was getting worse over time
changing the light to a sphere should minimize grazing angle nee hits, but does little to reduce the fireflies
you're right it isn't many fireflies
lmao i'm doing like 250 samples
15k takes forever
lemme just do a lower res render
uhhh yeah 2.5k and counting, i definitely have waaay more fireflies
i don't have a pure diffuse brdf to test with though
i would hope i don't get fireflies with that cause it's super shrimple
could also be caustics maybe?
don't think so, it's consistent across almost every scene even without shiny materials
made a new material by mixing metallic with diffuse just for testing and yeah, it's noisier but still not comparable to yours
yeah ima try to figure out what's causing all the fireflies
ugh
i wish i could write my shader code in a real language
fuck shading languages
not having any lsp support or even syntax highlight in my editor sucks absolute ass
that's not even the issue, it's the complete lack of syntax highlight
tbf there are lsp implementations for glsl and slang
cause people like actually use those
msl is a pain in the ass to work with because there's zero support outside of xcode
and i'm not using xcode
tbh when kosmickrisp enables vk rt on mac i'm probably rewriting this thing with a vk backend
funny enough clangd_format actually works just fine with msl because it's just c++
but the clangd lang server shits itself with all the custom keywords and doesn't find the include dirs
so you're basically coding in plain text? that does not sound fun
yeahh
i did manage to enable nvim's basic syntax highlight so i can at least tell keywords, comments, and numbers, but that's about it
honestly your codebase seems small enough that a shrimple text editor, maybe with syntax highlighting + git grep is fine no?
I do struggle slightly without go to definition on larger codebases but these tools also often fall apart on the said larger codebases
like e.g. vscode's go to def in linux (the kernel (the sorse)) often lands me at incorrect definitions so I have resort to git grep again
it works slightly better with mesa but our driver does some cursed macro magic (there's other drivers that do the same thing) that vscode doesn't cope with and so I'm often back to using git grep again
I guess it's just a cee moment (similarly applies to c++ ofc)
it's manageable, and at least the autoformatter does work
but without lsp support the syntax is kinda ass
lol I wish clang-format was actually good and we had ci require that things are formatted
this is what nvim basic syntax highlighting looks like
better than plaintext but barely
and no autocomplete 
i mean clangd should work on msl with a little trickery
if i could just get it to see the include paths
I don't get autocomplete at work 
well
it works sometimes
the other times it's borked
what the fuck
because of the cursed macro magic
oh
leme show lmao
oh no
fwiw xcode's autocomplete is also broken
so the situation is not better using apple's own stuff 
when I worked as an ue monkey vs proper (i.e. not vscode) would also be broken a lot
so I had to use,, you guessed it, git grep
for ue
yes
and ue code is NOT straightforward to navigate

god I love when my sbc randomly reboots for no reason
alright I give up I'll boot into stable kernel
probs something is borked in kernel with my changes
so uhh to give you the first impression
vsc doesn't believe this is a function (but it is)

i'm working on UE rn and clangd works almost perfectly tbh
autocomplete? yeah sure buddy
all the UHT macro spam resolves fine too
btw do you work on gippy drivers?
vX files are compiled once per gpu arch
with panvk_per_arch and bunch of other things substituted with names identifying the arch
only similar issue i ever had at work was certain libs murdering the ts language server
just need to wait 20 min for reindex when i realize i didn't include a target when genning compile_commands 
I'm not a fan of this approach tbh but the other extreme is doing things like radv does and I feel like that's worse
in drivers using genx approach (like the above), the hw stuff is represented with structs
in drivers like radv which don't have this cursed macro magic things are arguably more tedious, you put things into bags of uint32_ts or w/e by yourself, shift and mask bits around, etc
anyway back into the kernel mines I go,,
so either way you don't get typechecking lol
you do get typechecking with genx approach
it's just that fancy tools fall apart
well yea at compile time right
hopefully eventually things get rustified at least a lil
(and hopefully it won't be taking morbillion years to compile, as many rust codebases seem to tend in that direction)
it only uses 50 gigs of ram while doing that 
and 70% of a 64 core threadripper 
my work laptop has 16G soldered I'm suffering
I got 48G for gf so she can have morbillion layers in krita
but I myself continue suffering
i had a work laptop and it was too slow so they sent me a workstation 
I compile stuff on the sbc
easier than cross compiling 
and the sbc is fast enough for compiling linux and mesa (though not vk validation layers, or vk cts, those are too chungus)
i need the threadripper and UBA to compile in a sane amount of time
(still takes 20 min when i get a new build)
the only downside is that it has an nvme and in summer nvme would get so hot in some workloads it would trip the thermal fuse
so like yeah
it would throttle the soc down into oblivion but it would still not help because even fully throttled cpu is fast enough to feed nvme with enough work to blow past whatever is the thermal threshold
still, your iteration times seem a lot nicer than mine 
in certain workloads
yes
I hate C++
I did some work on ANGLE in the past
it was pain
my iteration times suck less because of c++ and more because of UE
well in case of ANGLE it's also down to how ANGLE uses C++ but ime a lot of C++ codebases just tend to have horrid compile times
so unless there's additional knowledge/assumptions, I prefer to avoid working with C++
but yeah UE also has this uhh
the last time I touched UE was in 4.26 times or something
I do recall there was a weird step before actual compile taking like 10-20 seconds and barely utilizing the hw
UHT?
also my workstation at the time was i7 6700 (non-k) iirc
though when I moved to wfh I got 5900x
5900x was p good tbh
though
I do feel like windows was making an impact on compile times
ok actually i have to compile the engine 4 times, so uh
might be better if you don't
I remember git grep being noticably slower on my 5900x with nvme ssd running windows than on my laptop with i5 6300U with sata ssd running fedora
I think I had to tinker with windows guard or w/e it's called to make things not garbage
but ofc it became the fastest when I stopped being an ue monkey, wiped windows and my 5900x became busy running linux instead of windows
Can't you force treesitter to highlight it as a cpp file ?
It's cope but atleast it will be better than this lmao
@winged veldt have you considered getting rid of LightSample::pos in favor of dist or smth
so pos would be equivalent to hit.pos + LightSample::wi * LightSample::dist
infinite lights like env light could specify dist = inf
so then the code that handles light sampling would do min(ls.dist, max ray dist or w/e)
in my toy program I also decided to factor out Li into its own function because Li will become non-trivial to compute once materials are wired into the lights so it'd be preferrable to only compute Li if the point has been hit
hm, what's the advantage here?
over what i have now
that way you could push it onto the whatever is doing light sampling
do you plan to make emission value be defined by a texture
if so you'd probs benefit from doing this too
yep
hm yart had that did i not port it over to platinum yet
i frogot it's been a while
but yeah i'll make li a function if it gets any more complex
afaiu you can't have maxt=inf
but you can know the scene bounds
which'd be effectively inf ig
and arguably lights should not be the ones handling that
nothing stopping you from just not setting it afaik
at least in metal
inderesding
yeah lights shouldn't have to know the scene bounds
i mean ig you'd set it to float_max or something
which is effectively infinite
I guess yeah
yeah i'll keep that in mind it'll be cleaner
rn i'm trying to find what's causing all the fireflies
i want to improve overall render/code quality
select random parts of the brdf and only render them
like disable lobes?
Still have them with just brute force path tracing? If so, this may be the BSDF, otherwise probably something with light sampling?
Also @winged veldt , what if for GMoN we could look up the neighbor pixels to get more sets for not more VRAM?
Each pixel does 3 sets
And for computing the median of means, we could look at say the pixel to the right, left, above, below to add 12 more sets
That deviates a bit since neighbor statistics aren't the same as center pixel statistics but with pixels this close, this could work very well?
@winged veldt when switching to spherical trongle sampling,,
uniform solid angle sampling?
i don't understand it tbh, brain too smooth 
mood
i read the pbrt stuff and it was so much code just to sample a tri
i tried to implement it in yart (my earlier cpu path tracer) and gave up 
if you know of a simpler implementation i can look at that would be super helpful tbh
i've since learned a lot of pbrt code is overcomplicated and hard to parse
that's true
I don't like pbrt's light sampling interface for example
there's too much data in the light sample and it also has different interfaces for infinite and finite lights instead of unifying the two
yeah i don't like pbrt much in general, the book is great for theory but the code is way bloated if you're trying to figure out the concepts
yea
hm, wouldn't this blur the image? unless you're considering the neighbors for the confidence function only and doing the mon on the pixel buckets
also that seems closer to that other method for yeeting fireflies that's mentioned in the gmon paper
What is a spherical triangle. Three non-collinear points form a plane
trongle projected onto a sphere
neat, treesitter will happily highlight msl as if it was c++
so even if i can't get clangd to work, i at least get syntax highlighting and formatting
that's sweet
I mean using the samples of the neighbors to decide on your MoN at the center pixel
It wouldn't be perfect but probably better quality/VRAM ratio
hm
possibly, but vram usage isn't my main concern with gmon
it's the effect it has on caustics and darker areas
Ah I see
How many sets do you use? I tend to use only like 5/7
I'd rather have a tiny bit of fireflies (but even with only 5 sets they are extremely mild) left than massive darkening because of using 21 sets or something
I think RWMC darkens a bit less than GMoN but still a little depending on the variance if that's a solution 😅
Or just get a better GI / caustics integrator 
"shrimply git gud" is always the best idea
yeah that's the plan atm
debug the bsdf and integrator to see what's causing the fireflies
as for caustics i do eventually want to add bdpt, path guiding and such
Have you tried that yet?
nope, haven't had time for it
naive integrator also has firefly issues so yeah most likely the bsdf
i mean good news is, probably not a mis bug
bad news is, probably a bsdf bug 
the fireflies look worse in the mis+nee renders but that's mostly because the noise floor for naive is much higher so they're less obvious, but i think they're just as bad
here's a set of "control" renders i'll be using to test the firefly stuff
cornell box renders are 10k spp naive and 1k spp mis+nee, veach lights thingy (turned out to be a great scene to expose fireflies) is 20k spp naive and 1k spp mis
alright test 1: replace the entire bsdf with a simple lambertian diffuse
absolutely no firefly issues in the naive renders as one would expect (the veach scene is super noisy ofc, it's literally designed to be a bad case for a naive path tracer)
mis renders are showing some fireflies though, especially visible in the veach scene
but also some in the dragons and even the basic cornell box has some brighter pixels in the small box closer to the camera if you look closely
so there are mis issues, i'll go ahead and fix those first
oh now i remember why i had it set to skip nee if the last hit was emissive @keen bear
i suspect what's happening here is ray hits a light -> nee ray samples the same light -> distance is near zero, numerical instability causes the sample to blow up
which i fixed by simply skipping nee if the hit was on an emissive material, idk if there's a better way there probably is
But the cosine term here should be 0, literally, very close to at least so it shouldn't firefly?
If anything, enabling MIS usually fixes some fireflies for me with everything Lambertian, probably because the BSDF sampling helps in some places where NEE is unstable
yeah, i suspect some numerical instability with near zero / near zero
i gotta eep now, but i'll keep looking into it tomorrow
the veach test one
it's not mis vs no mis, btw
i'm always testing mis+nee
i don't have a nee only integrator
so what's on the right?
here
ah NEE vs no NEE


