Rosy | Graphics Programming | Page 23

cloud rivet Dec 13, 2025, 12:04 AM

#

servers yes

#

hrm

brisk chasm Dec 13, 2025, 12:04 AM

#

bjorn switching to linux fulltime, you read it here first

vivid barn Dec 13, 2025, 12:06 AM

#

anyway for memory management I'd prefer to stick to at least cudaMallocManaged as "default"

#

that absolves you from the need to do cudaMemcpy and such

#

it can be a perf footgun in a way because something might be sticking around in host memory while being very heavily hammered by the device but there's cudaMemAdvise to migrate it around

cloud rivet Dec 13, 2025, 12:11 AM

#

I still have to read the cuda programming guide

#

I've been focused on the optix side of things

#

it's got a full chapter on unified memory

#

given my current progress, unless there's some massive hidden whopper that I cannot yet see that's going to hit me right in the face with OptiX, there's likely no chance I'm going to go back to try to do graphics again with Vulkan. It's like 10x less pain with CUDA and OptiX. I made the right choice.

vivid barn Dec 13, 2025, 12:22 AM

#

truly joever

cloud rivet Dec 13, 2025, 12:22 AM

#

right for right now anyway

#

ya

#

idc it's fun

#

going to uh try to make this suzanne appear now I guess

astral hinge Dec 13, 2025, 12:53 AM

#

this is the only thing I'm unsure about when it comes to submodules

#

it seems like a pain to specify a specific commit

brisk chasm Dec 13, 2025, 1:02 AM

#

forking and tagging them yourself feels nicer, if you dont want to rely on the upstream repo

#

or you git gud

cloud rivet Dec 13, 2025, 1:06 AM

#

astral hinge it seems like a pain to specify a specific commit

I just did:

cd external\fastgltf
git checkout v0.9.5
cd ..\..
git add -i 
...
git commit

works?

#

you mean in cmake?

#

I guess if I wanted that in cmake I'd use FetchContent_Declare

#

but anyone closing the repo and checking out a commit would get the right versions of your git submodule depenedencies since it is committed in .gitmodules

brisk chasm Dec 13, 2025, 1:19 AM

#

FetchContent_Declare(
    fastgltf
    GIT_REPOSITORY https://github.com/spnda/fastgltf.git
    GIT_TAG        v0.9.0
)

message("Fetching fastgltf")
set(FASTGLTF_USE_CUSTOM_SMALLVECTOR OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_TESTS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_EXAMPLES OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_DOCS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_GLTF_RS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_ASSIMP OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_DEPRECATED_EXT ON CACHE BOOL "" FORCE)
set(FASTGLTF_DISABLE_CUSTOM_MEMORY_POOL OFF CACHE BOOL "" FORCE)
set(FASTGLTF_USE_64BIT_FLOAT OFF CACHE BOOL "" FORCE)
set(FASTGLTF_COMPILE_AS_CPP20 OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_CPP_MODULES OFF CACHE BOOL "" FORCE)
set(FASTGLTF_USE_STD_MODULE OFF CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(fastgltf)

#

and then

target_link_libraries(YourProject
    PRIVATE fastgltf
)

#

then you can #include <fastgltf/fastgltf.hpp>

#

or with cpm

# this file would sit in libs/fastgltf.cmake
include(../cmake/CPM.cmake)

CPMAddPackage(
    NAME            fastgltf
    GIT_REPOSITORY  https://github.com/spnda/fastgltf.git
    GIT_TAG         2483b87d19a67c91f2bffc386082001d3ea8bc07
    OPTIONS         "FASTGLTF_USE_CUSTOM_SMALLVECTOR OFF"
    OPTIONS         "FASTGLTF_ENABLE_TESTS OFF"
    OPTIONS         "FASTGLTF_ENABLE_EXAMPLES OFF"
    OPTIONS         "FASTGLTF_ENABLE_DOCS OFF"
    OPTIONS         "FASTGLTF_ENABLE_GLTF_RS OFF"
    OPTIONS         "FASTGLTF_ENABLE_ASSIMP OFF"
    OPTIONS         "FASTGLTF_ENABLE_DEPRECATED_EXT ON"
    OPTIONS         "FASTGLTF_DISABLE_CUSTOM_MEMORY_POOL OFF"
    OPTIONS         "FASTGLTF_USE_64BIT_FLOAT OFF"
    OPTIONS         "FASTGLTF_COMPILE_AS_CPP20 OFF"
    OPTIONS         "FASTGLTF_ENABLE_CPP_MODULES OFF"
    OPTIONS         "FASTGLTF_USE_STD_MODULE OFF"
)

#

CPM can install itself with a 2 liner

cloud rivet Dec 13, 2025, 1:45 AM

#

ok looking at the OptiX_Apps repo, I like how they do it, gpu pointer to vertex data and pointer to indices and material and light go on the SBT record for a shader, you get the primitive index (the index of the triangle hit), using the triangle index you look up the vertex indices for the triangle you hit via the indices buffer pointer that you get in your closest hit shader, that gives you a way to look up whatever you stored per vertex (position, normal, uv color etc) in the gpu memory, given a defined stride, and then you can also use that triangle index to look up your material

astral hinge Dec 13, 2025, 1:46 AM

#

cloud rivet I just did: ``` cd external\fastgltf git checkout v0.9.5 cd ..\.. git add -i .....

is this what you do to add a submodule?

cloud rivet Dec 13, 2025, 1:46 AM

#

the per instance stuff I'll stick at the beginning of the gas vertex memory

cloud rivet Dec 13, 2025, 1:47 AM

#

astral hinge is this what you do to add a submodule?

well I do a git submodule add & git checkout tag etc but yea

#

I mean do that very rarely

#

so not a big deal in my opinion

astral hinge Dec 13, 2025, 1:47 AM

#

hmm I see

#

the only way I use submodules is by right clicking and going to tortoisegit then clicking "add submodule" frog_bath

cloud rivet Dec 13, 2025, 1:48 AM

#

using ui for git is a crime

wraith urchin Dec 13, 2025, 1:48 AM

#

tortoisegit is pretty good

astral hinge Dec 13, 2025, 1:48 AM

#

I only use git through UIs hehe

#

UIs that are G

cloud rivet Dec 13, 2025, 1:49 AM

#

I only use the cli with git

wraith urchin Dec 13, 2025, 1:49 AM

#

I use both but its really easy to fuck up complicated workflows like rebase interactive with cli

cloud rivet Dec 13, 2025, 1:49 AM

#

I dunno, I just do git rebase -i and it works

#

I rebase, I don't do merges

wraith urchin Dec 13, 2025, 1:50 AM

#

but then you get dumped into a text editor (probably vim)

astral hinge Dec 13, 2025, 1:50 AM

#

Last time I tried rebasing was at an internship and I fucked it up lol

cloud rivet Dec 13, 2025, 1:50 AM

#

a git merge is also a crime

wraith urchin Dec 13, 2025, 1:50 AM

#

Yeah rebase is a better workflow imo

cloud rivet Dec 13, 2025, 1:50 AM

#

I work on a team with like 30+ engineers in a monorepo and we almost never fuck it up

wraith urchin Dec 13, 2025, 1:50 AM

#

or rebase then merge

cloud rivet Dec 13, 2025, 1:51 AM

#

everyone working in branches together and we do ok

wraith urchin Dec 13, 2025, 1:51 AM

#

Hey I mean whatever works for you

cloud rivet Dec 13, 2025, 1:51 AM

#

yes

wraith urchin Dec 13, 2025, 1:51 AM

#

but I like my UIs

astral hinge Dec 13, 2025, 1:51 AM

#

my git history is long and linear

cloud rivet Dec 13, 2025, 1:51 AM

#

same

#

super flat

astral hinge Dec 13, 2025, 1:52 AM

#

last time there was a branch was when saky was adding his sky stuff to it

cloud rivet Dec 13, 2025, 1:52 AM

#

oh, well I work in branches

#

I just merge them and delete them when I am done with it

#

I can't live without branches

#

also I get AI reviews that save my life

wraith urchin Dec 13, 2025, 1:53 AM

#

At work we all do branches, but for ASO I only do a branch if I'm gonna really make a huge breaking change, so I can easily get back to the working version

cloud rivet Dec 13, 2025, 1:53 AM

#

yeah I will branch off in my branch also, to try experimental stuff

#

like when I'm not sure something will work

#

I feel dirty pushing to main, I don't want to get in the habit of doing it, I'll accidentally will do it at work

#

I am an org admin so none of the branch protection rules apply to me

#

so I can accidentally do it

wraith urchin Dec 13, 2025, 1:55 AM

#

For solo dev I feel no need to be overly rigorous about my workflow, but on a team, yeah it makes sense to be careful

spiral ice Dec 13, 2025, 2:00 AM

#

astral hinge Last time I tried rebasing was at an internship and I fucked it up lol

Interesting, almost all places I worked uses a "rebase before merging" workflow to preserve linear history

cloud rivet Dec 13, 2025, 2:01 AM

#

merge commits are horrible

#

you want to revert a PR, and it has a merge commit in it you are in big trouble

#

if this is in an incident, it's going to increase the time of impact

#

I'm like at the point now where I pretty much know exactly what I need to do, I just have to do the work.

#

I will break it up into tiny small surgical code changes, and verify it along the way. I don't like doing a big bulk of work with tons of changes in them that could be hell to debug if there's UB or something in there

#

so I think the first thing is to just capture the gpu memory pointer to the vertices data and stop freeing it after making the gas

#

then stick the pointer to it on the SBT record for the ch shader

#

then add the normals to it, and get the normals from there instead of from the beginning of the buffer

#

then add vertex colors

#

and then once I have this all setup, I can extract fastgltf data to GPU memory in the same way and it should just work

#

if I get this working today, maybe tomorrow I can start on textures, I have no idea how to do textures yet

#

or race track, not sure

#

probably race track, eager to get it back, I can't actually use it though as my controller is at home

vivid barn Dec 13, 2025, 2:33 AM

#

cloud rivet ok looking at the OptiX_Apps repo, I like how they do it, gpu pointer to vertex ...

light? sbt record? guh

cloud rivet Dec 13, 2025, 2:41 AM

#

vivid barn light? sbt record? guh

I don't actually know why it's there, I just saw it was, maybe it means emissive?

vivid barn Dec 13, 2025, 2:41 AM

#

I guess

cloud rivet Dec 13, 2025, 2:42 AM

#

man emissive materials

#

I have never had such a thing

#

I'm pretty excited about it

cloud rivet Dec 13, 2025, 3:39 AM

#

#1248864886772072591 message

#

#1248864886772072591 message

#

man

#

that was 1.5 years ago

#

also after I had worked on blockens

astral hinge Dec 13, 2025, 3:40 AM

#

believe it or not, that was 10 years ago

cloud rivet Dec 13, 2025, 4:04 AM

#

I have vertex colors now and an per vertex normals

#

ready for some gltf loading I think

#

I didn't pick the best colors for my cube mesh

brisk chasm Dec 13, 2025, 10:09 AM

#

time flies to fast, we need more gravity

cloud rivet Dec 13, 2025, 10:28 AM

#

brisk chasm Dec 13, 2025, 10:29 AM

#

i still have no shadows : >

cloud rivet Dec 13, 2025, 10:29 AM

#

you have mipmaps though

#

man RT looks great

#

even with such a tiny model

#

I actually don't have any lighting math. not even n dot l

#

I just did shadows

#

if occluded color *= 0.5f

astral hinge Dec 13, 2025, 10:37 AM

#

you can also do reflections with the same level of complexity

#

well, slightly more

cloud rivet Dec 13, 2025, 10:37 AM

#

it's pretty cool

#

I probably am like a month ahead of where I would have been had I done it in vulkan instead I think

#

who knows

broken fog Dec 13, 2025, 4:09 PM

#

yeah rt is great, you can basically get full gi, hard+soft shadows and perfect specular or rough reflections in 100 lines of code

#

making it fast otoh,,,

cloud rivet Dec 13, 2025, 10:36 PM

#

Walking around Bremerton today

cloud rivet Dec 13, 2025, 10:55 PM

#

Found a nice hobby store, they have Critical Role stuff froge_love

#

Too big to bring back with me though

#

cloud rivet Dec 13, 2025, 11:18 PM

#

This place is busy

cloud rivet Dec 14, 2025, 5:10 AM

#

hrm

#

I think OptiX supports mipmaps through cuda texture objects already

#

I don't think I have to like write my own solution

#

well

#

I'm not sure, I see:

optixTexFootprint2D
optixTexFootprint2DGrad
optixTexFootprint2DLod

which take cuda texture objects and have mipmap parameters, I don't really understand how they work

#

the optixWhitted example in the SDK samples a texture

cloud rivet Dec 14, 2025, 5:55 AM

#

https://forums.developer.nvidia.com/t/optix-7-2-build-in-texture-mipmapping/185076

NVIDIA Developer Forums

Optix 7.2 Build in Texture Mipmapping

Hi I don’t see any option for automatic mipmap generation for textures in Optix. Is there a way for this to work? Thanks.

#

it's all via cuda

#

I wonder what optixTexFootprint2D help out with

#

https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Texture.cpp#L1277

#

For that you would usually implement tracking of differential rays inside the ray tracer which would allow calculating the current derivatives at a hit point which in turn lets the resp. texture grad function select the best mipmap LODs.

#

womp womp

#

yeah I guess that didn't make any sense to me anyway

cloud rivet Dec 14, 2025, 6:25 AM

#

yeah I'm going to figure out textures

#

fuck it

#

we work on what we want to work on at any given moment gpAkkoShrug

astral hinge Dec 14, 2025, 7:03 AM

#

cloud rivet fuck it

we'll do it live

cloud rivet Dec 14, 2025, 7:13 AM

#

ok so Suzanne has a texture

#

so my first thing I'll try is to just create a cuda texture object without it creating an error

#

so the first actual thing is to just load the image

#

I'll try and use SDL for that

#

since I already have it as a dependency

#

i spent a lot of time today reading the optix programming guide, I have a much better understanding of shader binding tables now

#

I think it's maybe the best explanation of an SBT I have read so far

#

how that I understand how the records work and how it's associated with geometry and instances it makes a lot of sense

#

it's just a way to index into a table to get data, and the acceleration structure on a hit returns the sbt-index which I use to get the data that was configured by me in the sbt record

#

in my mind an sbt record is basically working similar to a push constant, except instead of me supplying it per draw, I set up the shader program sbt record with params, and the acceleration structure gives me the sbt-index to that record, plus the index of the triangle I hit, and then I take that sbt index to get the sbt record look up all my data in a buffer just like I would use the data in a push constant to look up all my data in a buffer for my slang shader

solid grove Dec 14, 2025, 7:25 AM

#

cloud rivet I'll try and use SDL for that

sdl3 can load PNGs now, don't even need to add SDL_image

#

https://wiki.libsdl.org/SDL3/SDL_LoadPNG

SDL3/SDL_LoadPNG

The Simple Directmedia Layer Wiki

#

i think they are just wrapping stb_image though. SDL_image uses the real libpng

cloud rivet Dec 14, 2025, 7:26 AM

#

hrm

#

well suzanne has a png so I will use SDL3 since I already have it

#

didn't realize SDL_Image is a separate library

solid grove Dec 14, 2025, 7:27 AM

#

image decoders can get pretty big, i think they keep it separate so as to not make the main lib huge

#

and to keep it free of external deps

cloud rivet Dec 14, 2025, 7:27 AM

#

it's already huge

solid grove Dec 14, 2025, 7:28 AM

#

and that's after I went in and trimmed out stuff from the decoder libraries. it was like 400mb before that

cloud rivet Dec 14, 2025, 7:30 AM

#

SDL repos are absolute units

#

I want to load exr

#

so I will eventually get SDL_image

#

but not right now

solid grove Dec 14, 2025, 7:32 AM

#

sdl_image doesn't load exr

cloud rivet Dec 14, 2025, 7:32 AM

#

wtf

#

tinyexr then

solid grove Dec 14, 2025, 7:33 AM

#

I use tinyexr. I tried getting the real OpenEXR SDK building but its dependency math lib requires RTTI to compile. the offline world is wild

cloud rivet Dec 14, 2025, 7:33 AM

#

gross

#

I remember reading about the math library for openexr when I was investigating how to write my own exr lib

astral hinge Dec 14, 2025, 7:34 AM

#

solid grove I use tinyexr. I tried getting the real OpenEXR SDK building but its dependency ...

how is that wild

cloud rivet Dec 14, 2025, 7:34 AM

#

it got pulled out of openexr into its own project if I recall

#

I actually got it to build

solid grove Dec 14, 2025, 7:35 AM

#

why does a math lib need RTTI?

cloud rivet Dec 14, 2025, 7:35 AM

#

to test it

#

the cli tool

astral hinge Dec 14, 2025, 7:35 AM

#

well disabling language features is on you

cloud rivet Dec 14, 2025, 7:38 AM

#

it was a bit involved to get that thing to build if I recall, it was a while ago, and it was related to the math library

#

I don't remember what the specific challenge was

#

I didn't disable any language features

#

once I had the cli tool it was really cool, it has a lot of features

#

you can change the compression, even remove compression

astral hinge Dec 14, 2025, 7:42 AM

#

cloud rivet I didn't disable any language features

oh I was responding to ravbug

cloud rivet Dec 14, 2025, 7:57 AM

#

hrm

#

I don't see SDL_LoadPNG

#

https://wiki.libsdl.org/SDL3/SDL_LoadPNG says it's in SDL_Surface.h

#

This function is available since SDL 3.4.0.

#

wonder what version is shipped in the vulkan sdk

#

GitHub Repo: sdl3, Version Tag: release-3.2.26

#

womp womp

#

hrmmm

#

I'll just use SDL_image, I don't want want to change my SDL

#

let's see how hard that is

#

3.4 is not even released yet

#

the latest pre-release is 3.3.4

astral hinge Dec 14, 2025, 8:20 AM

#

you could also use stb image

cloud rivet Dec 14, 2025, 8:23 AM

#

C:\Users\swart\projects\pixel_storm>git submodule update --init --recursive
Submodule 'external/aom' (https://github.com/libsdl-org/aom.git) registered for path 'external/SDL_image/external/aom'
Submodule 'external/dav1d' (https://github.com/libsdl-org/dav1d.git) registered for path 'external/SDL_image/external/dav1d'
Submodule 'external/jpeg' (https://github.com/libsdl-org/jpeg.git) registered for path 'external/SDL_image/external/jpeg'
Submodule 'external/libavif' (https://github.com/libsdl-org/libavif.git) registered for path 'external/SDL_image/external/libavif'
Submodule 'external/libjxl' (https://github.com/libsdl-org/libjxl.git) registered for path 'external/SDL_image/external/libjxl'
Submodule 'external/libpng' (https://github.com/libsdl-org/libpng.git) registered for path 'external/SDL_image/external/libpng'
Submodule 'external/libtiff' (https://github.com/libsdl-org/libtiff.git) registered for path 'external/SDL_image/external/libtiff'
Submodule 'external/libwebp' (https://github.com/libsdl-org/libwebp.git) registered for path 'external/SDL_image/external/libwebp'
Submodule 'external/zlib' (https://github.com/libsdl-org/zlib.git) registered for path 'external/SDL_image/external/zlib'

#

lol

#

it's git submodules all the way down

cloud rivet Dec 14, 2025, 8:24 AM

#

astral hinge you could also use stb image

I should have, but I would be a quitter if I gave up now

astral hinge Dec 14, 2025, 8:25 AM

#

well idk what sdl's png library's trade-offs are, so I can't judge

cloud rivet Dec 14, 2025, 8:25 AM

#

probably not worth it whatever they are

astral hinge Dec 14, 2025, 8:26 AM

#

yeah being an unfamiliar library is a disadvantage in itself

cloud rivet Dec 14, 2025, 9:11 AM

#

#

I switched from vulkan sdl to git submodule anyway

#

forat 😅

#

that dependency change dramatically increased my compile time for that file

#

it's like 8 seconds

#

brutal

#

I'm going to put that in its own TU tbh

#

excited about having textures finally again

#

this project is cool

#

tomorrow I'll try and actually make a cuda texture object from this data

cloud rivet Dec 14, 2025, 9:52 AM

#

cloud rivet in my mind an sbt record is basically working similar to a push constant, except...

I kind of like this push constant mental model with regards to understanding SBTs. I want to refine it a bit:

SBTs are arrays of data that is similar to data supplied per draw in a graphics pipeline including what shaders to use and what materials are needed.

The difference is that with ray tracing you have to supply all that data at once, so it's put into a table that is constructed in such a way that a ray trace knows how to get the index into that array via the acceleration structure upon a hit.

graphics pipeline -> supply context to render via binding/push constants/uniforms for each draw
ray trace -> supply context to render via indexes into arrays of data in a table for each ray trace

brisk chasm Dec 14, 2025, 10:36 AM

#

i was building openexr just last night, it was a little kick in the nuts i have to admit

vivid barn Dec 14, 2025, 4:12 PM

#

SBT entry in my head is function pointer + associated data sitting next to it

cloud rivet Dec 14, 2025, 7:46 PM

#

Oh it is yeah

#

Nice, I went from being confused a couple of days ago to feeling pretty good about SBTs

#

Now I want to clear up cuda texture objects and ray differentials

#

At least just for primary rays. I am not building a path tracer yet

cloud rivet Dec 14, 2025, 9:22 PM

#

it looks like textures and surfaces are considered legacy in CUDA 13.1

#

they removed almost all mention of them from the new programming guide and are referring people using "existing code bases" to refer to the now legacy version of the programming guide

#

so like functions like tex2D are now only in the legacy guide

#

I asked on the optix forum about the status, since idk what someone using optix is supposed to use now, other than legacy functionality you have to go read about in a doc that's not being updated anymore

https://forums.developer.nvidia.com/t/using-cuda-texture-apis-with-optix-and-the-legacy-cuda-programming-guide/354581

NVIDIA Developer Forums

Using CUDA texture APIs with OptiX and the legacy CUDA programming ...

I’m learning how to use OptiX and I wanted to sample textures in my renderer. The CUDA 13.1 Programming Guide states: Texture and Surface memory are not discussed further in this guide as there is no advantage to using them in CUDA on any currently supported NVIDIA GPU. CUDA developers should feel free to ignore these APIs. For developers...

#

well they still exist so I will use them

brisk chasm Dec 14, 2025, 9:34 PM

#

so what do they sell now? pixelbuffers? ididnotread

cloud rivet Dec 14, 2025, 9:40 PM

#

nothing I guess, read from linear memory using a pointer

#

I could do that tbh

brisk chasm Dec 14, 2025, 9:42 PM

#

ah even "lower level" so to speak

cloud rivet Dec 14, 2025, 9:42 PM

#

well the uvs map to some quad, and I randomly sample from that region

brisk chasm Dec 14, 2025, 9:43 PM

#

its a raytracing engine after all

cloud rivet Dec 14, 2025, 9:47 PM

#

but then I'm copying uncompressed images to the device?

brisk chasm Dec 14, 2025, 9:48 PM

#

perhaps optix/cuda has extensions for compressed formats?

#

or supports that natively somehow

#

or you have to go with vulkan for that part?

cloud rivet Dec 14, 2025, 9:50 PM

#

cuda texture objects support block compressed formats

#

but the documentation for working with textures is in the legacy docs

brisk chasm Dec 14, 2025, 9:52 PM

#

then the material has to come from ray intersections only

#

no textures, just material properties

cloud rivet Dec 14, 2025, 9:54 PM

#

I'm just going to use the stuff the way the sdk does frogshrug

cloud rivet Dec 14, 2025, 10:28 PM

#

Maybe i can send the block compressed image to the device and since the shader is just C++ just use a compression library in the shader

#

Or maybe the NVIDIA texture tools sdk has something in it for device side

cloud rivet Dec 15, 2025, 3:00 AM

#

When passing device pointers to NVTT, make sure the pointer refers to memory NVTT's device (and runtime API context if using the CUDA Driver API) can access. Similarly, when accessing data from device pointers returned from NVTT, make sure the current device (and context, if using the CUDA Driver API) can access allocations made by NVTT's device using the CUDA Runtime API. Device pointers are returned from nvtt::Surface::gpuData(), and used in the low-level GPU compression function API in nvtt_lowlevel.h.

#

seems like I can

#

// Shows how to use NVTT 3's low-level GPUInputBuffer API to compress a texture
// directly from a CUDA buffer. Using this API allows the input and output to
// exist on the GPU, avoiding GPU-to-CPU and CPU-to-GPU copies.

#

https://github.com/nvpro-samples/nvtt_samples/blob/main/cuda_input/main.cu

#

it just uses a cuda texture object lol

#

I suspect the response I'm going to get to my question is to continue to use tex2D and texture objects, despite it being "legacy". All of their sample code across three different NVIDIA platforms use cuda texture objects.

cloud rivet Dec 15, 2025, 3:59 AM

#

It is recommended programs use cudaMallocHost() to allocate buffers which
will be used to send or receive data from GPUs.

#

oh

#

Allocates page-locked memory on the host

#

Page-locked memory will not communicate with hard drive. Therefore, the efficiency of reading and writing in page-locked memory is more guaranteed.

#

Host to GPU copies are much faster when they originate from pinned (page-locked) memory.

cloud rivet Dec 15, 2025, 4:51 AM

#

#

I want mipmaps, but I don't think Suzanne's model has a good texture to see if mipmaps look good. I think the chess board ABeautifulGame gltf is better for that, but that's a whole scene. I think I will work on loading proper scene next, using ABeautifulGame and then work on mipmaps.

I think at this point though I could load my track and vehicle at any point and I could be working on my game, but I want to figure out scene material things first. Once I get tired of rendering stuff I can go back to my game.

solid grove Dec 15, 2025, 5:02 AM

#

cloud rivet

the council will decide your fate

pseudo dock Dec 15, 2025, 5:06 AM

#

the council will decide your fate
Does The Combine know that headcrabs are taking their orders from Suzanne?

cloud rivet Dec 15, 2025, 5:07 AM

#

I'm going to pitch valve on supporting this project

pseudo dock Dec 15, 2025, 5:09 AM

#

Also:

#

(Yes, after warming up the kids with a Half-Life 2 reference I did pull out a Transformers: The Movie reference 👴 )

cloud rivet Dec 15, 2025, 5:26 AM

#

maybe I use deccer cubes to work on scene loading, and then abeautifulgame for mipmaps

vivid barn Dec 15, 2025, 5:29 AM

#

for mipmaps you can make a tube scene

#

where the camera is placed inside a tube, facing along it

#

basically

#

or make a tube game

cloud rivet Dec 15, 2025, 5:41 AM

#

hrmm a tube game

vivid barn Dec 15, 2025, 5:42 AM

#

there was a game I played a few times at a friend's very long time ago as a smol kid

#

that I'm reluctant to name because the company behind it is cringe now

#

"lagsters"

#

it was basically racing but within a tube track

#

so you didn't have to turn but you had to like

#

position yourself within the tube to avoid obstacles

#

and pick up power ups

#

looking down a tube is going to be very anisotropic

#

could also be a use case for curve primitives

cloud rivet Dec 15, 2025, 5:44 AM

#

I like the idea, thank you

astral hinge Dec 15, 2025, 6:03 AM

#

vivid barn position yourself within the tube to avoid obstacles

temple run

shut hornet Dec 15, 2025, 10:02 AM

#

vivid barn looking down a tube is going to be very anisotropic

this game was in a web browser in 2001 or something. https://www.youtube.com/watch?v=urIqLheLwtU

YouTube

Bryan Villard

Betty Bad (WildTangent Original) Gameplay 1

Here's the full verison
http://lostgames00.blogspot.com/2016/02/here-are-list-of-wildtangent-games-that.html

To play the Full version:

first create both a file "Shockwave.com"1st and "Betty Bad"2nd on Program File before installing,
then Browse the destination folder to Program Files.

For Example:
C:\Program Files\Shockwave.com\B...

▶ Play video

#

feel like we are just getting scammed non-stop

#

actually, I should probably cancel all projects and do a remake of this 🤔

cloud rivet Dec 15, 2025, 4:35 PM

#

shut hornet this game was in a web browser in 2001 or something. https://www.youtube.com/wat...

that's fun looking

#

hrmmm

#

a tube game is a series of tubes

#

I'd need to make some textures for the tubes

vivid barn Dec 15, 2025, 4:37 PM

#

shut hornet feel like we are just getting scammed non-stop

what

cloud rivet Dec 15, 2025, 4:41 PM

#

I thought it was a joke

#

big GPU is always trying to sell us more GPUs when were having fun with what we had already

#

but no, I want more GPUs in reality

#

with more stuff

vivid barn Dec 15, 2025, 4:52 PM

#

real

cloud rivet Dec 15, 2025, 8:47 PM

#

https://youtu.be/YFrP03KuMZ8?si=o2S9f2_DMDUCRUTW

YouTube

NVIDIA Developer

Deep Dive: How to Use cuTile Python

With the largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a new virtual instruction set for tile-based parallel programming, focusing on the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such...

▶ Play video

#

Pretty interesting

#

cutiles let’s you do array operations where tiles are sub-arrays, opaque objects, that are cheap to copy and unlock better compiler optimizations

#

It’s funny when NVIDIA refers to portability they mean portable across NVIDIA architectures

#

This isn’t available via C++ yet, just python

brisk chasm Dec 15, 2025, 8:53 PM

#

C++ython

cloud rivet Dec 15, 2025, 8:53 PM

#

YouTube learned I am using CUDA

#

I will get CUDA video recommendations now

brisk chasm Dec 15, 2025, 8:55 PM

#

wait until it pivots to Modern OpenGL tutorials (in python)

cloud rivet Dec 15, 2025, 8:55 PM

#

I am pretty CUDA pilled right now. It would be a bitter pill to swallow to go back to vulkan

#

I do miss validation

#

there is validation for optix, I haven't turned it on yet

#

I should actually

#

options.validationMode = OPTIX_DEVICE_CONTEXT_VALIDATION_MODE_ALL;

#

agonyfrog

[OptiX][2][VALIDATION_ERROR]: [TRACE_DEPTH_EXCEEDED] Trace depth exceeded.
    launch index: [891, 367, 0]
    additional occurrences: 9407
The maximum optixTrace recursion depth is exceeded. See OptixPipelineLinkOptions::maxTraceDepth.
[OptiX][2][ERROR]: Error syncing stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
OptiX error: Error during validation mode run (code 7053) at C:\Users\swart\projects\pixel_storm\src\ps_game.cu:821

#

I don't recurse though thinkeyes

#

I have misconfigured something

#

The link options consist of the maximum recursion depth setting for recursive ray tracing, along with pipeline level settings for debugging. However, the value for the maximum recursion depth has an upper limit that overrides an limit set by the link options.

#

ohh

#

I bet it's my shadows

#

  OptixPipelineLinkOptions link_options = {};
  link_options.maxTraceDepth = 1;

vivid barn Dec 15, 2025, 9:16 PM

#

when cluster AS

cloud rivet Dec 15, 2025, 9:17 PM

#

yup that fixed it

cloud rivet Dec 15, 2025, 9:17 PM

#

vivid barn when cluster AS

soon

brisk chasm Dec 15, 2025, 9:17 PM

#

im looking forward to the blog post on our blog explaining all that optix shit

vivid barn Dec 15, 2025, 9:17 PM

#

bjorn in for some suffering

cloud rivet Dec 15, 2025, 9:17 PM

#

I live to suffer

#

I don't think I'm a credible authority on optix, I have been using it for like 2 minutes, but I'm always happy to share how I do things, and if I do, I hope someone tears into it to tell me about all of the millions of things I'm doing incorrectly

brisk chasm Dec 15, 2025, 9:20 PM

#

that wont happen, nano will make sure you are on the right track 🙂

vivid barn Dec 15, 2025, 9:20 PM

#

vk rt transfers pretty straightforwardly to optix, and back (vk has all the features optix has either in form of khr or nv exts)

#

I haven't used any nv ones (I don't have the hw) and my optix knowledge comes from zooming around optix programming guide and cycles' source but I think I'm pretty comfortable looking at optix code

cloud rivet Dec 15, 2025, 9:22 PM

#

yes, it seems very similar to vulkan conceptually

#

I saw there's now a reordering extension with the latest vk sdk

#

I think maybe people interested in vulkan ray tracing might benefit from reading the optix programming guide

#

because it's very good I think

vivid barn Dec 15, 2025, 9:23 PM

#

well these days these features come to those other apis first and to optix second

#

optix docs are pretty bad tbh

#

like even look at optixAccelBuild

cloud rivet Dec 15, 2025, 9:24 PM

#

oh I read the guide for a high level understanding, then look at the sample code for how to use it

vivid barn Dec 15, 2025, 9:24 PM

#

nothing ever tells you the instance array takes a pointer to an array of optixInstance

cloud rivet Dec 15, 2025, 9:25 PM

#

I am constantly switching back and forth between pdfs, and notepad++ to figure stuff out

#

does optix get the feature second? it has shader execution reordering and cluster AS

#

I thought it would get it first

#

what's an example of a feature that vulkan/dx got first?

#

the resource allocation for the vulkan/dx drivers is probably larger than for optix

#

so I guess that makes sense

vivid barn Dec 15, 2025, 9:28 PM

#

cloud rivet does optix get the feature second? it has shader execution reordering and cluste...

vk got khr ser recently but it had nv ser for a few years already

#

nv-specific cluster AS also for like a year or more

cloud rivet Dec 15, 2025, 9:28 PM

#

oh

#

so the vulkan nv version comes out before optix

#

yes that only came out recently on optix

vivid barn Dec 15, 2025, 9:28 PM

#

there's also khr position fetch

#

I think optix had pos fetch in 8 so it might w/eburger

#

ignore that one

vivid barn Dec 15, 2025, 9:30 PM

#

cloud rivet > Host to GPU copies are much faster when they originate from pinned (page-locke...

I'm p sure what it's saying is that when you do copies from/to random host pointer there's an additional host copy from the pointer you passed to device-accessible mapping, or that it's host importing that region

#

while page-locked means it has been mapped to the device earlier

#

and cudaMallocHost can also be swapped to disk you know

#

it's just that because demand paging is generally non-existent for non-cpu-threads (current exception being linux with certain compute apis) faulting happens at submission boundaries

#

and the memory stays basically mlock'ed for the duration of a submission

#

anyway I'd rather not think about cudaMalloc this cudaMallocHost that but just use cudaMallocManaged

#

it's basically cudaMalloc{,Host} in one but you can cudaMemAdvise later to migrate it between host and device

#

also removes the need for some cudaMemcpies like I said

#

it's basically the nicest thing short of fully functioning unified shared memory that you get on cuda linux

#

where you can just pass any random pointer to the device and it just werks, no need for cudaMalloc* of any kind

vivid barn Dec 15, 2025, 9:36 PM

#

cloud rivet I suspect the response I'm going to get to my question is to continue to use tex...

also how are texture objects legacy?

cloud rivet Dec 15, 2025, 9:44 PM

#

vivid barn also how are texture objects legacy?

the new 13.1 CUDA programming guide removed all mention of texture fetch, and says to go look in the legacy programing guide

#

where it is still documented

#

it also says texture memory offers no benefits

#

regarding cudaMallocManaged I don't know, it seems like I know what's happing if I explicitly cudaMemcpy. I should try it

#

and see in nsight what it does

#

to build trust

vivid barn Dec 15, 2025, 9:47 PM

#

cudaMallocManaged is basically just cudaMalloc{,Host} that can be migrated between host and device

#

it doesn't magically do things on its own except on first use

cloud rivet Dec 15, 2025, 9:48 PM

#

going to try it

vivid barn Dec 15, 2025, 9:50 PM

#

cloud rivet the new 13.1 CUDA programming guide removed all mention of texture fetch, and sa...

it says that in a subsection of "device memory spaces"

#

because yes there's no "texture memory" (hasn't been in a while, like 14 years)

#

it doesn't say texture/surface objects are deprecated though

cloud rivet Dec 15, 2025, 9:50 PM

#

reading the cudaMallocManaged huge doc comment

#

no it doesn't say deprecated

#

it says legacy and should no longer be used for texture memory

#

and if you want to know how to make sense out of texture fetching from the CUDA/Optix example code you somehow have to figure out to look at the legacy programming guide

vivid barn Dec 15, 2025, 9:51 PM

#

no it doesn't call them out as legacy apis

#

idk why they're not documented

cloud rivet Dec 15, 2025, 9:52 PM

#

Introduced the new CUDA Programming Guide, the official, comprehensive resource on the CUDA programming model. The guide has been restructured into five parts that cover a language agnostic overview of CUDA, introductory and advanced CUDA programming in C++ and Python, detailed descriptions of specific CUDA features, and technical appendices for reference. The legacy CUDA C++ Programming Guide remains available in this release but is deprecated and includes notices that direct readers to the new guide.

vivid barn Dec 15, 2025, 9:52 PM

#

but it'd be pretty weird because there's no replacement for these apis for cases when you want to sample an image with filtering

cloud rivet Dec 15, 2025, 9:52 PM

#

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

CUDA Toolkit 13.1 - Release Notes — Release Notes 13.1 documentation

The Release Notes for the CUDA Toolkit.

#

The legacy CUDA C++ Programming Guide remains available in this release but is deprecated and includes notices that direct readers to the new guide.

vivid barn Dec 15, 2025, 9:52 PM

#

yes I'm looking at that right now

#

https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/writing-cuda-kernels.html#texture-and-surface-memory

cloud rivet Dec 15, 2025, 9:52 PM

#

it sucks tbh

#

yeah

vivid barn Dec 15, 2025, 9:53 PM

#

I'm compelled to just write it off at docs being a meme

vivid barn Dec 15, 2025, 9:53 PM

#

vivid barn but it'd be pretty weird because there's no replacement for these apis for cases...

^

cloud rivet Dec 15, 2025, 9:53 PM

#

well I asked about it

#

https://forums.developer.nvidia.com/t/using-cuda-texture-apis-with-optix-and-the-legacy-cuda-programming-guide/354581/3

#

just a random person responded, no nvidia devs

vivid barn Dec 15, 2025, 9:54 PM

#

in any case there are reasons to give up hw sampling sure

#

at which point you can also kinda give up hw doing format conversion

#

e.g. if you one day were to decide to implement neural textures you wouldn't be using hw sampling

#

though you'd still benefit from using surface apis to write out results of your renderer

cloud rivet Dec 15, 2025, 9:55 PM

#

how do you give up on hw sampling? can you do sampling with block compression image formats?

#

or just don't use compression?

vivid barn Dec 15, 2025, 9:55 PM

#

you nih your own conversion as seen in neural textures

cloud rivet Dec 15, 2025, 9:55 PM

#

why is that better

vivid barn Dec 15, 2025, 9:56 PM

#

well for a start it lets you not use hw formats but nih your own which might be better

#

neural textures are built on observation that you don't just compress like 3 or 4 channels at once but like 15

#

(the texture sets, like beside rgb base color you also have normal map, microfacet roughness, blend between metallic and dielectric, etc)

cloud rivet Dec 15, 2025, 9:57 PM

#

ohh

vivid barn Dec 15, 2025, 9:57 PM

#

those channels have a lot of correlation

#

so nihing your own format lets you exploit that correlation and potentially win over hw BCn etc compression

#

a prerequisite to giving up on hw sampling is usually switching to stochastic filtering

#

(see filtering after shading)

#

if you're migrating away from hw sampling you usually do that step first

#

that entails replacing your tex2DGrads with texel fetches of randomly chosen texels

#

what were weights in the weighted sum of aniso box filter become PMFs of texels so it's pretty straightforward in that regard

#

neural textures though are kind of a meme tbh

#

don't bother with those

#

filtering after shading is useful though

cloud rivet Dec 15, 2025, 10:01 PM

#

will take a look thanks

#

I have to look up a few of these things

cloud rivet Dec 15, 2025, 10:20 PM

#

so thinking through how to set up a scene, just as first attempt with very minimal materials I am thinking of this approach:

typedef struct ps_texture_t {
  cudaTextureObject_t texture;
  cudaArray_t pixels;
} ps_texture_t;

typedef struct ps_sampler_t {
  bool linear_filtering;
} ps_sampler_t;

typedef struct ps_material_t {
  ps_texture_t base_color_texture;
  ps_sampler_t base_color_sampler;
} ps_material_t;

typedef struct ps_triangle_material_t {
  size_t material_index;
} ps_triangle_material_t;

typedef struct ps_mesh_t {
  ps_vertex_attributes_t *vertices;
  uint3 *indices;
  ps_triangle_material_t tri_material_map;
  CUdeviceptr d_vertices;
  CUdeviceptr d_indices;
  CUdeviceptr d_gas_output;
  OptixTraversableHandle gas_handle;
} ps_mesh_t;

typedef struct ps_node_t {
  ps_m4x4 transform;
  ps_node_t *child_nodes;
  OptixInstance *instance;
} ps_node_t;

typedef struct ps_scene_data_t {
  ps_material_t *materials;
  ps_mesh_t *meshes;
  ps_node_t *nodes;
  ps_node_t *root;
  OptixTraversableHandle ias_handle;
} ps_scene_data_t;

vivid barn Dec 15, 2025, 10:21 PM

#

btw consider making a typesafe wrapper for CUdeviceptr

#

or just use T*

cloud rivet Dec 15, 2025, 10:22 PM

#

why?

vivid barn Dec 15, 2025, 10:22 PM

#

because it gets really hard to follow after at 2 levels of indirection already

#

same way void* is hard to follow

#

anyway

cloud rivet Dec 15, 2025, 10:23 PM

#

oh I see

vivid barn Dec 15, 2025, 10:23 PM

#

looks like roughly good general direction

cloud rivet Dec 15, 2025, 10:24 PM

#

I see so that I'm not passing a random pointer to the wrong function, and I get compile time safety

spiral ice Dec 15, 2025, 10:49 PM

#

cloud rivet so thinking through how to set up a scene, just as first attempt with very minim...

Your node has a pointer to children, but doesn't have a count?

cloud rivet Dec 15, 2025, 10:51 PM

#

Oh yes I need counts

#

This was just me thinking through the structures

shut hornet Dec 15, 2025, 11:40 PM

#

vivid barn what

that we had basically full DX9 capability on the web 25 years ago, and the situation hasn't improved much

vivid barn Dec 15, 2025, 11:43 PM

#

ping @ jasperrlz about it I'm sure he'll instantly tell you where you're wrong

cloud rivet Dec 17, 2025, 2:15 PM

#

cloud rivet Dec 17, 2025, 2:44 PM

#

the video capture is creating all those artifacts when I go in

#

RT looks pretty good. There's no actual lighting in this, just the shadow trace

#

the lack of mipmaps though is brutal

hushed creek Dec 17, 2025, 2:55 PM

#

nice sponzer

cloud rivet Dec 17, 2025, 2:59 PM

#

thanks!

brisk chasm Dec 17, 2025, 3:00 PM

#

nice cubes too

tight torrent Dec 17, 2025, 3:00 PM

#

cloud rivet RT looks pretty good. There's no actual lighting in this, just the shadow trace

Is there any raster going on or is it all rt?

cloud rivet Dec 17, 2025, 3:01 PM

#

yeah there's no graphics pipeline in cuda

#

so it's all RT

tight torrent Dec 17, 2025, 3:01 PM

#

Yikes

#

Calculating the gradient for mipmaps was rough when I was doing full rt

#

Have fun

cloud rivet Dec 17, 2025, 3:02 PM

#

heh

cloud rivet Dec 17, 2025, 9:55 PM

#

I did a C++ metaprogramming thingy

template <typename T>
ps_internal void allocate_list(auto &list, i32 &count, T **ptr) {
  count = list.size();
  if (!count) {
    *ptr = nullptr;
    return;
  }
  *ptr = (T *)malloc(sizeof(T) * count);
  PS_MEM_CHECK(*ptr);
}

void ps_gltf_set_counts(ps_gltf_t *gltf, ps_asset_t *asset) {
  fastgltf::Asset &fgltf = gltf->asset.get();
  allocate_list(fgltf.scenes, asset->num_scenes, &asset->scenes);
  allocate_list(fgltf.nodes, asset->num_nodes, &asset->nodes);
  allocate_list(fgltf.meshes, asset->num_meshes, &asset->meshes);
  allocate_list(fgltf.textures, asset->num_textures, &asset->textures);
  allocate_list(fgltf.images, asset->num_images, &asset->images);
  allocate_list(fgltf.samplers, asset->num_samplers, &asset->samplers);
  allocate_list(fgltf.materials, asset->num_materials, &asset->materials);
}

#

I don't normally do those, but here it seemed like nice to do

wraith urchin Dec 17, 2025, 10:01 PM

#

cloud rivet I did a C++ metaprogramming thingy ```C++ template <typename T> ps_internal voi...

You forgot to add requires( is_trivially_copyable_v<T> )

#

I think this code is UB if the type isn't trivially copyable

#

also why not return the pointer instead of taking a pointer to a pointer?

cloud rivet Dec 17, 2025, 10:07 PM

#

to set it to nullptr if size is empty

#

nothing is being copied here though

#

it's just setting a count and allocating memory

#

my gltf code had so many bugs, the overall approach worked, it was just death by a thousand cuts, and I bled through so many. it's working well now though

#

misplaced increments, wrong pointers used, all kinds of trivial issues that just caused illegal pointer use errors or cuda checks to fail with no helpful error messages, even with optix validation on

cloud rivet Dec 17, 2025, 10:30 PM

#

based on https://en.cppreference.com/w/cpp/language/classes.html#Trivially_copyable_class I think all my types are trivially copyable

#

I just use structs, pointers and scalar values

wraith urchin Dec 17, 2025, 10:32 PM

#

cloud rivet I just use structs, pointers and scalar values

yeah it doesn't hurt to put that in the requires clause, then you will at least get an error if you try to pass it anything else

cloud rivet Dec 17, 2025, 10:41 PM

#

current project scale. I quarantined fastgltf and sdl_image to cpp files, because they slow down compile time and require std and C++ semantics I don't like

#

fastgltf also had a bunch of problems in .cu files

#

I'm gonna add tinyexr, a sky and then work on ray differentials I guess

wraith urchin Dec 18, 2025, 4:04 AM

#

Is the EXR for the skybox?

cloud rivet Dec 18, 2025, 4:16 AM

#

yes

#

alright I think I'm done researching, time to write the code

shut hornet Dec 18, 2025, 4:27 AM

#

cloud rivet I'm gonna add tinyexr, a sky and then work on ray differentials I guess

i have that, it was simple to add, but honestly i like .hdr format better

cloud rivet Dec 18, 2025, 4:29 AM

#

why do you like .hdr better?

astral hinge Dec 18, 2025, 4:30 AM

#

it has a cooler name

cloud rivet Dec 18, 2025, 4:30 AM

#

the optix sdk has a tinyexr example is why I am using it, and I don't really know what I'm doing so I need that crutch

wraith urchin Dec 18, 2025, 4:31 AM

#

Isn't .hdr like some borked thing with a shared exponent?

cloud rivet Dec 18, 2025, 4:31 AM

#

what does ASO use?

wraith urchin Dec 18, 2025, 4:31 AM

#

EXR

#

16bit PNG is also supported

cloud rivet Dec 18, 2025, 4:32 AM

#

I have a tiny bit of understanding of the exr format, as in my previous project I was investigating writing my own exr library, so I read through the spec a bit and played around with the cli tool

wraith urchin Dec 18, 2025, 4:32 AM

#

I use OpenEXR though, not tinyexr

cloud rivet Dec 18, 2025, 4:32 AM

#

the problem with any of this NIH stuff and images is always having to write comression code

shut hornet Dec 18, 2025, 4:32 AM

#

cloud rivet why do you like .hdr better?

STB Image loads HDR, you don't need another library

cloud rivet Dec 18, 2025, 4:33 AM

#

I don't use stb

wraith urchin Dec 18, 2025, 4:33 AM

#

although for the version on steam I ship it as BC6H inside LZ4

shut hornet Dec 18, 2025, 4:33 AM

#

HDR is also about 1/3rd the file size (for same resolution) as EXR

wraith urchin Dec 18, 2025, 4:33 AM

#

I just use EXR for dev because it's supported by GIMP & Blender

shut hornet Dec 18, 2025, 4:34 AM

#

HDR is as well

cloud rivet Dec 18, 2025, 4:34 AM

#

my image libraries are going to be SDL_image, tinyexr (assuming it works), and NVTT and hopefully that's all I'll ever need

wraith urchin Dec 18, 2025, 4:34 AM

#

The reason its 1/3 the size is because it loses data lol

shut hornet Dec 18, 2025, 4:34 AM

#

yes, it's compressed

wraith urchin Dec 18, 2025, 4:34 AM

#

Lossy compression

shut hornet Dec 18, 2025, 4:34 AM

#

but 33% the size is significant

cloud rivet Dec 18, 2025, 4:34 AM

#

exr also has compression options

#

which compression are you comparing it to I guess

wraith urchin Dec 18, 2025, 4:34 AM

#

But I never ship the EXR so the size doesn't matter

cloud rivet Dec 18, 2025, 4:35 AM

#

exr doesn't just have a default compression

wraith urchin Dec 18, 2025, 4:35 AM

#

the lossy compression stage at the end is gonna be BC6H anyways

#

And you definetly don't wanna do two rounds of lossy compression

shut hornet Dec 18, 2025, 4:35 AM

#

fair

#

in my case i didn't see a difference, but it depends what you are doing

cloud rivet Dec 18, 2025, 4:36 AM

#

idk I appreciate learning about .hdr, thank you

wraith urchin Dec 18, 2025, 4:36 AM

#

EXR has two different option for compression

cloud rivet Dec 18, 2025, 4:36 AM

#

it has a bunch based on my experience with the cli tool

#

idk for sure

shut hornet Dec 18, 2025, 4:36 AM

#

probably i guess I can leave EXR in there since it already works but I'd prefer less moving parts

cloud rivet Dec 18, 2025, 4:36 AM

#

maybe I was mistaken

wraith urchin Dec 18, 2025, 4:37 AM

#

Oh wait actually it does have more

cloud rivet Dec 18, 2025, 4:37 AM

#

yeah

wraith urchin Dec 18, 2025, 4:37 AM

#

I only ever used zip & piz

cloud rivet Dec 18, 2025, 4:37 AM

#

tinyexr also documents a bunch of compression options

#

which I think are just a subset

#

if you look at the open exr tool

#

its cli help is like the clang compiler

#

massive

wraith urchin Dec 18, 2025, 4:38 AM

#

Ok so I'm using Imf::PIZ_COMPRESSION

#

To save

cloud rivet Dec 18, 2025, 4:38 AM

#

I'm just using whatever I downloaded from polyhaven

shut hornet Dec 18, 2025, 5:14 AM

#

cloud rivet I'm just using whatever I downloaded from polyhaven

they have both exr and hdr

cloud rivet Dec 18, 2025, 6:21 AM

#

#

#

#

ok

#

sun is glaring

#

everything looks a little nicer now

#

ok I guess I have to learn what ray differentials are now

#

at some point I should compact these acceleration structures too

#

and use mesh optimizer

#

I feel like I'm on track still to be back to working on my game before the end of the month

astral hinge Dec 18, 2025, 6:31 AM

#

cloud rivet ok I guess I have to learn what ray differentials are now

that paper you linked is good

cloud rivet Dec 18, 2025, 6:32 AM

#

https://link.springer.com/content/pdf/10.1007/978-1-4842-4427-2_20.pdf this one yes

#

I will read through it

astral hinge Dec 18, 2025, 6:33 AM

#

the basic idea for ray differentials is to intersect two imaginary rays (one above the current pixel, and one to the right) with the plane of the surface hit by the primary ray

cloud rivet Dec 18, 2025, 6:33 AM

#

will this require I start resorting to a cache

astral hinge Dec 18, 2025, 6:33 AM

#

then you can get the uv of the imaginary rays and then calculate the partial derivatives through shrimple subtraction

astral hinge Dec 18, 2025, 6:34 AM

#

cloud rivet will this require I start resorting to a cache

wdym

#

the code for what I'm describing is like 5 loc I think

cloud rivet Dec 18, 2025, 6:35 AM

#

you know how ray tracing renders look noisy and then incease in detail if you remain still

#

there's some cache yes

#

I think I see that in your game?

#

idk

astral hinge Dec 18, 2025, 6:37 AM

#

instead of

auto ray = MakeRay(coord);
auto hit = TraceRay(ray);
auto color = Sample(hit.texture, hit.uv);

you do

auto ray = MakeRay(coord);
auto rayDx = MakeRay(coord + {1, 0});
auto rayDy = MakeRay(coord + {0, 1});
auto hit = TraceRay(ray);
auto hitDx = IntersectPlane(rayDx, hit.position, hit.normal);
auto hitDy = IntersectPlane(rayDy, hit.position, hit.normal);
auto uvDx = hit.uv - hitDx.uv;
auto uvDy = hit.uv - hitDy.uv;
auto color = SampleGrad(hit.texture, hit.uv, uvDx, uvDy);

#

you can see the extra two fake rays I added

cloud rivet Dec 18, 2025, 6:38 AM

#

thank you

astral hinge Dec 18, 2025, 6:38 AM

#

cloud rivet there's some cache yes

oh that is temporal denoising which is often used with path tracing

#

my game uses DDGI but it still has some temporal stuff to make it work good. it doesn't have the characteristic "noisy then smooth" artifact though

#

ray differentials (which I don't even have btw) are just for nice texture sampling

#

and don't require anything but a little bit of math in the shader

cloud rivet Dec 18, 2025, 6:40 AM

#

https://github.com/NVIDIAGameWorks/RTXGI-DDGI/blob/main/docs/Algorithms.md

astral hinge Dec 18, 2025, 6:40 AM

#

astral hinge instead of ```cc auto ray = MakeRay(coord); auto hit = TraceRay(ray); auto color...

I think the paper you linked has something almost exactly like this which I'm just recalling

cloud rivet Dec 18, 2025, 6:41 AM

#

oh ok

astral hinge Dec 18, 2025, 6:41 AM

#

but writing it myself helps solidify it in my brain 😌

cloud rivet Dec 18, 2025, 6:41 AM

#

that's why I want too

#

I don't want any black boxes I don't understand

astral hinge Dec 18, 2025, 6:44 AM

#

your paper has another technique based on ray cones, but I didn't read all of the paper to understand how the cones come into existence

#

I'm guessing it is simpler though

cloud rivet Dec 18, 2025, 6:45 AM

#

I'm concerned a bit with perf

astral hinge Dec 18, 2025, 6:45 AM

#

does optix or cuda give you a function for sampling with gradients? or do you have to implement anisotropic filtering yourself?

cloud rivet Dec 18, 2025, 6:46 AM

#

cuda has something

#

yes

#

tex2DGrad and tex2DLod

#

or something

astral hinge Dec 18, 2025, 6:46 AM

#

ok nice, that's the only part that would destroy perf if you had to implement it manually

#

but doing it in hardware is practically free

cloud rivet Dec 18, 2025, 6:47 AM

#

auto ray = MakeRay(coord);
auto hit = TraceRay(ray);
auto color = Sample(hit.texture, hit.uv);

#

this isn't how my shaders work fwiw

#

the first part is in the raygen

#

the hit then does the sample

#

in your example you seem to be sampling the texture in the same place as where the trace is being called

#

which is the raygen shader

astral hinge Dec 18, 2025, 6:48 AM

#

where do you have the texture sampling?

cloud rivet Dec 18, 2025, 6:48 AM

#

in the closest hit shader

astral hinge Dec 18, 2025, 6:49 AM

#

you can pass it the info necessary to spawn the imaginary rays

#

which is just the pixel coordinate tbh

cloud rivet Dec 18, 2025, 6:49 AM

#

well

#

there are some optix functions I can call in the hit shaders

#

that can probably give me this yes

#

stuff like

const float3 ray_dir = optixGetWorldRayDirection();

const float3 P = optixGetWorldRayOrigin() + optixGetRayTmax() * ray_dir;

astral hinge Dec 18, 2025, 6:51 AM

#

hmm I'm not sure how optix works exactly. how can you pass info to hit shaders?

cloud rivet Dec 18, 2025, 6:51 AM

#

a few diferent ways

#

I can over allocate the buffer for the acceleration structure and stick data there, I can vary the vertex stride and stick data there adjaced to the vertex, I can allocate device memory and stick pointers to that memory on the shader binding table records, and then there's a global param I can stick pointers and scalar values on

#

I don't really have any problem getting whatever data I need

#

in terms of application data

#

the data available in the context of the ray trace execution all comes via optix functions though

#

I don't know of a way to pass data directly from ray gen to the hit shader, I think that would be problematic

#

hrm

#

although optixTrace might let me attach something

astral hinge Dec 18, 2025, 6:55 AM

#

in vulkan rt there is a small piece of data that can be passed between stages

#

I forgor what it's called

cloud rivet Dec 18, 2025, 6:56 AM

#

oh yeah there's a payload

#

#

yeah

astral hinge Dec 18, 2025, 6:56 AM

#

ah yeah that's what it's called in vk

cloud rivet Dec 18, 2025, 6:56 AM

#

I can do that too

#

I haven't had a need for it

astral hinge Dec 18, 2025, 6:56 AM

#

so I think the payload is how you're "supposed" to pass, like, the pixel coord that spawned the ray

#

then you can use that in the hit shader to spawn the fake rays for calculating derivatives

cloud rivet Dec 18, 2025, 6:57 AM

#

oh perfect, thank you

#

how's your game going?

astral hinge Dec 18, 2025, 6:58 AM

#

I'm distracted trying to implement something like the api described here
https://www.sebastianaaltonen.com/blog/no-graphics-api

#

idk how far I'll take it. it just seems neat

astral hinge Dec 18, 2025, 6:59 AM

#

cloud rivet oh yeah there's a payload

ok so it seems like you can pass data in both directions with these

#

which is handy

cloud rivet Dec 18, 2025, 7:00 AM

#

oh

#

yes

astral hinge Dec 18, 2025, 7:00 AM

#

that means you can do the shading in the raygen shader if you pass the hit info back from the hit shader

cloud rivet Dec 18, 2025, 7:00 AM

#

that's what I do, return the colors

#

sorry

astral hinge Dec 18, 2025, 7:00 AM

#

but having a big payload is limiting for perf so maybe you want to shade in the hit shader anyway

cloud rivet Dec 18, 2025, 7:01 AM

#


static __forceinline__ __device__ void setPayload(float3 p) {
  optixSetPayload_0(__float_as_uint(p.x));
  optixSetPayload_1(__float_as_uint(p.y));
  optixSetPayload_2(__float_as_uint(p.z));
}

#

is how this works in my project

astral hinge Dec 18, 2025, 7:01 AM

#

noice

cloud rivet Dec 18, 2025, 7:02 AM

#

#

I just write data back, but I hadn't considered reading from it in the hit shader I guess

#

and it's done via those functions so I didn't associate it with the param I guess

astral hinge Dec 18, 2025, 7:04 AM

#

I'm not sure whether it's better to do the shading in the hit shader or the raygen shader

#

I think it might be cleaner to have the raygen shader do everything

cloud rivet Dec 18, 2025, 7:04 AM

#

my hit shader does it's own optixTraverse to see if it is occluded

astral hinge Dec 18, 2025, 7:05 AM

#

astral hinge I think it might be cleaner to have the raygen shader do everything

also it'd map better to the code I posted smart

cloud rivet Dec 18, 2025, 7:05 AM

#

yes

#

hrm

astral hinge Dec 18, 2025, 7:08 AM

#

oh I think perf favors the big raygen shader because you can do shader execution reordering at a finer granularity

#

I don't think SER is supported in other stages

cloud rivet Dec 18, 2025, 7:09 AM

#

I have only briefly read about reordering

#

optixReorder is available in ray gen only in optix

cloud rivet Dec 18, 2025, 7:15 AM

#

astral hinge I'm distracted trying to implement something like the api described here <https:...

I like this doc you linked to

#

#

man

#

yes

#

you're building this yourself on top of vulkan?

astral hinge Dec 18, 2025, 7:18 AM

#

yeah I don't think it will be hard

cloud rivet Dec 18, 2025, 7:18 AM

#

#

I love this document

#

I am going to reread it a bunch

astral hinge Dec 18, 2025, 7:33 AM

#

it's 18000 words btw

cloud rivet Dec 18, 2025, 7:33 AM

#

yes

#

I skimmed it and decided I wanted to read it in detail

#

over time

astral hinge Dec 18, 2025, 7:34 AM

#

it's mostly history tbh

#

it gives context for the api simplifications he makes

vivid barn Dec 18, 2025, 7:39 AM

#

cloud rivet you're building this yourself on top of vulkan?

I already have?

cloud rivet Dec 18, 2025, 7:40 AM

#

nice

vivid barn Dec 18, 2025, 7:40 AM

#

years ago

#

I'm going to post something toxic for your eyes (go)

#

also slang

cloud rivet Dec 18, 2025, 7:42 AM

#

oh no

vivid barn Dec 18, 2025, 7:42 AM

#

public struct MeshPart {
    public void *triangles;
    public uint32_t num_triangles;
    public float3 *posBuffer;
    public float3 *normals;
}

// TODO: kill kill kill
public struct MaterialParams {
    public interpreterProgram program;

    public MeshPart meshPart;

    // must be aligned to 8-byte boundary
    public uint8_t[256] args;
}

public struct Scene {
    public int64_t maxPartsPerMesh;

    public Slice<MaterialParams> materialParams;

    public Slice<AccelInstance> accelInstances;

    public Accel accel;

    public lightAccel lightAccel;

    public Sampler sampler;

    // there's more fields here, consult scene.go
};

#

many indirections

#

public struct interpreterProgram {
    public interpreterABI abi;
    public uint32_t *code;
}

#

abi should not be called abi but something more specific and be behind an indirection too

#

host side is very simple

#

func NewInterpretedMaterial(blob *matc.InterpretedMaterial) *InterpretedMaterial {
    device := gpu.MakeSliceUncached[uint32](len(blob.Code))
    copy(device.Value(), blob.Code)

    return &InterpretedMaterial{
        emissive: blob.ABI.EDFCount > 0,
        program: material.InterpreterProgram{
            ABI:  blob.ABI,
            Code: gpu.SliceData(device),
        },
    }
}

#

gpu.MakeSliceUncached thingy and that's it

#

.Value() gets you host []T from a gpu.Slice[T]

#

unfortunately I have to deal with this cope until I'm able to have addresses match between host and device address spaces (or better yet have just one)

#

this is not something I cooked up in my go abstraction btw

#

I had similar thingy in my C stuff before

#

https://gitlab.freedesktop.org/nanokatze/gpu-hello-world/-/blob/main/main.c?ref_type=heads#L144-145 this doesn't particularly show anything because it literally only works for one (1) allocation but it's the same interface I used in my C stuff in the past

#

that bit of code is just something I used to poke the driver with

#

https://github.com/nanokatze/worldspawn/blob/main/src/gpu/memory.go#L73-L87

#

this is how I map device pointers to host address space in my toy program

#

it's horrendous yes

#

and it's very easy to do better

#

with e.g. a radix tree

#

I just don't care atm

cloud rivet Dec 18, 2025, 7:48 AM

#

is it slow?

vivid barn Dec 18, 2025, 7:48 AM

#

probably

#

I mean my app barely spends any time on cpu atm

#

also I should stop calling my thing uncached

#

I haven't been using ~HOST_CACHED type in forever

#

there doesn't seem to be a penalty for it on neither deck nor on the puter with gigachungus igp

cloud rivet Dec 18, 2025, 7:52 AM

#

seems fine then

#

that's a nice API

vivid barn Dec 18, 2025, 7:53 AM

#

it's really not it'd be nicer if I could have things in the same address space

#

like cudaMallocManaged

cloud rivet Dec 18, 2025, 7:53 AM

#

I don't think everyone should have to write their own

vivid barn Dec 18, 2025, 7:53 AM

#

yes there should be a library for it I agree

#

or better yet we should have demand paging so we can just use our normal allocator we hammer on host all the time

#

i.e. malloc
but this is me asking too much tbh

cloud rivet Dec 18, 2025, 7:54 AM

#

once I get to reading through the cuda programming guide I will clean up my memory use

#

I think I will have to rewrite some code

vivid barn Dec 18, 2025, 7:57 AM

#

astral hinge I'm distracted trying to implement something like the api described here <https:...

which parts are you trying to implement exactly

#

I might have had some of them implemented already

astral hinge Dec 18, 2025, 7:58 AM

#

well none of it looks particularly hard to implement

#

it's just a reduced api surface basically

#

plus a hashmap for the pointer thingy

vivid barn Dec 18, 2025, 7:58 AM

#

uhh

#

you can't use hashmap

astral hinge Dec 18, 2025, 7:58 AM

#

oh actually the pointer thing is gonna be harder yeah

vivid barn Dec 18, 2025, 7:58 AM

#

you have to use some kind of interval mapper

astral hinge Dec 18, 2025, 7:58 AM

#

yeah

#

that's probably easy though ididnotread

vivid barn Dec 18, 2025, 7:58 AM

#

sorted list of intervals is the shrimplest

#

it is

#

radix tree is more annoying

astral hinge Dec 18, 2025, 7:59 AM

#

yeah that's ez

vivid barn Dec 18, 2025, 7:59 AM

#

I'm p happy about my hw image api

astral hinge Dec 18, 2025, 7:59 AM

#

ong

vivid barn Dec 18, 2025, 7:59 AM

#

shame I hate hw images

#

are you switching to shader bobject too frog_pregnant

astral hinge Dec 18, 2025, 8:00 AM

#

idk I'm going to see what I can do with just dynamic state

#

uh I think shader object has renderdoc support now

#

and descriptor buffer

shut hornet Dec 18, 2025, 8:01 AM

#

astral hinge it's 18000 words btw

i started reading it, but no attention span left

vivid barn Dec 18, 2025, 8:01 AM

#

descriptor buffer is h onestly a meme

#

unless you're dxvk or something then it';s very good

astral hinge Dec 18, 2025, 8:01 AM

#

yeah I don't personally think it's very useful for me

#

but since this is an exercise in reducing api surface it might be interesting to mess around with

vivid barn Dec 18, 2025, 8:02 AM

#

in my toy program's abstraction I have very uhh

#

glbrained

#

image api

astral hinge Dec 18, 2025, 8:03 AM

#

how so

vivid barn Dec 18, 2025, 8:03 AM

#

and like you don't need descriptors for anything else so that's the only thing that needs addressing

#

https://github.com/nanokatze/worldspawn/blob/main/src/gpu/image_test.go#L11

#

so you create an image

#

but actually it creates an image + image view

#

so the resulting object is immediately usable for e.g. descriptors and stuff

astral hinge Dec 18, 2025, 8:04 AM

#

oh yes I was actually thinking of having a hidden chungus descriptor set instead of exposing any descriptor stuff

#

and you just get resource indices

vivid barn Dec 18, 2025, 8:04 AM

#

yeah

#

ok right you're in C++

#

you can do better than resource index cope I think

#

or rather hide it away

astral hinge Dec 18, 2025, 8:04 AM

#

I can have decent code sharing to make it nice to work with

vivid barn Dec 18, 2025, 8:04 AM

#

ok hear me out

astral hinge Dec 18, 2025, 8:04 AM

#

like a function in c++ that returns Image2D and an equivalent type (with the same memory layout) in glsl

#

I already have that in my game

vivid barn Dec 18, 2025, 8:05 AM

#

what if you had your own object type (like I have Image in my toy abstraction)

#

but you allocated it out of a huge std::vector<Image>

astral hinge Dec 18, 2025, 8:05 AM

#

hmm like a VkJaker?

vivid barn Dec 18, 2025, 8:05 AM

#

no

#

https://github.com/nanokatze/worldspawn/blob/main/src/gpu/image.go#L132

#

loike this idk this is already basically a VkImageView (well it just has two descriptors)

#

any3way

#

you could allocate it out of a chungus vector<Image>

#

pass the host pointer as is to the gpu

astral hinge Dec 18, 2025, 8:06 AM

#

hmm why would I do that

vivid barn Dec 18, 2025, 8:06 AM

#

do some math like

#

(pointer - where_that_vector_starts) / sizeof(Image)

#

????????

#

descriptor index.

#

@shut hornet cease and disperse

astral hinge Dec 18, 2025, 8:07 AM

#

I see but I don't think it helps a lot tbh

#

it moves complexity to the shader where it now has to think about the cpu pointer offset

vivid barn Dec 18, 2025, 8:07 AM

#

it helps in that you don't have several identities for your image objects, only host pointer

#

and in shader you hide that complexity somewhere

#

same as you'd on host

#

or

#

or

astral hinge Dec 18, 2025, 8:08 AM

#

true but I think I can smuggle the texture identity into a few bits of the descriptor index

vivid barn Dec 18, 2025, 8:08 AM

#

you could do the opposite

#

and have your Image be descriptor index on host

#

and have host methods look things up in some out of band vector<ActualImageStuff> array

#

so your class Image is just uint32_t index

vivid barn Dec 18, 2025, 8:09 AM

#

astral hinge true but I think I can smuggle the texture identity into a few bits of the descr...

my point is that with this you'd have the same identity for an image at all times

#

no Image* vs descriptor index dichotomy

astral hinge Dec 18, 2025, 8:09 AM

#

and I could do that with what I'm describing

vivid barn Dec 18, 2025, 8:09 AM

#

i c i c

astral hinge Dec 18, 2025, 8:09 AM

#

hmm

#

ok I am not describing what you are

#

I thought you were trying to avoid having Image2D, Image3D, Image2DMS, etc.

cloud rivet Dec 18, 2025, 8:10 AM

#

the big image vector thing reminds me of https://floooh.github.io/2018/06/17/handles-vs-pointers.html

Handles are the better pointers

28-Nov-2018: I’ve added a small update at the end on how to prevent‘handle collisions’ with a per-slot generation counter

astral hinge Dec 18, 2025, 8:10 AM

#

but I think avoiding those isn't necessarily a good thing

vivid barn Dec 18, 2025, 8:10 AM

#

cloud rivet the big image vector thing reminds me of https://floooh.github.io/2018/06/17/han...

the title sounds gross already

vivid barn Dec 18, 2025, 8:10 AM

#

astral hinge but I think avoiding those isn't necessarily a good thing

yeah that's orthogonal to what I'm describing

cloud rivet Dec 18, 2025, 8:11 AM

#

group items of the same type into arrays, and treat the array base pointer as system-private

vivid barn Dec 18, 2025, 8:11 AM

#

yes

#

but

#

that's generally not a great thing to do/have imo

#

we're only doing it here because we're coping with the fact that some hw insists on having descriptors basically live in their own address space

#

(and the addresses are indices)

#

and that's how the apis are anyway

astral hinge Dec 18, 2025, 8:12 AM

#

your point is I can just pass &image to a shader and then, with just that and some secret extra info, sample the texture from the gpu

#

whereas with my original design I need to query an index from the image, then pass it to the shader

vivid barn Dec 18, 2025, 8:12 AM

#

astral hinge your point is I can just pass `&image` to a shader and then, with just that and ...

or, have your image itself be just an index, even on host

#

and store the actual data, which would be accessed by the methods of that type, out of band

#

but ye

astral hinge Dec 18, 2025, 8:13 AM

#

how would it look to write code that uses that?

#

I'm just trying to see how it simplifies things

vivid barn Dec 18, 2025, 8:14 AM

#

astral hinge how would it look to write code that uses that?

basically nothing would change except your images (image views) would become pass-by-value (because they're indices now) and you wouldn't be "getting an index", just passing the image into the device-accessed structures directly

#

so

astral hinge Dec 18, 2025, 8:15 AM

#

hmm ok

#

so that means moving the zoo of different image types to the cpu if I'm striving for type safety

vivid barn Dec 18, 2025, 8:16 AM

#

that's orthogonal

#

struct ActualFunStuff {
 VkImageView vkView;
};

bit allocator;
std::vector<ActualFunStuff> actualFunStuff;

class Image {
 uint32_t index;

 /* e.g. */
 Image SubImage(format Format, /* other things for reinterpretation */) const {
  /* create new image view here */
 }

 /* other stuff you'd want to do with an image view on cpu */
};

astral hinge Dec 18, 2025, 8:16 AM

#

well how do I do .memberWhichIsAnImage2D = someImage on the cpu then, and have the types match

vivid barn Dec 18, 2025, 8:17 AM

#

ok yeah in that case you probably wanna do a bunch of typed wrappers I guess

#

Image2D etc

astral hinge Dec 18, 2025, 8:17 AM

#

it's just a detail tbh

#

not an important one

cloud rivet Dec 18, 2025, 8:19 AM

#

oh I guess I can have type safe image types in my project, I can typedef metallic/normal/base image shaders on the cpu when I extract them from fastgltf or wherever and then use those types in my materials that I read in the shader. right now I just have a bunch of cudaTextureObject_t

#

also a bit orthogonal

vivid barn Dec 18, 2025, 8:20 AM

#

that feels like the wrong "plane" to encode into the types

astral hinge Dec 18, 2025, 8:22 AM

#

hmm yeah there are definitely cases where you want an image object to represent different image shapes

#

like multisample vs not multisample

cloud rivet Dec 18, 2025, 8:22 AM

#

In february spent a long time being frustrated with my normal map because they ended up being srgb

astral hinge Dec 18, 2025, 8:22 AM

#

but otherwise I don't think anyone needs to replace a 2D image with a 3D image or something

cloud rivet Dec 18, 2025, 8:22 AM

#

and had visible artifacts

#

that I misunderstood as being something wrong with my math but it was just the dumb image format

vivid barn Dec 18, 2025, 8:23 AM

#

astral hinge but otherwise I don't think anyone needs to replace a 2D image with a 3D image o...

I mean the user (you) can just use union for when a single location might either hava a 2D or 2DMS image in it, depending on some other knob

astral hinge Dec 18, 2025, 8:23 AM

#

I'm not sure encoding srgbness into the type is the solution but maybe it is

cloud rivet Dec 18, 2025, 8:23 AM

#

I don't know

vivid barn Dec 18, 2025, 8:24 AM

#

yeah it's probably not

astral hinge Dec 18, 2025, 8:24 AM

#

this might just be one of those problems where you suffer once and then the only solution is to never forget the lesson you learned

cloud rivet Dec 18, 2025, 8:26 AM

#

well if you all make a cool easy to use GPU API that makes life easier working with images and memory and open source it I would try it

#

I'm pretty happy with what I have so far in my project

vivid barn Dec 18, 2025, 8:30 AM

#

I have such api already

#

it's up on my github!

cloud rivet Dec 18, 2025, 8:34 AM

#

your toy project uses the worldspawn library?

vivid barn Dec 18, 2025, 8:34 AM

#

yes

#

my toy project is called worldspawn

#

it's just the gpu thing is sourcedropped

#

the rest is not open source although many of the components could be

#

I'll get to it some time

#

it's just that it needs to be sanitized

#

or the gazers might end up needing eyebleach

#

#1181368932637093888 is worldspawn tbc

astral hinge Dec 18, 2025, 8:54 AM

#

I think I'll just do compute and no graphics or presentation initially

#

all that stuff is bloat

vivid barn Dec 18, 2025, 8:54 AM

#

based

cloud rivet Dec 18, 2025, 9:41 AM

#

#

who needs mipmaps

#

this is totally fine

#

jk, jk

#

I think that's a good test right there

#

that view

#

also this one

#

#

#

cloud rivet Dec 18, 2025, 10:09 AM

#

so in the optix samples I see

#

#include <cuda/random.h>

#

const float2 subpixel_jitter =
        subframe_index == 0 ? make_float2( 0.5f, 0.5f ) : make_float2( rnd( seed ), rnd( seed ) );

#

this is in a raygen shader

#

 unsigned int seed = tea<4>( launch_idx.y * launch_dims.x + launch_idx.x, subframe_index );

    // The center of each pixel is at fraction (0.5,0.5)
    const float2 subpixel_jitter =
        subframe_index == 0 ? make_float2( 0.5f, 0.5f ) : make_float2( rnd( seed ), rnd( seed ) );

    const float2 d =
        2.0f
            * make_float2( ( static_cast<float>( launch_idx.x ) + subpixel_jitter.x ) / static_cast<float>( launch_dims.x ),
                           ( static_cast<float>( launch_idx.y ) + subpixel_jitter.y ) / static_cast<float>( launch_dims.y ) )
        - 1.0f;
    const float3 ray_direction = normalize( d.x * U + d.y * V + W );
    const float3 ray_origin    = eye;

astral hinge Dec 18, 2025, 10:12 AM

#

what's the question

cloud rivet Dec 18, 2025, 10:15 AM

#

no question. just looking through the sampler code for how to do things and commenting out loud about it I guess

#

I was looking at this paragraph

#

#

so I thought maybe I could just use some jitter

#

and sample level 0

astral hinge Dec 18, 2025, 10:17 AM

#

give it a shot

#

what's your goal for the jitter?

cloud rivet Dec 18, 2025, 10:31 AM

#

reduce the aliasing

#

still reading through the doc

pseudo dock Dec 18, 2025, 10:45 AM

#

In my renderer I always do one primary ray exactly in the center of the pixel (using rasterization) and then one additional ray that is randomly jittered (for antialiasing rather than texturing). It converges (mentally) to the correct result over time but is definitely noisy for any one single frame (as the quote you show from the article says, although it's talking about textures). I have a memory that you've maybe said you don't want the noisy look, though?

#

It's not directly comparable, but I found a section of a video where I zoom in close in my renderer and show an example of what I'm talking about (hopefully the link with the timestamp works): https://www.youtube.com/watch?v=oUfrtmJeyEU&t=1632s
It's hard to tell from a still shot or from video what it actually looks like in realtime (it definitely looks much better in motion than it does in a single screenshot hehe), but maybe might give at least a general idea of what texturing might look like if you went that route with only a few rays per pixel.

If you do temporal accumulation though then of course this doesn't apply as much and it actually would converge over time.

YouTube

John-Paul Ownby

Recreating my Room with Real-Time Path Tracing

I had a goal to create graphics that emulate what my eyes see when I am in a real room in my house and I wanted the program to run interactively at high frame rates.

This video shows the results and discusses some of what makes path tracing cool. It also explores some of the other graphics techniques that contribute to making the generated imag...

▶ Play video

cloud rivet Dec 18, 2025, 11:10 AM

#

thank you

#

I've watched that video before

#

it's great

#

you have a very clear way of explaining concepts

pseudo dock Dec 18, 2025, 11:22 AM

#

That's kind of you to say! I think maybe having a music background and only coming to programing later in life leads to me thinking about things differently compared to many programmers and maybe the way I talk about things also takes a more intuitive/conceptual because of that (and also because I often have trouble understanding more programmer-y explanations myself hehe). At work, at least, I sometimes am able to talk artist language hehe, so that's gratifying to hear that some of that maybe also comes through in video form.

But I didn't mean to derail your thread! I've considered maybe trying the same approach to textures when I finally add them to my current project so I'd be interested in seeing the results if you try. Intuitively, though, my suspicion is that using mipmaps is probably better both for quality and to help with texture caching, as that quote says.

cloud rivet Dec 18, 2025, 11:24 AM

#

you didn't derail anything, thank you. i have to learn all of these things

#

I think I understand what that paper is saying, I send some extra data from the ray gen corresponding a ray 1 pixel up and another one pixel right, and then I use the information I get from the triangle that I intersect with to generate additional points for the information I sent, and from that I can get a differential to calculate the lod when sampling the texture

#

regarding your video, I really like the HDR portion, the whole video is great

#

but I also like how you explain how noise helps the brain fill more information than there is

#

tomorrow I will generate mipmaps and try to see if I can generate a render without all the aliasing

#

if I can get that all done I'll get back to my track and vehicle physics, and get that working, and I'll be where I left it before my I reached that precipice

astral hinge Dec 18, 2025, 11:31 AM

#

cloud rivet reduce the aliasing

are you gonna do temporal AA

cloud rivet Dec 18, 2025, 11:36 AM

#

no I don't think I will. just try ray differentials

#

seems straight forward, it's just one ray, it's just a big pay load per ray

pseudo dock Dec 18, 2025, 11:38 AM

#

The thing that really made noise click for me personally was working on dithering at my previous job. Unfortunately I can't go into details too much because that industry is much more closed and secretive than games, but the effect that good dithering has on the brain feels like magic to me.
Since falling in love with dithering I've probably gone too far with embracing all kinds of noise, but it has also made me appreciate monte carlo techniques in a way that I didn't quite understand as much back when I was in school doing path tracing. It really does help the brain in a way that feels magical to me (like, the fact that we can effectively do these continuous integrals over the entire hemisphere at every single bounce is incredible to me).

astral hinge Dec 18, 2025, 11:42 AM

#

noise is epic

#

I love monte carlo integration

pseudo dock Dec 18, 2025, 11:43 AM

#

Yes, me too! 😍

#

It is seriously so cool

vivid barn Dec 18, 2025, 11:53 AM

#

you can have both ray differentials and noise

#

with stochastic filtering

astral hinge Dec 19, 2025, 2:08 AM

#

@vivid barn re: mapping offset host pointers to gpu pointers
idk what the radix thing is you suggested for storing intervals, but wouldn't a simple binary tree work? the intervals can't overlap, so you just need to locate the two nearest intervals (sorted by the start of range), then one of the intervals will contain your offset pointer

#

I am using a shrimple linear search for now btw, but if perf sucks I will explore the option I just explained (unless you tell me it can't work for some reason)

cloud rivet Dec 19, 2025, 2:14 AM

#

That’s what is in nanos workdspace go project

#

A binary tree

vivid barn Dec 19, 2025, 2:15 AM

#

No I use sorted list of intervals not even a tree, basically an even dumber solution

#

But yes an interval tree will work

vivid barn Dec 19, 2025, 2:16 AM

#

astral hinge <@324651651224829952> re: mapping offset host pointers to gpu pointers idk what ...

Radix tree

#

Like how an MMU works

#

You have a tree that's a few levels deep, like 3 to 5

cloud rivet Dec 19, 2025, 2:17 AM

#

Oh I assumed: i, ok := slices.BinarySearch(deviceAddrs, uint64(p))

astral hinge Dec 19, 2025, 2:17 AM

#

wikipedia calls it a radix "trie"

vivid barn Dec 19, 2025, 2:17 AM

#

Oh

astral hinge Dec 19, 2025, 2:17 AM

#

oh that's just an alternative name, nvm

vivid barn Dec 19, 2025, 2:17 AM

#

Ok I guess it's q trie

#

Whatever

vivid barn Dec 19, 2025, 2:18 AM

#

cloud rivet Oh I assumed: i, ok := slices.BinarySearch(deviceAddrs, uint64(p))

Yeah that's searching a sorted list

cloud rivet Dec 19, 2025, 2:18 AM

#

Right

vivid barn Dec 19, 2025, 2:18 AM

#

I guess it's a tree

astral hinge Dec 19, 2025, 2:19 AM

#

implicit 🅱️ree

vivid barn Dec 19, 2025, 2:19 AM

#

But it doesn't have certain properties of a tree like log n insertion

#

Because you have to shift things around when inserting or deleting

vivid barn Dec 19, 2025, 2:19 AM

#

astral hinge implicit 🅱️ree

This basically I guess

bronze socket Dec 19, 2025, 3:10 AM

#

yeah a trie (pronounced "try") is a special subset of tree for building up prefixes, it's the kind of thing I largely haven't heard outside of college classes

cloud rivet Dec 19, 2025, 3:42 AM

#

it's good for auto completion

cloud rivet Dec 19, 2025, 8:27 AM

#

haven't gotten quite to generating mipmaps yet, but I am close, I just finally got bc7 compressed texture objects to render. there's not a way to inspect the texture objects in nsight compute though, so I can't like show that, but they are compressed now

#

mabye it's viewbable via total memory allocation or something

vivid barn Dec 19, 2025, 9:07 AM

#

bronze socket yeah a trie (pronounced "try") is a special subset of tree for building up prefi...

I think the reason I didn't have trie come up in my mind is because the dominant term for this thing in my environment was and is "prefix tree"

spiral ice Dec 19, 2025, 10:22 AM

#

"radix tree" is another common terminology

cloud rivet Dec 19, 2025, 10:46 AM

#

I was able to get rid of SDL_image as nvtt handles image loading. I can get rid of tinyexr too I think but I will worry about that later

#

I just spent most of my day figuring out how to use nvtt with block compression and cuda texture objects :\

#

I finally had to find an example on github that someone had working

cloud rivet Dec 19, 2025, 11:41 AM

#

ok I have compressed images with mip maps

#

now I just have to do the ray differential math in the shader

#

I didn't expect that to be so hard, but it's only hard because the docs were bad

vivid barn Dec 19, 2025, 12:53 PM

#

spiral ice "radix tree" is another common terminology

well clearly it's not common enough for e.g. jaker

brisk chasm Dec 19, 2025, 3:59 PM

#

not sure if it was here where we derailed the thread into which thinkpads are/were good or not

#

but, i acquired a new ~~work laptop~~ lunch grill

#

finkpad p16 gen2, 65THOUSANDFIVEHUNDREDANDFIRDYFIVE megabytes of ram

cloud rivet Dec 19, 2025, 8:02 PM

#

brisk chasm not sure if it was here where we derailed the thread into which thinkpads are/we...

You like it?

brisk chasm Dec 19, 2025, 8:03 PM

#

the keyboard is really nice

#

i havent tested anything on it yet : >

#

after christmas probably

cloud rivet Dec 19, 2025, 8:13 PM

#

I would like to buy new hardware but I don't want to spend the money, maybe 2027 :/

shut hornet Dec 19, 2025, 8:14 PM

#

brisk chasm finkpad p16 gen2, 65THOUSANDFIVEHUNDREDANDFIRDYFIVE megabytes of ram

When I built my new PC, they send me 16GB of DDR5 for free with the motherboard. Back in June. How times have changed

cloud rivet Dec 19, 2025, 11:39 PM

#

brisk chasm Dec 19, 2025, 11:41 PM

#

this is very cool

cloud rivet Dec 19, 2025, 11:42 PM

#

thanks, just followed the paper

#

did exactly what it said

brisk chasm Dec 19, 2025, 11:43 PM

#

still, i suppose you need to be a little smart to read the paper

cloud rivet Dec 19, 2025, 11:44 PM

#

I had to look a lot of it up

#

I'm not very strong with the math, but I mostly understand what's happening

#

this is just for single rays, I don't have follow up rays or anything, I'd have to expand it in that case

brisk chasm Dec 19, 2025, 11:47 PM

#

ah you mean the derivates/differentials

#

i mean the thing as a whole

cloud rivet Dec 19, 2025, 11:48 PM

#

yeah

#

glad it works, the aliasing was very bad

brisk chasm Dec 19, 2025, 11:49 PM

#

it did be

pseudo dock Dec 20, 2025, 12:55 AM

#

What if you made your skimmer game inside of Sponza? And the skimmers could be like the size of a bug, and the tracks could go around inside Sponza?

#

Sponza Racing (TM)

cloud rivet Dec 20, 2025, 1:12 AM

#

I would have to speed up loading sponza, it takes a good while right now

pseudo dock Dec 20, 2025, 1:16 AM

#

Sponza Loading Racing (TM)

#

I'm not really advocating it, and you should obviously do what you want. I just suddenly got this image in my head of going through the environment at high speeds but on tracks like yours that didn't have to follow the ground so you could zoom up and down and through windows and off the balcony and stuff and it seemed potentially cool in my head 🙃

spiral ice Dec 20, 2025, 1:21 AM

#

Putting the texture loading into a thread pool should be way faster. If you'd want it to be really fast, you probably need to preprocess the scene somehow.

#

By the way, you've made so much progress on your project. I am really impressed. I've barely worked on mine for the last two weeks

tight torrent Dec 20, 2025, 1:27 AM

#

cloud rivet I would have to speed up loading sponza, it takes a good while right now

me too, im using fastgltf for loading but i think the bottleneck is waiting on the gpu to create buffers and such

#

part of it too is im executing command lists for every individual mesh and waiting for them to complete before going on and processing the next one

cloud rivet Dec 20, 2025, 3:00 AM

#

yeah, it's all completely serial right now and it is compressing the BC5 and generating mipmaps for all sponza 50 textures before it shows anything. I can obviously speed it up, but it's not what I want to work on right now

cloud rivet Dec 20, 2025, 3:01 AM

#

spiral ice By the way, you've made so much progress on your project. I am really impressed....

thanks, I have been on vacation and that has helped, and also have been sort of racing to meet my goal of being back to where I was before rewriting the renderer

#

3 weeks now I think?

#

#1323084490997895198 message 11/26 - 12/19, 23 days

#

that's pretty good for a for a new from no code to sponza with a vulkan + optix renderer

#

hrm

#

wondering if I should bring in my previous physics code or start from new

#

I'm gonna work on gameplay now for a while

solid grove Dec 20, 2025, 3:10 AM

#

planning to do fully custom physics again?

cloud rivet Dec 20, 2025, 3:10 AM

#

I was thinking I would follow along the book I bought

solid grove Dec 20, 2025, 3:11 AM

#

then u won’t be doing gameplay for a while KEKW

cloud rivet Dec 20, 2025, 3:11 AM

#

that's sort of gameplay?

#

I guess in my brain any work that is not rendering is gameplay

#

asset pipeline/rendering/animation/level editing

#

heh

#

I don't know

#

I think it will be fun

#

there's a lot of work left on the rendering but it's good enough for now I think

astral hinge Dec 20, 2025, 6:02 AM

#

@vivid barn what do you think about deleting semaphores and just having submit operations return a thing that can be waited on (I know your api has something like this)

#

I'm mainly concerned that there might be something else I didn't think of for which I'd need a semaphore

#

I hope bjorn doesn't mind us hijacking his thread to talk about this froge_yeehaw

vivid barn Dec 20, 2025, 6:07 AM

#

astral hinge <@324651651224829952> what do you think about deleting semaphores and just havin...

no my api isn't like that, I have my own sync object

astral hinge Dec 20, 2025, 6:08 AM

#

ah I thought your stream submissions returned an awaitable token thingy

vivid barn Dec 20, 2025, 6:08 AM

#

astral hinge <@324651651224829952> what do you think about deleting semaphores and just havin...

this idea doesn't allow you to wait on a syncobj before you submit the signaller

#

which idk how valuable it is to you but it prevents doing a thing where you record n cmdbufs at the same time

#

which might be convenient in some cases like idk

#

say you're doing skinning so you record two cmdbufs, one does compute dispatches that do skinning, then the second one does AS rebuilds

astral hinge Dec 20, 2025, 6:10 AM

#

vivid barn this idea doesn't allow you to wait on a syncobj before you submit the signaller

couldn't I just add a wait parameter for the things to wait on before submitting

#

"things" being sync "primitives"

vivid barn Dec 20, 2025, 6:10 AM

#

ok w8 to clear it up are you suggesting that you wouldn't have random VkSemaphores but rather there would be one hard-associated with each VkQueue

#

so if you wanna wait on something you specify (VkQueue, counter) pair?\

astral hinge Dec 20, 2025, 6:10 AM

#

hmm

vivid barn Dec 20, 2025, 6:11 AM

#

you can replace VkQueues with higher level thingies of your own

#

this would be fine I guess

astral hinge Dec 20, 2025, 6:11 AM

#

I was just thinking that my Submit would return something I could wait on at a later date (whether that be another submit or just a free wait function)

#

instead of making the user construct semaphores themselves

vivid barn Dec 20, 2025, 6:11 AM

#

yeah that sounds fine I guess

#

I don't like the level of abstraction that VkQueues and cmdbufs offer at all so it's kinda hard for me to judge this

astral hinge Dec 20, 2025, 6:12 AM

#

yeah I've been running into that as well

vivid barn Dec 20, 2025, 6:12 AM

#

what do you find you dislike?

astral hinge Dec 20, 2025, 6:12 AM

#

atm I just have three hardcoded queues and the user can pick one to submit to

#

well it feels like a mix of low and high level concepts

#

but idk

vivid barn Dec 20, 2025, 6:14 AM

#

gob had an idea (for vk-tier abstraction level) where instead of VkQueues, you'd be submitting to a queue family and the magic would happen underneath to find an available hw resource to run this submission as soon as its sync conditions are met

#

this would mean that VkQueues disappear

#

VK_NV_command_buffer_inheritance wouldn't work

#

and instead of VkQueues, you have just VkSemaphores

#

VkSemaphore sync becomes mandatory because you can't just rely on ordering you get by submitting to the same queue

#

what do you think

astral hinge Dec 20, 2025, 6:15 AM

#

eliminating the weird thing where you allocate queues from a family would be nice

#

yeah sounds nice

vivid barn Dec 20, 2025, 6:15 AM

#

if you're ok with cmdbufs I'd implement this then

astral hinge Dec 20, 2025, 6:15 AM

#

I guess I kind of do have that already

vivid barn Dec 20, 2025, 6:15 AM

#

I wasn't ok with cmdbufs either (I wanted sync to be "transposed" so to speak, like in cuda or opencl) so I ended up abstracting over those too

#

o i c

#

cool

astral hinge Dec 20, 2025, 6:16 AM

#

submit is missing wait semaphores but yeah

typedef enum gfx_queue
{
  GFX_QUEUE_GRAPHICS,
  GFX_QUEUE_COMPUTE,
  GFX_QUEUE_TRANSFER,
} gfx_queue;

void gfx_submit(gfx_queue queue, gfx_command_buffer command_buffer, gfx_semaphore semaphore, uint64_t signal);

vivid barn Dec 20, 2025, 6:16 AM

#

right i c

#

note that with gob's idea you can't have a magical thing to wait on anymore I guess

astral hinge Dec 20, 2025, 6:17 AM

#

btw I don't know how individual VkQueues allocated(?) from the families maps to the actual hw

vivid barn Dec 20, 2025, 6:17 AM

#

unless you actually return smth like (VkSemaphore, timeline value) r rather than just timeline value

astral hinge Dec 20, 2025, 6:18 AM

#

hmm that makes sense ye

vivid barn Dec 20, 2025, 6:18 AM

#

astral hinge btw I don't know how individual VkQueues allocated(?) from the families maps to ...

it's like cpu threads

#

I think

astral hinge Dec 20, 2025, 6:18 AM

#

oh there's some new(?) queue priority thing I saw when I was making queues yesterday

vivid barn Dec 20, 2025, 6:18 AM

#

but a number of kmds/fw have problems allocating more queues at runtime basically

#

I think the thing windows got called "hardware assisted scheduling" means scheduling decisions are deferred to fw sitting in the gpu

#

scheduling in fw means that each VkQueue corresponds to an actual small ring thing of submissions that's mapped to host and to fw

#

there's obviously lots of VkQueues at any point in time

#

but there's only so many hw resources that can handle each one

#

so anyway same as cpu threads, when you have free hw resource, you have something somewhere choose a VkQueue for it to get busy with

cloud rivet Dec 20, 2025, 6:21 AM

#

astral hinge I hope bjorn doesn't mind us hijacking his thread to talk about this <:froge_yee...

I don't mind at all, I am very interested in this

vivid barn Dec 20, 2025, 6:21 AM

#

idk why I wento n a tangent about scheduling in fw

#

ignore that

vivid barn Dec 20, 2025, 6:21 AM

#

vivid barn so anyway same as cpu threads, when you have free hw resource, you have somethin...

this is the only important bit

astral hinge Dec 20, 2025, 6:24 AM

#

it sounds like the "ideal" abstraction would make as many VkQueues as possible and treat them as a thread pool, then the user would want to submit relatively granular tasks to submit so all queues can be busy

#

except I'm skeptical of how that would turn out in practice given that you can saturate the gpu with just one submit

vivid barn Dec 20, 2025, 6:24 AM

#

astral hinge except I'm skeptical of how that would turn out in practice given that you can s...

only until you run into a barrier

#

but yes stuff being like thread pool is my thinking too

astral hinge Dec 20, 2025, 6:25 AM

#

true, but then it seems like the ideal abstraction is just a frame graph

vivid barn Dec 20, 2025, 6:25 AM

#

my abstraction lets me all the queues without a frame graph.

#Rosy