#Rosy

1 messages · Page 23 of 1

cloud rivet
#

servers yes

#

hrm

brisk chasm
#

bjorn switching to linux fulltime, you read it here first

vivid barn
#

anyway for memory management I'd prefer to stick to at least cudaMallocManaged as "default"

#

that absolves you from the need to do cudaMemcpy and such

#

it can be a perf footgun in a way because something might be sticking around in host memory while being very heavily hammered by the device but there's cudaMemAdvise to migrate it around

cloud rivet
#

I still have to read the cuda programming guide

#

I've been focused on the optix side of things

#

it's got a full chapter on unified memory

#

given my current progress, unless there's some massive hidden whopper that I cannot yet see that's going to hit me right in the face with OptiX, there's likely no chance I'm going to go back to try to do graphics again with Vulkan. It's like 10x less pain with CUDA and OptiX. I made the right choice.

vivid barn
#

truly joever

cloud rivet
#

right for right now anyway

#

ya

#

idc it's fun

#

going to uh try to make this suzanne appear now I guess

astral hinge
#

this is the only thing I'm unsure about when it comes to submodules

#

it seems like a pain to specify a specific commit

brisk chasm
#

forking and tagging them yourself feels nicer, if you dont want to rely on the upstream repo

#

or you git gud

cloud rivet
#

you mean in cmake?

#

I guess if I wanted that in cmake I'd use FetchContent_Declare

#

but anyone closing the repo and checking out a commit would get the right versions of your git submodule depenedencies since it is committed in .gitmodules

brisk chasm
#
FetchContent_Declare(
    fastgltf
    GIT_REPOSITORY https://github.com/spnda/fastgltf.git
    GIT_TAG        v0.9.0
)

message("Fetching fastgltf")
set(FASTGLTF_USE_CUSTOM_SMALLVECTOR OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_TESTS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_EXAMPLES OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_DOCS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_GLTF_RS OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_ASSIMP OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_DEPRECATED_EXT ON CACHE BOOL "" FORCE)
set(FASTGLTF_DISABLE_CUSTOM_MEMORY_POOL OFF CACHE BOOL "" FORCE)
set(FASTGLTF_USE_64BIT_FLOAT OFF CACHE BOOL "" FORCE)
set(FASTGLTF_COMPILE_AS_CPP20 OFF CACHE BOOL "" FORCE)
set(FASTGLTF_ENABLE_CPP_MODULES OFF CACHE BOOL "" FORCE)
set(FASTGLTF_USE_STD_MODULE OFF CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(fastgltf)
#

and then

target_link_libraries(YourProject
    PRIVATE fastgltf
)
#

then you can #include <fastgltf/fastgltf.hpp>

#

or with cpm

# this file would sit in libs/fastgltf.cmake
include(../cmake/CPM.cmake)

CPMAddPackage(
    NAME            fastgltf
    GIT_REPOSITORY  https://github.com/spnda/fastgltf.git
    GIT_TAG         2483b87d19a67c91f2bffc386082001d3ea8bc07
    OPTIONS         "FASTGLTF_USE_CUSTOM_SMALLVECTOR OFF"
    OPTIONS         "FASTGLTF_ENABLE_TESTS OFF"
    OPTIONS         "FASTGLTF_ENABLE_EXAMPLES OFF"
    OPTIONS         "FASTGLTF_ENABLE_DOCS OFF"
    OPTIONS         "FASTGLTF_ENABLE_GLTF_RS OFF"
    OPTIONS         "FASTGLTF_ENABLE_ASSIMP OFF"
    OPTIONS         "FASTGLTF_ENABLE_DEPRECATED_EXT ON"
    OPTIONS         "FASTGLTF_DISABLE_CUSTOM_MEMORY_POOL OFF"
    OPTIONS         "FASTGLTF_USE_64BIT_FLOAT OFF"
    OPTIONS         "FASTGLTF_COMPILE_AS_CPP20 OFF"
    OPTIONS         "FASTGLTF_ENABLE_CPP_MODULES OFF"
    OPTIONS         "FASTGLTF_USE_STD_MODULE OFF"
)
#

CPM can install itself with a 2 liner

cloud rivet
#

ok looking at the OptiX_Apps repo, I like how they do it, gpu pointer to vertex data and pointer to indices and material and light go on the SBT record for a shader, you get the primitive index (the index of the triangle hit), using the triangle index you look up the vertex indices for the triangle you hit via the indices buffer pointer that you get in your closest hit shader, that gives you a way to look up whatever you stored per vertex (position, normal, uv color etc) in the gpu memory, given a defined stride, and then you can also use that triangle index to look up your material

astral hinge
cloud rivet
#

the per instance stuff I'll stick at the beginning of the gas vertex memory

cloud rivet
#

I mean do that very rarely

#

so not a big deal in my opinion

astral hinge
#

hmm I see

#

the only way I use submodules is by right clicking and going to tortoisegit then clicking "add submodule" frog_bath

cloud rivet
#

using ui for git is a crime

wraith urchin
#

tortoisegit is pretty good

astral hinge
#

I only use git through UIs hehe

#

UIs that are G

cloud rivet
#

I only use the cli with git

wraith urchin
#

I use both but its really easy to fuck up complicated workflows like rebase interactive with cli

cloud rivet
#

I dunno, I just do git rebase -i and it works

#

I rebase, I don't do merges

wraith urchin
#

but then you get dumped into a text editor (probably vim)

astral hinge
#

Last time I tried rebasing was at an internship and I fucked it up lol

cloud rivet
#

a git merge is also a crime

wraith urchin
#

Yeah rebase is a better workflow imo

cloud rivet
#

I work on a team with like 30+ engineers in a monorepo and we almost never fuck it up

wraith urchin
#

or rebase then merge

cloud rivet
#

everyone working in branches together and we do ok

wraith urchin
#

Hey I mean whatever works for you

cloud rivet
#

yes

wraith urchin
#

but I like my UIs

astral hinge
#

my git history is long and linear

cloud rivet
#

same

#

super flat

astral hinge
#

last time there was a branch was when saky was adding his sky stuff to it

cloud rivet
#

oh, well I work in branches

#

I just merge them and delete them when I am done with it

#

I can't live without branches

#

also I get AI reviews that save my life

wraith urchin
#

At work we all do branches, but for ASO I only do a branch if I'm gonna really make a huge breaking change, so I can easily get back to the working version

cloud rivet
#

yeah I will branch off in my branch also, to try experimental stuff

#

like when I'm not sure something will work

#

I feel dirty pushing to main, I don't want to get in the habit of doing it, I'll accidentally will do it at work

#

I am an org admin so none of the branch protection rules apply to me

#

so I can accidentally do it

wraith urchin
#

For solo dev I feel no need to be overly rigorous about my workflow, but on a team, yeah it makes sense to be careful

spiral ice
cloud rivet
#

merge commits are horrible

#

you want to revert a PR, and it has a merge commit in it you are in big trouble

#

if this is in an incident, it's going to increase the time of impact

#

I'm like at the point now where I pretty much know exactly what I need to do, I just have to do the work.

#

I will break it up into tiny small surgical code changes, and verify it along the way. I don't like doing a big bulk of work with tons of changes in them that could be hell to debug if there's UB or something in there

#

so I think the first thing is to just capture the gpu memory pointer to the vertices data and stop freeing it after making the gas

#

then stick the pointer to it on the SBT record for the ch shader

#

then add the normals to it, and get the normals from there instead of from the beginning of the buffer

#

then add vertex colors

#

and then once I have this all setup, I can extract fastgltf data to GPU memory in the same way and it should just work

#

if I get this working today, maybe tomorrow I can start on textures, I have no idea how to do textures yet

#

or race track, not sure

#

probably race track, eager to get it back, I can't actually use it though as my controller is at home

cloud rivet
vivid barn
#

I guess

cloud rivet
#

man emissive materials

#

I have never had such a thing

#

I'm pretty excited about it

cloud rivet
#

man

#

that was 1.5 years ago

#

also after I had worked on blockens

astral hinge
#

believe it or not, that was 10 years ago

cloud rivet
#

ready for some gltf loading I think

#

I didn't pick the best colors for my cube mesh

brisk chasm
#

time flies to fast, we need more gravity

cloud rivet
brisk chasm
#

i still have no shadows : >

cloud rivet
#

you have mipmaps though

#

man RT looks great

#

even with such a tiny model

#

I actually don't have any lighting math. not even n dot l

#

I just did shadows

#

if occluded color *= 0.5f

astral hinge
#

you can also do reflections with the same level of complexity

#

well, slightly more

cloud rivet
#

it's pretty cool

#

I probably am like a month ahead of where I would have been had I done it in vulkan instead I think

#

who knows

broken fog
#

yeah rt is great, you can basically get full gi, hard+soft shadows and perfect specular or rough reflections in 100 lines of code

#

making it fast otoh,,,

cloud rivet
#

Walking around Bremerton today

cloud rivet
#

Found a nice hobby store, they have Critical Role stuff froge_love

#

Too big to bring back with me though

cloud rivet
#

This place is busy

cloud rivet
#

hrm

#

I think OptiX supports mipmaps through cuda texture objects already

#

I don't think I have to like write my own solution

#

well

#

I'm not sure, I see:

optixTexFootprint2D
optixTexFootprint2DGrad
optixTexFootprint2DLod

which take cuda texture objects and have mipmap parameters, I don't really understand how they work

#

the optixWhitted example in the SDK samples a texture

cloud rivet
#

it's all via cuda

#

I wonder what optixTexFootprint2D help out with

#

For that you would usually implement tracking of differential rays inside the ray tracer which would allow calculating the current derivatives at a hit point which in turn lets the resp. texture grad function select the best mipmap LODs.

#

womp womp

#

yeah I guess that didn't make any sense to me anyway

cloud rivet
#

yeah I'm going to figure out textures

#

fuck it

#

we work on what we want to work on at any given moment gpAkkoShrug

astral hinge
cloud rivet
#

ok so Suzanne has a texture

#

so my first thing I'll try is to just create a cuda texture object without it creating an error

#

so the first actual thing is to just load the image

#

I'll try and use SDL for that

#

since I already have it as a dependency

#

i spent a lot of time today reading the optix programming guide, I have a much better understanding of shader binding tables now

#

I think it's maybe the best explanation of an SBT I have read so far

#

how that I understand how the records work and how it's associated with geometry and instances it makes a lot of sense

#

it's just a way to index into a table to get data, and the acceleration structure on a hit returns the sbt-index which I use to get the data that was configured by me in the sbt record

#

in my mind an sbt record is basically working similar to a push constant, except instead of me supplying it per draw, I set up the shader program sbt record with params, and the acceleration structure gives me the sbt-index to that record, plus the index of the triangle I hit, and then I take that sbt index to get the sbt record look up all my data in a buffer just like I would use the data in a push constant to look up all my data in a buffer for my slang shader

solid grove
#

i think they are just wrapping stb_image though. SDL_image uses the real libpng

cloud rivet
#

hrm

#

well suzanne has a png so I will use SDL3 since I already have it

#

didn't realize SDL_Image is a separate library

solid grove
#

image decoders can get pretty big, i think they keep it separate so as to not make the main lib huge

#

and to keep it free of external deps

cloud rivet
#

it's already huge

solid grove
#

and that's after I went in and trimmed out stuff from the decoder libraries. it was like 400mb before that

cloud rivet
#

SDL repos are absolute units

#

I want to load exr

#

so I will eventually get SDL_image

#

but not right now

solid grove
#

sdl_image doesn't load exr

cloud rivet
#

wtf

#

tinyexr then

solid grove
#

I use tinyexr. I tried getting the real OpenEXR SDK building but its dependency math lib requires RTTI to compile. the offline world is wild

cloud rivet
#

gross

#

I remember reading about the math library for openexr when I was investigating how to write my own exr lib

cloud rivet
#

it got pulled out of openexr into its own project if I recall

#

I actually got it to build

solid grove
#

why does a math lib need RTTI?

cloud rivet
#

to test it

#

the cli tool

astral hinge
#

well disabling language features is on you

cloud rivet
#

it was a bit involved to get that thing to build if I recall, it was a while ago, and it was related to the math library

#

I don't remember what the specific challenge was

#

I didn't disable any language features

#

once I had the cli tool it was really cool, it has a lot of features

#

you can change the compression, even remove compression

astral hinge
cloud rivet
#

hrm

#

I don't see SDL_LoadPNG

#

This function is available since SDL 3.4.0.

#

wonder what version is shipped in the vulkan sdk

#

GitHub Repo: sdl3, Version Tag: release-3.2.26

#

womp womp

#

hrmmm

#

I'll just use SDL_image, I don't want want to change my SDL

#

let's see how hard that is

#

3.4 is not even released yet

#

the latest pre-release is 3.3.4

astral hinge
#

you could also use stb image

cloud rivet
#
C:\Users\swart\projects\pixel_storm>git submodule update --init --recursive
Submodule 'external/aom' (https://github.com/libsdl-org/aom.git) registered for path 'external/SDL_image/external/aom'
Submodule 'external/dav1d' (https://github.com/libsdl-org/dav1d.git) registered for path 'external/SDL_image/external/dav1d'
Submodule 'external/jpeg' (https://github.com/libsdl-org/jpeg.git) registered for path 'external/SDL_image/external/jpeg'
Submodule 'external/libavif' (https://github.com/libsdl-org/libavif.git) registered for path 'external/SDL_image/external/libavif'
Submodule 'external/libjxl' (https://github.com/libsdl-org/libjxl.git) registered for path 'external/SDL_image/external/libjxl'
Submodule 'external/libpng' (https://github.com/libsdl-org/libpng.git) registered for path 'external/SDL_image/external/libpng'
Submodule 'external/libtiff' (https://github.com/libsdl-org/libtiff.git) registered for path 'external/SDL_image/external/libtiff'
Submodule 'external/libwebp' (https://github.com/libsdl-org/libwebp.git) registered for path 'external/SDL_image/external/libwebp'
Submodule 'external/zlib' (https://github.com/libsdl-org/zlib.git) registered for path 'external/SDL_image/external/zlib'
#

lol

#

it's git submodules all the way down

cloud rivet
astral hinge
#

well idk what sdl's png library's trade-offs are, so I can't judge

cloud rivet
#

probably not worth it whatever they are

astral hinge
#

yeah being an unfamiliar library is a disadvantage in itself

cloud rivet
#

I switched from vulkan sdl to git submodule anyway

#

forat 😅

#

that dependency change dramatically increased my compile time for that file

#

it's like 8 seconds

#

brutal

#

I'm going to put that in its own TU tbh

#

excited about having textures finally again

#

this project is cool

#

tomorrow I'll try and actually make a cuda texture object from this data

cloud rivet
# cloud rivet in my mind an sbt record is basically working similar to a push constant, except...

I kind of like this push constant mental model with regards to understanding SBTs. I want to refine it a bit:

SBTs are arrays of data that is similar to data supplied per draw in a graphics pipeline including what shaders to use and what materials are needed.

The difference is that with ray tracing you have to supply all that data at once, so it's put into a table that is constructed in such a way that a ray trace knows how to get the index into that array via the acceleration structure upon a hit.

graphics pipeline -> supply context to render via binding/push constants/uniforms for each draw
ray trace -> supply context to render via indexes into arrays of data in a table for each ray trace

brisk chasm
#

i was building openexr just last night, it was a little kick in the nuts i have to admit

vivid barn
#

SBT entry in my head is function pointer + associated data sitting next to it

cloud rivet
#

Oh it is yeah

#

Nice, I went from being confused a couple of days ago to feeling pretty good about SBTs

#

Now I want to clear up cuda texture objects and ray differentials

#

At least just for primary rays. I am not building a path tracer yet

cloud rivet
#

it looks like textures and surfaces are considered legacy in CUDA 13.1

#

they removed almost all mention of them from the new programming guide and are referring people using "existing code bases" to refer to the now legacy version of the programming guide

#

so like functions like tex2D are now only in the legacy guide

#

I asked on the optix forum about the status, since idk what someone using optix is supposed to use now, other than legacy functionality you have to go read about in a doc that's not being updated anymore

https://forums.developer.nvidia.com/t/using-cuda-texture-apis-with-optix-and-the-legacy-cuda-programming-guide/354581

#

well they still exist so I will use them

brisk chasm
#

so what do they sell now? pixelbuffers? ididnotread

cloud rivet
#

nothing I guess, read from linear memory using a pointer

#

I could do that tbh

brisk chasm
#

ah even "lower level" so to speak

cloud rivet
#

well the uvs map to some quad, and I randomly sample from that region

brisk chasm
#

its a raytracing engine after all

cloud rivet
#

but then I'm copying uncompressed images to the device?

brisk chasm
#

perhaps optix/cuda has extensions for compressed formats?

#

or supports that natively somehow

#

or you have to go with vulkan for that part?

cloud rivet
#

cuda texture objects support block compressed formats

#

but the documentation for working with textures is in the legacy docs

brisk chasm
#

then the material has to come from ray intersections only

#

no textures, just material properties

cloud rivet
#

I'm just going to use the stuff the way the sdk does frogshrug

cloud rivet
#

Maybe i can send the block compressed image to the device and since the shader is just C++ just use a compression library in the shader

#

Or maybe the NVIDIA texture tools sdk has something in it for device side

cloud rivet
#

When passing device pointers to NVTT, make sure the pointer refers to memory NVTT's device (and runtime API context if using the CUDA Driver API) can access. Similarly, when accessing data from device pointers returned from NVTT, make sure the current device (and context, if using the CUDA Driver API) can access allocations made by NVTT's device using the CUDA Runtime API. Device pointers are returned from nvtt::Surface::gpuData(), and used in the low-level GPU compression function API in nvtt_lowlevel.h.

#

seems like I can

#

// Shows how to use NVTT 3's low-level GPUInputBuffer API to compress a texture
// directly from a CUDA buffer. Using this API allows the input and output to
// exist on the GPU, avoiding GPU-to-CPU and CPU-to-GPU copies.

#

it just uses a cuda texture object lol

#

I suspect the response I'm going to get to my question is to continue to use tex2D and texture objects, despite it being "legacy". All of their sample code across three different NVIDIA platforms use cuda texture objects.

cloud rivet
#

It is recommended programs use cudaMallocHost() to allocate buffers which
will be used to send or receive data from GPUs.

#

oh

#

Allocates page-locked memory on the host

#

Page-locked memory will not communicate with hard drive. Therefore, the efficiency of reading and writing in page-locked memory is more guaranteed.

#

Host to GPU copies are much faster when they originate from pinned (page-locked) memory.

cloud rivet
#

I want mipmaps, but I don't think Suzanne's model has a good texture to see if mipmaps look good. I think the chess board ABeautifulGame gltf is better for that, but that's a whole scene. I think I will work on loading proper scene next, using ABeautifulGame and then work on mipmaps.

I think at this point though I could load my track and vehicle at any point and I could be working on my game, but I want to figure out scene material things first. Once I get tired of rendering stuff I can go back to my game.

solid grove
pseudo dock
#

the council will decide your fate
Does The Combine know that headcrabs are taking their orders from Suzanne?

cloud rivet
#

I'm going to pitch valve on supporting this project

pseudo dock
#

(Yes, after warming up the kids with a Half-Life 2 reference I did pull out a Transformers: The Movie reference 👴 )

cloud rivet
#

maybe I use deccer cubes to work on scene loading, and then abeautifulgame for mipmaps

vivid barn
#

for mipmaps you can make a tube scene

#

where the camera is placed inside a tube, facing along it

#

basically

#

or make a tube game

cloud rivet
#

hrmm a tube game

vivid barn
#

there was a game I played a few times at a friend's very long time ago as a smol kid

#

that I'm reluctant to name because the company behind it is cringe now

#

"lagsters"

#

it was basically racing but within a tube track

#

so you didn't have to turn but you had to like

#

position yourself within the tube to avoid obstacles

#

and pick up power ups

#

looking down a tube is going to be very anisotropic

#

could also be a use case for curve primitives

cloud rivet
#

I like the idea, thank you

astral hinge
shut hornet
# vivid barn looking down a tube is going to be very anisotropic

this game was in a web browser in 2001 or something. https://www.youtube.com/watch?v=urIqLheLwtU

Here's the full verison
http://lostgames00.blogspot.com/2016/02/here-are-list-of-wildtangent-games-that.html

To play the Full version:

  1. first create both a file "Shockwave.com"1st and "Betty Bad"2nd on Program File before installing,

  2. then Browse the destination folder to Program Files.

For Example:
C:\Program Files\Shockwave.com\B...

▶ Play video
#

feel like we are just getting scammed non-stop

#

actually, I should probably cancel all projects and do a remake of this 🤔

cloud rivet
#

hrmmm

#

a tube game is a series of tubes

#

I'd need to make some textures for the tubes

cloud rivet
#

I thought it was a joke

#

big GPU is always trying to sell us more GPUs when were having fun with what we had already

#

but no, I want more GPUs in reality

#

with more stuff

vivid barn
#

real

cloud rivet
#

Pretty interesting

#

cutiles let’s you do array operations where tiles are sub-arrays, opaque objects, that are cheap to copy and unlock better compiler optimizations

#

It’s funny when NVIDIA refers to portability they mean portable across NVIDIA architectures

#

This isn’t available via C++ yet, just python

brisk chasm
#

C++ython

cloud rivet
#

YouTube learned I am using CUDA

#

I will get CUDA video recommendations now

brisk chasm
#

wait until it pivots to Modern OpenGL tutorials (in python)

cloud rivet
#

I am pretty CUDA pilled right now. It would be a bitter pill to swallow to go back to vulkan

#

I do miss validation

#

there is validation for optix, I haven't turned it on yet

#

I should actually

#
options.validationMode = OPTIX_DEVICE_CONTEXT_VALIDATION_MODE_ALL;
#

agonyfrog

[OptiX][2][VALIDATION_ERROR]: [TRACE_DEPTH_EXCEEDED] Trace depth exceeded.
    launch index: [891, 367, 0]
    additional occurrences: 9407
The maximum optixTrace recursion depth is exceeded. See OptixPipelineLinkOptions::maxTraceDepth.
[OptiX][2][ERROR]: Error syncing stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
OptiX error: Error during validation mode run (code 7053) at C:\Users\swart\projects\pixel_storm\src\ps_game.cu:821
#

I don't recurse though thinkeyes

#

I have misconfigured something

#

The link options consist of the maximum recursion depth setting for recursive ray tracing, along with pipeline level settings for debugging. However, the value for the maximum recursion depth has an upper limit that overrides an limit set by the link options.

#

ohh

#

I bet it's my shadows

#
  OptixPipelineLinkOptions link_options = {};
  link_options.maxTraceDepth = 1;
vivid barn
#

when cluster AS

cloud rivet
#

yup that fixed it

cloud rivet
brisk chasm
#

im looking forward to the blog post on our blog explaining all that optix shit

vivid barn
#

bjorn in for some suffering

cloud rivet
#

I live to suffer

#

I don't think I'm a credible authority on optix, I have been using it for like 2 minutes, but I'm always happy to share how I do things, and if I do, I hope someone tears into it to tell me about all of the millions of things I'm doing incorrectly

brisk chasm
#

that wont happen, nano will make sure you are on the right track 🙂

vivid barn
#

vk rt transfers pretty straightforwardly to optix, and back (vk has all the features optix has either in form of khr or nv exts)

#

I haven't used any nv ones (I don't have the hw) and my optix knowledge comes from zooming around optix programming guide and cycles' source but I think I'm pretty comfortable looking at optix code

cloud rivet
#

yes, it seems very similar to vulkan conceptually

#

I saw there's now a reordering extension with the latest vk sdk

#

I think maybe people interested in vulkan ray tracing might benefit from reading the optix programming guide

#

because it's very good I think

vivid barn
#

well these days these features come to those other apis first and to optix second

#

optix docs are pretty bad tbh

#

like even look at optixAccelBuild

cloud rivet
#

oh I read the guide for a high level understanding, then look at the sample code for how to use it

vivid barn
#

nothing ever tells you the instance array takes a pointer to an array of optixInstance

cloud rivet
#

I am constantly switching back and forth between pdfs, and notepad++ to figure stuff out

#

does optix get the feature second? it has shader execution reordering and cluster AS

#

I thought it would get it first

#

what's an example of a feature that vulkan/dx got first?

#

the resource allocation for the vulkan/dx drivers is probably larger than for optix

#

so I guess that makes sense

vivid barn
#

nv-specific cluster AS also for like a year or more

cloud rivet
#

oh

#

so the vulkan nv version comes out before optix

#

yes that only came out recently on optix

vivid barn
#

there's also khr position fetch

#

I think optix had pos fetch in 8 so it might w/eburger

#

ignore that one

vivid barn
#

while page-locked means it has been mapped to the device earlier

#

and cudaMallocHost can also be swapped to disk you know

#

it's just that because demand paging is generally non-existent for non-cpu-threads (current exception being linux with certain compute apis) faulting happens at submission boundaries

#

and the memory stays basically mlock'ed for the duration of a submission

#

anyway I'd rather not think about cudaMalloc this cudaMallocHost that but just use cudaMallocManaged

#

it's basically cudaMalloc{,Host} in one but you can cudaMemAdvise later to migrate it between host and device

#

also removes the need for some cudaMemcpies like I said

#

it's basically the nicest thing short of fully functioning unified shared memory that you get on cuda linux

#

where you can just pass any random pointer to the device and it just werks, no need for cudaMalloc* of any kind

vivid barn
cloud rivet
#

where it is still documented

#

it also says texture memory offers no benefits

#

regarding cudaMallocManaged I don't know, it seems like I know what's happing if I explicitly cudaMemcpy. I should try it

#

and see in nsight what it does

#

to build trust

vivid barn
#

cudaMallocManaged is basically just cudaMalloc{,Host} that can be migrated between host and device

#

it doesn't magically do things on its own except on first use

cloud rivet
#

going to try it

vivid barn
#

because yes there's no "texture memory" (hasn't been in a while, like 14 years)

#

it doesn't say texture/surface objects are deprecated though

cloud rivet
#

reading the cudaMallocManaged huge doc comment

#

no it doesn't say deprecated

#

it says legacy and should no longer be used for texture memory

#

and if you want to know how to make sense out of texture fetching from the CUDA/Optix example code you somehow have to figure out to look at the legacy programming guide

vivid barn
#

no it doesn't call them out as legacy apis

#

idk why they're not documented

cloud rivet
#

Introduced the new CUDA Programming Guide, the official, comprehensive resource on the CUDA programming model. The guide has been restructured into five parts that cover a language agnostic overview of CUDA, introductory and advanced CUDA programming in C++ and Python, detailed descriptions of specific CUDA features, and technical appendices for reference. The legacy CUDA C++ Programming Guide remains available in this release but is deprecated and includes notices that direct readers to the new guide.

vivid barn
#

but it'd be pretty weird because there's no replacement for these apis for cases when you want to sample an image with filtering

cloud rivet
#

The legacy CUDA C++ Programming Guide remains available in this release but is deprecated and includes notices that direct readers to the new guide.

cloud rivet
#

it sucks tbh

#

yeah

vivid barn
#

I'm compelled to just write it off at docs being a meme

cloud rivet
#

well I asked about it

#

just a random person responded, no nvidia devs

vivid barn
#

in any case there are reasons to give up hw sampling sure

#

at which point you can also kinda give up hw doing format conversion

#

e.g. if you one day were to decide to implement neural textures you wouldn't be using hw sampling

#

though you'd still benefit from using surface apis to write out results of your renderer

cloud rivet
#

how do you give up on hw sampling? can you do sampling with block compression image formats?

#

or just don't use compression?

vivid barn
#

you nih your own conversion as seen in neural textures

cloud rivet
#

why is that better

vivid barn
#

well for a start it lets you not use hw formats but nih your own which might be better

#

neural textures are built on observation that you don't just compress like 3 or 4 channels at once but like 15

#

(the texture sets, like beside rgb base color you also have normal map, microfacet roughness, blend between metallic and dielectric, etc)

cloud rivet
#

ohh

vivid barn
#

those channels have a lot of correlation

#

so nihing your own format lets you exploit that correlation and potentially win over hw BCn etc compression

#

a prerequisite to giving up on hw sampling is usually switching to stochastic filtering

#

(see filtering after shading)

#

if you're migrating away from hw sampling you usually do that step first

#

that entails replacing your tex2DGrads with texel fetches of randomly chosen texels

#

what were weights in the weighted sum of aniso box filter become PMFs of texels so it's pretty straightforward in that regard

#

neural textures though are kind of a meme tbh

#

don't bother with those

#

filtering after shading is useful though

cloud rivet
#

will take a look thanks

#

I have to look up a few of these things

cloud rivet
#

so thinking through how to set up a scene, just as first attempt with very minimal materials I am thinking of this approach:

typedef struct ps_texture_t {
  cudaTextureObject_t texture;
  cudaArray_t pixels;
} ps_texture_t;

typedef struct ps_sampler_t {
  bool linear_filtering;
} ps_sampler_t;

typedef struct ps_material_t {
  ps_texture_t base_color_texture;
  ps_sampler_t base_color_sampler;
} ps_material_t;

typedef struct ps_triangle_material_t {
  size_t material_index;
} ps_triangle_material_t;

typedef struct ps_mesh_t {
  ps_vertex_attributes_t *vertices;
  uint3 *indices;
  ps_triangle_material_t tri_material_map;
  CUdeviceptr d_vertices;
  CUdeviceptr d_indices;
  CUdeviceptr d_gas_output;
  OptixTraversableHandle gas_handle;
} ps_mesh_t;

typedef struct ps_node_t {
  ps_m4x4 transform;
  ps_node_t *child_nodes;
  OptixInstance *instance;
} ps_node_t;

typedef struct ps_scene_data_t {
  ps_material_t *materials;
  ps_mesh_t *meshes;
  ps_node_t *nodes;
  ps_node_t *root;
  OptixTraversableHandle ias_handle;
} ps_scene_data_t;
vivid barn
#

btw consider making a typesafe wrapper for CUdeviceptr

#

or just use T*

cloud rivet
#

why?

vivid barn
#

because it gets really hard to follow after at 2 levels of indirection already

#

same way void* is hard to follow

#

anyway

cloud rivet
#

oh I see

vivid barn
#

looks like roughly good general direction

cloud rivet
#

I see so that I'm not passing a random pointer to the wrong function, and I get compile time safety

spiral ice
cloud rivet
#

Oh yes I need counts

#

This was just me thinking through the structures

shut hornet
# vivid barn what

that we had basically full DX9 capability on the web 25 years ago, and the situation hasn't improved much

vivid barn
#

ping @ jasperrlz about it I'm sure he'll instantly tell you where you're wrong

cloud rivet
cloud rivet
#

the video capture is creating all those artifacts when I go in

#

RT looks pretty good. There's no actual lighting in this, just the shadow trace

#

the lack of mipmaps though is brutal

hushed creek
#

nice sponzer

cloud rivet
#

thanks!

brisk chasm
#

nice cubes too

tight torrent
cloud rivet
#

yeah there's no graphics pipeline in cuda

#

so it's all RT

tight torrent
#

Yikes

#

Calculating the gradient for mipmaps was rough when I was doing full rt

#

Have fun

cloud rivet
#

heh

cloud rivet
#

I did a C++ metaprogramming thingy

template <typename T>
ps_internal void allocate_list(auto &list, i32 &count, T **ptr) {
  count = list.size();
  if (!count) {
    *ptr = nullptr;
    return;
  }
  *ptr = (T *)malloc(sizeof(T) * count);
  PS_MEM_CHECK(*ptr);
}

void ps_gltf_set_counts(ps_gltf_t *gltf, ps_asset_t *asset) {
  fastgltf::Asset &fgltf = gltf->asset.get();
  allocate_list(fgltf.scenes, asset->num_scenes, &asset->scenes);
  allocate_list(fgltf.nodes, asset->num_nodes, &asset->nodes);
  allocate_list(fgltf.meshes, asset->num_meshes, &asset->meshes);
  allocate_list(fgltf.textures, asset->num_textures, &asset->textures);
  allocate_list(fgltf.images, asset->num_images, &asset->images);
  allocate_list(fgltf.samplers, asset->num_samplers, &asset->samplers);
  allocate_list(fgltf.materials, asset->num_materials, &asset->materials);
}
#

I don't normally do those, but here it seemed like nice to do

wraith urchin
#

I think this code is UB if the type isn't trivially copyable

#

also why not return the pointer instead of taking a pointer to a pointer?

cloud rivet
#

to set it to nullptr if size is empty

#

nothing is being copied here though

#

it's just setting a count and allocating memory

#

my gltf code had so many bugs, the overall approach worked, it was just death by a thousand cuts, and I bled through so many. it's working well now though

#

misplaced increments, wrong pointers used, all kinds of trivial issues that just caused illegal pointer use errors or cuda checks to fail with no helpful error messages, even with optix validation on

cloud rivet
#

I just use structs, pointers and scalar values

wraith urchin
cloud rivet
#

current project scale. I quarantined fastgltf and sdl_image to cpp files, because they slow down compile time and require std and C++ semantics I don't like

#

fastgltf also had a bunch of problems in .cu files

#

I'm gonna add tinyexr, a sky and then work on ray differentials I guess

wraith urchin
#

Is the EXR for the skybox?

cloud rivet
#

yes

#

alright I think I'm done researching, time to write the code

shut hornet
cloud rivet
#

why do you like .hdr better?

astral hinge
#

it has a cooler name

cloud rivet
#

the optix sdk has a tinyexr example is why I am using it, and I don't really know what I'm doing so I need that crutch

wraith urchin
#

Isn't .hdr like some borked thing with a shared exponent?

cloud rivet
#

what does ASO use?

wraith urchin
#

EXR

#

16bit PNG is also supported

cloud rivet
#

I have a tiny bit of understanding of the exr format, as in my previous project I was investigating writing my own exr library, so I read through the spec a bit and played around with the cli tool

wraith urchin
#

I use OpenEXR though, not tinyexr

cloud rivet
#

the problem with any of this NIH stuff and images is always having to write comression code

shut hornet
cloud rivet
#

I don't use stb

wraith urchin
#

although for the version on steam I ship it as BC6H inside LZ4

shut hornet
#

HDR is also about 1/3rd the file size (for same resolution) as EXR

wraith urchin
#

I just use EXR for dev because it's supported by GIMP & Blender

shut hornet
#

HDR is as well

cloud rivet
#

my image libraries are going to be SDL_image, tinyexr (assuming it works), and NVTT and hopefully that's all I'll ever need

wraith urchin
#

The reason its 1/3 the size is because it loses data lol

shut hornet
#

yes, it's compressed

wraith urchin
#

Lossy compression

shut hornet
#

but 33% the size is significant

cloud rivet
#

exr also has compression options

#

which compression are you comparing it to I guess

wraith urchin
#

But I never ship the EXR so the size doesn't matter

cloud rivet
#

exr doesn't just have a default compression

wraith urchin
#

the lossy compression stage at the end is gonna be BC6H anyways

#

And you definetly don't wanna do two rounds of lossy compression

shut hornet
#

fair

#

in my case i didn't see a difference, but it depends what you are doing

cloud rivet
#

idk I appreciate learning about .hdr, thank you

wraith urchin
#

EXR has two different option for compression

cloud rivet
#

it has a bunch based on my experience with the cli tool

#

idk for sure

shut hornet
#

probably i guess I can leave EXR in there since it already works but I'd prefer less moving parts

cloud rivet
#

maybe I was mistaken

wraith urchin
#

Oh wait actually it does have more

cloud rivet
#

yeah

wraith urchin
#

I only ever used zip & piz

cloud rivet
#

tinyexr also documents a bunch of compression options

#

which I think are just a subset

#

if you look at the open exr tool

#

its cli help is like the clang compiler

#

massive

wraith urchin
#

Ok so I'm using Imf::PIZ_COMPRESSION

#

To save

cloud rivet
#

I'm just using whatever I downloaded from polyhaven

shut hornet
cloud rivet
#

ok

#

sun is glaring

#

everything looks a little nicer now

#

ok I guess I have to learn what ray differentials are now

#

at some point I should compact these acceleration structures too

#

and use mesh optimizer

#

I feel like I'm on track still to be back to working on my game before the end of the month

astral hinge
cloud rivet
#

I will read through it

astral hinge
#

the basic idea for ray differentials is to intersect two imaginary rays (one above the current pixel, and one to the right) with the plane of the surface hit by the primary ray

cloud rivet
#

will this require I start resorting to a cache

astral hinge
#

then you can get the uv of the imaginary rays and then calculate the partial derivatives through shrimple subtraction

astral hinge
#

the code for what I'm describing is like 5 loc I think

cloud rivet
#

you know how ray tracing renders look noisy and then incease in detail if you remain still

#

there's some cache yes

#

I think I see that in your game?

#

idk

astral hinge
#

instead of

auto ray = MakeRay(coord);
auto hit = TraceRay(ray);
auto color = Sample(hit.texture, hit.uv);

you do

auto ray = MakeRay(coord);
auto rayDx = MakeRay(coord + {1, 0});
auto rayDy = MakeRay(coord + {0, 1});
auto hit = TraceRay(ray);
auto hitDx = IntersectPlane(rayDx, hit.position, hit.normal);
auto hitDy = IntersectPlane(rayDy, hit.position, hit.normal);
auto uvDx = hit.uv - hitDx.uv;
auto uvDy = hit.uv - hitDy.uv;
auto color = SampleGrad(hit.texture, hit.uv, uvDx, uvDy);
#

you can see the extra two fake rays I added

cloud rivet
#

thank you

astral hinge
#

my game uses DDGI but it still has some temporal stuff to make it work good. it doesn't have the characteristic "noisy then smooth" artifact though

#

ray differentials (which I don't even have btw) are just for nice texture sampling

#

and don't require anything but a little bit of math in the shader

astral hinge
cloud rivet
#

oh ok

astral hinge
#

but writing it myself helps solidify it in my brain 😌

cloud rivet
#

that's why I want too

#

I don't want any black boxes I don't understand

astral hinge
#

your paper has another technique based on ray cones, but I didn't read all of the paper to understand how the cones come into existence

#

I'm guessing it is simpler though

cloud rivet
#

I'm concerned a bit with perf

astral hinge
#

does optix or cuda give you a function for sampling with gradients? or do you have to implement anisotropic filtering yourself?

cloud rivet
#

cuda has something

#

yes

#

tex2DGrad and tex2DLod

#

or something

astral hinge
#

ok nice, that's the only part that would destroy perf if you had to implement it manually

#

but doing it in hardware is practically free

cloud rivet
#
auto ray = MakeRay(coord);
auto hit = TraceRay(ray);
auto color = Sample(hit.texture, hit.uv);
#

this isn't how my shaders work fwiw

#

the first part is in the raygen

#

the hit then does the sample

#

in your example you seem to be sampling the texture in the same place as where the trace is being called

#

which is the raygen shader

astral hinge
#

where do you have the texture sampling?

cloud rivet
#

in the closest hit shader

astral hinge
#

you can pass it the info necessary to spawn the imaginary rays

#

which is just the pixel coordinate tbh

cloud rivet
#

well

#

there are some optix functions I can call in the hit shaders

#

that can probably give me this yes

#

stuff like

const float3 ray_dir = optixGetWorldRayDirection();

const float3 P = optixGetWorldRayOrigin() + optixGetRayTmax() * ray_dir;
astral hinge
#

hmm I'm not sure how optix works exactly. how can you pass info to hit shaders?

cloud rivet
#

a few diferent ways

#

I can over allocate the buffer for the acceleration structure and stick data there, I can vary the vertex stride and stick data there adjaced to the vertex, I can allocate device memory and stick pointers to that memory on the shader binding table records, and then there's a global param I can stick pointers and scalar values on

#

I don't really have any problem getting whatever data I need

#

in terms of application data

#

the data available in the context of the ray trace execution all comes via optix functions though

#

I don't know of a way to pass data directly from ray gen to the hit shader, I think that would be problematic

#

hrm

#

although optixTrace might let me attach something

astral hinge
#

in vulkan rt there is a small piece of data that can be passed between stages

#

I forgor what it's called

cloud rivet
#

oh yeah there's a payload

#

yeah

astral hinge
#

ah yeah that's what it's called in vk

cloud rivet
#

I can do that too

#

I haven't had a need for it

astral hinge
#

so I think the payload is how you're "supposed" to pass, like, the pixel coord that spawned the ray

#

then you can use that in the hit shader to spawn the fake rays for calculating derivatives

cloud rivet
#

oh perfect, thank you

#

how's your game going?

astral hinge
#

idk how far I'll take it. it just seems neat

astral hinge
#

which is handy

cloud rivet
#

oh

#

yes

astral hinge
#

that means you can do the shading in the raygen shader if you pass the hit info back from the hit shader

cloud rivet
#

that's what I do, return the colors

#

sorry

astral hinge
#

but having a big payload is limiting for perf so maybe you want to shade in the hit shader anyway

cloud rivet
#

static __forceinline__ __device__ void setPayload(float3 p) {
  optixSetPayload_0(__float_as_uint(p.x));
  optixSetPayload_1(__float_as_uint(p.y));
  optixSetPayload_2(__float_as_uint(p.z));
}
#

is how this works in my project

astral hinge
#

noice

cloud rivet
#

I just write data back, but I hadn't considered reading from it in the hit shader I guess

#

and it's done via those functions so I didn't associate it with the param I guess

astral hinge
#

I'm not sure whether it's better to do the shading in the hit shader or the raygen shader

#

I think it might be cleaner to have the raygen shader do everything

cloud rivet
#

my hit shader does it's own optixTraverse to see if it is occluded

astral hinge
cloud rivet
#

yes

#

hrm

astral hinge
#

oh I think perf favors the big raygen shader because you can do shader execution reordering at a finer granularity

#

I don't think SER is supported in other stages

cloud rivet
#

I have only briefly read about reordering

#

optixReorder is available in ray gen only in optix

cloud rivet
#

man

#

yes

#

you're building this yourself on top of vulkan?

astral hinge
#

yeah I don't think it will be hard

cloud rivet
#

I love this document

#

I am going to reread it a bunch

astral hinge
#

it's 18000 words btw

cloud rivet
#

yes

#

I skimmed it and decided I wanted to read it in detail

#

over time

astral hinge
#

it's mostly history tbh

#

it gives context for the api simplifications he makes

vivid barn
cloud rivet
#

nice

vivid barn
#

years ago

#

I'm going to post something toxic for your eyes (go)

#

also slang

cloud rivet
#

oh no

vivid barn
#
public struct MeshPart {
    public void *triangles;
    public uint32_t num_triangles;
    public float3 *posBuffer;
    public float3 *normals;
}

// TODO: kill kill kill
public struct MaterialParams {
    public interpreterProgram program;

    public MeshPart meshPart;

    // must be aligned to 8-byte boundary
    public uint8_t[256] args;
}

public struct Scene {
    public int64_t maxPartsPerMesh;

    public Slice<MaterialParams> materialParams;

    public Slice<AccelInstance> accelInstances;

    public Accel accel;

    public lightAccel lightAccel;

    public Sampler sampler;

    // there's more fields here, consult scene.go
};
#

many indirections

#
public struct interpreterProgram {
    public interpreterABI abi;
    public uint32_t *code;
}
#

abi should not be called abi but something more specific and be behind an indirection too

#

host side is very simple

#
func NewInterpretedMaterial(blob *matc.InterpretedMaterial) *InterpretedMaterial {
    device := gpu.MakeSliceUncached[uint32](len(blob.Code))
    copy(device.Value(), blob.Code)

    return &InterpretedMaterial{
        emissive: blob.ABI.EDFCount > 0,
        program: material.InterpreterProgram{
            ABI:  blob.ABI,
            Code: gpu.SliceData(device),
        },
    }
}
#

gpu.MakeSliceUncached thingy and that's it

#

.Value() gets you host []T from a gpu.Slice[T]

#

unfortunately I have to deal with this cope until I'm able to have addresses match between host and device address spaces (or better yet have just one)

#

this is not something I cooked up in my go abstraction btw

#

I had similar thingy in my C stuff before

#

that bit of code is just something I used to poke the driver with

#

this is how I map device pointers to host address space in my toy program

#

it's horrendous yes

#

and it's very easy to do better

#

with e.g. a radix tree

#

I just don't care atm

cloud rivet
#

is it slow?

vivid barn
#

probably

#

I mean my app barely spends any time on cpu atm

#

also I should stop calling my thing uncached

#

I haven't been using ~HOST_CACHED type in forever

#

there doesn't seem to be a penalty for it on neither deck nor on the puter with gigachungus igp

cloud rivet
#

seems fine then

#

that's a nice API

vivid barn
#

it's really not it'd be nicer if I could have things in the same address space

#

like cudaMallocManaged

cloud rivet
#

I don't think everyone should have to write their own

vivid barn
#

yes there should be a library for it I agree

#

or better yet we should have demand paging so we can just use our normal allocator we hammer on host all the time

#

i.e. malloc
but this is me asking too much tbh

cloud rivet
#

once I get to reading through the cuda programming guide I will clean up my memory use

#

I think I will have to rewrite some code

vivid barn
#

I might have had some of them implemented already

astral hinge
#

well none of it looks particularly hard to implement

#

it's just a reduced api surface basically

#

plus a hashmap for the pointer thingy

vivid barn
#

uhh

#

you can't use hashmap

astral hinge
#

oh actually the pointer thing is gonna be harder yeah

vivid barn
#

you have to use some kind of interval mapper

astral hinge
#

yeah

#

that's probably easy though ididnotread

vivid barn
#

sorted list of intervals is the shrimplest

#

it is

#

radix tree is more annoying

astral hinge
#

yeah that's ez

vivid barn
#

I'm p happy about my hw image api

astral hinge
#

ong

vivid barn
#

shame I hate hw images

#

are you switching to shader bobject too frog_pregnant

astral hinge
#

idk I'm going to see what I can do with just dynamic state

#

uh I think shader object has renderdoc support now

#

and descriptor buffer

shut hornet
vivid barn
#

descriptor buffer is h onestly a meme

#

unless you're dxvk or something then it';s very good

astral hinge
#

yeah I don't personally think it's very useful for me

#

but since this is an exercise in reducing api surface it might be interesting to mess around with

vivid barn
#

in my toy program's abstraction I have very uhh

#

glbrained

#

image api

astral hinge
#

how so

vivid barn
#

and like you don't need descriptors for anything else so that's the only thing that needs addressing

#

so you create an image

#

but actually it creates an image + image view

#

so the resulting object is immediately usable for e.g. descriptors and stuff

astral hinge
#

oh yes I was actually thinking of having a hidden chungus descriptor set instead of exposing any descriptor stuff

#

and you just get resource indices

vivid barn
#

yeah

#

ok right you're in C++

#

you can do better than resource index cope I think

#

or rather hide it away

astral hinge
#

I can have decent code sharing to make it nice to work with

vivid barn
#

ok hear me out

astral hinge
#

like a function in c++ that returns Image2D and an equivalent type (with the same memory layout) in glsl

#

I already have that in my game

vivid barn
#

what if you had your own object type (like I have Image in my toy abstraction)

#

but you allocated it out of a huge std::vector<Image>

astral hinge
#

hmm like a VkJaker?

vivid barn
#

no

#

loike this idk this is already basically a VkImageView (well it just has two descriptors)

#

any3way

#

you could allocate it out of a chungus vector<Image>

#

pass the host pointer as is to the gpu

astral hinge
#

hmm why would I do that

vivid barn
#

do some math like

#

(pointer - where_that_vector_starts) / sizeof(Image)

#

????????

#

descriptor index.

#

@shut hornet cease and disperse

astral hinge
#

I see but I don't think it helps a lot tbh

#

it moves complexity to the shader where it now has to think about the cpu pointer offset

vivid barn
#

it helps in that you don't have several identities for your image objects, only host pointer

#

and in shader you hide that complexity somewhere

#

same as you'd on host

#

or

#

or

astral hinge
#

true but I think I can smuggle the texture identity into a few bits of the descriptor index

vivid barn
#

you could do the opposite

#

and have your Image be descriptor index on host

#

and have host methods look things up in some out of band vector<ActualImageStuff> array

#

so your class Image is just uint32_t index

vivid barn
#

no Image* vs descriptor index dichotomy

astral hinge
#

and I could do that with what I'm describing

vivid barn
#

i c i c

astral hinge
#

hmm

#

ok I am not describing what you are

#

I thought you were trying to avoid having Image2D, Image3D, Image2DMS, etc.

cloud rivet
astral hinge
#

but I think avoiding those isn't necessarily a good thing

vivid barn
cloud rivet
#

group items of the same type into arrays, and treat the array base pointer as system-private

vivid barn
#

yes

#

but

#

that's generally not a great thing to do/have imo

#

we're only doing it here because we're coping with the fact that some hw insists on having descriptors basically live in their own address space

#

(and the addresses are indices)

#

and that's how the apis are anyway

astral hinge
#

your point is I can just pass &image to a shader and then, with just that and some secret extra info, sample the texture from the gpu

#

whereas with my original design I need to query an index from the image, then pass it to the shader

vivid barn
#

and store the actual data, which would be accessed by the methods of that type, out of band

#

but ye

astral hinge
#

how would it look to write code that uses that?

#

I'm just trying to see how it simplifies things

vivid barn
#

so

astral hinge
#

hmm ok

#

so that means moving the zoo of different image types to the cpu if I'm striving for type safety

vivid barn
#

that's orthogonal

#
struct ActualFunStuff {
 VkImageView vkView;
};

bit allocator;
std::vector<ActualFunStuff> actualFunStuff;

class Image {
 uint32_t index;

 /* e.g. */
 Image SubImage(format Format, /* other things for reinterpretation */) const {
  /* create new image view here */
 }

 /* other stuff you'd want to do with an image view on cpu */
};
astral hinge
#

well how do I do .memberWhichIsAnImage2D = someImage on the cpu then, and have the types match

vivid barn
#

ok yeah in that case you probably wanna do a bunch of typed wrappers I guess

#

Image2D etc

astral hinge
#

it's just a detail tbh

#

not an important one

cloud rivet
#

oh I guess I can have type safe image types in my project, I can typedef metallic/normal/base image shaders on the cpu when I extract them from fastgltf or wherever and then use those types in my materials that I read in the shader. right now I just have a bunch of cudaTextureObject_t

#

also a bit orthogonal

vivid barn
#

that feels like the wrong "plane" to encode into the types

astral hinge
#

hmm yeah there are definitely cases where you want an image object to represent different image shapes

#

like multisample vs not multisample

cloud rivet
#

In february spent a long time being frustrated with my normal map because they ended up being srgb

astral hinge
#

but otherwise I don't think anyone needs to replace a 2D image with a 3D image or something

cloud rivet
#

and had visible artifacts

#

that I misunderstood as being something wrong with my math but it was just the dumb image format

vivid barn
astral hinge
#

I'm not sure encoding srgbness into the type is the solution but maybe it is

cloud rivet
#

I don't know

vivid barn
#

yeah it's probably not

astral hinge
#

this might just be one of those problems where you suffer once and then the only solution is to never forget the lesson you learned

cloud rivet
#

well if you all make a cool easy to use GPU API that makes life easier working with images and memory and open source it I would try it

#

I'm pretty happy with what I have so far in my project

vivid barn
#

I have such api already

#

it's up on my github!

cloud rivet
#

your toy project uses the worldspawn library?

vivid barn
#

yes

#

my toy project is called worldspawn

#

it's just the gpu thing is sourcedropped

#

the rest is not open source although many of the components could be

#

I'll get to it some time

#

it's just that it needs to be sanitized

#

or the gazers might end up needing eyebleach

#

#1181368932637093888 is worldspawn tbc

astral hinge
#

I think I'll just do compute and no graphics or presentation initially

#

all that stuff is bloat

vivid barn
#

based

cloud rivet
#

who needs mipmaps

#

this is totally fine

#

jk, jk

#

I think that's a good test right there

#

that view

#

also this one

cloud rivet
#

so in the optix samples I see

#
#include <cuda/random.h>
#
const float2 subpixel_jitter =
        subframe_index == 0 ? make_float2( 0.5f, 0.5f ) : make_float2( rnd( seed ), rnd( seed ) );
#

this is in a raygen shader

#
 unsigned int seed = tea<4>( launch_idx.y * launch_dims.x + launch_idx.x, subframe_index );

    // The center of each pixel is at fraction (0.5,0.5)
    const float2 subpixel_jitter =
        subframe_index == 0 ? make_float2( 0.5f, 0.5f ) : make_float2( rnd( seed ), rnd( seed ) );

    const float2 d =
        2.0f
            * make_float2( ( static_cast<float>( launch_idx.x ) + subpixel_jitter.x ) / static_cast<float>( launch_dims.x ),
                           ( static_cast<float>( launch_idx.y ) + subpixel_jitter.y ) / static_cast<float>( launch_dims.y ) )
        - 1.0f;
    const float3 ray_direction = normalize( d.x * U + d.y * V + W );
    const float3 ray_origin    = eye;
astral hinge
#

what's the question

cloud rivet
#

no question. just looking through the sampler code for how to do things and commenting out loud about it I guess

#

I was looking at this paragraph

#

so I thought maybe I could just use some jitter

#

and sample level 0

astral hinge
#

give it a shot

#

what's your goal for the jitter?

cloud rivet
#

reduce the aliasing

#

still reading through the doc

pseudo dock
#

In my renderer I always do one primary ray exactly in the center of the pixel (using rasterization) and then one additional ray that is randomly jittered (for antialiasing rather than texturing). It converges (mentally) to the correct result over time but is definitely noisy for any one single frame (as the quote you show from the article says, although it's talking about textures). I have a memory that you've maybe said you don't want the noisy look, though?

#

It's not directly comparable, but I found a section of a video where I zoom in close in my renderer and show an example of what I'm talking about (hopefully the link with the timestamp works): https://www.youtube.com/watch?v=oUfrtmJeyEU&t=1632s
It's hard to tell from a still shot or from video what it actually looks like in realtime (it definitely looks much better in motion than it does in a single screenshot hehe), but maybe might give at least a general idea of what texturing might look like if you went that route with only a few rays per pixel.

If you do temporal accumulation though then of course this doesn't apply as much and it actually would converge over time.

I had a goal to create graphics that emulate what my eyes see when I am in a real room in my house and I wanted the program to run interactively at high frame rates.

This video shows the results and discusses some of what makes path tracing cool. It also explores some of the other graphics techniques that contribute to making the generated imag...

▶ Play video
cloud rivet
#

thank you

#

I've watched that video before

#

it's great

#

you have a very clear way of explaining concepts

pseudo dock
#

That's kind of you to say! I think maybe having a music background and only coming to programing later in life leads to me thinking about things differently compared to many programmers and maybe the way I talk about things also takes a more intuitive/conceptual because of that (and also because I often have trouble understanding more programmer-y explanations myself hehe). At work, at least, I sometimes am able to talk artist language hehe, so that's gratifying to hear that some of that maybe also comes through in video form.

But I didn't mean to derail your thread! I've considered maybe trying the same approach to textures when I finally add them to my current project so I'd be interested in seeing the results if you try. Intuitively, though, my suspicion is that using mipmaps is probably better both for quality and to help with texture caching, as that quote says.

cloud rivet
#

you didn't derail anything, thank you. i have to learn all of these things

#

I think I understand what that paper is saying, I send some extra data from the ray gen corresponding a ray 1 pixel up and another one pixel right, and then I use the information I get from the triangle that I intersect with to generate additional points for the information I sent, and from that I can get a differential to calculate the lod when sampling the texture

#

regarding your video, I really like the HDR portion, the whole video is great

#

but I also like how you explain how noise helps the brain fill more information than there is

#

tomorrow I will generate mipmaps and try to see if I can generate a render without all the aliasing

#

if I can get that all done I'll get back to my track and vehicle physics, and get that working, and I'll be where I left it before my I reached that precipice

astral hinge
cloud rivet
#

no I don't think I will. just try ray differentials

#

seems straight forward, it's just one ray, it's just a big pay load per ray

pseudo dock
#

The thing that really made noise click for me personally was working on dithering at my previous job. Unfortunately I can't go into details too much because that industry is much more closed and secretive than games, but the effect that good dithering has on the brain feels like magic to me.
Since falling in love with dithering I've probably gone too far with embracing all kinds of noise, but it has also made me appreciate monte carlo techniques in a way that I didn't quite understand as much back when I was in school doing path tracing. It really does help the brain in a way that feels magical to me (like, the fact that we can effectively do these continuous integrals over the entire hemisphere at every single bounce is incredible to me).

astral hinge
#

noise is epic

#

I love monte carlo integration

pseudo dock
#

Yes, me too! 😍

#

It is seriously so cool

vivid barn
#

you can have both ray differentials and noise

#

with stochastic filtering

astral hinge
#

@vivid barn re: mapping offset host pointers to gpu pointers
idk what the radix thing is you suggested for storing intervals, but wouldn't a simple binary tree work? the intervals can't overlap, so you just need to locate the two nearest intervals (sorted by the start of range), then one of the intervals will contain your offset pointer

#

I am using a shrimple linear search for now btw, but if perf sucks I will explore the option I just explained (unless you tell me it can't work for some reason)

cloud rivet
#

That’s what is in nanos workdspace go project

#

A binary tree

vivid barn
#

No I use sorted list of intervals not even a tree, basically an even dumber solution

#

But yes an interval tree will work

vivid barn
#

Like how an MMU works

#

You have a tree that's a few levels deep, like 3 to 5

cloud rivet
#

Oh I assumed: i, ok := slices.BinarySearch(deviceAddrs, uint64(p))

astral hinge
#

wikipedia calls it a radix "trie"

vivid barn
#

Oh

astral hinge
#

oh that's just an alternative name, nvm

vivid barn
#

Ok I guess it's q trie

#

Whatever

vivid barn
cloud rivet
#

Right

vivid barn
#

I guess it's a tree

astral hinge
#

implicit 🅱️ree

vivid barn
#

But it doesn't have certain properties of a tree like log n insertion

#

Because you have to shift things around when inserting or deleting

vivid barn
bronze socket
#

yeah a trie (pronounced "try") is a special subset of tree for building up prefixes, it's the kind of thing I largely haven't heard outside of college classes

cloud rivet
#

it's good for auto completion

cloud rivet
#

haven't gotten quite to generating mipmaps yet, but I am close, I just finally got bc7 compressed texture objects to render. there's not a way to inspect the texture objects in nsight compute though, so I can't like show that, but they are compressed now

#

mabye it's viewbable via total memory allocation or something

vivid barn
spiral ice
#

"radix tree" is another common terminology

cloud rivet
#

I was able to get rid of SDL_image as nvtt handles image loading. I can get rid of tinyexr too I think but I will worry about that later

#

I just spent most of my day figuring out how to use nvtt with block compression and cuda texture objects :\

#

I finally had to find an example on github that someone had working

cloud rivet
#

ok I have compressed images with mip maps

#

now I just have to do the ray differential math in the shader

#

I didn't expect that to be so hard, but it's only hard because the docs were bad

vivid barn
brisk chasm
#

not sure if it was here where we derailed the thread into which thinkpads are/were good or not

#

but, i acquired a new work laptop lunch grill

#

finkpad p16 gen2, 65THOUSANDFIVEHUNDREDANDFIRDYFIVE megabytes of ram

brisk chasm
#

the keyboard is really nice

#

i havent tested anything on it yet : >

#

after christmas probably

cloud rivet
#

I would like to buy new hardware but I don't want to spend the money, maybe 2027 :/

shut hornet
cloud rivet
brisk chasm
#

this is very cool

cloud rivet
#

thanks, just followed the paper

#

did exactly what it said

brisk chasm
#

still, i suppose you need to be a little smart to read the paper

cloud rivet
#

I had to look a lot of it up

#

I'm not very strong with the math, but I mostly understand what's happening

#

this is just for single rays, I don't have follow up rays or anything, I'd have to expand it in that case

brisk chasm
#

ah you mean the derivates/differentials

#

i mean the thing as a whole

cloud rivet
#

yeah

#

glad it works, the aliasing was very bad

brisk chasm
#

it did be

pseudo dock
#

What if you made your skimmer game inside of Sponza? And the skimmers could be like the size of a bug, and the tracks could go around inside Sponza?

#

Sponza Racing (TM)

cloud rivet
#

I would have to speed up loading sponza, it takes a good while right now

pseudo dock
#

Sponza Loading Racing (TM)

#

I'm not really advocating it, and you should obviously do what you want. I just suddenly got this image in my head of going through the environment at high speeds but on tracks like yours that didn't have to follow the ground so you could zoom up and down and through windows and off the balcony and stuff and it seemed potentially cool in my head 🙃

spiral ice
#

Putting the texture loading into a thread pool should be way faster. If you'd want it to be really fast, you probably need to preprocess the scene somehow.

#

By the way, you've made so much progress on your project. I am really impressed. I've barely worked on mine for the last two weeks

tight torrent
#

part of it too is im executing command lists for every individual mesh and waiting for them to complete before going on and processing the next one

cloud rivet
#

yeah, it's all completely serial right now and it is compressing the BC5 and generating mipmaps for all sponza 50 textures before it shows anything. I can obviously speed it up, but it's not what I want to work on right now

cloud rivet
#

3 weeks now I think?

#

#1323084490997895198 message 11/26 - 12/19, 23 days

#

that's pretty good for a for a new from no code to sponza with a vulkan + optix renderer

#

hrm

#

wondering if I should bring in my previous physics code or start from new

#

I'm gonna work on gameplay now for a while

solid grove
#

planning to do fully custom physics again?

cloud rivet
#

I was thinking I would follow along the book I bought

solid grove
#

then u won’t be doing gameplay for a while KEKW

cloud rivet
#

that's sort of gameplay?

#

I guess in my brain any work that is not rendering is gameplay

#

asset pipeline/rendering/animation/level editing

#

heh

#

I don't know

#

I think it will be fun

#

there's a lot of work left on the rendering but it's good enough for now I think

astral hinge
#

@vivid barn what do you think about deleting semaphores and just having submit operations return a thing that can be waited on (I know your api has something like this)

#

I'm mainly concerned that there might be something else I didn't think of for which I'd need a semaphore

#

I hope bjorn doesn't mind us hijacking his thread to talk about this froge_yeehaw

vivid barn
astral hinge
#

ah I thought your stream submissions returned an awaitable token thingy

vivid barn
#

which idk how valuable it is to you but it prevents doing a thing where you record n cmdbufs at the same time

#

which might be convenient in some cases like idk

#

say you're doing skinning so you record two cmdbufs, one does compute dispatches that do skinning, then the second one does AS rebuilds

astral hinge
#

"things" being sync "primitives"

vivid barn
#

ok w8 to clear it up are you suggesting that you wouldn't have random VkSemaphores but rather there would be one hard-associated with each VkQueue

#

so if you wanna wait on something you specify (VkQueue, counter) pair?\

astral hinge
#

hmm

vivid barn
#

you can replace VkQueues with higher level thingies of your own

#

this would be fine I guess

astral hinge
#

I was just thinking that my Submit would return something I could wait on at a later date (whether that be another submit or just a free wait function)

#

instead of making the user construct semaphores themselves

vivid barn
#

yeah that sounds fine I guess

#

I don't like the level of abstraction that VkQueues and cmdbufs offer at all so it's kinda hard for me to judge this

astral hinge
#

yeah I've been running into that as well

vivid barn
#

what do you find you dislike?

astral hinge
#

atm I just have three hardcoded queues and the user can pick one to submit to

#

well it feels like a mix of low and high level concepts

#

but idk

vivid barn
#

gob had an idea (for vk-tier abstraction level) where instead of VkQueues, you'd be submitting to a queue family and the magic would happen underneath to find an available hw resource to run this submission as soon as its sync conditions are met

#

this would mean that VkQueues disappear

#

VK_NV_command_buffer_inheritance wouldn't work

#

and instead of VkQueues, you have just VkSemaphores

#

VkSemaphore sync becomes mandatory because you can't just rely on ordering you get by submitting to the same queue

#

what do you think

astral hinge
#

eliminating the weird thing where you allocate queues from a family would be nice

#

yeah sounds nice

vivid barn
#

if you're ok with cmdbufs I'd implement this then

astral hinge
#

I guess I kind of do have that already

vivid barn
#

I wasn't ok with cmdbufs either (I wanted sync to be "transposed" so to speak, like in cuda or opencl) so I ended up abstracting over those too

#

o i c

#

cool

astral hinge
#

submit is missing wait semaphores but yeah

typedef enum gfx_queue
{
  GFX_QUEUE_GRAPHICS,
  GFX_QUEUE_COMPUTE,
  GFX_QUEUE_TRANSFER,
} gfx_queue;

void gfx_submit(gfx_queue queue, gfx_command_buffer command_buffer, gfx_semaphore semaphore, uint64_t signal);
vivid barn
#

right i c

#

note that with gob's idea you can't have a magical thing to wait on anymore I guess

astral hinge
#

btw I don't know how individual VkQueues allocated(?) from the families maps to the actual hw

vivid barn
#

unless you actually return smth like (VkSemaphore, timeline value) r rather than just timeline value

astral hinge
#

hmm that makes sense ye

astral hinge
#

oh there's some new(?) queue priority thing I saw when I was making queues yesterday

vivid barn
#

but a number of kmds/fw have problems allocating more queues at runtime basically

#

I think the thing windows got called "hardware assisted scheduling" means scheduling decisions are deferred to fw sitting in the gpu

#

scheduling in fw means that each VkQueue corresponds to an actual small ring thing of submissions that's mapped to host and to fw

#

there's obviously lots of VkQueues at any point in time

#

but there's only so many hw resources that can handle each one

#

so anyway same as cpu threads, when you have free hw resource, you have something somewhere choose a VkQueue for it to get busy with

cloud rivet
vivid barn
#

idk why I wento n a tangent about scheduling in fw

#

ignore that

vivid barn
astral hinge
#

it sounds like the "ideal" abstraction would make as many VkQueues as possible and treat them as a thread pool, then the user would want to submit relatively granular tasks to submit so all queues can be busy

#

except I'm skeptical of how that would turn out in practice given that you can saturate the gpu with just one submit

vivid barn
#

but yes stuff being like thread pool is my thinking too

astral hinge
#

true, but then it seems like the ideal abstraction is just a frame graph

vivid barn
#

my abstraction lets me all the queues without a frame graph.