#Rosy
1 messages · Page 22 of 1
i see (sic)
and it's one memcpy to add the cube mesh to the gas, one memcpy to add all the instances with jus the transform 3x4 matrices
and then one build for the gas, one build for the ias
and bob's your uncle
ezpz
i wanted to look into picking at some point, i suppose those structures come in handy too, not gas/ias but blas and tlas, since i wont be using optix i guess
Silly Bait Traversal
shader binding table
aye, like your vk descriptor set
so when a ray hits a deccer cube, how do I know what it looks like
like how do I color the pixel
hrm, in vulkan you can have a descriptor set for the acceleration structure, but you can just use an address
the shader binding table, SBT is basically a collection of shaders
when you draw with opengl you always know which shader is going to be invoked
but with RT you have no idea
until the ray hits something
and you look at the SBT somehow to figure that out
makes sense that you need something to look up
I'm starting to understand Gob's point of view a lot more now. I don't understand why cuda and optix can have all this cool easy to use stuff, but vulkan can't
if you have a resources that require a complicated sync, you can do that with cuda, but if you just want an ordered task execution queue, you get it for free
I don't have to create complicated nested structs to update device memory
you also get libcu++ https://nvidia.github.io/cccl/libcudacxx/standard_api.html#libcudacxx-standard-api
oh
it's now called libcudacxx
or those are both referring to the same thing
**Thrust **is the C++ parallel algorithms library which inspired the introduction of parallel algorithms to the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs via configurable backends that allow using multiple parallel programming frameworks (such as CUDA, TBB, and OpenMP).
**CUB **is a lower-level, CUDA-specific library designed for speed-of-light parallel algorithms across all GPU architectures. In addition to device-wide algorithms, it provides cooperative algorithms like block-wide reduction and warp-wide scan, providing CUDA kernel developers with building blocks to create speed-of-light, custom kernels.
**libcudacxx **is the CUDA C++ Standard Library. It provides an implementation of the C++ Standard Library that works in both host and device code. Additionally, it provides abstractions for CUDA-specific hardware features like synchronization primitives, cache control, atomics, and more.
anyway, I don't have to worry about what other apis support right now
broke up the monolith, no more unity builds
haven't gotten to animated cubes yet, found the monolith to difficult to work with
apparently I did 5.8 months of work and created $105K in value in two weeks 💰
a $100K triangle 
oh that vec math code
let me get rid of that
I brought that in just from the sdk but I don't use it
rip $50K 
ok now I can work on animated cubes
I'm going to pull in my math from my previous project
well from the game in it's C/Vulkan form
oh I should prefix my files
what is this lol
cash money
the cost estimation is funny
apparently platinum is worth $333k
it's for sale, if anyone's interested 
and my website is $700k
where's my goddamn money
Yeah where do we collect wtf
that's how much you owe
I think that's 700 rather than 700k?
Ahh, I just tried it on my site. I wish I had that kind of money
huh that's weird actually, my mac is in us english
the formatting is all over the place it seems, it's doing . for decimals on the complexity/lines column and the size but , on the months and people thing
🤑
I'm a 7xer according to this 😎
I wonder how accurate this thing is for projects that actually cost money
lol
I'm using half a billion dollars' worth of dependencies
it's actually counting everything five times here so the real number is "only" about $100m
Hrmmmm maybe I want a blue terminal
this is just default powershell
You use the powershell app?
I'm on Windows 10 so I don't think that's available
my other option is cmd but it kinda sucks
it is available on 10, I use it
Your on an unsupported OS that’s already missing critical security updates fwiw
ope
Assuming the Windows 11 updates also apply to 10
Like the fixes
idk playing with fire
hmm I'm skeptical about win10 being unsafe to use
ok it seems like they did stop security updates
so the unsafety will only grow over time unless I switch to the LTS version
I'll upgrade to win11 sometime
I plan on continuing to use win10 until it sinks in that people aren't going to migrate to 11 at which point MS will start supporting it again
idk why people think this. they won't start supporting it again
i say this as someone who hasn't updated yet
I was mostly joking
I'll probably upgrade to some LTS version if it becomes necessary but security updates are kind of overrated I feel like if you use a browser ad blocker to suppress questionable web elements and don't run any programs that you don't trust the attack surface is probably still pretty small
There are probably vulnerabilities in some applications that connect to the internet that could remain a threat but idk much about security stuff
upgrade to li-
"upgrading" to Linux for me would mean a complete upheaval of all my workflows and having to learn and cope with the quirks and defects of another OS
I'm happy to support native Linux builds for my software though 
you can't get the LTS version
can you?
that's like for enterprise customers
there's nothing wrong with windows 11 lol
you can shut off all the copilots
and turn off the ads
yeah it's the same as win10 basically
ok having to turn that stuff of is stinky, but it is easy to do
I'm not truly that allergic to upgrading
win10 already has all that bs you mentioned
like if you were paranoid
you could assume you're maybe in a bad place
already
I had a couple of red dot updates
lemme see what they were
Install Linux
2026 Year of Linux Desktop (for real this time) https://www.youtube.com/watch?v=mfv0V1SxbNA
Get a free 15-day trial of Odoo’s all-in-one business solution and see how it can make your life easier! Check it out at https://www.odoo.com/ltt
It is finally here, the computer build you have (and possibly the whole world) been waiting for. The Linus Tech Tips and Linus Torvalds Collab PC build! Linus Torvalds talks through Linux developmen...
idk
just skimmed through the video, but it's actually a banger setup
hmm ecc ram
sadly I did install Windows 11 on my new machine, but I barely use it cause Win 11 is a meme OS
what OS do you use?
but some of the 3D monitor stuff I'm doing, and Cuda apps are working poorly or not at all on Linux
I use Fedora
ok hows photoshop on Fedora?
same as Linus picked in the video
hence why I have Win 11 on my better machine, but only use it against my will
are those actually on linux or are you still trolling
Substance actually has a native Linux version
they're not
you can get Maya working on linux, and Houdini works out of the box
but if you want to edit images you're fucked
aside from Adobe, many other apps have Linux version or can get working
Blender is okay for sculpting if you get plug-ins
not even paint dot net is on linux
well that uses dlls for extensions so that would be a problem
maybe it works with wine
so yes, it's not without pain, you can look into what apps you actually need
well
I don't need to
because they all work on my OS
:P
"install linux"
I actually would though
GIMP is not that bad honestly
you can make a Picasso painting with a stick and dirt, it's not the tools
at least it doesn't have AI
true
actually true
if I went broke
I would be happy on linux
I could make it work
there's no way not to
there is, with a lot of pain
where do you shop on line
who makes your clothing
do you drive?
do you use electricity?
those companies are providing good customer service tho
i just don't want to get screwed, it's not my business if someone else gets screwed 
hrm, I think the only bad customer service I get is like from this one guy in SF at the Chipolte
very rude that guy
idk how he keeps his job, I don't say anything
was it with regards to the guac?
no, he just wants people to go fast
sorry, you can use Windows 11 if you want, it's a free country
you use cuda on linux?
what do you do with cuda?
well my 3D image app works on ROCm, so that's fine, but there are other tools that only work on Windows/Nvidia (most stuff using ML)
you are doing ML?
yeah
photo stuff?
a lot of animation studios use linux for their animation workstations. i'm not entirely sure what the reason for this is
hrm
sorry, not trying to hijack the thread, but this is the app I had out: https://rendepth.com/
because those same studios don't use linux for everything
probably cause they do render farms, and it's easier to have the servers on Linux
i think Pixar mains linux on their animation workstations too
for automation
yeah but they have builds for all platforms
at siggraph i saw dreamworks demo their software on a mac
well yeah, cause it's probably the same app with client/server builds
and Mac has been huge with graphics design and film for a while
yeah
Steve Jobs made Pixar
mac kinda fell off in the film world when they discontinued Shake
don't know what shake is? exactly
but back in ~2004 ish it was pretty much the only compositing software in use
nuke was born to be a windows competitor to shake
Shake is an image compositing package used in the post-production industry. Available for Mac OS X and GNU/Linux (support for IRIX and Microsoft Windows was once available, but has since been discontinued), Shake delivers visual effects and digital compositing for film, HD and commercials.
hrm
never heard of it
yeah apple bought shake
Shake is a discontinued image compositing package used in the post-production industry developed by Nothing Real for Windows and later acquired by Apple Inc. Shake was widely used in visual effects and digital compositing for film, video and commercials. Shake exposed its node graph architecture graphically. It enabled complex image processing s...
why did I get that link and not wikipedia
oh I clicked an AI link
not a search result link
I can't even tell anymore
i think mac is slowly increasing again in film just because bespoke add-in cards aren't used as much anymore and the unified memory lets them load more at once
I need Apple to be succesful so air tags work
edit houses used to be all powermacs / mac pros with special add-in cards
if people stop carrying iphones I will never find my lost stuff
hrm
I am just going to use the optix sdk util code
it has a full math library using CUDA's built in vectors
it's got quaternions, matrices with operator overloading built in
GNU/Linux, eh?
Also cause the Apple M4 chips are legitimately better
Like a MacBook is likely more powerful than an entire desktop Windows machine
had a big debate with my brother about it, when he was building a new PC, and the Apple laptop was better than a Windows desktop in almost all work tasks
or one of those 128-core server boards for doing cpu stuff
except he did build a Windows 11 RTX 4090 for a racing sim cockpit cause Mac still blows for games
the m-series RT capabilities are getting good but they're nowhere near 4090 level yet
but in normal work stuff, like Photoshop or Final Cut Pro, web, etc. it's not even close
yeah the OptiX benchmarks for blender are at the top
photoshop for windows is also just slower for some reason, even on spec-equilvalent hardware
when i had a bootcamped intel mac where the hardware was as identical as you can get, the windows version of photoshop was probably about 15% slower
nah, if u want to do ray traced rendering, a mac isn't going to be a good fit
well not for 3D graphics as much
tile-based GPUs have some party tricks like MLAB being super cheap but i don't know of any games that actually leverage this
also cockblocking OpenGL/Vulkan not helping the situation
the lack of a game library on mac is fully a corporate policy issue. all of the major game engines natively support mac
the Apple chips probably would be better if they made one with the power target of Nvidia/AMD dGPU
it's up to apple if they want to make gaming better
like RE8 and Death Stranding running on the iPhone
huh, I guess I don't need a projection matrix anymore
I'm in a weird place trying to figure out how to do stuff right now with RT
it's uh different
I need to do my own interpolation I think?
pretty interesting
I'm in a refactoring mode still, just organizing code better since now I need to track ownership of memory since I no longer have a unity build in a single file
not making progress on animated cubes
this was a good time to do this, it would only have gotten messier
and harder to unwind
seems good now
I would have had to rewrite the whole thing eventually if I had ignored this
I did a thing
it's not cubes
it is changing the "camera" z
just had to refactor the whole app so I could have some dynamic state
I can make a real free camera now
missed the cash money posting thread
not quite a 7xer
this is your game?
its my engine
mostly nothing
it has a vulkan backend w/ frame graph and resource management, it also has a painstaking reproduction of roblox's GUI system
oh and a bunch of luau integration stuff like scheduling scripts and events and managing native object lifetimes
no, including my own actually
let me count my text library
I think the lua scripts are in my build dir, all the lua interop stuff is the C++ side
uhh this particular iteration? maybe 4 years
of which I spent at least 2 not working on it
actually wait, good that you caught that, that was 1 iteration old
and my text layout lib
NIH nanite was someone else
oh
my frame time is a steady 16.5ms for no explainable reason, except when I record, when it drops? also recording on my laptop is super jitter
that must be SDL I think right
it's capping the frames
it's weird that the act of recording my screen would make it faster
I'm using the Optix sdk's camera and trackball code for this. Seems to work fine
I am using SDL_MAIN_USE_CALLBACKS
I don't have any ideas or solutions, but how are you recording? Maybe that might be a clue
oh just the built in windows screen recording app
I can use OBS fine on my PC
but this is on my laptop
SDL_AppResult rc = (SDL_AppResult)SDL_GetAtomicInt(&apprc);
if (rc == SDL_APP_CONTINUE) {
rc = SDL_main_iteration_callback(SDL_main_appstate);
if (!SDL_CompareAndSwapAtomicInt(&apprc, SDL_APP_CONTINUE, rc)) {
rc = (SDL_AppResult)SDL_GetAtomicInt(&apprc); // something else already set a quit result, keep that.
}
}
it is doing something
but I don't think it's rate limiting
ah
maybe this
static void SDLCALL MainCallbackRateHintChanged(void *userdata, const char *name, const char *oldValue, const char *newValue)
{
iterate_after_waitevent = newValue && (SDL_strcmp(newValue, "waitevent") == 0);
if (iterate_after_waitevent) {
callback_rate_increment = 0;
} else {
const double callback_rate = newValue ? SDL_atof(newValue) : 0.0;
if (callback_rate > 0.0) {
callback_rate_increment = (Uint64) ((double) SDL_NS_PER_SECOND / callback_rate);
} else {
callback_rate_increment = 0;
}
}
}
/**
* Request SDL_AppIterate() be called at a specific rate.
*
* If this is set to a number, it represents Hz, so "60" means try to iterate
* 60 times per second. "0" means to iterate as fast as possible. Negative
* values are illegal, but reserved, in case they are useful in a future
* revision of SDL.
*
* There are other strings that have special meaning. If set to "waitevent",
* SDL_AppIterate will not be called until new event(s) have arrived (and been
* processed by SDL_AppEvent). This can be useful for apps that are completely
* idle except in response to input.
*
* On some platforms, or if you are using SDL_main instead of SDL_AppIterate,
* this hint is ignored. When the hint can be used, it is allowed to be
* changed at any time.
*
* This defaults to 0, and specifying NULL for the hint's value will restore
* the default.
*
* This doesn't have to be an integer value. For example, "59.94" won't be
* rounded to an integer rate; the digits after the decimal are actually
* respected.
*
* This hint can be set anytime.
*
* \since This hint is available since SDL 3.2.0.
*/
#define SDL_HINT_MAIN_CALLBACK_RATE "SDL_MAIN_CALLBACK_RATE"
I never set this
it's super supicious that I'm capped at exactly 60Hz
yes it's SDL
I set it 120 and now it's capped at 120
they have a bug, setting it to 0 just caps it at 60 again
I'm going to stick at 120 anyway, mystery solved
well I have a working free camera, using the sdk's camera/trackball and math code
ok actually time for cubes now
I got a reply to my bug report https://forums.developer.nvidia.com/t/optix-ir-seems-to-fail-for-me-with-vector-types-updated-with-reproduction/353662/4 apparently you have to use --fast-math with optix shaders, and what it does is replace the math intrinsics
this fastmath bs seems to come up quite often in all sorts of places
well I think the semantics of it are specialized for optix as it creates its own IR. I mean if it's required they should just do that when I supply the --optix-ir parameter
idk
generally I don't want to use fast_math for anything
this is different from clang, which is what I assuming it meant
but it's a different arg
with different meaning
I was thinking I would be using visual studio with all this cuda optix stuff but I don't need it. raddbg works and I am pretty sure I can step through a shader with nsight compute
neovim + raddbg has been working nicely for me for a while now
This is too much code to read on my phone. But SDL has Hints you need to enable for certain things like V sync or swap buffer numbers that will effect the frame cap
Yes that’s what I ended up doing!
🙏 thank
I installed this neovim plugin that will show me what key maps are available I type them
you can see at the bottom
I typed <leader> g
it's showing me what options are available to based on what I typed
leader is just the space bar for me
it's really cool
I really like neovim
the lsp support is great, I can refactor quit a bit
the big missing feature, devastating honestly, is project wide search and replace
it's not possible in neovim to do that since it's buffer based
there are plugins that can change files, but there's no undo support for it, it's effectively the same as doing it with a command line program
and if you've changed any files that aren't even open in the editor neovim has no idea about it
that current active buffer is the whole world and any feature that sort of works project wide in something like intellij or vs code is just not a thing neovim
that's pretty bad for refactoring
yeah
I wonder if you can just open a whole project though
it's only a few MBs of text after all
idk what a buffer is in this context
maybe there's just like a plugin that creates some kind of session or project context that I Just don't knwo about it
the current open and focused file, effectively
wow
if you go with vanilla vim it's even worse
I thought it would be like, all the files you have open
I think it will just delete your changes if you change buffers
or prompt you to save them
before you switch
imagine alt tab forcing you to save or lose data in vs code
it depends on your settings
well yeah, but whatever commands you do are only for the current buffer, but plugins can act on all all buffers in neovim
I tried configuring neovim, couldn't figure it out and I fucking hate lua. My current conf just kinda works
oh
if you hate lua, you probably don't want to try neovim
do you like lisp? maybe you want spacemacs or doom emacs or whatever
or just stuck with your current editor
so is the advantage of vim that it's super lightweight or something
because that's really the only thing I'm missing in vs and vscode
hrm, I don't think neovim has any objective advantage
I just prefer it
it has a big ecosystem?
I mean
Basically
Yeah
neovim, it's actually not lightweight
if you run some heavy lsp
neovim will own that
like go's gopls
Yeah but the lsp is the heavy thing, not vim, right?
well to the OS it thinks it's neovim
with these honestly sad restrictions I'd expect the base program's perf to be off the charts
I think anyway, my linux dev box at work oomkills neovim due to gopls and there's 32GB of memory available
I think if you want an editor with perf you probably want 4coder or 10x
pretty sure 10x is just a customised imgui app, but apparently it's super performant
Probably the use case that vim or whatever would fit would be if you ssh into something, or you really like vim motions
Vim or nano yeah
4coder is abandonware but Ryan Fleury maintains a fork that is popular
vim is not neovim
man 10x is a cringe name ngl
I'm not saying any other aspect of it is cringe
it just seems like an arrogant thing to call it
well, there's definitely cringe vibe in that community
lol
just my opinion
I'm flying to seattle tomorrow because I had bought a flight and hotel to attend handmade seattle
but the event got cancelled
yeah, rip
-use_fast_math worked and I can use optix-ir now
you should check out Mox Boarding House if you're interested in tabletop games
I don't think the light rail goes very close to it so you may have to bus
is it walking distance
with how many steps bjorn takes per day it probably is
6 miles is walking distance yes
actually
no that's not walkign distance from my hotel
oh
wait there's two results
yeah that's fine
that's a nice walk
there is a bigger store in Bellevue
both moxes are very nice
me when i solve y=-x^2+5 for y=0
is that a nice area to hang out?
I didn't go there last year
that's cool I can just spend the day over there
hmm there is food nearby but idk about other stuff to do in Bothell
Thats 5i right?
when i take a wrong turn and end up in twitter
I've only ever gone there to visit Mox and eat
lol Bothell is a place north of Seattle
is it pronounced "Both-elle" or "bot-hell"
the former lol
ok ok
if you want an interesting place that can consume much of a day, check out the university district
bring an empty belly
my stomach is a bottomless pit
another thing to check out is the Pacific science center
I haven't been to the actual exhibit in a long time but I did visit their imax theater to see Avatar 2 and it was pretty epic
that movie is peak gp
I love avatar movies for how amazing they look
so immersive
hrmmmm I will take a look at that too thank you
I think the hotel offers a ride service too anyway
alright
to baby step into the cube
I'm going to try and make a plane instead
from a triangle
to use the index buffer
once I have an index buffer working I think cubes are ezpz
and once I have one cube I can make an IAS with just one instance and a transform
and then once I have that I can just make a whole bunch of cubes
then I'll animate them
totally read this wrong 
Does the existing single triangle not use an index buffer?
no, I didn't give it one
I just copied the code from the one shot sample, it didn't use one
but going by the header file for the sdk, there is an index buffer
so I will just try and use it
I wonder what you would think about the OptiX API, it seems like it's super easy to use
compared to vulkan and dx12
it just leaves the interesting bits for the developer to do
I was thinking it might be great for prototyping
you could build some RT idea in optix and get it working and if you like it you could then build it with vulkan/dx12
you build the acceleration structs, memcpy the data and params, SBT and define the raygen, hit, miss etc and that's it?
I mean that's kind of how it seems vulkan is at a high level, but vulkan makes all this way more painful
I have a friend at NVIDIA who works with (on?) OptiX, and when we talk about things I always feel kind of guilty when I have to remind him that I'm using D3D12 and don't have access to some of what he talks about hehe (although apparently NvAPI makes some of it available also?)
But I wanted it to work with AMD also, and also have access to standard rasterization for things like debug drawing and UI. (And I also ended up rasterizing the primary intersections.) Do you have plans for how you're going to do that stuff? Or will that still work with what you're doing? (I know you used debug lines a lot when working on the skimmer physics.)
I have no idea how I am going to debug lines
I still have the vulkan renderer
it's rendering via vulkan
so actually can just do it there
So you can still draw on top of it?
yeah vulkan creates a buffer with an external handle that I give to cuda
and then then there's two external semaphores for the sync
and then cuda/optix launches -> write to shared memory, signal semaphore -> vulkan copies data to draw image -> blit to swapchain -> presents
so I can still rasterize
Nice
I can't debug vulkan though
because of the external handle
nsight graphics and renderdoc both dont' support it
so I wouldn't want to do anything too complex in vulkan
also I'd not have a depth buffer?
the debug lines would just be on top of everything?
Oh... maybe I missed something. You're saying that it's not just the OptiX part that you couldn't debug, but even any kind of rasterization you did after wouldn't be debuggable because of the external handle?
I can debug optix with Nsight Compute
Ah, that's a relief then
it has the acceleration structure viewer and shader debuggers
But none of the vulkan stuff afterwards?
I can look at the memory etc it's amazing
well
vulkan is debugged with Nsight Graphics
you can't use one app to debug both
I see...
and if you have an external memory handle it errors
so you can't debug it at all actually
Well, just don't program any bugs and you'll be fine!
Using thin quads, you mean?
like just a bunch of debug planes always facing the camera
yeah
I haven't figured it out
hrm
I think there's a solution right
Yeah 🙂
you can do particles with RT?
Well...............................
:|||
You can , but not as easily/cheaply as classic rasterization
I was actually thinking of building a software rasterizer in cuda for this maybe
and then I can just use nsight compute for both
but there wouldn't be a depth buffer for the stuff the RT pipeline is drawing
unless I render RT geometry to the depth buffer
via the rasterizer
how does your RT debug line stuff work? is it all on top?
I think you could also generate your own depth buffer while ray tracing, couldn't you? You know the depth, so you could write to a UAV I think.
oh true
but there's no projection matrix 
the depth buffer is in clip space
in a rasterization pipeline
there's no clipspace in RT?
I mean I guess the math for that is not hard
how does your RT debug line stuff work? is it all on top?
Ah.... I haven't actually done any debug drawing yet heh. Since I'm rasterizing the primary intersections I already have a classic depth buffer (with the depth sample in the very center of the pixel). But I haven't actually used it yet, I just plan to one day.
it would just be for debugging if the perf was bad anyway
outlines are another issue
I want to draw outlines for picking
I was thinking just scaling the picked object and returning a solid color
You could still create a projection matrix. My impression is that many (most?) people still use an inverse projection matrix to generate rays (although I don't do it that way), but there's no reason you can't still calculate a classic projection matrix with the camera info. (Does that make sense what I'm trying to say?)
yes that makes sense
there's a cutout example in the optix sdk
I was wondering how that would work
I'm really excited about RT though
ok on to using an index buffer
Good luck!
OptiX is a painless and fun to use graphics API
that took like 10-15 minutes
I didn't even read any docs
and that's the plane truth
oh
nm
looks a bit weird with barycentric coordinate colors
hrm
how do I do vertex colors
let me figure that out
I have an SDL bug with the mouse movement where it hits the edge of the window when the cursor is disabled, I think SDL has a solution for this
I'll fix that, then figure out vertex colors, then create an IAS
SDL_SetWindowMouseRect
no it's SDL_SetWindowRelativeMouseMode
that's what I'm using hrmm
Yeah I toggle that one whenever the player goes into a menu
yeah that worked for me, also I switched to using xrel and yrel and keeping track of the x and y positions via those relative values instead of using actual coordinates in window, and warping the mouse to the center of the window when it gets enabled. seems like that solved all my looking around bugs
ok vertex colors are next
hrmmmm
nfi how to do this
I need an index for the triangle hit
maybe this
yeah
numIndexTriplets
that's just the index of the triangle I think
well that's the total number of triangles
but yes this is what I want I think
so I need to have a buffer of colors indexed by triangle
and have that be available to the closest hit shader
so a pointer to that buffer
I think that's a CUdeviceptr I get when I cudaMalloc
I need to find an example for how to pass a CUdeviceptr to a shader
or look at the programming guide
ok ok ok
so the path tracer example has this:
struct HitGroupData
{
float3 emission_color;
float3 diffuse_color;
float4* vertices;
};
that's a pointer to some data there
ok so on the host side:
CUDA_CHECK( cudaMemcpy(
reinterpret_cast<void*>( state.d_vertices ),
g_vertices.data(), vertices_size_in_bytes,
cudaMemcpyHostToDevice
) );
d_vertices is the device pointer
and here is where it passes the pointer to the hit group data:
hitgroup_records[sbt_idx].data.vertices = reinterpret_cast<float4*>( state.d_vertices );
it's casting the CUdeviceptr to float4*
HitGroupData* rt_data = (HitGroupData*)optixGetSbtDataPointer();
const int prim_idx = optixGetPrimitiveIndex();
const float3 ray_dir = optixGetWorldRayDirection();
const int vert_idx_offset = prim_idx*3;
const float3 v0 = make_float3( rt_data->vertices[ vert_idx_offset+0 ] );
const float3 v1 = make_float3( rt_data->vertices[ vert_idx_offset+1 ] );
const float3 v2 = make_float3( rt_data->vertices[ vert_idx_offset+2 ] );
and this is in the closest hit shader
it's getting the hit group data via
HitGroupData* rt_data = (HitGroupData*)optixGetSbtDataPointer();
and accessing it via
rt_data->vertices[ vert_idx_offset+0 ]
so there you go
that'll do it
per triangle data
I'll do that tomorrow I guess
I have to catch a flight tomorrow
vi or nano
I have had in the past per FIF scene transforms, so I think would have per FIF IAS's yeah?
also, I think it may be the case that debug lines are possible with OptiX since it has a bezier curve geometry acceleration structure, which are intended bo be used for things like simulating hair and grass
the sphere gas can be used for smoke and other forms of particles
anyway
back to working on vertex colors
one of the things the programming guide suggests for per vertex data is to over allocate the space for the gas output buffer, to stick things like vertex colors and normals in front of the gas in memory, and then you can use a convenience function to get a pointer to the gas memory, from which you can then subtract an offset to get your per vertex data
you can just take the memory and reinterpret cast it to whatever since it's just C++
hrm
did you make it?
still working on it
been really tired today
I'm going to start with per triangle colors
and then if that works move up to per vertex colors
getting per triangle should be easy
since I can just get an offset of a float4 by triangle index
in the color data
but per vertex data requires adding the per vertex data, all of the vertices and indices into the buffer
I mean did you make it to seattle
oh yeah
I walked around the public market today
it's pretty cool
the optix devs recommend 128 byte alignment for data
some of the of the buffers have it as a requirement
so even though the formats are float3 and int3
I use float4 and int4 and set the stride to float4 int4
there is no float4/int4 format
OPTIX_VERTEX_FORMAT_FLOAT3
etc
this feels a little bit cursed tbh
void *gas_buffer;
size_t color_size = sizeof(colors);
size_t gas_buffer_size = gas_sizes.outputSizeInBytes + color_size;
checkCudaErrors(cudaMalloc(&gas_buffer, gas_buffer_size));
ctx.game_ctx->optix.d_gas_output =
(CUdeviceptr)((float4 *)gas_buffer + color_size);
OPTIX_CHECK(optixAccelBuild(
ctx.game_ctx->optix.context, 0, &accel_options, &build_input, 1, d_temp,
gas_sizes.tempSizeInBytes, ctx.game_ctx->optix.d_gas_output,
gas_sizes.outputSizeInBytes, &ctx.game_ctx->optix.gas_handle, nullptr,
0));
i haven't added the colors yet, but this works
heh
void *gas_buffer;
size_t color_size = sizeof(colors);
size_t gas_buffer_size = gas_sizes.outputSizeInBytes + color_size;
checkCudaErrors(cudaMalloc(&gas_buffer, gas_buffer_size));
ctx.game_ctx->optix.d_gas_output =
(CUdeviceptr)((float4 *)gas_buffer + color_size);
OPTIX_CHECK(optixAccelBuild(
ctx.game_ctx->optix.context, 0, &accel_options, &build_input, 1, d_temp,
gas_sizes.tempSizeInBytes, ctx.game_ctx->optix.d_gas_output,
gas_sizes.outputSizeInBytes, &ctx.game_ctx->optix.gas_handle, nullptr,
0));
checkCudaErrors(
cudaMemcpy(gas_buffer, colors, color_size, cudaMemcpyHostToDevice));
so I just stick some extra data at the beginning of the buffer
and the idea is that I should now be able to call optixGetGASPointerFromHandle()
in the shader program
cast it to (float4*), do pointer math to decrement by number of color_size offset
and now I have per surface color
I need to stick the color_size on the hit params
I'm printing the color size in my hit shader ^^
extern "C" __global__ void __closesthit__ch() {
HitGroupData *data =
reinterpret_cast<HitGroupData *>(optixGetSbtDataPointer());
printf("color size: %u\n", data->color_size);
const float2 bary = optixGetTriangleBarycentrics();
// Color based on barycentric coordinates
setPayload(make_float3(bary.x, bary.y, 1.0f - bary.x - bary.y));
}
that's nuts right? directly from the terminal?
can you do that with vulkan?
it fucking works
this is the shader code:
extern "C" __global__ void __closesthit__ch() {
HitGroupData *data = (HitGroupData *)optixGetSbtDataPointer();
i32 triangle_index = optixGetPrimitiveIndex();
float4 *colors = (float4 *)optixGetGASPointerFromHandle(data->gas_handle);
colors -= data->color_size;
float4 color = colors[triangle_index];
setPayload(make_float3(color.x, color.y, color.z));
}
this is how I allocate the memory:
void *gas_buffer;
color_size = sizeof(colors);
size_t gas_buffer_size = gas_sizes.outputSizeInBytes + color_size;
checkCudaErrors(cudaMalloc(&gas_buffer, gas_buffer_size));
ctx.game_ctx->optix.d_gas_output =
(CUdeviceptr)((float4 *)gas_buffer + color_size);
OPTIX_CHECK(optixAccelBuild(
ctx.game_ctx->optix.context, 0, &accel_options, &build_input, 1, d_temp,
gas_sizes.tempSizeInBytes, ctx.game_ctx->optix.d_gas_output,
gas_sizes.outputSizeInBytes, &ctx.game_ctx->optix.gas_handle, nullptr,
0));
gas_handle = (u64)ctx.game_ctx->optix.gas_handle;
checkCudaErrors(
cudaMemcpy(gas_buffer, colors, color_size, cudaMemcpyHostToDevice));
@vivid barn ^^ if you're interested, regarding my question about RT materials, this is just per triangle color but I could just make it per triangle material index instead. I think this will work
I can put a big materials buffer on the device, add the pointer to that buffer to the hitgroup record, and then find the material for the current triangle via this sort of look up
I'd have to iterate over surface's triangle and set a material for it though
could be an initial thing I iterate on
ok, I'm going to skip doing per vertex stuff, I think that's not useful in the long run, although it would look cool, and work instancing my cube and animating it
that's nano btw, I wasn't @ everyone
devious username right there 
The launch params can have the buffers actually
That way I can just reuse the shaders?
Actually just offsets into big buffer
that gas thing reminds me of a very german meme
I'll read shortly (I just woke up)
I guess I should update Notepad++ (via winget and not its updater)
do you wanna do things yourself or would you be willing to rely on api sbt indexing?
because if you'd be willing to rely on api sbt indexing, you can provide multiple OptixBuildInput
the triangles from all the inputs would be considered "together"
and the index of OptixBuildInput would be basically a special blessed per-triangle attribute
that you can access either with optixGetGeometryIndex() in the *hit kernels
or
oh
when doing optixTrace or was it optixLaunch I forgor (optixLunch) you can pass sbt geometry stride that will be used in sbt index calculation
I'd start with doing things yourself though
I am kinda conflicted about api sbt indexing
I guess it's fair and pretty good as far as it can be but idk I hate it
I will take a look at sbt indexing
so yeah ig just replace per-tri color with per-tri material index
and use that to look up the material and idk optixDirectCall it
or optixContinuationCall I guess idk
or run it in an interpreter
or have a very shrimple material model like @broken fog does
I like simple
thank you for taking a look, I am now reading through the sbt stuff and continuation/direct call programs
to build a mental model of how that might work
also tbc per-tri material index should not be an index into some global table
rather, you would have a small (the number of materials a piece of geometry uses) material table for each instance in the tlas
that then points to other things
that way you could have two instances of the same blas with different materials
this is the way data is organized in e.g. blender
and the nice thing is that it also doesn't require patching huge numbers of attributes when loading geo, which itself would necessitate new attribute buffer for each instance
but just a few per-instance things
yes, that makes sense
ohhh
I understand
the SDK has an example simple path tracer that uses sbt indexing
OPTIX_CHECK( optixSbtRecordPackHeader( state.radiance_hit_group, &hitgroup_records[sbt_idx] ) );
hitgroup_records[sbt_idx].data.emission_color = g_emission_colors[i];
hitgroup_records[sbt_idx].data.diffuse_color = g_diffuse_colors[i];
hitgroup_records[sbt_idx].data.vertices = reinterpret_cast<float4*>( state.d_vertices );
also I learned I have to configure the stack sizes
I still have a lot to read through and understand
I think I just have to keep rereading the programming guide while I make progress because I get to a point where I start becoming unfamiliar with what is being discussed because I haven't encountered enough of this in code yet
ok ok ok
per instance material variation
alright so I think what I am doing now makes sense for per geometry data that doesn't change per mesh, like normals
and maybe also material id, but there's more data to go by so yes I agree that there's a better mechanism available than per geometry material index -> global table
I have a lot of concepts I still need to learn, optixTrace vs optixTraverse, optixDirectCall, shader binding table construction and stack size stuff
I'm gonna keep going with just getting my instanced cubes animated for now
optixTraverse is ray query iirc
it's the simplest option to start with
because it doesn't require any sbt setup except raygen entry
oh so I could call an optixTraverse to get a simple shadow maybe
yee do it
I'm going to modify my colors to be normals for now, and just output normals as colors until I understand some of these other things better while I work on instancing and animation
yeah I think I can probably add a directional light to these animated cubes and see if I can get them to occlude the light as they animate
yeah I can just see if traverse gets a hit via optixHitObjectIsHit from the closest hit shader
the SDK's simple path tracer example calls this from it's closest hit program:
// Returns true if ray is occluded, else false
static __forceinline__ __device__ bool traceOcclusion(
OptixTraversableHandle handle,
float3 ray_origin,
float3 ray_direction,
float tmin,
float tmax
)
{
// We are only casting probe rays so no shader invocation is needed
optixTraverse(
handle,
ray_origin,
ray_direction,
tmin,
tmax, 0.0f, // rayTime
OptixVisibilityMask( 1 ),
OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT | OPTIX_RAY_FLAG_DISABLE_ANYHIT,
0, // SBT offset
RAY_TYPE_COUNT, // SBT stride
0 // missSBTIndex
);
return optixHitObjectIsHit();
}
glad you posted this, just updated to 8.8.9
same
I had to download the update from the notepad++ site
thanks @brisk chasm 
notepad++ is my code explorer
I also found out that I was apparently using 32 bit n++ the whole time somehow
but now I have 64 bit
oh huh, interesting
is how I use it
just browse through sdks code, browse through other projects
I can't use it to edit, and but it's easier to browse code with it than neovim
hrm
just trying to get a couple of cubes to render
I don't know what trace offset and trace stride are 
but since I just have one gas and one ias it doesn't matter I can just set this to 0
oh I don't set that at all
it explains what the index is
That's for SBT indexing
Oh NVM
Those are parameters that optixTrace takes
having some issues with allocating memory for the ias, going to come back to all this, this evening
there's something I'm missing
I’m on a boat
Cool
he's on a boat
i don't see a boat in that picture, how do we know you're not just walking on the water
I’m still on a boat
This is a boat thread now sorry
bjorn went from software engineering to salty sea dog arc within 8 hours
when you touch Vulkan you yearn to be a castaway
Reading the #ray-tracing channel apparently everyone is doing OptiX now 
is that the fast ferry
No
oh is it the normal one
Bainbridge Island?
You are officially an influencer now
well
pretty sure they've been doing it for a while and I only just noticed
alright time to see why I got problems making an ias
got ias working
oh I can get the transform on the device via optixGetInstanceTransformFromHandle
so I could fix the normals
hrm that returns a float4 *
the api ref shows how the functions might be implemented so that helps
so it's f32[12] being typed cast to a float4* so it's three rows of float4s
found more NVIDIA OptiX example code https://github.com/NVIDIA/OptiX_Apps
it includes a gltf renderer
so that example code uses optixGetTransformListHandle
yes that works
ok next is animating
which requires updating the ias
and I think requires fif ias
when I add suzanne I'll add it to the cubes so I have a second gas
hrm
I think that requires a per frame pipeline and sbt too
hrmmmm
no
I don't know
nevermind
I can update the params per fif 
so no need
super jittery recording on my laptop but whatever
nice, now you can add shadows
yes, I'll start on that tomorrow
it's just one line of code, can't be too hard 
hrm, I think it might not be too hard
I keep running into unknown unknowns though
finding that NVIDIA Optix_Apps repo really helped
The cmake learning page recommends a book called Professional CMake: A Practical Guide I have found very useful
The way generators play into building was a meaningful thing to understand
also the the custom targets and commands chapter helped with setting up my shader builds
The way cmake uses build directories is really easy to understand and work with
well, they released one
I personally just skimmed the cmake page about generator expressions and that was enough, but to each their own
Quickly learn the CMake essentials with The CMake Tutorial, so you can focus on writing C++ code instead of fighting with the build system... Want to see a sample of the book first? Go to this page (click here). NOTE: The paperback edition is printed on-demand just for you, and shipped directly from one of multiple locations globally.
I haven't read this yet, the author keeps asking me to write a review. his other books are great tho
I haven’t found cmake to be hard so far but I am not doing anything complex
that's the secret, it's not
it's not hard. just mental illness
not even, the only thing that's a little weird are the stringiness of underlying types
and you uh, just don't wanna deal with that in the first place if you can help it
and 99.99% of the time you can
it was fine for my code, but if you have dependencies, and want everything compiled together it can turn into a PITA
I think learning about high level how it works just builds helpful context, and the practical recommendations and use cases are nice.
idk my deps have all been easy except for the most stubborn GNU automake horrors
Well I am not going to anything GNU build related if I can ever help it. Just keeping it focused to win32. Not that there’s anything wrong with GNU, it’s just not relevant to what I am doing
yeah for the most part you can, there are just certain very popular/fundamental C and C++ libraries that are exclusively automake artifacts, you'll know them when you see them
Good to know
it is because the alternative is i'm stupid and i reject that
everything is fine until i have to pull in deps that have a non trivial build then it's 
I feel personally challenged when I find a lib like that and the only time I've lost is ffmpeg
The biggest CMake cringe I’ve encountered is the fact that it can’t target multiple platforms in the same config. So eg if you’re compiling for android, you can’t compile a build tool and then use it during the build, because the build tool will get built for android and not for the host OS.
I had to make a janky workaround where I spawn a second CMake configure and hook it all up with custom commands. It’s rancid
The LLVM project has the same problem and they solved it the same way
yeah thats pretty brutal
i tried but the cmake docs are just... bad? how do y'all learn cmake istg whoever wrote the manual never read the thing
I learned it slowly and painfully, then wrote a tutorial https://www.ravbug.com/tutorials/cmake-easy
there are tools for this, but it starts to get complicated
this one I've used looks interesting, and has Lua so editing the files by hand is sane. https://github.com/bkaradzic/GENie
genie is just a premake fork afaik
yeah, basically premake
and it already does some annoying things like putting the build artifacts in a hidden folder
hmm
It rains a bit here and every place only offers paper bags
Not convenient when walking
I am doing a tour of bookstores and found this book
i have that book, it’s pretty cool
It’s really cool
was especially surprised at how small the Atari 2600 board is compared to the enclosure
it’s mostly air in there
nice
did i show you the video game museum in berlin? they have every console system ever.
Wow that’s amazing
Berlin hmmm
Walking down Pike street in Seattle. Nice shops
I am having a my scene is blank moment that I would usually solve with Renderdoc
so this is my first time trying to actually debug a scene with Nsight Compute
so I see my instances
I can see the transforms
they look correct
so I think the "failed to find traverseable handles" are the key thing
I can see the vertices for the gas too, and that all looks correct
nsight compute feels very good for debugging a scene
unrelated to debugging this issue, I can actually have per vertex data easily, in addition to my per surface data. I can set the vertex format and its stride, which means I can stick normals, and vertex colors and uvs directly with the per vertex data 
as long as the actual vertex data is a the beginning of the stride, whatever else is in there I can use for whatever I want
per vertex
and I get barycentric coordinates via the optix API so I can interpolate easily
oh I think I know the issue
nice
it is the traverseable handles that are wrong
nsight compute++
my video capture is jittery, not the render itself
if I don't call cudaStreamSynchronize after a launch and IAS update I get visual tearing
actually I can get away with not calling it after accel update, but it doesn't really impact anything either way
like meaningfully I mean, at least not yet
going to work on gltf loading
after I have gltf loading working I'll figure out how to get my track to render with optix
I want to bring that all back now
I'm going to hang out in pioneer square and chinatown today
I'm going to use fastgltf for gltf loading
in the past adding fastgltf was always such a pain, with cmake it's just
FetchContent_Declare(fastgltf
GIT_REPOSITORY https://github.com/spnda/fastgltf.git
GIT_TAG 0d1b67a28c4950ea2deb796702006dcbe31e02b3 # v0.9.0
FIND_PACKAGE_ARGS NAMES fastgltf
)
FetchContent_MakeAvailable(fastgltf)
find_package(fastgltf REQUIRED)
it was a pain in the past because I was using vcpkg and or just doing it manually
I made my life a lot more difficult by not using cmake from the start when I switched from zig
when people complain about a dependency or tooling as much as everyone does about cmake all the time
it really presents it in a bad light and that dissuaded me from using it
"man everyone hates cmake, I should just skip that"
that really hurt me
I have learned from this
people who are unhappy advertise their complaints, people who are happy with cmake don't advertise it
some do and even go the extra mile D:
I suggest using git submodules over fetchcontent
it makes unconditional web requests every time you reconfigure
and I guess it does create a _deps per build
you just do git submodule add <git dir> in your third_party directory or wherever you want it
and then add_subdirectory that dir somewhere
thanks I will do that
oh
that's even easier, it's just
add_subdirectory("external/fastgltf")
that's it
cool thanks again
alright I'll get back to work this evening on all this
huh that shrimple?
i've just been manually chucking in files for my deps, like an idiot
with submodules do you specify a commit/tag/branch to pull when you clone the repo?
you can, by default it takes the latest commit from the default branch at the time of making the submodule, and stays on that commit until you manually update it
i also do this so that I can delete the examples / tests / etc from the repo to make the download smaller
but only for some projects
cmake is really not hard when you understand a few basics
its just a pain in the ass to pull your shit together and give it a try 🙂
a lot of people like to make overcomplicated war crime cmakes
yeah
looking at you compressonator & physx
even boost is SUPER trivial these days, and that was a fucking pain in the ass few years ago
idk it's kinda convoluted as hell ime
it doens't need to be
i have a bunch of cmake stuff to compile metal shaders with my project
it was a pain in the ass to get working correctly
i found that I was able to add metal shaders to my target and have Xcode compile them into the default metallib for me
for assetpipelines i would probably write a bespoke tool and not use cmake or any other build system
i just wrapped the cmake build in a makefile instead, done in 5 minutes
you just need to tell cmake what the language is
now make runs the metal commands if those files changed then runs the cmake build
because cmake doesn't auto detect metal shaders
I compile my optix shaders with a simple cmake target
It’s just like compiling a C++ program
yeah for me the issue is that any time i want to learn how to do something there isn't really a good resource
the syntax is very unintuitive and the manual is awful
for simple things it's definitely not bad though
i sat down for half an hour and looked at one of jakers cmake isms and tried to reproduce it with my shit, and then i modified a few bits and modularized my libs, had to lookup set and option once or twice, and put all deps in stone afterwards, now i just copy paste that shit into new folders and im set. slapped a cmakepresets.json on later, and now i can have it configure for gcc/clang and msvc
there is probably a bit more to learn, but i will do that once i need it
and re assets now, i just copy_directory_if_newer my data into output/data
i also wrap lose files into libs and those get statically linked, including stb, imgui and one or two others. all that manual copy pasting bs i have no time for anymore like others still do
yea i mostly figured shit out by looking at other people's cmake as well
fastgltf is super shrimple to integrate tho
yes
literally the easiest to work with lib i've user other than header only stuff
I just wanted to look anyway
i just did add_subdirectory and that's it
Well it’s easy if you use cmake
fastgltfs cmake does a lot
that it does
i haven't really looked in there
my approach to libs' cmake files is "here be dragons"
Yeah
yeah I usually don't need to look, cmake-gui shows me the variables I need to set
// my libs/fastgltf.cmake
include(../cmake/CPM.cmake)
CPMAddPackage(
NAME fastgltf
GIT_REPOSITORY https://github.com/spnda/fastgltf.git
GIT_TAG 2483b87d19a67c91f2bffc386082001d3ea8bc07
OPTIONS "FASTGLTF_USE_CUSTOM_SMALLVECTOR OFF"
OPTIONS "FASTGLTF_ENABLE_TESTS OFF"
OPTIONS "FASTGLTF_ENABLE_EXAMPLES OFF"
OPTIONS "FASTGLTF_ENABLE_DOCS OFF"
OPTIONS "FASTGLTF_ENABLE_GLTF_RS OFF"
OPTIONS "FASTGLTF_ENABLE_ASSIMP OFF"
OPTIONS "FASTGLTF_ENABLE_DEPRECATED_EXT ON"
OPTIONS "FASTGLTF_DISABLE_CUSTOM_MEMORY_POOL OFF"
OPTIONS "FASTGLTF_USE_64BIT_FLOAT OFF"
OPTIONS "FASTGLTF_COMPILE_AS_CPP20 OFF"
OPTIONS "FASTGLTF_ENABLE_CPP_MODULES OFF"
OPTIONS "FASTGLTF_USE_STD_MODULE OFF"
)
though I do ctrl+f add_library in their cmakes to find out the library names
I can't think of a better way to do that
i never understood cmake-gui, its ugly af and confuses the heck out of me hehe, i htink that where my initial hate with cmake started ages ago
that stuff should really be in the readme
it's honestly baffling that a short section of "here's how you get this thing running in a cmake project" is not a standard part of the readme for every c/c++ library
if the maintainers have done their jobs properly, then all it would be is just add_subdirectory, maybe some additional add_subdirectory before it for the dependencies of that project
you won't be able to immediately figure out the required variables or the library names from that though
true
yeah cmake-gui sucks but its one good feature is that it lists out all the variables in the cmake cache cleanly
and lets you filter them
yeah
I walked 20K steps today and I'm so beat up lol
how much is that in km
that's a long step
then consider it an upper bound
nice
i should walk more 
used to walk all the way home from school back in hs, like 8-10km every day
I walk a lot, but this was twice as much as I walk in a full day and this was all in like 4 hours
there's a higher level OptiX RT library on NVIDIA's github that hasn't received much attention https://github.com/NVIDIA/owl
The OptiX Wrappers Library. Contribute to NVIDIA/OWL development by creating an account on GitHub.
it makes it even easier to do RT if someone wanted to
I'm fine with just OptiX, I don't need this
if i could get decent XR glasses and some haptic keyboardisms then i could work while walking and keeping my hands in my pockets 🙂
I think you can allocate memory on the gpu with just
cudaSetDevice(0);
cudaMalloc(&ptr, sizeof(something) * N));
the amount of code this same thing requires with Vulkan is kind of absurd.
this amount of code is absurd too
on cuda on linux you can allocate memory on the gpu with just malloc(N * sizeof(something))
you don't have to set up the device?
so clang has a cuda target
which works on windows, maybe that's possible too
I haven't looked at it
I mean that unified shared memory works
oh, unified shared memory is not available to my hardware
I'm p sure it is available on any turing+ gpu on linux
on pascal+ actually but iirc there were some gotchas on pascal
yeah those should have the feature
hrmm
tbc it's not supported on windows
including wsl
on those you have to do cudaMallocManaged or w/e
yes I'm on windows
rip
hrmm nsight compute is available on linux too, seems like cuda support on linux is pretty good
nsight graphics works on lunix, chances are good that nsight compute also doe
it would be very surprising if cuda support on linux was pretty bad given that that's where almost all of cuda users are lol
oh I didn't know that
||