#Rosy
1 messages · Page 21 of 1
he's there to help people having L-system emergencies
I'm probably on some body cam footage from when the cops came over to see us dealing with this insanely wasted guy, all they cared about was his birthday probably to decide if they could arrest him for drinking underage lmao
I-m probably on some body cam footage..
with gaffertape wraped around your head again like the other day? 😄
Lmao no
easiest vulkan init ever
idk why that's a video
I clicked the wrong button that's why
I think maybe total my vulkan init took me over the last few days like 5-6 hours
need to add resize/minimize support and then I'll create the memory for cuda and start on cuda
going to stick with a single main.cpp until I have rendered a Suzanne monkey via an acceleration structure with a controllable camera and imgui debug ui
although, that may require cu files
not sure
my C++ is looking a lot like my C only now I get to use references and C++ libraries
it is not C however
it just looks a lot like my C
no use of any std:: right now :/
3 line diff to get resize working
minimize crashes, however
tomorrow I'll finally write some cuda I think, I'll just start by writing a color value to the memory and blit it to the swapchain image
via cuda
then I'll do the triangle example, then the triangle example with a acceleration structure, then I'll add imgui, then gltf loading then Suzanne and then a camera
then I'll try and add my track and skimmer back in
I won't have a mesh shader anymore for the track so need to think about that
I won't have any shadows either, you don't get those for free
you just rename main.cpp to main.cu. cuda c++ lets you mix cpu and gpu code in the same file
Oh cool
won't the cpu code be compiled with Clang if you do this?
I would imagine nvcc does something like this (no idea tho):
would that be a problem?
- cpu code ==> clang cpu path
- gpu code ==> clang GPU path
depends, is Bjorn assuming msvc/gcc?
idk
same here
(its worth bringing up so stuff doesn't magically break and cause major confusion)
Depends on the build
The ninja build is clang
The msbuild is msvc
I only have that one to use with visual studio
cmake lets you tell nvcc what the host compiler is
It’s a compiler arg given to nvcc
So it can be whichever you want, msvc, clang or g++
there's even a --allow-unsupported-compiler option
Disable nvcc check for supported host compiler versions.
Using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. This option has no effect on MacOS.
clang has a cl.exe and you can use clang instead of nvcc too
well a clang-cl I guess
idk was just looking it up again
that's the thing to test just the CUDA VK interop
cudaProcess<<<grid, block, sbytes>>> <<<
>>>
I heard you like angle brackets
I wonder what is going on there
I should look at the header file
oh it uses a ppm file
man
calling a function on the device from application code directly is kind of mind blowing
hrm
// Find the GPU which is selected by Vulkan
while (current_device < device_count) {
cudaGetDeviceProperties(&deviceProp, current_device);
int computeMode;
checkCudaErrors(cudaDeviceGetAttribute(&computeMode, cudaDevAttrComputeMode, current_device));
if ((computeMode != cudaComputeModeProhibited)) {
// Compare the cuda device UUID with vulkan UUID
int ret = memcmp((void *)&deviceProp.uuid, vkDeviceUUID, UUID_SIZE);
if (ret == 0) {
checkCudaErrors(cudaSetDevice(current_device));
checkCudaErrors(cudaGetDeviceProperties(&deviceProp, current_device));
printf("GPU Device %d: \"%s\" with compute capability %d.%d\n\n",
current_device,
deviceProp.name,
deviceProp.major,
deviceProp.minor);
return current_device;
}
}
hrm this is actually the thing to follow
VkExternalMemoryImageCreateInfo vkExternalMemImageCreateInfo = {};
vkExternalMemImageCreateInfo.sType = VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO;
vkExternalMemImageCreateInfo.pNext = NULL;
#ifdef _WIN64
vkExternalMemImageCreateInfo.handleTypes = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_BIT;
#else
vkExternalMemImageCreateInfo.handleTypes = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
#endif
imageInfo.pNext = &vkExternalMemImageCreateInfo;
if (vkCreateImage(device, &imageInfo, nullptr, &image) != VK_SUCCESS) {
throw std::runtime_error("failed to create image!");
}
it goes to the pNext for vkCreateImage create info
so it's just a draw image like I have been creating and it just gets a pNext
oh
and then a VkExportMemoryWin32HandleInfoKHR and VkExportMemoryAllocateInfoKHR
which goes into the VkMemoryAllocateInfo pNext
and then a VkSemaphoreGetWin32HandleInfoKHR from a VkSemaphore
which is attached to a cuda object and given to cudaImportExternalSemaphore
and then a VkMemoryGetWin32HandleInfoKHR for the image memory
which is given to cudaImportExternalMemory
I just want to write some rgba values into that memory
hrm
this uses a single image
// Added sleep of 10 millisecs so that CPU does not submit too much work to
// GPU
std::this_thread::sleep_for(std::chrono::microseconds(10000));

wat
ok I think in context this is just a frame rate limiter, albeit a terrible one
could use vsync to achieve something similar (without ruining frame pacing) as long as the gpu isn't already the bottleneck
yeah
maybe I use flecs for this game
it comes with a frame rate limiter
or just write my own
idk
I can just do it based on frame time
I wonder what a frame rate limiter is doing in an ecs lib
I guess it is a useful utility to have in a game
I don't know, but it's useful
yeah
so based on that vulkan cuda image example I will start on CUDA init next
I was planning today to keep going on the vulkan side to create these shared resources but based on the sample
it makes more sense to have cuda initialized by then
I think it would be nice to fail fast if there's anything here that becomes a blocker for me
so I think that makes sense to not invest in time in things only to find out I can't init a cuda device, stream whatever
I should just get cuda hello world going without any interopt
and once I know cuda works, I can start doing the rest of it
easy
unless it isn't
the next thing would be to try the simpleStream example, since the vulkan sample uses a stream
no, that's too much
I'll just do the hello world printf
I'll start by just renaming my main.cpp to main.cu
and see what happens
C:\Users\swart\projects\code\pixel_storm>cmake --build build-dev
[0/1] Re-running CMake...-- Configuring done (1.6s)
-- Generating done (0.0s)
-- Build files have been written to: C:/Users/swart/projects/code/pixel_storm/build-dev
[1/3] Building CUDA object CMakeFiles\pixelStorm.dir\src\main.cu.obj
main.cu
tmpxft_00004b38_00000000-7_main.cudafe1.cpp
[2/3] Linking CXX executable pixelStorm.exe
C:\Users\swart\projects\code\pixel_storm>
well it built
and it runs
one thing I like about using C++ again is references
oh clangd hates me now
that fixed it
and removed some args from compile_commands.json it didn't understand
I wish .clangd could read env variables
looks like it just needs someone to do and own that
Uh oh, are you approaching another precipice?
no
I'm staying focused
I need to wrap SDL_Log, and all the other SDL crutches I am using so I don't pepper my code with an SDL dependency
at some point
not right now
or maybe I just don't care
internal void init_cuda(ps_ctx_t &ctx) {
#ifndef __CUDA_RUNTIME_H__
SDL_Log("No CUDA runtime detected");
abort();
#endif
i32 device_count;
checkCudaErrors(cudaGetDeviceCount(&device_count));
SDL_Log("num cuda devices: %d", device_count);
}
nice, my first ever CUDA code
oh
this explains the <<<>>>
A kernel is defined using the global declaration specifier and the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<...>>>execution configuration syntax (see Execution Configuration). Each thread that executes the kernel is given a unique thread ID that is accessible within the kernel through built-in variables.
Any call to a global function must specify the execution configuration for that call. The execution configuration defines the dimension of the grid and blocks that will be used to execute the function on the device, as well as the associated stream (see CUDA Runtime for a description of streams).
The execution configuration is specified by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the function name and the parenthesized argument list, where:
Dg is of type dim3 (see dim3) and specifies the dimension and size of the grid, such that Dg.x * Dg.y * Dg.z equals the number of blocks being launched;
Db is of type dim3 (see dim3) and specifies the dimension and size of each block, such that Db.x * Db.y * Db.z equals the number of threads per block;
Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in shared; Ns is an optional argument which defaults to 0;
S is of type cudaStream_t and specifies the associated stream; S is an optional argument which defaults to 0.
its just like a compute shader, except the cuda stream thing
so <<< >>> is how to call the shader from the host is how I read that
sort of like vkCmdDispatch
I guess it's a kernel, not a shader
hrmmmmmm
it's supposed to log something
printf doesn't do anything in a win32 windows application I think
ah the debugger was hiding it
yay 🎉
__global__ void testKernel(int val) {
printf("[%d, %d]:\t\tValue is:%d\n", blockIdx.y * gridDim.x + blockIdx.x,
threadIdx.z * blockDim.x * blockDim.y + threadIdx.y * blockDim.x +
threadIdx.x,
val);
}
{
SDL_Log("starting testKernel");
dim3 dimGrid(2, 2);
dim3 dimBlock(2, 2, 2);
testKernel<<<dimGrid, dimBlock>>>(10);
checkCudaErrors(cudaDeviceSynchronize());
SDL_Log("finished testKernel");
}
great
yep
the cuda programming guide is actually doing a really great job explaining how compute on a gpu works
it goes into hardware details, guidelines, talks about occupancy, how threads become inactive during divergence
When a CUDA kernel accesses a data region in the global memory repeatedly, such data accesses can be considered to be persisting. On the other hand, if the data is only accessed once, such data accesses can be considered to be streaming.
Starting with CUDA 11.0, devices of compute capability 8.0 and above have the capability to influence persistence of data in the L2 cache, potentially providing higher bandwidth and lower latency accesses to global memory.
ok so creating a stream is setting aside L2 cache for persistent access
I don't have shared distributed memory, since that's in 9.0
I'd have to get a new graphics card
ok so the vulkan example needs the stream for cudaSignalExternalSemaphoresAsync
it looks like the optix shaders can get an arbitrary memory to write to
extern "C" {
__constant__ Params params;
}
extern "C"
__global__ void __raygen__draw_solid_color()
{
uint3 launch_index = optixGetLaunchIndex();
RayGenData* rtData = (RayGenData*)optixGetSbtDataPointer();
params.image[launch_index.y * params.image_width + launch_index.x] =
make_color( make_float3( rtData->r, rtData->g, rtData->b ) );
}
that params is passed in just as memory
Params output_buffer;
params.image = output_buffer.map();
params.image_width = width;
CUdeviceptr d_param;
CUDA_CHECK( cudaMalloc( reinterpret_cast<void**>( &d_param ), sizeof( Params ) ) );
CUDA_CHECK( cudaMemcpy(
reinterpret_cast<void*>( d_param ),
¶ms, sizeof( params ),
cudaMemcpyHostToDevice
) );
OPTIX_CHECK( optixLaunch( pipeline, stream, d_param, sizeof( Params ), &sbt, width, height, /*depth=*/1 ) );
OPTIXAPI OptixResult optixLaunch (
OptixPipeline pipeline,
CUstream stream,
CUdeviceptr pipelineParams,
size_t pipelineParamsSize,
const OptixShaderBindingTable ∗ sbt,
unsigned int width,
unsigned int height,
unsigned int depth
typedef unsigned long long CUdeviceptr
// CUDAdevice pointer.
that's cool
so I think I can just write to the vulkan image memory in whatever format
heh
feel pretty confident that if I can write colors to the image memory with just CUDA that I'll have no problems writing colors via OptiX
the thing I also want is an image per FIF and not just a single one
I'll start with a single one since that's what the sample has
ok this evening's goal is to add draw images to my Vulkan renderer, and an unlikely stretch goal to write colors to them from CUDA.
per FIF draw images like I have always done tbh
that turned out to be pretty easy since I could copy a lot of my code over from my previous project
well just the draw image itself
now I need to do that special memory stuff
let me check sync validation though
:|
this sucks
A pointer to a SECURITY_DESCRIPTOR structure that controls access to the object. If the value of this member is NULL, the object is assigned the default security descriptor associated with the access token of the calling process. This is not the same as granting access to everyone by assigning a NULL discretionary access control list (DACL). By default, the default DACL in the access token of a process allows access only to the user represented by the access token.
I'm just going to copy paste this win32 security class from the sample to get this to work
it's pretty gross
EXPLICIT_ACCESS explicitAccess;
ZeroMemory(&explicitAccess, sizeof(EXPLICIT_ACCESS));
explicitAccess.grfAccessPermissions = STANDARD_RIGHTS_ALL | SPECIFIC_RIGHTS_ALL;
explicitAccess.grfAccessMode = SET_ACCESS;
explicitAccess.grfInheritance = INHERIT_ONLY;
explicitAccess.Trustee.TrusteeForm = TRUSTEE_IS_SID;
explicitAccess.Trustee.TrusteeType = TRUSTEE_IS_WELL_KNOWN_GROUP;
explicitAccess.Trustee.ptstrName = (LPTSTR)*ppSID;
I think that just means no security at all
and would not be shippable
so
good this is a solo project
Wait... Why not just pass null?
I think it has to be defined for the win32 external memory handler
If VkExportMemoryAllocateInfo is included in the pNext chain of VkMemoryAllocateInfo with a Windows handleType, but either VkExportMemoryWin32HandleInfoKHR is not included in the pNext chain, or it is included but pAttributes is set to NULL, default security descriptor values will be used, and child processes created by the application will not inherit the handle, as described in the MSDN documentation for “Synchronization Object Security and Access Rights”1. Further, if the structure is not present, the access rights used depend on the handle type.
If you specify NULL, the object gets a default security descriptor
I don't know what the default security descriptor is
But I think that's ok. Child processed won't inherit it, you're not giving it a name, so that's fine
but the sample has the above code
hrm
oh I see so it is not a huge risk
the wikipedia article is better than the msdn docs
Well, I shouldn't act confident, I haven't used vulkan or its memory allocators. But generally I think it's ok to use the default security descriptor, especially for things that don't have names that other processes can find and that child processed won't inherit.
ok, I hope so, just because though I have no idea what I am doing, I'm going to follow the sample and then once I get working set it to null
and see what happens
because I won't probably be able to understand when I have an error what the cause of it is
hrm
I'm going to be happy to delete that class
I think I have to link win32 stuff now too
I get this cool icon in neovim for my cuda file
I have no idea what that is
looks cool though
ok everything seems to render fine with just the vulkan clear color applied to the draw image with all external memory ceremony
next is the external semaphore
my first time using a timeline semaphore
I think I need one of these per FIF
oh that semaphore needs the win32 security stuff too wow
pAttributes is a pointer to a Windows SECURITY_ATTRIBUTES structure specifying security attributes of the handle.
If pAttributes is not NULL, pAttributes must be a valid pointer to a valid SECURITY_ATTRIBUTES value
can be null
I will follow the sample for now
oh it is not a timeline semaphore nm
cool cool, I guess time to do the CUDA side now for the memory and semaphores 😅
oh so my queue submit for the frame command buffer has to wait for two semaphores the frame present semaphore and a cuda -> vk semaphore signaled by cuda, and then I have another vk -> cuda semaphore
makes sense for these to be FIF
I switched to a buffer
Shared buffer created: 132710400 bytes, CUDA ptr: 0000000404C00000
Shared buffer created: 132710400 bytes, CUDA ptr: 000000040CC00000
Shared buffer created: 132710400 bytes, CUDA ptr: 0000000414C00000
works
__global__ void fillColorKernel(float4 *output, u32 width, u32 height, float r,
float g, float b, float a) {
u32 x = blockIdx.x * blockDim.x + threadIdx.x;
u32 y = blockIdx.y * blockDim.y + threadIdx.y;
if (x < width && y < height) {
output[y * width + x] = make_float4(r, g, b, a);
}
}
cuda_fill_buffer(&fctx.shared_buffer, ctx.swapchain.extent.width,
ctx.swapchain.extent.height, 0.8f, 0.2f, 0.3f,
1.0f);
ok
let me try set those win security things to null
I just copy the buffer to the draw image
@pseudo dock you're right I don't need them! 🎉
I'm allowed to wear the
thing now as my role
I'll try to do a optix thing tomorrow
i didnt catch the sudden cuda integration
i suppose that was just an itch you tried to cure by just trying it out, neh?
This is a complete rewrite so I can avoid using a shader language
I had to sacrifice a graphics pipeline along the way, it’s fully RT now
i see
Hoping to be where I left off by end of month
Also no longer a from scratch NHI project
i can see the little headcrab looking vehicle driving in and on and around my little gpu when i look through the side panel 🙂
are you still using shady btw?
No
It doesn’t have stuff
No
Using C now for my shaders
Via OptiX
It’s a C++ compiler however
It’s not coming back, I have made a model for it 
I think I could possibly have a triangle by end of day today
ah right because you're using cuda now you don't need to bother with a shading lang
I remember the first time I tried linking a string literal with GLSL code in it and trying to get an error out of GL to tell me why nothing worked
I have hated since the start
I am free now
I could still use the graphics pipeline if I really wanted to since I am using both CUDA and Vulkan via shared GPU memory
It is still presenting via Vulkan
That’s not a thing CUDA can do
In my case it is just memory to CUDA, using a buffer, but yes it supports images with tiling
You can use textures, mipmaps etc
It just doesn’t have a swapchain WSI thing
afaik
that's cool
I'm glad I haven't bought an AMD GPU like I had considered to test on
What are we working on now @cloud rivet? I feel like every time I come into this thread its a new project
Something with CUDA?
Bjorn is just speedrunning all major technologies, because why not 
amd has a cuda counterpart.... it was either 🇹🇼m or Zluda, although the latter got canned again iirc
Does nobody use OpenCL anymore?
only die hards i would imagine, and big libs which are hardware accellerated use cuda or something else
i thought that was the motivation of amd to push rocm/zluda
well I'm using CUDA specifically for OptiX
I'm migrating my game from vulkan graphics pipeline + Vulkan RT to CUDA OptiX RT (and no graphics pipeline) with a Vulkan present. It's technically the same project, just a new code base. Also no longer extreme NHI and now using a C++ compiler (since nvcc, the cuda compiler, doesn't compile C)
I think/hope I will be back to where I was by end of month, plus also no longer NHI I should go much faster
since I'll use tinygltf, and SDL for images
NHI? you mean NIH?
You did it twice 😛
my brain works in mysterious ways
not homemade invention
NIH is too mainstream as an acronym so had to come up with a new and improved v2
what would the mandatory french canadian alternate acronym be
PII
Pas Inventé Ici
I had a really good reason for my change this time
NIH the NIH
famous last words
I'm still waiting for the next blockens update 
I miss working on it
It was pretty cool
ok time to get an optix thing to work
new CUDA release today 13.1 https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
The Release Notes for the CUDA Toolkit.
new programming guide came out today too https://docs.nvidia.com/cuda/cuda-programming-guide/index.html
The programming guide to the CUDA model and interface.
this was the previous one https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#execution-configuration
I think recent releases are more aimed at server stuff
1 more day of work and I have a two week vacation
first vacation this year
I really need it
my secret is to not take vacation during the actual holiday weeks, because it's actually super quiet nearly every year
nobody is doing much or expects too much
because most people are on vacation and nobody is trying to release anything
hrm I hard code my cuda paths in CMakeLists I should use my %CUDA_PATH%
oh
it's nice it did that check
they improved the installer too lol
just trying all the tools
this DXGI Layer is interesting
that's sharing the memory with CUDA I think
I'm like using three graphics APIs, not two
I have a Vulkan context, a CUDA context and a DX12 context
ok ok anyway time for OptiX
I'll need to setup Tracy at somepoint
optixGetTriangleBarycentrics()
static forceinline device float2 optixGetTriangleBarycentrics ( )
static
Convenience function that returns the first two attributes as floats.When using OptixBuildInputTriangleArray objects, during intersection with a triangle, the barycentric coordinates of the hit are stored into the first two attribute registers.
Available in AH, CH
this is cool
also
optixGetTriangleVertexData() [1/2]
static forceinline device void optixGetTriangleVertexData ( float3 data[3] )
static
Returns the object space triangle vertex positions of the currently intersected triangle at the current ray time.Similar to the random access variant optixGetTriangleVertexDataFromHandle, but does not require setting flag OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS when building the corresponding GAS.
It is only valid to call this function if the return value of optixGetPrimitiveType( optixGetHitKind() ) equals OPTIX_PRIMITIVE_TYPE_TRIANGLE.
Available in AH, CH
you can get instance id, instance index id, primitive index id, world transform
man i have no idea what you're doing at all but seeing you this active makes me want to work on my stuff
I don't know what I am doing either
apparently OptiX also requires precompiled shaders
which is different from CUDA kernels
but at least they're still C++
they get compiled to PTX files, which I guess is an IR like SPIR-V
oh there's a newer version than PTX
OptiX-IR
OptiX has a couple of objects similar to a blas and tlas, but they are a little different:
DXR and VulkanRT use the terms top-level acceleration structure and bottom-level
acceleration structure. A bottom-level acceleration structure is the same as a geometry
acceleration structure; a top-level acceleration structure is similar to an instance
acceleration structure.Traversing against a single geometry acceleration structure,
motion transform nodes, or nested instance acceleration structures (multi-level
instancing) are not supported in DXR or VulkanRT. In NVIDIA OptiX, the terms were
changed due to the additional possible configurations of scene graphs beyond the strict
two-level, top-bottom configurations supported by DXR and VulkanRT.
also
OptiX has an AI denoiser
The denoiser is based on statistical data sets that guide the denoising process. These data,
represented by a binary blob called a training model, are produced from a large number of
rendered images in different stages of convergence. The images are used as input to an
underlying deep learning system. (See the NVIDIA Developer article “Deep Learning”1 for
more information about deep-learning systems.)
Because deep-learning training needs significant computational resources — even obtaining a
sufficient number of partially converged images can be difficult — a general-purpose model
is included with the OptiX software. This model is suitable for many renderers. However, the
model may not yield optimal results when applied to images produced by renderers with
very different noise characteristics compared to those used in the original training data.
hrm
it also has image filters
idk
The ray payload is used to pass data between optixTrace and the programs invoked during
ray traversal. Payload values are passed to and returned from optixTrace, and follow a
copy-in/copy-out semantic. The payload is passed to all the intersection, any-hit, closest-hit
and miss programs that are invoked during the execution of optixTrace. The payload can be
read and written by each program using the thirty-two pairs of optixGetPayload and
optixSetPayload functions (for example, optixGetPayload_0 and optixSetPayload_0).
like in and out variables for shaders
you know what I like about OptiX, I don't have to go through the ceremony of creating vulkan buffer
I just cudaMalloc
OK but why
Why not just use vulkan alone or something
because I want to use a real programming language for shader code
and OptiX gives me that
the shader set up -> shader module is basically a compiled .cu file full of ray_gen, miss, any hit closest hit etc functions, these get separated by type into program groups which are associated with shader binding records, which are then attached to a shader binding table, then the SBT is supplied to a launch function
ok ok I think I get the basics
I'm going to take this triangle example and make it work in my renderer
optix is pretty high level
this isn't some low level vulkan like api imo,this is like the opengl of ray tracing imo
it kind of seems easier to use, with more features, it has all these helper functions
you get image filters, and motion blur, a denoiser,
this is the same as it is in vk and dx12, you just have to make the SBT manually
yes I have had made an SBT In VK
should call it the CBT for how confusing it is tbh
the blas/tlas set up in vulkan is so gross
it's just some malloc memory and a create struct for the simplest version of a acceleration structure
in OptiX
but they can get more complex
you can just use the geometry acceleration structure without a instance acceleration structure
like you can just use a blas, and no tlas if you want
I haven't rendered anything yet, but right now it feels about 90% less gross than vulkan's version of this stuff
just now got your joke
ping: 1000
on a good day
I think maybe cross device apis are bad
all the good apis are for specific hardware
prove me wrong
the nintendo graphics api, the play station graphics api are all lauded by people familiar with them
metal is good
optix is good
dx12 is meh
dx12 is cross device
it's not for specific hardware
I mean IHVs should make their own apis
Metal is also technically cross-hardware, it works on older macs with nvidia and amd gpus
the main thing metal does well is the difficult behavior is opt-in rather than pushed onto the developer
though with metal 4 they removed auto barriers so 🤷
I don't mind barriers
i think they have a good-ish reason for it though, metal 4 is all bindless all the time so the driver can't really add barriers for you
they got rid of setVertexBytes/setFragmentBytes though which I was using as a push constant equivalent with no replacement 
in metal 4 its only buffer objects
they used to have a convenience API in setVertexBytes for when you needed a little bit of data
where the driver would do the buffer stuff for you. underneath it was still a buffer object
ok, I'm still confused how you can get per draw state
if push constants were taken away from vulkan it would be useless?
you can bind a constant space buffer that's global for the whole draw (for your push-constant like things), then use instance ID or drawID system-values to look up the per-draw data in another buffer
unless you have like a draw index
ah ok
that works
that's all I sort of do with push constants
give it a way to identify the actual data in a buffer
but the convenient thing about push constants is they are encoded as part of the command buffer so you didn't have to make a pool allocator just to set some simple global draw parameters
so I can't just have a single file program like I was hoping
with cuda the kernels can live in the application code
but with OptiX they have to be compiled separately
so a second cu file
hmm why is that
the shader programs get compiled into a special type of IR
for optimization and debugging is what it says and future features
does that only happen at program compile time
I'm a little confused how things work exactly which prevents you from mixing cpu and gpu code in a single file
it happens when I run cmake
well it's not a CUDA kernel
it's a special thing for OptiX
OptiX is sort of it's own graphics API that is run from a CUDA context
--optix-ir (-optix-ir)
Compile CUDA source to OptiX IR (.optixir) output. The OptiX IR is only intended for consumption by OptiX through appropriate APIs. This feature is not supported with link-time-optimization (-dlto), the lto_NN -arch target, or with -gencode.
Default Output File Name
The source file name extension is replaced by .optixir to create the default output file name. For example, the default output file name for x.cu is x.optixir
I'm kind of thinking about OptiX as an OpenGL for ray tracing in my brain
OpenGL as in an easy to use, high level graphics API, not as in open or cross platform
it's not a CUDA thing, it's just run via a CUDA context
you use CUDA to upload everything to the device
you can't use a bunch of CUDA features in the OptiX shader programs
like barriers or any synchronization
I see
For efficiency and coherence, the NVIDIA OptiX runtime—unlike CUDA kernels—allows the execution of one task, such as a single ray, to be moved at any point in time to a different lane, warp or streaming multiprocessor (SM). (See section “Kernel Focus”1 in the CUDA Toolkit Documentation.2) Consequently, applications cannot use shared memory, synchronization, barriers, or other SM-thread-specific programming constructs in their programs supplied to OptiX.
add_custom_command in CMake is pretty cool
cmake has all these utility commands
like moving files if changed
and using that with add_custom_command is so handy
imagine using a bat file
I just have to learn everything the hard way
yesterday I ran into an issue trying to use Optix-IR and switched to the PTX format, which is what blender uses
it was very late and didn't do a very good job on this nvidia dev forum post https://forums.developer.nvidia.com/t/optix-ir-seems-to-fail-for-me-with-vector-types/353662
but I wanted to investigate it a but more
something about the built in vector type was failing to compile when I was using the Optix-IR option when compiling the shader programs
well
it doesn't fail on compile
it fails on creating the shader module at run time
maybe I can create a small reproducible test case for the forum
thats always a good idea
I want to go over all the nvcc compiler options
maybe it is alignment related
native vector types and alignment is a reoccurring topic for me
two more hours and I'm on vacation 
I have used AI for good
This function compiles fine with PTX but fails with Optix-IR. Specifically it fails on this line: float inv_dim_x = 1.0f / (float)dim.x; The only thing I changed is -optix-ir instead of -ptx I am using sm_86 on cuda 13.1 and optix 9.0 It fails when I call optixModuleCreate My raygen program that calls computeRay will end up with 0 instructi...
I had claude help me create an easy to reproducible example of the bug
they can download, unzip and see for themselves
even though I used AI it took me several hours to get it right
unlikely anyone will work on it this month, anyway PTX works
but they're pushing everyone to use optix-ir so why would it fail like this
how ironic is it that my first issue using OptiX is a shader bug 
anyway, not blocked
i am pretty sure that it is a bug
oh
someone else just reported a bug too
Hello, I recently updated the driver to 591.44 and tested sample programs of my OptiX wrapper. Then I noticed some programs run weirdly with debug build, specifically with debuggable (-G) ptx/OptiX-IR. For example, a sample program demonstrating deformation blur looks like with the issue: It should look like: I have been able to get the...
I think they shipped some regressions with the latest
I don't know if this exists in Vulkan, but OptiX has a motion transform, which seems like it would help with updates as a result from animations?
I'll have to try and use it to understand it better
oh the OptiX guide goes on to say these things don't exist in vulkan or dx
i think motion transforms are intended for stuff like motion blur
for denoisers to plugin?
I see
I've been doing a force sync after the kernel call, and have not been using the semaphores, I'm going to fix that now
was just trying to get a triangle going
going to use timeline semaphores, haven't used them before
they're nice
hrm actually think binary semaphores are easier for this
and it requires using a cuda stream now
and I'll have to not delete these when resizing so I need to fix my frame reset code
I don't think I need a stream per FIF 
hrm
The cudaStreamSynchronize() function will block until all the work in the stream has completed.
I'm trying to signal an external semaphore
so
syncing the sync sounds bad
is this not something like a commandlist situation
you write to a stream with any thread, when you are done writing and want to submit you sync then
not sure
there's no command list
The simplest way to synchronize the GPU and a host thread is with the use of cudaDeviceSynchronize, which blocks the host thread until all previously issued work on the GPU has completed. In the examples of this chapter this is sufficient because only single operations are being executed on the GPU. In larger applications, there may be multiple streams executing work on the GPU and cudaDeviceSynchronize will wait for work in all streams to complete.
I think maybe I just have a stream per fif
all my sync is per fif
so I think this makes sense
unfortunately all the cuda and optix sample code is one shot
at least all the ones I looked at
I think it is safer to over sync and I'll be able to profile and fix slowness
vs random shit is broken because I didn't sync something and I have no idea what is going wrong
i mean the stream is the "commandlist"
yes that's what it is
the analog to that is the vulkan command buffer, and I have one per FIF
why not just use vulkan for all this? 
ah, and vchichi is not ready yet i assume
vcc is amazing, but to me it seemed that Gob has no plans to make it into a production ready thing someone should use for a game
slang was a disappointment, now you try optix
ok that was the missing piece
yeah I have been spamming this channel, so hard to keep up with with my crazy ranting
all good
we do have #gpu-compute as well, not sure if you put your sniffer in there too, just in case other frogs fiddled with optix already, plus the cuda discord linked in related-serveurs
oh there's a cuda specific discord?
that server seems pretty quiet
I'll lurk
last mention of optix was over a year ago
thanks
doesnt hurt to ping peeps, greg might know a thing or two
or nightchild, both are here too
I lurk in #gpu-compute
I don't really participate in the topic specific "Discussion" channels
but yeah I see what's being said
I don't generally ask for help, just try to figure things out on my own. I hate asking for help. It's a flaw
I feel like I'm surrendering
having to ask for help about slang is actually where I ended up having had enough
and that's when I dumped it
yes I will do a cuda stream per fif
and I will call cudaStreamSynchronize
but it should be empty of tasks
no
I should not call that
it interferes with the GPU to GPU sync
hehe
works
if I resize the window too much I can run out of device memory however
I think I need to create a really large shared buffer and only render to a portion of it so I don't have to recreate the buffers on resize
the way this code blows up on resize is hilarous
abort dialog bombs
I will fix that later
internal void optix_render_to_buffer(
ps_ctx_t &ctx, i32 frame_count, cudaExternalSemaphore_t *cuda_v2c_semaphore,
cudaExternalSemaphore_t *cuda_c2v_semaphore, cudaStream_t cuda_stream,
ps_shared_buffer_t *buf, u32 width, u32 height) {
if (frame_count > 0) {
cuda_wait_semaphore(ctx, cuda_v2c_semaphore, cuda_stream);
}
Params params = {};
params.output = (float4 *)buf->cuda_ptr;
params.width = width;
params.height = height;
params.handle = ctx.optix.gas_handle;
params.cam_eye = make_float3(0.0f, 0.0f, 2.0f);
float aspect = (float)width / (float)height;
float fov_y = 45.0f * 3.14159f / 180.0f;
float half_height = tanf(fov_y * 0.5f);
float half_width = aspect * half_height;
params.cam_u = make_float3(half_width, 0.0f, 0.0f);
params.cam_v = make_float3(0.0f, -half_height, 0.0f);
params.cam_w = make_float3(0.0f, 0.0f, -1.0f);
checkCudaErrors(cudaMemcpyAsync(reinterpret_cast<void *>(ctx.optix.d_params),
¶ms, sizeof(Params),
cudaMemcpyHostToDevice, cuda_stream));
OPTIX_CHECK(optixLaunch(ctx.optix.pipeline, cuda_stream,
reinterpret_cast<CUdeviceptr>(ctx.optix.d_params),
sizeof(Params), &ctx.optix.sbt, width, height, 1));
cuda_signal_semaphore(ctx, cuda_c2v_semaphore, cuda_stream);
}
optixLaunch starts the ray trace
I should just pass the frame context into this function
those are all per fif objects, the stream, semaphores and buffers
yeah the cuda_stream is just like a comand buffer in vulkan
I like it
so far really enjoying all this
that Param struct is used in the shader code, I can just share my header file with the shader code
I can just use normal C++20 in the shader code
it's great
after I fix resize I'll add imgui, which requires adding a dynamic render pass to vulkan
and then I should be done touching vulkan
it'll just be CUDA and OptiX and application/game code from then on
well I should break up in my single main.cpp file up at some point
but that's just moving code around
ah tokei doesn't count cu files RIP
I think the CUDA cmd stream must be handled in order since I don't have to add any barriers between the cudaMemcpyAsync and optixLaunch
I should confirm this
like a gpu to gpu barrier?
oh
no barriers needed
A stream operates like a work-queue into which programs can add operations, such as memory copies or kernel launches, to be executed in order.
guaranteed to be in order, be praised
imagine still using

I never got out of bed this morning, I've just been sitting here with my laptop writing this stuff
I haven't eaten or anything, it's already dark
lol
engrossed
I still need to get my 10k steps in fml
that's gotta be 1-2 hours of walking 
i wanted to go to bed 4hrs ago, its almost 2:30am and now i feel like starting a new round of anno 117 🙂 and just skip the night, to fix my schlepp cycle
I started vacation, I can be up at any time, no responsibilities, why can't I always live like this.
ill be off work soon too, but i have people to tell what they have to do until then ;C
if you live frugally and accumulate that succulent startup $$$ for a few years, you can
I gotta pay college tuition starting next year lol
frogally*
hmm i could try that new marmelade on some peanut butter
damn u
am getting hungry
on some briefly toasted slice of german bread, with thicc butter underneath the peanut butter
90% nuss
I am going to eat something healthy and all natural like a food truck burrito
They got one by a bar that stays open late
unsanctioned food trucks that only appear in parking lots at night and dump their trash on the street hit different
I mean the food from them does, not the trucks themselves 
hostile interpretation is funny doe
it do be
be doo be
took a little over 2 hours 
nice! how are you feeling?
I wouldn't feel lazy if I worked on my game and walked for over two hours
I guess what I mean is I feel more of a headwind than normal
I've been neglecting my cardio since the sun is setting early and my sleep schedule has been wack 
exercising has gotten much harder since it is so cold now
I exercise very late
I'm gonna take a break from exercise for about a week
I'll still get my steps in though
my shed of a garage is my gym, it's just a wooden shack with no door
it's freezing in there
at like 11pm
I wear thermal wear and a hoodie while I train and my lungs burn
my current roadmap for rest of December to get back to where I was is:
- fix resize mem bug
- add imgui/dynamic render pass
- add animated cubes to learn how to instance draw
- add Suzanne monke head & gltf loading & camera movement
- add deccer cubes to get scene and texture loading
- add Sponza to figure out how materials/normal/surface details maps work in RT
- figure out shadows, there's no shadow thing in OptiX
- add track and vehicle back in
- back to where I was!
I got 3 and a half weeks
mostly vacation and holidays
I think maybe doable
I have done all this before with a graphics pipeline
maybe it's crazy hard though with RT
not NIH'ing anything so will use tinygltf and SDL for image loading
I'll actually be way ahead because of that from where I was before since I'm using libraries again
just more strength to gain per session
lung day
Based on your description and what you had been talking about before I thought this was going to be a list of exercises ☺️
hrmmm, this is still creating an image per resize, but already not losing device memory, so that's good
still going to switch it to not recreate buffers and draw images per resize though
there's no visible aliasing along the triangle edges, it's nice how smooth it is. it's such a pretty triangle
the colors in the video look a bit more washed out than they do in the actual render
the green is especially a lot more vibrant for me than it is in the video
a relative comparison
a screen shot of the video makes it even worse
but you can tell there's a difference, you obviously can't see what it looks like for me though
gross
I want to share the actual colors 
oh I could dump the draw image to a file
but that wouldn't work with a video
I should use CUDA for that
I'll do that later, going to stay focused
oh so now I keep the buffer and draw image per resize, and it peforms much better, but I see artifacts now
hrm
I know what it is
I think I know
no that wasn't it, idk
I wonder if this is just on resize or a problem with any kind of change per frame
on to imgui
TIL I can debug step through inside neovim with a DAP https://github.com/mfussenegger/nvim-dap
hrm
learned about this via mason https://github.com/mason-org/mason.nvim
the fuck
how do you debug OptiX 
it doesn't even show up
hrm 
can I see acceleration structure like I can with Nsight graphics though in vulkan?
The User Guide for Nsight Compute.
yes
hrmmm
there it is
nsight graphics is useless to me now
it's just nsight compute
I already uninstalled renderdoc
nsight systems and nsight compute
does it not support cuda-vulkan interop?
no
that answer is unclear to me
why?
so you're saying it does support cuda-vulkan interop?
it does not
rip
yeah I mean
not much to gain from looking at nsight graphics
it just presenting the image from optix
yeah vulkan isn't doing the real work anyway
I was thinking since I had vulkan I could still use the graphics pipeline to rasterize if I wanted to
but without any kind of a debugger support
apparently
at least you have the validation layers
yeah
anyway, if I want to rasterise now, I'll do it in cuda
hrm, the examples, the docs, the debugging applications, it's all NVIDIA, it's just one company's entire ecosystem I've attached myself to
it's fine
everything is fine 😅
it's going well so far. I think a couple of major drawbacks are apparent now though
updates break things, seems to be a pattern
a completely opaque (binaries, not much is open source) and proprietary ecosystem with a not very large set of visible users
I imagine most users of this tech are private companies not spamming a discord/reddit with what they're working on
it's fun though
gonna keep going
it's a 4 or 5 Trillion dollar market cap company

they're probably not going to kill OptiX tomorrow
there's a solid chance they're spamming private discords
or slacks, which are the same
maybe it's like a cult and I should start wearing a leather jacket
you know, to fit in with the slacks, a leather jacket is required
to get in
the leather jacket is the key
I used to have one, a long long time ago
I don't remember what happened to it
I bought it korea in the 90's
I don't think I need a graphics pipeline at all
I think I just start a dynamic render pass, call imgui and then end the render pass and done
ezpz?
just a color attachment
init imgui with vk stuff
start frame, end frame, shut down
capture events
You are interoping Vk and Cuda?
yeah
anything you want to render in real time from CUDA you have to get on a screen with another graphics API since it can't present
Yeah. I am doing a similar project (CUDA and Optix), though I use OpenGL, which seems to be an eaiser path
is that GPU to GPU?
Yeah
nice
yeah I'm using OptiX also
there's two of us!
how far along are you?
I used vulkan because that's all I know anymore
Not very far. I have sphere and triangle mesh primitives. Not much else (I have some "fake" shading currently). I am doing it pretty slowly
nice, what IR are you using? PTX or Optix-IR?
I can't get Optix-IR to work, I posted about it on the NVIDIA forums
for compiling your optix shaders
I am using PTX (compiled via nvrtc)
Runtime compiler. I want to support hot-reloading eventually
oh I see
yeah my plan for runtime hot reloading was just to compile and reload the binary
so nvcc -> new binary -> app detects it updated and reloads
I just read the bytes with file read right now
so nvrtc also supports optic-ir
I should have figured rt stands for run time
they say ptx is deprecated, but blender uses it still
optix-ir has a bunch of bugs tbh
based on the forum
I don't know much about PTX vs OptixIR. I guess I will just use what is currently working until there is a need for change
I don't either, I just saw in the documentation to not use PTX but then optix ir doesn't even work

so I also just use PTX
works for me
apparently optix-ir is supposed to have better performance
i need my single triangle to be super performant
do you have a project thread or share any updates?
what made you pick CUDA/OptiX?
Not yet. I want to make more progress before sharing it
I am making a ray tracer with multiple backends (CPU, CUDA pure software, and hardware acceleration with Optix). Using CUDA/Optix makes it easy to share code between different backends.
Maybe in a distant future, I can also implement a Vulkan backend or something
oh, that's cool
my opinion, having used the vulkan RT just a tiny bit
optix is much better
so far anyway
vulkan has fewer features and a burdensome API
did you get your leather jacket yet?
What's that?
a jacket made of leather
btw I think @shut hornet doesn't actually use CUDA they are likely an imposter with that role
but I could be wrong. I have seen zero pictures of them with a jacket on as evidecne
ok dynamic render pass added, but it doesn't do anything
oh I need a barrier for that
cool no vk sync validation issues
alright
imgui init is next
I guess I need to add the library as a git submodule
cmake is great
every time I have to do anything with cmake, it's easy, idk why I have slept on it for so long
well the app I released is ML with Cuda, but I didn't write the Cuda parts, it's a library https://rendepth.com/
but honestly really dislike Nvidia in general and will need to port this to something open, just don't have much time
this is a cool project
thanks!!
and, AFAIK, the training was done with open datasets, so there shouldn't be anything "stolen" in there
right
cool, I did have a designer for the logo, most of the rest of the design on the site is me
i have some freelancers, but i'll mostly just do a rough sketch and then send it to them to make it look professional
makes sense, and comes out look great imo
thanks a bunch!!
but do you wear a leather jacket
jk jk
it's not required for the cuda role afaik
so after I added imgui to my project I realized I was over gamma correcting my render
i used to for a while in NYC, but I'm old and real leather is super heavy
are you sure, imgui is whack when it comes to srgb
ok
i thought rendepth was your thing 100%
2 times better than 1 time 🙂
10x engineers 10x their gamma correction 
damn it's nice to be using imgui instead of my half baked ui lol
it felt so cool to be able to tell people I made something totally from scratch, but you know, that's just ego
and in the end I had a worse thing
yeah, it's my thing, but there are a bunch of 3rd party libs, not 100% from scratch
you could pick up your ui lib and make it cooler later
true
it's kinda dumb in the end yea. on the first engine I tried to write, I thought I would use no 3rd party code at all (it was DX11 and only used Microsoft DirectX libs)
like even wrote a crappy XML parser from hand, and the COLLADA importer, no libraries at all
it's so rough
cool learning experiment, so i know to never do that again
I agree
though, I did do a custom UI on Rendepth and honestly that alone took like a month, when the rest of the app took like 2 months (for the MVP)
for the actual game UI I want to do my own UI still
but the debug ui tooling, it's fine to use imgui for it
well I want it to be a custom UI unique to the game
it might still be a library if I can customize it enough
alright I have imgui, I have resize, I have correct sync, it's all RT and game stuff going forward, vulkan is done
i'm looking at this, though not sure about cross-platform support: https://github.com/mikke89/RmlUi
imgui is a great debugger but not good for in-game ui
my friend is using this one and we got it working on android so should be fine
cool thanks
I'm gonna star that
all this talk about ui reminded me of something i made decades ago
a custom little ui to browse brushes and maps for some little terrain editor
it felt super easy to make back when, now it takes a bit more head scratching for me if i was to reproduce that thing hehe
sometimes things are easier when you are stupid and don't realize how hard they are 
or it's like gambling, were you always win big the first hand of hold 'em you play, and then progressively just get worse and worse
think I won like $200 the first time i played poker, all downhill from there 😭
in 2mins its leet time here
2mins ago it was leeeet time 😄 (13:33:37)
or leeeeeeet time, 13:33:33.337, haha
RmlUi is cool, I haven't gone too too deep with it but so far it's the best UI solution I looked at
especially for stuff that needs heavier than imgui-tier debug UI but lighter than Qt gigabloat
what I think is really cool is that since it has some subset of CSS so it's pretty moddable, including by your users
it's def cross platform, you provide your own backends just like imgui and there's default (GL, Vulkan) x (GLFW, SDL) much like imgui, however the vulkan one last I tried is broken
but if you're integrating into your own vulkan renderer you likely wanna write your own backend to integrate with your resource management/frame graph/whatever
yeah, I understand that much, just looking at the github there was a lot of red x's on the support matrix so didn't seem 100% everywhere
though yeah, I guess you can do the integration yourself
DRUI when
I have my copy of roblox UI in my engine but much like real roblox UI it's mildly ass to work with
RmlUi is awesome because you can build applications with it and people can mod them like foobar all without it being a giant electron app or Qt/WxWidgets/etc. app
though for something like a matrix chat or something approaching minimal browser functionality there's a lot I'd have to mod in myself like adding my text support lib for full text rendering/editing capabilities, as well as getting some kind of wuffs-based image loading for safe rendering of images you receive from the internet
pretty soon some random app is just gonna be a Docker image with an entire OS inside
and still be more optimized than embedded a web browser
Let’s be real, they’re going to do that and run electron in it
i mean, that was the entire point of HTML5, so that webapps could be close to native
and companies realize that (in terms of development) but still ship "native" apps I guess cause it's easier to make money (or embed illegal tracking)
yeah the worst thing about browsers for a company is that they are very well sandboxed
I'm occasionally a little sad that html5 didn't take up the slack for flash's market even though it could have and should have
im still a little sour that silverlight was not turned into a native thing, not requiring any plugin, but running natively in any major brauser on any platform
that would have been "rich" lol (since all that shit was called rich internet application back when)
one of the big appeals of flash was that the creation tool was very comprehensive yet easy to understand. nothing else has really replaced it
kids could start making movies and games in it with no effort, and have them online in minutes
but flash was riddled with bugs and security vulnerabilitilbilities, thats what fonts and drawing primitives do to a mf render-engine
the flash plugin also gave webpages unrestricted access to the user's filesystem 
yeah that's true, but part of the same issue, no one has the appetite to copy 20+ year old tech
the closest thing that comes to mind in functionality is the 2D animation tool in blender but that's not exactly kid friendly
flash is still used today in TV animation
I've only ever played with it once
wow, didn't realize they hadn't replaced it there either
I know flash had a pretty big impact on animation production cycles when it came out
i think ToonBoom Harmony is steadily eating its lunch, but flash is still king
Adobe renamed flash to "animate" and focused almost entirely on TV animation but it's the same app and actionscript is still there
to my knowledge anime is still mostly drawn on physical paper and scanned in
yeah I've heard that as well
this is true, but was also kind of a red herring. like security in general at the time (in IE6 or whatever) was crap, and any website could hack you
or install 50 browser plug-ins and viruses, etc. even without Flash
how was that a red herring smh, it was about flash not everything else
like the industry used that as an excuse to kill Flash, when it was really about money/control
Apple changed the rules on iOS several times to specifically block Flash, even after Adobe enabled native compile (which did not use the web runtime)
yeah, Adobe still sells the product called Adobe Animate now, just not for games: https://www.adobe.com/products/animate.html
that looks just like flash
does it let you write actionscript?
it does!
:O
it's just flash
lol I literally said this here 
it is Flash. they rebranded cause the name was not popular anymore
the steve jobs letter of doom hit them
yes
it handles geometry compression
nobody is keeping you from peeking into the code there and steal relevant bits
as in there's an algorithm in there where it determines the size required to build the acceleration structures via compression
do you mind sharing the link to that repo though?
if it’s there and you’re already tied to optix I see no reason not to
it's not in a repo you have to sign up with an NVIDIA account, agree to the EULA and then you can download it
and the thing you download is an installer
it's not source code
but you get source code
it's just like the FBX SDK
the code does have a license though
SPDX-License-Identifier: BSD-3-Clause
maybe someone put it on github
maybe nvidia did
looking
it's actually not that much code
ah i have a novideo account, no worries
you need one for nsight, or needed one at least
I can't see this code on github
ok if you download it the code ends up in C:\ProgramData\NVIDIA Corporation\OptiX SDK 9.0.0\SDK\sutil\Scene.cpp
I feel like they made this just for their samples
it's not documented
anyway
pretty cool
I think I can build this myself tbh, it's complex but there's a huge comment explaining what it does
and it's not that much code
// Problem:
// The memory requirements of a compacted GAS are unknown prior to building the GAS.
// Hence, compaction of a GAS requires to build the GAS first and allocating memory for the compacted GAS afterwards.
// This causes a device-host synchronization point, potentially harming performance.
// This is most likely the case for small GASes where the actual building and compaction of the GAS is very fast.
// A naive algorithm processes one GAS at a time with the following steps
is the beginning of the long comment
sometimes there are some gems in those code schnippets
yeah there's a lot of code in the both the cuda and optix samples
the cuda stuff is on github
it doesn't have the optix code though
it's pretty cool, there's basically an example for anything I might want to do
to build the OptiX SDK I had to use these args cmake build .. -DCUDA_MIN_SM_TARGET=sm_86 -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -G "Visual Studio 17 2022" because the latest cmake I have deprecated the cmake version the project uses, and the optix sdk doesn't build with VS 2026 and additionally the minimum cuda version is not supported at least by my system so I specified sm_86 which RTX cards support
just in case you wanted to build it, it took me a bit to figure all that out
today I have just been reading through the sample to figure out how to do instances acceleration structures because I wanted a bunch of spinning cubes
optix_instance.traversableHandle = m_meshes[instance->mesh_idx]->gas_handle;
there it is
instances[0].traversableHandle = state.static_gas_handle;
instances[1].traversableHandle = state.static_gas_handle;
instances[2].traversableHandle = state.deforming_gas_handle;
instances[3].traversableHandle = state.exploding_gas_handle;
size_t instances_size_in_bytes = sizeof( OptixInstance ) * instances.size();
CUDA_CHECK( cudaMalloc( ( void** )&state.d_instances, instances_size_in_bytes ) );
CUDA_CHECK( cudaMemcpy( ( void* )state.d_instances, instances.data(), instances_size_in_bytes, cudaMemcpyHostToDevice ) );
state.ias_instance_input.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
state.ias_instance_input.instanceArray.instances = state.d_instances;
I know right
ok
ok so the instances also get a transform
7.37.2.5
transform
float OptixInstance::transform[12]
affine object-to-world transformation as 3x4 matrix in row-major layout
there it is
cool I think maybe I can do some cubes
with these sdks instead of having a million browser tabs open I have a million notepad++ tabs open
classic
oh right thanks
also it's an adobe product, they all look like flash
because they all look the same
Does Nvidia still make you sign the NDA to access GameWorks?
no idea, its been a while since i looked
iirc a bunch of it is on their github
if it's the same way it was, yes, that agreement for the Nvidia dev account is somewhat strict
though some of the stuff is open source, it depends which sdk
well, they don't have to worry about me sharing anything I just plan to keep it all on my computer
I got a lot of stuff to read through and learn about. it's great
for sure. i don't really care, just heads up to know what you are signing
yes, thank you
I actually prefer the gas/ias terminology compare to blas/tlas. Also Optix supports multi-level instancing, so calling ias as tlas will be weird
yeah the Optix acceleration structure stuff seems way better than what you get with vulkan
what do gas and ias stand for
geometry acceleration structure
instance acceleration structure
it's just a memcpy and a build with an update operation to update an ias
i have no idea about any of this
you basically stick some vertices data into a buffer with a struct that is a configuration and then optix builds a fast data structure for resolving ray traces
that's all it is
geometry accelellleleeration structure?
it do be sounding more senseful
and then you create an ias
intelligent accelleration structure 🙂
and the number of deccer cubes as instances each with their own transform
ah
