#Rosy
1 messages · Page 12 of 1
do you know what derivatives are used for?
no, it sounds like it is useful for processing multiple fragments concurrently I am thinking?
and how they're calculated?
no
ok I'll explain
they're usually used for determining which lod of a texture to sample. if you view the texture from far away or at an oblique angle, the derivatives of the texture coordinate with respect to screen space x and y will be large (the screen pixel covers multiple texels from the texture)
so you can sample a higher lod (less detailed) for antialiasing and better perf
thank you for explaining that
one last thing
the graphics pipeline does a lot of work
yeah
this is stuff I'd have to do manually in a software renderer
derivatives are calculated by taking the difference between a variable from adjacent fragment shader instances
so dFdx(uv) might be implemented by subtracting uv from the current fragment shader invocation and uv from an invocation that's one pixel to the right
this is possible because GPUs execute fragment shaders in minimum 2x2 blocks
even if your triangle is one pixel or whatever
there will be fs invocations spawned on the edge that exist solely for derivative calculations
they're called "helper invocations" and this is a concept in graphics APIs
all the vertex attributes will be extrapolated into these invocations and they'll execute normally, except they won't write to buffers or images
I think that's all there is to GPU derivatives
that's a lot, thank you again!
you learned this while working at AMD or just reading papers?
I added all of this to my notes
actually neither of those
a lot can be learned by reading, e.g. the glsl (shading language) spec and asking people how certain things are implemented
oh that makes sense that I am running into this language reading the SPIR-V spec
if you learned about it from the GLSL spec
I mean those concepts were almost certainly reinforced in my mind by papers and things I read at AMD. They're referenced a lot
Oh that actually mirrors what I said perfectly lol
yes
ok so after I do a custom annotation/node IR and have it emit test spirv I will look at how llvm emits array types and array indexing, and then also look at how shady IR array types and inexing works, and how that gets emitted as SPIRV
then I'll add that to my own test IR stuff
and then I'll create a GLSL and a slang fragment shader that uses descriptor indexing, look at the spirv disassembly for those
learn how whatever they do works
then I should be in a pretty good position to get my compute shader to compile
and this test to pass
alrighty
so I'm going to add a debug marker annotation that emits a OpName in the spirv for anything annotated in the C
and I'll create a new DebugMarker node type in the Shady IR
this is just as a learning exercise
#include <stdint.h>
#include <shady.h>
descriptor_set(0) descriptor_binding(1) uniform_constant sampler2D texSampler;
location(0) input native_vec3 fragColor;
debug_marker(albedo_texture) location(1) input native_vec2 fragTexCoord;
location(0) output native_vec4 outColor;
fragment_shader void main() {
outColor = texture2D(texSampler, fragTexCoord) * (native_vec4) { fragColor.x * 2.5f, fragColor.y * 2.5f, fragColor.z * 2.5f, 1.0f };
}
notice debug_marker(albedo_texture)
in shady.h:
#define debug_marker(name) __attribute__((annotate("shady::debug_marker::"#name)))
C:\Users\Bjorn\projects\code\shady\build>vcc --log-level debugvv -I..\vcc\include ..\test\vcc\debug_marker.frag.c -emit-llvm -o shader.ll --target spirv > full_debug.txt 2>&1
C:\Users\Bjorn\projects\code\shady\build>type full_debug.txt | findstr marker
built command: clang -c -emit-llvm -S -g -O0 -ffreestanding -Wno-main-return-type -isystem"C:\Users\Bjorn\projects\code\shady\build\bin\Debug/../share/vcc/include/" -D__SHADY__=1 --target=spirv64-unknown-unknown -o gOSnrfySsiRctZ0BavRO74LQeIbuUsYY "..\test\vcc\debug_marker.frag.c" "-I..\vcc\include" "-emit-llvm"
Emitting debug marker 'albedo_texture' for SPIR-V ID '24' for node '%77'
file=..\test\vcc\debug_marker.frag.c tmpfile=gOSnrfySsiRctZ0BavRO74LQeIbuUsYY
C:\Users\Bjorn\projects\code\shady\build>type full_debug.txt | findstr @DebugMarker
%78 = ptr(Generic, %77): CrossDevice %78 = @Exported("fragTexCoord") @DebugMarker("albedo_texture") @Location(1) @IO(5) GlobalVariable(type: %77, address_space: Generic, is_ref: false, init: null)
%76 = ptr(Generic, %75): CrossDevice %76 = @Exported("fragTexCoord") @DebugMarker("albedo_texture") @Location(1) @IO(5) GlobalVariable(type: %75, address_space: Generic, is_ref: false, init: null)
can you automate the placement of debug markers?
by modifying the compiler or something
C:\Users\Bjorn\projects\code\shady\build>spirv-dis output.spv > output.txt
C:\Users\Bjorn\projects\code\shady\build>type output.txt | findstr albedo
OpName %fragTexCoord "DEBUG_albedo_texture"
works :D
vcc already adds OpName's for a lot
all your variables have OpName
but yes I mean that's what I did
I modified the compiler to add debug markers
let me push the diff
what's the difference between a debug marker and OpName
I mean it's just what I called my thing
ah
OpName is the spirv mechanism by which you can give %23 a name
instead of it being %23 or whatever
I don't know what debug tooling supports that
maybe renderdoc?
try it
like when it disasembles your spirv
I hope it does
maybe cross-spirv
yeah I could try it
this is my code for adding the debug marker
src/frontend/llvm/l2s_annotations.c is what fetches the LLVM annotations from the LLVMIR and lets you do things with them, this is in the LLVM frontend in Shady
and src/backend/spirv/emit_spv.c obviously is the backend spirv emitter
ok let me try it my thing to see I can see it in Renderdoc
yeah
I see it
in the disassembly
the spir-v renderdoc thing is horrible
and nobody should ever look at it
and it's not in there
can you see it in the step debugger though
click on draw call then debug fragment button on the bottom right of the texture viewer
or go to pipeline state tab, then the stage you want to debug
it works for all types of shaders
I only see view and edit
for the vertex shader
I do see the debug button on the texture viewer
maybe the option to debug a vertex is only in the mesh viewer
oh let me take a look
sometimes the debug buttons are grayed out for mysterious reasons
I wish RenderDoc was clearer about why
I don't see a debug button on the mesh viewer I have the draw call event selected
oh
frag shader I can debug
neat
There should be debug buttons elsewhere imo. They should always be present in the pipeline tab
no it doesn't show what I did there because it uses the horrible renderdoc spirv output
rip
that would be cool if I could watch that
you could use the same debug marker on a bunch of variables
this renderdoc spirv output is so much worse than spirv-dis
hmm but RenderDoc can intercept the actual spirv you upload, so why is it so bad
it doesn't have to transform it at all
I mean I just find that hard to look at
ulong _58 = *_57;
struct60 _118 = generated_load_Generic_f32_Invocation(pc_physical@4, %48_physical@4, %193_physical@4, out_color_physical@4, stack_ptr@6, _58);
float _123 = _55 * 1000.0000;
float _125 = GLSL.std.450::Floor(_123);
float _127 = FMod(_125, 10000.0000);
float _128 = _127 / 10000.0000;
float _130 = _128 * 2.0000;
float _132 = _130 * 3.1416;
float _133 = GLSL.std.450::Sin(_132);
float _134 = _133 / 2.0000;
float _135 = _121 * _134;
vs
%118 = OpFunctionCall %_struct_60 %generated_load_Generic_f32_Invocation %pc_physical_1 %_48_physical_1 %_193_physical_1 %out_color_physical_1 %stack_ptr_1 %58
%123 = OpFMul %float %55 %float_1000
%125 = OpExtInst %float %124 Floor %123
%127 = OpFMod %float %125 %float_10000
%128 = OpFDiv %float %127 %float_10000
%130 = OpFMul %float %128 %float_2
%132 = OpFMul %float %130 %float_3_14159274
%133 = OpExtInst %float %124 Sin %132
%134 = OpFDiv %float %133 %float_2
%135 = OpFMul %float %121 %134
idk I prefer the latter
that looks like spirv
and it has OpName %pc "DEBUG_supercool"
oh
maybe the last one wins in renderdoc
OpName %pc "DEBUG_supercool"
OpName %pc "pc"
or maybe you can only have one per variable
yeah it's probably just getting renamed
maybe there's a better thing than OpName
; Debug information
OpSource GLSL 450
OpName %4 "main"
OpName %9 "scale"
OpName %17 "S"
OpMemberName %17 0 "b"
OpMemberName %17 1 "v"
OpMemberName %17 2 "i"
OpName %18 "blockName"
OpMemberName %18 0 "s"
OpMemberName %18 1 "cond"
OpName %20 ""
OpName %31 "color"
OpName %33 "color1"
OpName %42 "color2"
OpName %48 "i"
OpName %57 "multiplier"
nah it's just OpName
and I guess I was just renaming it
but the way vcc emits the spirv the variable name comes last so it wins
this was a pointless addition, I was just trying to learn Shady
I'll disassemble GLSL and Slang descriptor indexing tomorrow and maybe I can figure out what my issue is
wtf am I writing C for when I could just write LLVM IR
%arrayidx = getelementptr inbounds %struct.ST, ptr %s, i64 1, i32 2, i32 1, i64 5, i64 13

Isn't LLVM IR not stable between versions?
I was just kidding
I dont' want to do that lol
for my interrogation for how SPIR-V looks when descriptor indexing is used I can just use Sascha's descriptor indexing example shaders, he has them for GLSL, HLSL and Slang
https://raw.githubusercontent.com/SaschaWillems/Vulkan/refs/heads/master/shaders/glsl/descriptorindexing/descriptorindexing.frag
https://raw.githubusercontent.com/SaschaWillems/Vulkan/refs/heads/master/shaders/hlsl/descriptorindexing/descriptorindexing.frag
https://raw.githubusercontent.com/SaschaWillems/Vulkan/refs/heads/master/shaders/slang/descriptorindexing/descriptorindexing.slang
he even included the spv files
I'll just generate those myself
idk what version that is
although I think it says in the spv
here's all three run through spirv-dis, although the slang file includes both a vertex and a fragment shader https://gist.github.com/btipling/665eacb64a8c194493a800fdfc1c52ed
ok it's just what I thought it was, just
%158 = OpAccessChain %_ptr_UniformConstant_154 %textures %148
more specifically
OpName %textures "textures"
OpDecorate %textures Binding 1
OpDecorate %textures DescriptorSet 0
%textures = OpVariable %_ptr_UniformConstant__runtimearr_11 UniformConstant
...
%158 = OpAccessChain %_ptr_UniformConstant_154 %textures %148
oh this is important
%_ptr_UniformConstant__runtimearr_154 = OpTypePointer UniformConstant %_runtimearr_154
%_runtimearr_154 = OpTypeRuntimeArray %154
ok the glsl example is definitely the easier to follow
OpCapability RuntimeDescriptorArray
OpCapability SampledImageArrayNonUniformIndexing
OpExtension "SPV_EXT_descriptor_indexing"
...
OpName %textures "textures"
OpName %inTexIndex "inTexIndex"
OpDecorate %outFragColor Location 0
OpDecorate %textures Binding 1
OpDecorate %textures DescriptorSet 0
OpDecorate %inTexIndex Flat
OpDecorate %inTexIndex Location 1
OpDecorate %19 NonUniform
OpDecorate %21 NonUniform
OpDecorate %22 NonUniform
OpDecorate %inUV Location 0
...
%outFragColor = OpVariable %_ptr_Output_v4float Output
%10 = OpTypeImage %float 2D 0 0 0 1 Unknown
%11 = OpTypeSampledImage %10
%_runtimearr_11 = OpTypeRuntimeArray %11
%_ptr_UniformConstant__runtimearr_11 = OpTypePointer UniformConstant %_runtimearr_11
%textures = OpVariable %_ptr_UniformConstant__runtimearr_11 UniformConstant
...
%18 = OpLoad %int %inTexIndex
%19 = OpCopyObject %int %18
%21 = OpAccessChain %_ptr_UniformConstant_11 %textures %19
%22 = OpLoad %11 %21
%26 = OpLoad %v2float %inUV
%27 = OpImageSampleImplicitLod %v4float %22 %26
OpStore %outFragColor %27
interesting the slang spirv doesn't have SPV_EXT_descriptor_indexing
OpCapability RuntimeDescriptorArray seems important
you get a helpful validation error if you try to do something and didn't request the capability though
also spent sometime understanding LLVM IR arrays and Shady's IR array types
I think I can try fixing this bug again
I can write the spirv disassembly to write this test shader myself, I understand the llvm IR, the Shady IR, I know how add annotations that are passed throughout the compilation steps, I know how to write debug statements, how to print the Shady node IR, the steps the Shady IR goes through are already printed out with very verbose debugging enabled, how step through with a debugger, so I think I can get descriptor indexing working tbh
just gotta sit down and do it
after I walk Rosy for about an hour or two though
3 hour walk I guess
it's interesting the LLVM IR for texSamplers[]; is @texSamplers = global [1 x ptr addrspace(4097)]
as opposed to 0
but the compiler says that
..\test\vcc\bindless.frag.c:5:68: warning: tentative array definition assumed to have one element [-Wtentative-definition-array]
5 | descriptor_set(0) descriptor_binding(1) uniform_constant sampler2D texSamplers[];
high address space array found!
high address space array found!
Descriptor index Type_SampledImageType_TAG found in shd_get_default_value!
Error at C:\Users\Bjorn\projects\code\shady\build\src\shady\constructors_generated.c:331: operand 'contents' of node 'Composite' cannot be null
ok that's progress
I've finally detected that I am erroring on a descriptor indexing node type
well it's an arr_type and it wants to turn it into a composite type
but a composite has to have contents
and descriptor indexed contents are not available
hrm
{
"name": "Composite",
"class": "value",
"description": [
"A value made out of more values.",
"Re-ordering values does not count as a computation here !"
],
"ops": [
{ "name": "type", "class": "type", "nullable": true },
{ "name": "contents", "class": "value", "list": true }
]
},
nullable true
if not (op.get("nullable") or op.get("ignore")):
g.g += f"{extra}\t\t\tif (!*pop) {{\n"
g.g += f"{extra}\t\t\t\tshd_error(\"operand '{op_name}' of node '{name}' cannot be null\");\n"
g.g += f"{extra}\t\t\t}}\n"
that's the key issue
if I set the size to NULL on the Shady IR array all kinds of shit breaks, so that was a mistake
I did that last week and I was expecting that to end up as a runtime array without understanding how anything worked
the only code I see that will emit OpAccessChain is pointer offset instructions
I don't think I can use pointers with uniform constants
this code is pretty complex
I think an information gap I still have is understanding the difference in the IRs with types and values of a type
the shader declares a sampler2D type
I then have a line that declaring a variable that is an array of sampler2D
that's actually a new type that is not I think currently expressable in Shady IR, a descriptor index type
I have to handle both a new spv emit type and spv emit value if I made a new thing
and indexing a value of the new type would also be a new instruction
I think I'm going to file a bug with what I think I understand about it
I don't see a quick hack around this
or a quick fix
We can have a call if you want
that would be cool
I can do a call in the evenings during the week, I could do one right now in the next hour if you're available
I’m not at work yet, I’ll get there in about an hour
ah it's 11:30pm I gotta get some sleep before work tomorrow
I'm in pacific standard time
I think I am starting to get how physical address space works in emit SPIR-V. I believe that’s what enables Shady IR pointer use in SPIR-V
I am going to try and learn more about pointers and arrays in vcc today, see how they look in the IRs and what happens during the compilation steps
and how they are represented in the SPIR-V
I honestly just need to read through the SPIR-V spec because just looking up things when I need to would benefit from more context about the execution environment
and this is all relevant to my graphics programming learning goals, and someday rendering something more than a triangle 
I want the spirv-spec on audible and each chapter narrated by different celebrity voice actors
Variables declared with the Function Storage Class can have their lifetime’s specified within their function using the
OpLifetimeStart and OpLifetimeStop instructions.
that's cool
so you can have scope blocks in a function
oh there's a SPIR-V guide https://github.com/KhronosGroup/SPIRV-Guide/tree/main/chapters
I kind of read all of the SPIR-V spec today
mostly chapter 1 and 2 and then browsed through 3
LOL
they got the left side of this image wrong
no built in aggregate types
ok specialization is relevant to my interests I think
logical vs physical pointers explained
I saw that in the spec but the guide explains it better
oh godbolt supports spirv as a target and glsl as a source! https://godbolt.org/z/7KPe11GPs
that would have saved me time yesterday heh
oh there's a spirv tutor repo https://github.com/google/spirv-tutor/tree/main/01 - Introduction to SPIR-V
thanks looking
• C++virtual functions. Likewise doable, but sharing C++ objects
from host and device would not result in portable vtables, which
would make the feature a liability.
this is a feature
jk
(that not being implemented, I mean)
being able to safely memcpy structs to the gpu is big
so
UniformConstant can be viewed as a "opaque handle" to images, samples, raytracing acceleration structures, etc variables.
so the uniform constant is a storage class
along side input and output etc
hrm
RuntimeDescriptorArray
(RuntimeDescriptorArrayEXT)
Uses arrays of resources which are sized at run-
time.
so we detect arrays in the LLVM IR with LLVMArrayTypeKind
%_ptr_UniformConstant__runtimearr_11 = OpTypePointer UniformConstant %_runtimearr_11
%textures = OpVariable %_ptr_UniformConstant__runtimearr_11 UniformConstant
%int = OpTypeInt 32 1
// This will be consumed as a .glsl file and needs the stage and target profile. Example options:
// -S comp --target-env vulkan1.1
#version 450
#extension GL_EXT_nonuniform_qualifier : require
layout(set = 0, binding = 0, rgba8) readonly uniform image2D myImage[];
layout(set = 0, binding = 1, std430) buffer SSBO {
ivec2 coords;
vec4 da...
the runtime array corresponds to an unsized array in shady
ah I see, there's a regression
if (value && as != AsUniformConstant)
```this line is supposed to prevent giving a init value to I/O globals
but I have changed shady such that all globals are initially in the generic address space, and later they get rewritten in the relevant one, and a generic ptr cast is inserted
hrm
I tried this:
#include <shady.h>
location(0) output native_vec4 outColor;
typedef struct {
unsigned int color_index;
native_vec4 colors[];
} pc_t;
push_constant pc_t pc;
fragment_shader void main() {
unsigned int color_index = pc.color_index;
outColor = pc.colors[color_index];
}
this compiles
%23 = [u32; 1]
%24 = ref(Function, %23)
%33 = vec[f32; 4]
%34 = ref(Function, %33)
%147 = TupleType(members: [Invocation u32, Invocation u32])
%195 = TupleType(members: [Invocation u32, Invocation %33])
main = @Leaf @Name("main") @Leaf Function(params: [@Name("outColor_physical") outColor_physical_%125: Invocation %34, @Name("memory_Private") memory_Private_%126: Invocation %24, @Name("stack_ptr") stack_ptr_%127: Invocation u32], return_types: [Invocation u32], body: {
%129 = AbsMem(abs: main)
%148: %147 = Call(mem: %129, callee: generated_Load_CrossDevice_u32_Private, args: [@Name("outColor_physical") outColor_physical_%125, @Name("memory_Private") memory_Private_%126, @Name("stack_ptr") stack_ptr_%127, 0])
stack_ptr: Invocation u32 = @Name("stack_ptr") Extract(composite: %148, selector: 0)
%190: Invocation u32 = Extract(composite: %148, selector: 1)
%191: Invocation u64 = Conversion(type: u64, src: %190)
%193: Invocation u64 = PrimOp(op: mul, operands: [%191, 16])
%194: Invocation u64 = PrimOp(op: add, operands: [4, %193])
%196: %195 = Call(mem: %148, callee: generated_Load_Invocation_vector_type_121_Private, args: [@Name("outColor_physical") outColor_physical_%125, @Name("memory_Private") memory_Private_%126, stack_ptr, %194])
%197: Invocation %33 = Extract(composite: %196, selector: 1)
%198 = Store(mem: %196, ptr: @Name("outColor_physical") outColor_physical_%125, value: %197)
Return(mem: %198, args: [@Name("stack_ptr") stack_ptr_%127])
})
main = @EntryPoint("Fragment") @Name("main") @Leaf @Exported("main") Function(params: [], return_types: [], body: {
%18 = AbsMem(abs: main)
memory_Private: Invocation %24 = @Name("memory_Private") LocalAlloc(mem: %18, type: %23)
outColor_physical: Invocation %34 = @Name("outColor_physical") LocalAlloc(mem: memory_Private, type: %33)
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: outColor_physical, callee: generated_init, args: [outColor_physical, memory_Private, 0])
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: stack_ptr, callee: main, args: [outColor_physical, memory_Private, stack_ptr])
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: stack_ptr, callee: generated_fini, args: [outColor_physical, memory_Private, stack_ptr])
Return(mem: stack_ptr, args: [])
})
it does have a validation issue
C:\Users\Bjorn\projects\code\shady\build>spirv-val output.spv
error: line 74: OpTypeArray Length <id> '15[%uint_0]' default value must be at least 1: found 0
%_arr_v4float_uint_0 = OpTypeArray %v4float %uint_0
stack_ptr_8 = OpFunctionParameter %uint
%generated_fini_0 = OpLabel
%111 = OpLoad %v4float %outColor_physical_5
OpStore %outColor %111
OpReturnValue %stack_ptr_8
OpFunctionEnd
%main = OpFunction %void None %3
%main_0 = OpLabel
%memory_Private = OpVariable %_ptr_Function__arr_uint_ulong_1 Function
%outColor_physical = OpVariable %_ptr_Function_v4float Function
are initially in the generic address space
yes I see this in the IR dump
gl_WorkGroupSize = @Exported("gl_WorkGroupSize") GlobalVariable(type: %96, address_space: Generic, is_ref: false, init: null)
subgroup_id = @Exported("subgroup_id") GlobalVariable(type: u32, address_space: Generic, is_ref: false, init: null)
... etc
well basically the fix is if (value && !shd_lookup_annotation(decl, "IO"))
except that won't quite work
because annotations are processed after the globals are created
yes I saw that also
since the big annotations value references the global and it's converted first (which requires converting the globals)
one trick is to use extern on the declaration to avoid giving it a default value
but it also gets deleted if unused
oh I didn't notice that it excluded for defaults
for extern
oh this is making sense, right it was trying to assign a Composite as a default value for the array
yeah, it wants to zero-init since that's the LLVM IR semantics
but you can't zero-init a descriptor, that doesn't have a default value
yes
diff --git a/src/frontend/llvm/l2s_type.c b/src/frontend/llvm/l2s_type.c
index f9cc44d2..7a92dedd 100644
--- a/src/frontend/llvm/l2s_type.c
+++ b/src/frontend/llvm/l2s_type.c
@@ -70,6 +70,8 @@ const Type* l2s_convert_type(Parser* p, LLVMTypeRef t) {
case LLVMArrayTypeKind: {
unsigned length = LLVMGetArrayLength(t);
const Type* elem_t = l2s_convert_type(p, LLVMGetElementType(t));
+ if (!shd_is_physical_data_type(elem_t) && length == 0)
+ return arr_type(a, (ArrType) { .element_type = elem_t });
return arr_type(a, (ArrType) { .element_type = elem_t, .size = shd_uint32_literal(a, length)});
}
case LLVMPointerTypeKind: {
this is a bit of a hack, but whenever you have non-physical data (opaque) and the length is zero, it should be safe to interpret an array of size zero as a runtime array
this matters because runtime arrays are opaque themselves
[0 * i32] is basically unit, () but [0 * Image] is an opaque runtime array of descriptors
with that change, the following example compiles for me:
#include <stdint.h>
#include <shady.h>
extern descriptor_set(0) descriptor_binding(1) uniform_constant sampler2D texSampler[];
location(0) input native_vec3 fragColor;
location(1) input native_vec2 fragTexCoord;
location(0) output native_vec4 outColor;
fragment_shader void main() {
outColor = texture2D(texSampler[subgroup_local_id], fragTexCoord) * (native_vec4) { fragColor.x * 2.5f, fragColor.y * 2.5f, fragColor.z * 2.5f, 1.0f };
}
note however that this is lacking nonUniform annotations and won't work right
I think the length will be 1
not 0
@texSamplers = global [1 x ptr addrspace(4097)] zeroinitializer, align 8, !dbg !34
I have to step out for a second, I will be back in like 30, thank you for your help!
it's zero for me
hrm maybe it's a clang version? clang version 20.1.8
clang version 20.1.8
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
nope I have the same
oh your source builds for me
ah I see you're not using extern
%4 = getelementptr inbounds nuw [0 x ptr addrspace(4097)], ptr @texSampler, i64 0, i64 %3, !dbg !53
for your shader
#include <stdint.h>
#include <shady.h>
descriptor_set(0) descriptor_binding(1) uniform_constant sampler2D texSamplers[];
location(0) input native_vec3 fragColor;
location(1) input native_vec2 fragTexCoord;
location(0) output native_vec4 outColor;
typedef struct {
uint32_t image_index;
} pc_t;
push_constant pc_t pc;
fragment_shader void main() {
uint32_t tex_idx = pc.image_index;
outColor = texture2D(texSamplers[tex_idx], fragTexCoord) * (native_vec4) { fragColor.x * 2.5f, fragColor.y * 2.5f, fragColor.z * 2.5f, 1.0f };
}
oh
I just took the test that was there already
#include <stdint.h>
#include <shady.h>
descriptor_set(0) descriptor_binding(1) uniform_constant sampler2D texSampler;
location(0) input native_vec3 fragColor;
location(1) input native_vec2 fragTexCoord;
location(0) output native_vec4 outColor;
fragment_shader void main() {
outColor = texture2D(texSampler, fragTexCoord) * (native_vec4) { fragColor.x * 2.5f, fragColor.y * 2.5f, fragColor.z * 2.5f, 1.0f };
}
and made it an array
you need to use extern otherwise LLVM defaults to making it one-sized
tbh the uniform_constant annotation should imply extern
but the zero-init is kind of useful in that it makes the global stay even if unused, and ideally I can just post-process the initializer out
Good to know thank you
In my case it would not get to post process
It would error when setting the default
just use extern for now
Yes I will try that
I will need to emit the capabilities
I will try and figure that out
you don't need any more until you tackle nonUniform
or hum, actually unsure
i have some val issues on my end
ok there's some dumb bs going on I need to tackle
The extension docs I think require the RuntimeDescriptorArrayCapability it’s in the dis spirv you posted from godbolt
actually extern breaks more than I remember I think
ok I pushed a few fixes
and replied to the issue
thank you! I will give it a try after work tomorrow.
the spirv documentation about what feature requires a capability is a bit less obvious than how extensions are documented in vulkan
RuntimeDescriptorArray is for Uses arrays of resources which are sized at run-time. whereas OpTypeRuntimeArray is Declare a new run-time array type. Its length is not known at compile time.
the nuance being an array of resources sizes not known until runtime vs arrays whose length aren't known until runtime
they're both arrays whose length is not known until runtime
I mean it's fine
I just like the VUs in the vulkan spec a lot
although not everything is in a VU
like multiview not working with shader objects
not documented at all in the spec, just in the proposal
it's just UB if you try and do that
the spirv stuff has a lot fewer eyes on it and a lot of the spec language is very fuzzy
unfortunately the impls (and the validator/tooling) often are the documentation
hence the value of shady, even if it didn't have to do crazy lowering, there's a bunch of jank that needs to be dealt with
are you making headway with that, in general?
driving improvement to the spir-v spec, I mean
not really
i would basically need to dedicate a full day per week to khr interactions to make meaningful progress
i can't justify that kind of time investment so I stick to just firewalling spir-v nonsense from my goals
the messaging arround slang from khronos is uh not ideal as far as i'm concerned, that's all i'll say
are they supportive of slang?
slang is a khronos-backed project now, and i'm not yet convinced this isn't a regression to glsl ecosystem-wise
it's still effectively a transpiler right?
or does it go directly from slang to spir-v now
i mean the issue is presenting slang as an official language
to the exclusion of others
that's been a concerned I voiced since it was first raised
if a given language becomes "blessed" spir-v loses relevance and we revert back to a de-facto standard arround one impl
that's true
do you think there hasn't been as much work around compilers targeting spir-v as khronos expected?
I certainly thought there'd be way more tools around by now, but maybe I just don't know about them
i can't really speak to that (idk)
there's definitely a lot of work going on behind the scenes within LLVM and such
but it definitely took a while to get off the ground
i hope i helped stirr that particular pot but honestly idk how much vcc did there
nice exciting
I remember seeing the llvm-spirv repo on khronos's github forever ago but it was already archived then
I saw HLSL supportin clang 22
so I just assumed they gave up because they ran into the roadblocks you did
so why not SPIR-V
but I guess that's a frontend?
yeah it comiles HLSL
Due to HLSL being a GPU targeted language HLSL is a Single Program Multiple Data (SPMD) language relying on the implicit parallelism provided by GPU hardware. Some language features in HLSL enable programmers to take advantage of the parallel nature of GPUs in a hardware abstracted language.
HLSL also prohibits some features of C and C++ which can have catastrophic performance or are not widely supportable on GPU hardware or drivers. As an example, register spilling is often excessively expensive on GPUs, so HLSL requires all functions to be inlined during code generation, and does not support a runtime calling convention.
I guess I don't know enough about DXC what this actually does
As an example, register spilling is often excessively expensive on GPUs, so HLSL requires all functions to be inlined during code generation, and does not support a runtime calling convention.
(this is wrong, idiotic and counterproductive, see also every performant gpu rt stack)
yeah this is received knowledge of some industry "experts"
the reality is that this is a half-truth propped up by bad compiler implementations and technical debt
most critically, it ignores the issue of register pressure and larger programs where inlining has negative programmability, compilation time and instruction cache implications
not the spirv-guide, the spirv white paper:
However, key graphical-shader programming models do not expose pointers, do not support general arithmetic on them, and do
not want them in registers at the machine level. This programming model is also fully supported.
which I think referring to logical pointer in spirv
Logical addressing model I mean
your paper addresses it
it was interesting reading your paper after reading the spirv white paper
the people who write statements like that are uh ... not very forward looking (to say nothing of actively ignoring existing practices)
after reading the paper I better understood the lowering code
I wish I was better at sussing out intent and all the whys and whats just by looking at code
i don't think that's super realistic when talking about large systems like this
and i understand you have no previous background in compilers either so you're doing really well afaic
thanks, going to keep trying to learn more so I can contribute and unblock myself
I learned a lot from those commits from yesterday
hey my compute shader built and validate on first try with main and my image2D branch changes 🎉
%49 = OpLabel
%263 = OpLoad %uint %61
%264 = OpUConvert %ulong %263
%265 = OpAccessChain %_ptr_UniformConstant_65 %outputImage %264
%266 = OpLoad %65 %265
%286 = OpFunctionCall %_struct_125 %generated_Load_Invocation_vector_type_442_Private %_125_physical_1 %pc_physical_1 %memory_Private_1 %_257_physical_1 %stack_ptr_17 %148
%287 = OpCompositeExtract %v2uint %286 1
OpImageWrite %266 %287 %291
OpStore %cf_depth %int_0
OpBranch %51
%50 = OpLabel
this is the descriptor index:
%265 = OpAccessChain %_ptr_UniformConstant_65 %outputImage %264
this is the image write:
OpImageWrite %266 %287 %291
ok ok I figured out what's going
%69 = OpAccessChain %_ptr_UniformConstant_65 %outputImage %67
%70 = OpLoad %65 %69
%72 = OpIAdd %uint %stack_ptr_1 %uint_24
%74 = OpIAdd %uint %stack_ptr_1 %uint_16
%75 = OpUConvert %ulong %74
%77 = OpImageQuerySize %v2uint %70
%stack_ptr_3 = OpFunctionCall %uint %generated_Store_Invocation_vector_type_147_Private %_125_physical_1 %pc_physical_1 %memory_Private_1 %_257_physical_1 %72 %75 %77
%98 = OpLoad %v3uint %_125_physical_1
%99 = OpLoad %v3uint %_125_physical_1
this is me querying the image dimensions of my draw image
here I think it is deciding whether to write or not
%212 = OpCompositeExtract %uint %211 0
%213 = OpBitcast %int %212
%215 = OpSGreaterThanEqual %bool %210 %213
%stack_ptr_13 = OpCompositeExtract %uint %206 0
OpSelectionMerge %53 None
OpBranchConditional %215 %52 %48
%48 = OpLabel
this is my compute shader, it just writes magenta right now
#include "gpu.h" // IWYU pragma: keep
#include <shady.h>
typedef __attribute__((address_space(0x1101))) struct __shady_builtin_image2D *image2D;
void imageStore(image2D img,
native_ivec2 coord,
native_vec4 data) __asm__("shady::impure_op::spirv.core::99::Invocation");
native_ivec2 imageSize(image2D img) __asm__("shady::pure_op::spirv.core::104::Invocation");
descriptor_set(0) descriptor_binding(2) uniform_constant image2D outputImage[0];
typedef struct {
f32 t;
uint32_t draw_image_index;
} gpu_rast_pc_t;
push_constant gpu_rast_pc_t pc;
local_size(16, 16, 1) compute_shader void main() {
u32 image_index = pc.draw_image_index;
native_ivec2 image_size = imageSize(outputImage[image_index]);
native_ivec2 coords = (native_ivec2){(int)gl_GlobalInvocationID.x, (int)gl_GlobalInvocationID.y};
if (coords.x >= image_size.x || coords.y >= image_size.y) {
return;
}
native_vec4 magenta = (native_vec4){1.0f, 0.0f, 1.0f, 1.0f};
imageStore(outputImage[pc.draw_image_index], coords, magenta);
}
should work?
I didn't know what SSA was until this weekend, that's a thing I learned
alright gonna see if I can get a pink window
magenta I mean
no trouble loading shader into vk
hey works
it's using the draw image extent so it doesn't actually take the window extent into consideration for the color
ok now it's going by swapchain size instead of draw image size and I added a grid
nice
I'm tired as I stayed up really late on this last night I'm going to exercise and go to sleep
tomorrow I'll rewrite my cpu rasteriser so I can also run that code on the gpu
I was planning on code reuse between cpu and gpu for software rasterization but I changed my mind
The cpu stuff is going to be very specialized
after I got a triangle via gpu compute, I will add a RT pipeline
just a single triangle in an acceleration structure ? https://docs.vulkan.org/spec/latest/chapters/accelstructures.html
then I will convert my vertex shader to a mesh shader
and then I should be done with my basic pipelines setup and can actually start doing interesting things
interesting graphical things
I think I may delete my shader object code, I don't really know what purpose it serves anymore
was a mistake to ever use it
hrm
idk
I'll leave it and just not use it
alright feels good to be working something graphical
I just realized
I can import my binding constants into my shader!
oh that doesn't work
@.str.14 = private unnamed_addr constant [60 x i8] c"shady::descriptor_binding::descriptor_storage_image_binding\00", section "llvm.metadata"
it didn't get preprocessed
#define descriptor_storage_image_binding 2

I'm dumb
#define descriptor_set(i) __attribute__((annotate("shady::descriptor_set::"#i)))
#define descriptor_binding(i) __attribute__((annotate("shady::descriptor_binding::"#i)))
ok this worked
@.str.14 = private unnamed_addr constant [29 x i8] c"shady::descriptor_binding::2\00", section "llvm.metadata"
#define STRINGIFY(x) #x
#define pl_descriptor_binding(i) __attribute__((annotate("shady::descriptor_binding::" STRINGIFY(i))))
descriptor_set(0) pl_descriptor_binding(descriptor_storage_image_binding) uniform_constant image2D outputImage[0];
man
I'm going to llvm dump all my code from now on
what an amazing debug tool
next I am stumped
I'm going to go look at the llvm IR
I didn't even think about what I was doing, I just immediately went to go look at the IR, just was in the habit of doing that over the last few days
I should replace my bat scripts with zig's build system tbh 
I can add the shader compilation as compile steps
nice Sacha's vulkan example repo has a ton of RT examples
ah
handmade cities conference has been canceled
it's a meetup now
after I bought a ticket and arranged for travel
I guess it's just a vacation in Seattle now
:(
I'll go to the meetup
I like Abner, and I wish him the best and I'll get to see him and some other people in December, but woud like to go to conferences that are similar
I'm going to Vulkanized in February
that's gonna be a long flight
oh google showed me the 2025 location, which was in the UK
it's like 2 miles from where I used to live
which is actually disappointing, i like going to new cities
and I know this one really well and don't actually like it
but the conference will be fun
I don't mind traveling a long distance for a good conference
not too big into handmade hero but the point he's making is kinda sad
conferences in general are dying
do you go to conferences?
The RT struct spam required for a basic pipeline seems nearly equal in size to initial vk init lol
ok that is a fair bit of descriptor spam
no I don't get sent to any of the ones in my industry because my company is wicked stingy
I think that contributes
I would pay to not go to conferences relevant to my industry: IGA (Identity Governance and Administration).
Pretty sure our company is on track to take it all over anyway
i can't see how conferences are dying tbh, we have the graphics programming conference in europe now, and Vulkanised is adding a shading language symposium as an extra event this year
from what I can see, the toxicity of the handmade community did the guy in (to the point he's using the website of a conference people paid for as a soapbox to tell his side)
the whole thing is pretty sad
will you be at Vulkanized?
nice
i can make a joke having a single user in the audience 😆
ha!
I can hand out pamphlets about vcc, maybe find more people to contribute so they build features I want
err i don't know if khronos staff would be happy with me if I start proselytizing like that
but i can print a big batch of stickers
I'm not much of a sales person anyway :P
in 2027 it'll be back in europe and hopefully by then I can show something really exciting i'm working on
nice
i'm aiming for siggraph 2026, whose deadline is right after vulkanised
are you not in Europe now?
i am
I assumed given your hours
and so will Vulkanised most likely
yeah I hope to just keep going to those
it goes back and forth
wherever it is
i prefer the years where it's within driving distance
i visited the UK for the first time this year
drove on a drain that went under the sea 🙂
well this year it's like 2 miles from where I used to live
and then had to adjust to driving on the other side of the road
oh that's gotta be tough
nope it became natural in less than an hour
I find driving very hard in Europe even when it is on the right, except germany
i drove all the way to cambridge on the highway like I did it since I got my license
I have trouble with the round abouts where it's all mayhem
i take them over sitting at red lights
i had a rental fiat 500 at vulkanised 2024, and apparently in the US the base model has what we'd consider a "sport" engine 
heh
so this little thing was flying in between two minute waits at the lights
oh I guess I just don't see them very often
well never
I can't remember seeing one
well they can easily hide two behind one bro-dozer
but I don't know I don't drive
are you even american
that's entirely fair, i dislike driving in the us and large cities too
my wife owns a suv and sometimes she asks me to drive it
like to help her with something
Shady: unimplemented LLVM instruction %99 = fpext float %98 to double, !dbg !177 (opcode=38)
I'm doing something funny somewhere
you have a double-precision float literal
I found a couple
there's no double precision in graphics land (no opengl 4.0 doesn't count)
yes I remember reading that in that in the spec
in the spirv spec
these were all mistakes anyway these should all be f32's
shady can actually deal with f64 internally just fine so maybe this can be allowed
ok found them all
but then you'd need diagnostics on the backend side
i think there's some way to override clang's double precision bit width
let's gooo 
that's using the same math my cpu rasterizer uses
ok going to work on the RT pipeline tomorrow then
very cool
i had my students do software raster as part of my vk tutorial but they used glsl
i simply don't think it's cool to force unfinished compilers on students
#ifndef __SHADY__
#include <math.h>
#else
f32 powf(f32, f32) __asm__("shady::prim_op::pow");
f32 fabsf(f32) __asm__("shady::prim_op::abs");
f32 floorf(f32) __asm__("shady::prim_op::floor");
f32 fmodf(f32, f32) __asm__("shady::prim_op::mod");
f32 sinf(f32) __asm__("shady::prim_op::sin");
#endif
not a lot going on in my math yet
I got a couple of the #ifndef __SHADY__ in my code now
so how is shady to use right now? is it overtly missing anything?
it looks like you PRed in image2D right
I have a PR where I added image types and ops, that was pretty trivial. There are likely features that need to be added that you would expect to be available like non uniform indexing
As for how is shady to use? I think it is a really amazing project and a lot fun to use
I am happy that I gave it a try and want to keep using it
awesome
does that mean indexing is uniform by default like in the conventional shading languages?
whether indexing is uniform or non-uniform by default is one of those things I hear thrown around as shader language design opinions and missteps
using the nonuniform index operation requires additional SPIRV instructions, I don't know what the consequences are of varying the index in a single dispatch/draw without specifying it is nonuniform, maybe it is UB?
I just sort of understand that I am supposed to use that function when I know the index will be nonuniform
I would imagine if you don't specify nonuniform index the other shader languages will not apply nonuniform, could check what GLSL does on godbolt
I would expect it not to add it
because for example maybe I have a single atlas texture
and I know the index will always be uniform for all the shader executions for a single draw
I guess I wouldn't need nonuniform in that case and wouldn't expect it to add it
you have to add nonuniform explicitly in GLSL, that's the thing
yeah
that's what makes the conventional shader language way weird and why I am surprised gob didn't opine on it and change it
because conventionally you know that certain inputs have a level of uniformity or non-uniformity per the spec
and logically speaking, you'd promote variables to uniformity given assumptions you the programmer know but the compiler doesn't
curious what @dark saffron thinks
shady actually tracks uniformity
this is a matter of emitting the decoration for the sampling instructions
so do you do that dynamically?
oh nice
well, the decorations aren't getting emitted yet
because I don't want to spam every value with it
i suspect drivers and tooling would shit themselves if I did
so currently the information is unused
but it's in the IR
what information is available to know whether the index should be nonuniform without the user having to annotate?
so if I had the shady equivalent of the GLSL
layout (location = 0) flat int textureID;
texture(uv, sampler2D(textures[textureID], sampler));```
would it safely give me the nonuniformEXT around the textureID or is that part of what's not emitted
I bet you could trace "dangerous and need annotation" paths at a minimum
I don't think nonuniform is being emitted at all right now
hrm
look at an ir dump
every value is qualified with a scope
that scope is the scope where the value is uniform at
I will take a look
ok I guess a question I have is
sure some inputs could be non-uniform
but the user can know they are uniform
and that is the reason why the user specifies nonuniform index?
the inverse of that
it's assumed they're uniform by default but the user can know otherwise
right I understand that
I am responding to the idea of automatically deciding to use nonuniform based on the level of uniformity of the input
when the user knows that their input would be uniform even though the input could be non-uniform as per the spec
in that case would it be correct to automatically assume nonuniform and apply the SPIRV for it
yeah
why?
I don't want to use the descriptors API in vulkan so all my calls are bindless, even when they don't vary and it wouldn't be painful to bind
if vk had a sane descriptors api
the actual weird part about shader languages is that they take the position that requires more assumptions by default, and the one that requires less assumptions is an extension
hrm
what information can I supply that would help in this context?
explicitly annotating the uniformity of the input seems sufficient
oh you mean on the input itself
I get it
yeah that's much nicer
so you still have to supply an annotation in the shader
it's just done differently
in a perfect world, all variables would be assumed non-uniform by default and you'd mark them like @uniform uint textureID;
yeah at the end of the day, it's something you the programmer know but the compiler can't
yeah that's fine
so you have to specify it somehow
yeah
I guess which should be default depends on what your used to, if you're doing a lot of PBR and have tons of maps to sample
you probably perfer nonuniform
if you're writing to a single storage image in the GPU you may be wondering why you have to specify uniform
from the perspective of convenience sure, but from the perspective of what's a safe assumption in the absence of knowledge about your data, nonuniform is a safer bet for producing correct code
ah yes, I agree
I guess this is for historical reasons
but no reason why the shading language could not change the behavior
the compiler
yeah and in practice it barely ever bites you
the only place I've seen it actually matter was that above example of passing a textureID dynamically for bindless image sampling
and only on AMD
I imagine that I have never written graphics code that would work on AMD as I have never had an AMD GPU and have never tested any of my code on AMD, although deccer ran my voxel engine and it seemed to work for him
so maybe I have
I ignored the warnings of using nonuniformEXT for a long time until my friend ran my engine on his PC and got weird artifacts that I initially thought were some kind of sync bug
but syncval told me nothing so I was scratching my head until I realized it was consistent in UI draws where textures changed a lot
I didn't get any sync VVL for not having an image barrier between the compute dispatch that writes to a storage image on the GPU and blitting the storage image to the swapchain
I added one anyway
because it should have a barrier there?
I'm not sure I trust not getting a VVL as a sign that the sync is correct tbh
I do trust that when I do get a VVL that I have a mistake
but I think there are false negatives
yeah though in this case ti was correct, and the syncval wasn't the whole story
I was combing my barriers in renderdoc and re-checking all my semaphores between frames
losing my mind over being too lazy to use a nonuniformEXT I "knew" I needed (like 2 years ago when I read it in #vulkan)
yeah that sounds rough
I had a similar experience with trying to use multiview with shader objects
only in that case I didn't know that multiview didn't work with SO
and nobody else did either
everyone telling me I was doing it wrong
and that's why it didn't work
"losing my mind" is how I felt
SO is such a clusterfuck tbh
over promised, under delievered, and the spec has gaps
anyway tangent
yeah especially with GPU stuff when you go off the beaten path just enough you can really tell
you're out there all alone
one thing I might want to do in shady is maybe add profiling and find opportunities to find ways to improve the compile speed if that would be valuable
maybe there's some low hanging fruit
I think to start with RT, since I don’t know anything about it I will start by building and running Sascha’s example, and then try to replace the GLSL shader for it witth a vcc shader
And get the Sascha example working
If I do it all from my project I wouldn’t know how to distinguish between a problem in my own code and a compiler problem
It might be a fun and useful exercise to fork Sascha’s repo and add a vcc shader for all of the samples
Not to contribute back to the original project
Just as an exercise
I just kind of want to work on my own thing though
ok so Sacha's raytracingbasic shader is actually three shaders, a ray gen shader, a hit shader and a miss shader
these are the things new to me
GL_EXT_ray_tracing
rayPayloadInEXT
#extension GL_EXT_ray_tracing : enable
#extension GL_EXT_nonuniform_qualifier : enable
layout(location = 0) rayPayloadInEXT vec3 hitValue;
hitAttributeEXT vec2 attribs;
#extension GL_EXT_ray_tracing : enable
#extension GL_EXT_shader_image_load_formatted : enable
layout(location = 0) rayPayloadEXT vec3 hitValue;
const vec2 pixelCenter = vec2(gl_LaunchIDEXT.xy) + vec2(0.5);
const vec2 inUV = pixelCenter/vec2(gl_LaunchSizeEXT.xy);
https://godbolt.org/z/4se9rc6dM rgen
https://godbolt.org/z/1voePdn8s rchit
https://godbolt.org/z/arocv55x7 rmiss
// This will be consumed as a .glsl file and needs the stage and target profile. Example options:
// -S comp --target-env vulkan1.3
#version 460
#extension GL_EXT_ray_tracing : enable
#extension GL_EXT_shader_image_load_formatted : enable
layout(binding = 0, set = 0) uniform accelerationStructureEXT topLevelAS;
layout(binding = 1, set = 0) unif...
#version 460
#extension GL_EXT_ray_tracing : enable
#extension GL_EXT_nonuniform_qualifier : enable
layout(location = 0) rayPayloadInEXT vec3 hitValue;
hitAttributeEXT vec2 attribs;
void main()
{
const vec3 barycentricCoords = vec3(1.0f - attribs.x - attribs.y, attribs.x, attribs.y);
hitValue = barycentricCoords;
}
starting with the simple rmiss is probably the easiest
#define ray_generation_shader __attribute__((annotate("shady::entry_point::RayGeneration")))
ok I see this
looking at the available entry points I see mesh is missing so I'll probably have to to add that after I get through getting RT working
case SpvStorageClassRayPayloadKHR:
case SpvStorageClassHitAttributeKHR:
case SpvStorageClassIncomingRayPayloadKHR:
these seem to return an error
shd_error("s2s: Unsupported storage class: %d\n", class);
accelerationStructureEXT is just a descriptor
%86 = OpLoad %83 %topLevelAS
%88 = OpLoad %v4float %origin
%89 = OpVectorShuffle %v3float %88 %88 0 1 2
%90 = OpLoad %float %tmin
%91 = OpLoad %v4float %direction
%92 = OpVectorShuffle %v3float %91 %91 0 1 2
%93 = OpLoad %float %tmax
OpTraceRayKHR %86 %uint_1 %uint_255 %uint_0 %uint_0 %uint_0 %89 %90 %92 %93 %hitValue
it's a descriptor passed into OpTraceRayKHR
nfi how gl_RayFlagsOpaqueEXT ends up in the SPIRV
ok these are ray flags and being subsituted or lowered in to the spirv by their unsigned integer value uint_1 here
oh I see ClosestHitKHR and MissKHR are also missing entry points
I think the missing storage classes and entry points are what I see will be the initial challenges
#version 460
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadInEXT vec3 hitValue;
void main()
{
hitValue = vec3(0.0, 0.0, 0.2);
}
this little shader has both problems
it's an entrypoint I don't see in Shady and it has an unsupported storage class
I solve this little thing I might be good?
this is an rmiss shader, I guess you can't tell from the GLSL, you have to specify it as an arg to the compiler or something
but in shady it's just declared as a function attribute explicitly, I prefer that
so now I need to come up with the shady version of this GLSL shader that I think should work
oh this has a convenient GLSL -> SPIR-V mapping section https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_ray_tracing.txt
hitAttributeEXT storage qualifier -> HitAttributeKHR storage class
I don't see HitAttributeKHR in the output spirv at all
I see IncomingRayPayloadKHR
that's a storage class too
the extension doc has:
rayPayloadEXT storage qualifier -> RayPayloadKHR storage class
rayPayloadInEXT storage qualifier -> IncomingRayPayloadKHR storage class
hitAttributeEXT storage qualifier -> HitAttributeKHR storage class
callableDataEXT storage qualifier -> CallableDataKHR storage class
callableDataInEXT storage qualifier -> IncomingCallableDataKHR storage class
idk what the glsl compiler is doing here, but it works I guess
it emitted a different storage class than what was specified in the GLSL
oh I am just reading it wrong
I was confused by the variable name layout(location = 0) rayPayloadInEXT vec3 hitValue;
did you read about ray queries yet?
no
I'm just trying to get Sascha's example to work with shady without knowing how any of it works yet
alright
just wanted to let you know that ray queries are a lot simpler than ray pipelines because they can go in any shader. you just need an acceleration structure
so they are a good starting point imo
I see, well the reason I went with this
is because there's a raygen shader support already in Shady
can I do ray queries in in a raygen shader?
oh interesting
yeah you can do ray queries anywhere
I did them in compute shaders in my other renderer
is there any reason to use raygen and rayhit and raymiss?
are these like the v1 of RT
and then they just came up with an easier thing?
ray tracing pipelines are more flexible in the sense that they allow you to call other shaders
so you can have your per-material shaders and they'll be called automatically when a ray hits them
that aspect probably makes more sense for big engines with lots of shader permutations
oh I see
that's why that ray tracing doc I read about brought up pipeline_library
maybe
I didn't quite follow
the other benefit of ray tracing pipelines is that they can be more efficient
nvidia has the thingy in their newer GPUs that lets you reorder threads in the raygen shader for better execution coherency
so does intel
oh fancy
thank you for the helpful information!
once I have the gpu, cpu, mesh, and rt pipelines set up in my render graph I think I can start learning how to do fancy renderings
Multi-Bounce with Russian Roulette
why they call it that, that's horrible
lmao I can see how it sounds bad to the uninitiated
"russian roulette" is a term used in path tracing to refer to the stochastic termination of rays
stochastic is a fancy word for "random"?
it's a way to make sure rays don't keep bouncing forever (bad for perf) while keeping the render unbiased (meaning it converges to the correct result)
I don't understand why RT needs randomness
don't you want deterministic paths for a ray
if you always terminate rays at the nth bounce then your render will be biased because you lose the light that would've been contributed by further bounces
oh it's because you can't bounce all your rays to that further distance due to perf
so you bounce some sample of them
no because you want to make sure you sample the whole domain well
imagine trying to solve an integral that can't be solved analytically (so you have to solve it numerically)
there are different strategies, like taking samples at discrete intervals (riemann sum)
I've read through PBR and I understand the microfacet & scattering light in roughness
is it similar to that?
the monte carlo strategy is to take random samples which can sample a greater amount of the domain, especially if you vary them over time
I suppose they are related, but you can have a path tracer that has really simple materials. that's what I'm doing in my game. everything is just a simple perfect diffuse reflector
I see
oh each place the light bounces, it contributes to some of the color that you see?
and the domain is all of the bounces
I can read through this
the domain is all the directions the ray can go
which is a hemisphere oriented with the surface normal if you view a single ray
yeah
in backwards path tracing (tracing rays from the eye), you have a "throughput" variable that starts at 1, then you multiply it by every surface the ray hits before it hits a light source
if your ray hits a green surface, your throughput gets multiplied by the color green, which makes any light it subsequently hits turn green
that makes sense
nice pic of the color bleeding effect from a random paper
gorgeous
global illumination is awesome and you'll be able to achieve it with ray tracing for sure
I hope so
the simplest possible path tracer is like 30 loc assuming you have a TraceRay function of some sort
yes I hope to get all this working, just starting out with getting triangles and then will start making those renders more capable
#include <shady.h>
#define incoming_ray_payload __attribute__((annotate("shady::io::5342")))
location(0) output incoming_ray_payload native_vec3 hitValue;
ray_miss_shader void main() {
hitValue = (native_vec3){0.f, 0.f, 0.2f};
}
I think this is the shader I need for the rmiss replacement
ray_miss_shader is the entry point function annotation I'll need to capture and then the incoming_ray_payload is an annotation on the variable for its storage class
since there's already a ray gen shader I think I can follow that example
I think the storage class for these are just
AsGlobal
I don't know actually
no that's wrong
that shouldn't be both output incoming_ray_payload
it should not be output
output is its own storage class
#include <shady.h>
#define incoming_ray_payload __attribute__((annotate("shady::io::5342")))
location(0) incoming_ray_payload native_vec3 hitValue;
ray_miss_shader void main() {
hitValue = (native_vec3){0.f, 0.f, 0.2f};
}
ok ok ok I think I get it
C:\Users\Bjorn\projects\code\shady\build>ctest -C Debug -R vcc_basic_rmiss --output-on-failure
Test project C:/Users/Bjorn/projects/code/shady/build
Start 49: vcc_basic_rmiss
1/1 Test #49: vcc_basic_rmiss ..................***Failed 0.18 sec
'C:/Users/Bjorn/projects/code/shady/build/bin/Debug/vcc.exe' 'C:/Users/Bjorn/projects/code/shady/test/vcc/basic.rmiss.c' '--target' 'none' '--vcc-include-path' 'C:/Users/Bjorn/projects/code/shady/build/share/vcc/include/' '--entry-point' 'main' '--execution-model' 'Fragment' '-o' 'C:/Users/Bjorn/projects/code/shady/build/vcc_basic_rmiss.spv'
clang version 20.1.8
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
built command: clang -c -emit-llvm -S -g -O0 -ffreestanding -Wno-main-return-type -isystem"C:/Users/Bjorn/projects/code/shady/build/share/vcc/include/" -D__SHADY__=1 --target=spirv64-unknown-unknown -o LMS8veFfXGq5TDK8f43iPt8IIWyIP4u9 "C:/Users/Bjorn/projects/code/shady/test/vcc/basic.rmiss.c"
clang: warning: argument unused during compilation: '-c' [-Wunused-command-line-argument]
C:/Users/Bjorn/projects/code/shady/test/vcc/basic.rmiss.c:7:1: error: unknown type name 'ray_miss_shader'
7 | ray_miss_shader void main() {
| ^
1 error generated.
Clang returned 1 and replied:
file=C:/Users/Bjorn/projects/code/shady/test/vcc/basic.rmiss.c tmpfile=LMS8veFfXGq5TDK8f43iPt8IIWyIP4u9
CMake Error at C:/Users/Bjorn/projects/code/shady/test/unit_test.cmake:5 (execute_process):
execute_process failed command indexes:
1: "Child return code: 18"
now just have to make the test pass
and then make spirv-val pass
and then make Sascha's example actually work with the spirv
let's try
#include <shady.h>
#define ray_miss_shader __attribute__((annotate("shady::entry_point::RayMiss")))
#define incoming_ray_payload __attribute__((annotate("shady::io::5342")))
location(0) incoming_ray_payload native_vec3 hitValue;
ray_miss_shader void main() {
hitValue = (native_vec3){0.f, 0.f, 0.2f};
}
Warning: unrecognised address space 5342Error at C:\Users\Bjorn\projects\code\shady\src\shady\passes\abi\specialize_entry_point.c:50: Unknown entry point type: RayMiss
that's better
oh the address spaces have descriptions in Shady IR"s grammar.json
ok that address space is wrong, these seem like hard coded values as llvm-id in the grammar
the Shady IR is independent from spirv that's right and this address space is a Shady IR node so that makes sense
#include <shady.h>
#define ray_miss_shader __attribute__((annotate("shady::entry_point::RayMiss")))
#define incoming_ray_payload __attribute__((annotate("shady::io::399")))
location(0) incoming_ray_payload native_vec3 hitValue;
ray_miss_shader void main() {
hitValue = (native_vec3){0.f, 0.f, 0.2f};
}
Error at C:\Users\Bjorn\projects\code\shady\src\shady\passes\abi\specialize_entry_point.c:50: Unknown entry point type: RayMiss
ok that warning about unrecognized address space is gone, progress
I think that Ray Gen entry point in shady.h might be a meme
because I think it would also result in an unknown entry point error
oh I'm looking at the spirv frontend
that's wrong
ok this is interesting
#define EXECUTION_MODELS(EM) \
EM(Compute ) \
EM(Fragment ) \
EM(Vertex ) \
EM(RayGeneration) \
EM(Callable ) \
I need to add it here I think
progress
new error
Assertion failed: false, file C:\Users\Bjorn\projects\code\shady\src\shady\passes\io\promote_io_variables.c, line 46
oh this is address space related
more progress:
Cannot emit address space IncomingRayPayload.
oh ok so these are mapped directly to the storage classes defined in the SPIRV-Headers spirv.h file
noice, more progress, new error
error: line 6: Operand 1 of EntryPoint requires one of these capabilities: Kernel
OpEntryPoint Kernel %main "main" %hitValue
actually getting to the emit spirv point
error: line 2: Capability Kernel is not allowed by Vulkan 1.3 specification (or requires extension)
OpCapability Kernel
I am getting mixed messages
oh
I need an extension
I will try this
OpExtension "SPV_KHR_ray_tracing"
I think this error is lies
I will consult the spec
oh
it was trying to emit an OpEntryPoint Kernel wtf
oh this is meta
#define EM(name) ShdExecutionModel##name,
EXECUTION_MODELS(EM)
macro conatenation is still new to me
new error
rror: line 115: OpEntryPoint Entry Point <id> '2[%main]'s callgraph contains function <id> '12[%generated_init]', which cannot be used with the current execution model:
[VUID-StandaloneSpirv-IncomingRayPayloadKHR-04699] IncomingRayPayloadKHR Storage Class is limited to AnyHitKHR, ClosestHitKHR, and MissKHR execution model
%generated_init = OpFunction %uint None %13
It is trying to use None as the execution model
alright I'm at the point I need to start debug printing and looking at the IR
I will do that tomorrow
hitValue
%67 = ptr(Generic, %43): CrossDevice %67 = @Exported("hitValue") @Location(0) @IO(15) GlobalVariable(type: %43, address_space: Generic, is_ref: false, init: %81)
%86: CrossDevice %66 = BitCast(type: %66, src: hitValue)
main = @EntryPoint("RayMiss") @Exported("main") @Name("main") Function(params: [], return_types: [], body: {
%41 = AbsMem(abs: main)
%46: Invocation %44 = StackAlloc(mem: %41, type: %43)
%55: Invocation %48 = BitCast(type: %48, src: %46)
%64 = Store(mem: %46, ptr: %55, value: %62)
%65: Invocation %47 = Load(mem: %64, ptr: %55)
%91: Invocation %43 = PrimOp(op: shuffle, operands: [%65, undef[%47], 0, 1, 2])
%93: Invocation %47 = PrimOp(op: shuffle, operands: [%91, undef[%43], 0, 1, 2, 0])
%94 = Store(mem: %65, ptr: %86, value: %93)
Return(mem: %94, args: [])
})
and then after some passes
main = @EntryPoint("RayMiss") @Name("main") @Leaf @Exported("main") Function(params: [], return_types: [], body: {
%18 = AbsMem(abs: main)
%123_physical: Invocation %20 = @Name("%123_physical") LocalAlloc(mem: %18, type: u32)
memory_Private: Invocation %32 = @Name("memory_Private") LocalAlloc(mem: %123_physical, type: %31)
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: memory_Private, callee: generated_init, args: [memory_Private, %123_physical, 0])
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: stack_ptr, callee: main, args: [memory_Private, %123_physical, stack_ptr])
stack_ptr: Invocation u32 = @Name("stack_ptr") Call(mem: stack_ptr, callee: generated_fini, args: [memory_Private, %123_physical, stack_ptr])
Return(mem: stack_ptr, args: [])
})
I think I need to look at how the output storage class is handled so the incoming ray payload storage class variable isn't considered unused when written to
looks like it got nuked
looks like output has a size requirement in the fragment execution model
I'm just going to trace the path the output address space takes with debug logging
it gets nuked after this in l2s.c in the passes section below
shd_log_fmt(DEBUGVV, "Shady module parsed from LLVM:\n");
shd_log_module(DEBUGVV, dirty);
because I see it at this point still
I still see it in the "Parsed program successfully" in shd_driver_compile
oh that's after the frontend is done
so not being nuked in the frontend
hrm
it's not in the IR but I see it in the spirv
weird
oh
there's not a print method for this maybe
ok spirv-val passes
OpEntryPoint MissKHR %main "main" %hitValue %SubgroupLocalInvocationId %SubgroupId %resume_at %scheduler_cursor %actual_subgroup_size %scheduler_vector %next_fn %active_branch
hrm I don't know if this is correct since there's so much output in the spirv
%_ptr_IncomingRayPayloadKHR_v3float = OpTypePointer IncomingRayPayloadKHR %v3float
it looks right
change to my fork of Sascha's examples:
- if ((value != "glsl") && (value != "hlsl") && (value != "slang")) {
+ if ((value != "glsl") && (value != "hlsl") && (value != "slang") && (value != "vcc")) {
it works lol
proof from renderdoc that it used the vcc shader for miss
you can tell because of the massive amount of spirv lol
the glsl output of the spirv in renderdoc for comparison
hmm it can't show the actual source?
oh you probably can't debug ray tracing pipelines
actual c source?
yeah
not sure I understand
yeah I mean it would have to understand pdb debug info to do that?
I have no idea
doesn't shady let you generate debug info
it does, and I can see it in a debugger like visual studio
or rad debugger
well
if I use the c backend I guess?
I have never used it
there is SPIR-V debug info output
that vcc emits
it is a massive amount of debug output
ohhh
I know
you're right
you can see that in renderdoc
I need to pick the other option
spirv-dis output shows that
but renderdoc's own spirv disassembler doesn't
ok!
it works
now to do the same thing for the rhit shader
one down two to go
I may look at the ray query example too since I'm at it? idk depends on how hard the rest of these shaders are
I'm talking about the shaders
I don't think you can inspect those in a normal debugger
does that last screenshot show what you were talking about?
yeah probably
the only debug output in SPIRV IR shaders is OpName and a couple of other things, and those are visible in that last screenshot
but RenderDoc should be able to show the actual source
oh you can I think actually communicate line numbers
and a file path?
wait I am thinking of llvm IR
I'm not sure how RenderDoc would know what the original file the SPIRV was created from
since Sascha's program is only uploading the SPIRV from disk
check the link I just sent
ohh
with glslang you'd use the -gVS flag to generate that stuff
I actually see SpvOpExtInstImport support in Shady's spirv backend
I'd have to look how to emit it
it's not in my SPIRV
then you can see the actual glsl source in RenderDoc, not goofy decompiled stuff
hrm not sure
rereading that document, I think it would require more extensive support than I what I noticed in the backend maybe
ok time to add support for the ClosestHitKHR execution model and RayPayloadKHR storage class
these are all my changes to my own Shady fork with the work I did to get the Miss shader to work https://github.com/btipling/shady/pull/3/files
now I need to come up with the c version of the Sascha hit shader
this is the glsl
#version 460
#extension GL_EXT_ray_tracing : enable
#extension GL_EXT_nonuniform_qualifier : enable
layout(location = 0) rayPayloadInEXT vec3 hitValue;
hitAttributeEXT vec2 attribs;
void main()
{
const vec3 barycentricCoords = vec3(1.0f - attribs.x - attribs.y, attribs.x, attribs.y);
hitValue = barycentricCoords;
}
man if I had just done ray query I'd be done already lol
maybe
maybe that's harder
I think this would be the c version of the shader
#include <shady.h>
#define ray_chit_shader __attribute__((annotate("shady::entry_point::RayCHit")))
#define ray_payload __attribute__((annotate("shady::io::400")))
#define hit_attribute __attribute__((annotate("shady::io::401")))
location(0) ray_payload native_vec3 hitValue;
hit_attribute native_vec2 attribs;
ray_chit_shader void main() {
const native_vec3 barycentricCoords = (native_vec3){1.f - attribs.x - attribs.y, attribs.x, attribs.y};
hitValue = barycentricCoords;
}
I actually have to add two new storage classes, HitAttributeKHR and RayPayloadKHR
after I spend some time on this shader I need to do something with my project, probably move from bat files to a zig build
because last saturday I spent all day learning shady, didn't commit and now I have a big gaping hole in my git history 
shameful
oh that's a friday actually
I should probably support any hit also
ezpz
C:\Users\Bjorn\projects\code\shady\build>ctest -C Debug -R vcc_basic_chit --output-on-failure
Test project C:/Users/Bjorn/projects/code/shady/build
Start 50: vcc_basic_chit
1/1 Test #50: vcc_basic_chit ................... Passed 1.16 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 1.19 sec
C:\Users\Bjorn\projects\code\shady\build>spirv-val output.spv
C:\Users\Bjorn\projects\code\shady\build>
OpEntryPoint ClosestHitKHR %main "main" %attribs %hitValue %SubgroupLocalInvocationId %SubgroupId %resume_at %scheduler_cursor %actual_subgroup_size %scheduler_vector %next_fn %active_branch
%uint_0 = OpConstant %uint 0
%19 = OpTypeFunction %uint %_ptr_Function_uint %_ptr_Function_v2float %_ptr_Function__arr_uint_ulong_1027 %uint
%_ptr_HitAttributeKHR_v2float = OpTypePointer HitAttributeKHR %v2float
%attribs = OpVariable %_ptr_HitAttributeKHR_v2float HitAttributeKHR
%_ptr_RayPayloadKHR_v3float = OpTypePointer RayPayloadKHR %v3float
%hitValue = OpVariable %_ptr_RayPayloadKHR_v3float RayPayloadKHR
alright
let's try it with Sascha's vulkan example
hrm
no errors but doesn't show a triangle now
I bet hit attribute is just zeroed out
something's missing
I wish there was a pipeline state for this raytracing thing
so I could see what the values are that the shader is getting
maybe I try nsight
can't see anything sucks
I'll return a different value based on if hit attribute being 0 or not
that should tell me
maybe I can get vvls via vconfig from the app
aha VVLs
Validation Error: [ VUID-RuntimeSpirv-None-06275 ] | MessageID = 0x1c10ef47
vkCreateShaderModule(): VkPhysicalDeviceShaderSubgroupExtendedTypesFeatures::shaderSubgroupExtendedTypes was not enabled.
The Vulkan spec states: shaderSubgroupExtendedTypes must be enabled for group operations to use 8-bit integer, 16-bit integer, 64-bit integer, 16-bit floating-point, and vectors of these types (https://vulkan.lunarg.com/doc/view/1.4.313.2/windows/antora/spec/latest/appendices/spirvenv.html#VUID-RuntimeSpirv-None-06275)
Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-08740 ] | MessageID = 0x6e224e9
vkCreateShaderModule(): SPIR-V Capability Int64 was declared, but one of the following requirements is required (VkPhysicalDeviceFeatures::shaderInt64).
The Vulkan spec states: If pCode is a pointer to SPIR-V code, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied (https://vulkan.lunarg.com/doc/view/1.4.313.2/windows/antora/spec/latest/chapters/shaders.html#VUID-VkShaderModuleCreateInfo-pCode-08740)
I don't think these are it
the triangle showed fine with the miss
oh
maybe miss wasn't working
and I just couldn't tell
since miss doesn't do anything 
fml
%76 = OpUndef %v3float
%82 = OpCompositeInsert %v3float %81 %76 0
the fuck
that's probably
location(0) ray_payload native_vec3 hitValue;
hrm
maybe it not showing in the IR is a problem after all
idk
I think I need to set up the RT pipeline in my project and do it that way
I'll get the C version of Sascha's RayGen validatingthen I'll add the RT pipeline to my project and handle any of the VVLs
if I run into problems I'll make it work with slang shaders
and then I'll try and figure out why valid shaders aren't working
hrm
have realized the AS is a bit more complex
it looks like the nodes in the grammar.json are supplemented by specific imports via spv_imports.json
it currently has OpTypeSampler": {}, "OpTypeImage": {}, "OpTypeSampledImage"
I would be adding OpTypeAccelerationStructureKHR to it for the AS
I am not sure
it is pretty different from those
in the spv from the GLSL it doesn't make an appearance
it looks just like an opaque descriptor
this might be me being ignorant, but I kind of feel like you don't really need to know anything at all about a descriptor in the SPIR-V
you just know it's a descriptor
maybe on the frontend level you want the user to declare a type
so the compiler can throw an error if you attempt to give an image2D to OpTraceRayKHR
but in the SPIR-V they're just like a void *
completely opaque
the device will know what it is by the descriptor set and binding indices
like it could be you just have a single OpaqueDescriptor_TAG as the node
and just type erase whatever it is away
this kind of all explains to me finally why the VK descriptor API is the way it is
working with Shady has been such a massive amount of intense learning for the last two weeks, it's been great
"filter-name": { "OpTypeSampler": {}, "OpTypeImage": {}, "OpTypeSampledImage": {}, "OpTypeAccelerationStructureKHR": {} },
hrm
it is classless
so running the generator didn't generate much for it
I think the SPIRV just needs to decorate it its binding and descriptor set and declare it as a uniform constant
I guess this would be a new high level address space in l2s_type
this is vestigial and will die fully soon if it hasn't already
oh ok
ohh
I used the wrong storage class in my closest hit
there's a ray_payload that the ray generator execution mode gets
and a ray_payload_in that the closest hit execution mode gets
but I gve it the ray_payload storage class
that wouldn't work at all would it
hrmmm
that's just IncomingRayPayloadKHR
which is what I used in miss
great
that didn't fix anything
I think the VVLs are the issue, but that was clearly wrong
raygen gets the ray_payload
I think all I need now is to do the work to add the AS type stuff and make sure it gets into the spirv as a uniform constant
and add OpTraceRayKHR
I'll at least get this to validating but I am way over my head both on the RT side and on the Shady side, so just will get it validating, and then learn more about RT and maybe try these shaders and see if they work and decide what to do next
tomorrow anyway
