#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages · Page 14 of 1
same
very good\
Gone
all these vulkan structures lost to time, like tears in the rain
If Patrick (or me) actually rewrites task graph there will be so many bugs
I will not pull master for a month at least
oh yeah I am worried about looking at my frame graph again
there is only so many neurons in the human brain
it's either vulkan or your abstraction
can't have both
I really need to look into timeline semaphores though, I initially wrote them off
i am just [utting makeup on
i actually start to be backwards compatible + deprecation now
too many people shitting on me when i break things now
I only shit on you because I'm tilted from my bugs and you are near

I always pull by accident in the middle of something and then have to spend 2 years updating
I hate C API though
Wish we could deprecate that
human slavery
continuous integration

The ShaRT is finally complete
resource/state tracking is cancer
Now it's time for the even funnier part
edge cases 
The ShaRT or shader resource table is a big ass descriptor set that keeps track of resources bound to it
it's very useful because I can turn my brain off and get OpenGL level of ease when binding resources
template <uint64 N, typename T>
struct _descriptor_table_t {
struct write_info_t {
using content_type = std::conditional_t<std::is_same_v<T, arc_ptr<buffer_t<uint8>>>, buffer_info_t, image_info_t>;
uint32 binding = -1;
uint32 offset = -1;
content_type content = {};
};
std::array<T, N> resources = {};
std::array<uint64, N / 64> used = {};
std::vector<write_info_t> updates;
};
_descriptor_table_t<max_descriptors_in_binding, arc_ptr<buffer_t<uint8>>> _buffer_table;
_descriptor_table_t<max_descriptors_in_binding, arc_ptr<sampler_t>> _sampler_table;
_descriptor_table_t<max_descriptors_in_binding, arc_ptr<image_t>> _image_table;
_descriptor_table_t<max_descriptors_in_binding, arc_ptr<texture_t>> _texture_table;
arc_ptr<descriptor_set_t> _set;```
i like the lower case naming on first glance
something unusual
looks tempting
arc_ptr is also a sexy name
intel only
the underscore isms for types is ugly af
but i like the lowercase_t stuff too
i suppose _foo_t is supposed to be "internal" only
still ugly 😄
yet it has an internal type that doesn't have an underscore
what is arc supposed to be?
atomic reference counted
because it's not internal relative to the parent type
but it's probably better if all children of an internal type are classified as internal too
fo readability
its weird
the more i look at iris, the more i realise thats exactly how i write c# code
besides the self and (auto -> returntype) nonsense its quite neat too how you write code : )
I steal concepts from other langs 
says the rust hater
not me hating rust while integrating half of their features 
EEE but awesome
lvstri my man
how can i use your buffer_t in conjunction with std::make_unique
your buffer is using a neat factory method to create the thing... buffer_t::create which returns a buffer_t... but i would like to
std::unique_ptr<buffer_t> _vertexBuffer and _vertexBuffer = std::make_unique(buffer_t::create(......));
@wicked notch
You can just return buffer and C++ will take care of everything for you
I think buffer needs to be movable though
for ptr = std::make_unique<T>(T::create()); to work
hmmf
I return arc_ptr<buffer_t> if you notice
just removing the arc_ptr part will let you use make_unique, assuming your buffer is movable
i use Iris' bufferisms, not irisvk
: >
let me check one sec
shall i peek at irisvks'?
ok ye
my buffer_t in GL Iris should work fine with make_unique I think
did you try it and it doesn't work?
i am getting a template dump of isms
let me see if i can copy it out somehow
[build] /home/deccer/Personal/Code/Projects/lessGravity/OpenSpacePP/src/GameClient/GameApplication.cpp:52:121: error: no match for ‘operator=’ (operand types are ‘std::unique_ptr<Buffer>’ and ‘Buffer’)
[build] 52 | _vertexBuffer = Buffer::Create("Buffer_Vertices_Triangle", GL_ARRAY_BUFFER, SizeInBytes(_vertices), GL_MAP_WRITE_BIT);
[build] | ^
[build] In file included from /usr/include/c++/13.2.1/memory:78,
[build] from /home/deccer/Personal/Code/Projects/lessGravity/OpenSpacePP/src/Engine/Include/Engine/Application.hpp:4,
[build] from /home/deccer/Personal/Code/Projects/lessGravity/OpenSpacePP/src/GameClient/Include/GameClient/GameApplication.hpp:3,
[build] from /home/deccer/Personal/Code/Projects/lessGravity/OpenSpacePP/src/GameClient/GameApplication.cpp:1:
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:430:9: note: candidate: ‘template<class _Up, class _Ep> constexpr typename std::enable_if<std::__and_<std::__and_<std::is_convertible<typename std::unique_ptr<_Up, _Ep>::pointer, typename std::__uniq_ptr_impl<_Tp, _Dp>::pointer>, std::__not_<std::is_array<_Up> > >, std::is_assignable<_Dp&, _Ep&&> >::value, std::unique_ptr<_Tp, _Dp>&>::type std::unique_ptr<_Tp, _Dp>::operator=(std::unique_ptr<_Up, _Ep>&&) [with _Ep = _Up; _Tp = Buffer; _Dp = std::default_delete<Buffer>]’
[build] 430 | operator=(unique_ptr<_Up, _Ep>&& __u) noexcept
[build] | ^~~~~~~~
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:430:9: note: template argument deduction/substitution failed:
[build] /home/deccer/Personal/Code/Projects/lessGravity/OpenSpacePP/src/GameClient/GameApplication.cpp:52:121: note: ‘Buffer’ is not derived from ‘std::unique_ptr<_Tp, _Dp>’
[build] 52 | _vertexBuffer = Buffer::Create("Buffer_Vertices_Triangle", GL_ARRAY_BUFFER, SizeInBytes(_vertices), GL_MAP_WRITE_BIT);
[build] | ^
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:414:19: note: candidate: ‘constexpr std::unique_ptr<_Tp, _Dp>& std::unique_ptr<_Tp, _Dp>::operator=(std::unique_ptr<_Tp, _Dp>&&) [with _Tp = Buffer; _Dp = std::default_delete<Buffer>]’
[build] 414 | unique_ptr& operator=(unique_ptr&&) = default;
[build] | ^~~~~~~~
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:414:29: note: no known conversion for argument 1 from ‘Buffer’ to ‘std::unique_ptr<Buffer>&&’
[build] 414 | unique_ptr& operator=(unique_ptr&&) = default;
[build] | ^~~~~~~~~~~~
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:440:7: note: candidate: ‘constexpr std::unique_ptr<_Tp, _Dp>& std::unique_ptr<_Tp, _Dp>::operator=(std::nullptr_t) [with _Tp = Buffer; _Dp = std::default_delete<Buffer>; std::nullptr_t = std::nullptr_t]’
[build] 440 | operator=(nullptr_t) noexcept
[build] | ^~~~~~~~
[build] /usr/include/c++/13.2.1/bits/unique_ptr.h:440:17: note: no known conversion for argument 1 from ‘Buffer’ to ‘std::nullptr_t’
[build] 440 | operator=(nullptr_t) noexcept
[build] | ^~~~~~~~~
i apologise for this wall ofisms
i cant take a screenshot of the popup in vscode, when i do the momenti press enter to capture the popup disappears ;C
I don't see le make_unique though, did you forgor?
one sec
_vertexBuffer = Buffer::Create("Buffer_Vertices_Triangle", GL_ARRAY_BUFFER, SizeInBytes(_vertices), GL_MAP_WRITE_BIT);
left out the make unique because you said so earlier
Oh my bad yeah you need the make_unique here
try this: _vertexBuffer = std::make_unique<Buffer>(Buffer::Create("Buffer_Vertices_Triangle", GL_ARRAY_BUFFER, SizeInBytes(_vertices), GL_MAP_WRITE_BIT));
struct custom_hash_unique_object_representation {
using is_avalanching = void;
[[nodiscard]] auto operator()(point const& f) const noexcept -> uint64_t {
static_assert(std::has_unique_object_representations_v<point>);
return ankerl::unordered_dense::detail::wyhash::hash(&f, sizeof(f));
}
};
is ankerl using those usings as comments? there is no use of is_avalanching nor void
ive seen you have it all over the place in irisvk too
ah he explains it
i was just to impatient
weird thing either way
Now you can never complain about my hashisms again
: )
yours has a lot of ...voodoo in it
i admire you guys for understanding all that c++ nonsense, when to use what and how : )
i am also more on the c# side but I imagine its like learning OpenGL
i would say its 100 times more complicated
opengl has no template nonsense, no variadics, no this-and-that-elision
But at the end of the day it's just a programming language and we're all familiar with those already
Learning OpenGL without knowing another API is hard af
when i have
struct A
{
XXX x;
};
struct B
{
YYY y;
};
struct C
{
A a;
A aa;
std::span<const B> bs;
};
my understanding is thati have to implement operator== for all of them first
if i want to put them in a set-like-container
you can do = default; on them
yeah for trivial types?
if the members support comparison already
is that span covered by that as well?
struct C
{
A a;
A aa;
std::span<const B> bs;
bool operator==(const C& other) const
{
return a == other.a &&
aa == other.aa &&
bs == other.bs;
}
};
hmm i believe i saw some std::funky_v thing which can detect whether an array is different, not sure where i did
apparently std::span doesn't implement comparison
but you could use this
https://en.cppreference.com/w/cpp/algorithm/ranges/equal
should be as shrimple as && std::ranges::equal(bs, other.bs)
nice
and now i need to impl a hash function per type too, if i dont want to solve it by 1 template
you can also default <=> for easy mode comparisons
yeah jakerino mentioned it, for trivial types i can just = default
<=> is different
ah its something else
it also does the < and > operators at the same time
:S
it's the "spaceship" operator (your favorite)
: )
ah this one also mentions span_ext
remember to write a specialization for equal_to as well
unless
no nvm you can avoid that
specialization means template?
ye but you can avoid that if you just default/define == no prob
i want to try solve this in a naive fashion first
no unreadable templates for now if possible heh
you have ugly makros to solve unreadable templates though 
hehe
i suppose what im trying right now will also not work
make hash functions for all of those
struct A
{
XXX x;
};
struct B
{
YYY y;
};
struct C
{
A a;
A aa;
std::span<const B> bs;
};
struct HashFunctions
{
std::size_t operator() (const XXX& xxx) const
{
auto h1 = std::hash<whateverX1is>()(xxx.x1);
return h1;
}
std::size_t operator() (const A& a) const
{
auto h1 = std::hash<XXX>()(a.x);
return h1;
}
std::size_t operator() (const YYY& yyy) const
... and so forth... for all types involved
};
yeah you basically need a hash function for everything
the only nice thing is that each thing only needs to call the hasher on each of its members, as long as those members (and its members, etc.) have a hash function
😛
the plain xor should also be wrong, usually you multiply by some prime number and xor, but thats not important here right now
i saw yours
I wish I better understood how it worked exactly
like why they use giant prime numbers instead of just xoring everything like u
avoid collisions would be my guess
ah it probably rotates the bits "randomly"
if you hash two of the same value with your thing, you'll end up with 0 when they get xor'd
but a smarter hasher will have a seed that gets randomized before it combines one hash with another
that makes sense
ah you also only use it for the VertexArrayCache
not for the framebuffercache
I made the framebuffer cache a vector for some reason
perhaps one of the two existed before the other
could do the same for the vao cache
not great, not terrible
size_t GetFramebufferInfoHash(const FramebufferInfo& framebufferInfo)
{
size_t hashValue = {};
auto part1 = std::make_tuple(framebufferInfo.DepthAttachment.Attachment._id, framebufferInfo.DepthAttachment.ClearOperation);
Hash::CombineHash(hashValue, Hash::Hash<decltype(part1)>{}(part1));
auto part2 = std::make_tuple(framebufferInfo.StencilAttachment.Attachment._id, framebufferInfo.StencilAttachment.ClearOperation);
Hash::CombineHash(hashValue, Hash::Hash<decltype(part2)>{}(part2));
for (const auto& colorAttachment : framebufferInfo.ColorAttachments)
{
auto part3 = std::make_tuple(colorAttachment.Attachment._id, colorAttachment.ClearOperation);
Hash::CombineHash(hashValue, Hash::Hash<decltype(part3)>{}(part3));
}
return hashValue;
}
btw I'm not sure if caching framebuffers is worth it
compared to just having one framebuffer that you respec a bunch
hmm
id still need to cache those specs
that i know when to respec the single fbo
or just respec every frame?
how often are you binding the same set of render targets
0 per frame
yeah
and you do it every time you bind a new set of render targets
yes
yep, right now im not caching anything at all
and that will most likely be fine too
bindpipeline will enable/disable state, bindasubo/bind-ass-sbo/bindimage/texture will ust run per frame, then setting the fbo can run per frame too
i think i wanted to use the cache as impl detail
to hide teh fbo creation
instead of
_fbo1 = createfbo(attachments...)
_fbo2 = createfbo(otherattachments...);
while (true)
{
beginrender(_fbo1);
bindpipelione(_pipeloine1);
...
endrender();
...
}
have
...
beginrender({ ... describe fbo info here ... })
...
you can do exactly that when re-modifying a single fbo
quack
fresh off the press
for a fraction of a second
I'd hoped to see some epic algorithm that generated meshlets on the fly like we do in frogfood
rip
the paper links this repo
https://github.com/Senbyo/meshletmaker
I looked at a random source file and saw
#define TINYOBJLOADER_IMPLEMENTATION
so I'm guessing it's not supposed to be used as a library
😄
a bunch of empty files too
cleaning indicles isnt important enough
std::vector<uint32_t> AreaWeightedTriangleList(const std::vector<Triangle*>& triangles, const Vertex* vertexBuffer) {
do they normally not review the language in the paper to make sure it uses correct spelling and grammar
id hope somebody does
I'm on page 1 and I've found several issues already 😄
oof
revive thyself!
ok so big update, the shader resource table is now fully functional
also big update, it's winter break!
I have one week of time before death
ill get bob the necromancer, after you have passed, no worries
you cant escape VSM/Nanite 2.0 🙂
Also, I think I'm going to do a slight rewrite of IrisVk
there's a lot of clutter and stuff from the past
time for me to integrate renderpass2 (double the renderpasses to put the world into equilibrium)
Reject render pass, return to immediate drawing (GPUs are fast) 
vkQueueSubmit for every triangle 

I have a question for my boy potrick
Would you mind explaining why this line here works: https://github.com/Ipotrick/Daxa/blob/75f4b85da63af1ad01de0934d479a27c8307ed6f/src/impl_swapchain.cpp#L179-L183?
I have the same thing in my code, except I wait for the next target value
auto SyncHostTimeline::WaitForNextTimelineValue() noexcept -> uint64 {
const auto target = static_cast<int64>(_hostTimelineValue) - static_cast<int64>(_maxTimelineDifference);
_deviceTimelineSemaphore->Wait(std::max<int64>(target + 1, 0));
return _hostTimelineValue % _maxTimelineDifference;
}```
My reasoning is:
First FIF: wait for timeline 0 -> signals timeline 1
Second FIF: wait for timeline 0 -> signals for timeline 2
First FIF Again: wait for timeline 1 -> signals for timeline 3
...
Here _hostTimelineValue starts out as 0 and _maxTimelineDifference is 2
hmm your reasoning makes sense rn
i dont remember the details
When I'm back home in 2 days i ll take another look
sure thing, merry christmas potrick 
merry Christmas lvstri 🌲
If you do this you only have 1 frame in flight no?
As frame 0 is your worked on frame, frame 1 is your frame in flight, and frame 2 waits for frame 0 to finish - so to become the frame in flight instead of frame 1
That's what happens though isn't it?
0 waits NONE (or 0) signals 1
1 waits NONE (or 0) signals 2
2 waits 1 signals 3
3 waits 2 signals 4
...
Well in Daxa (with 2 frames in flight), 2 would not wait for 1 but wait for 0 no?
As the previous signal value is 2 - 2fif = 0
so 2 frames in flight means that the CPU can prepare 3 frames before waiting right?
Yes
gotcha
(I was also confused about that, before Patrick explained)
I defined frames in flight as the number of frames that the CPU can prepare
so when you make buffers that are updated from the CPU you allocate (FRAMES_IN_FLIGHT + 1) * capacity instead of just FIF * capacity?
We should
Now thinking about it I'm not sure I do in my projects hehe
Because I, until recently, thought the same and thought FIF = CPU buffered frames
we have many comments in the cpp for swapchain
without them i wouldnt be able to reunderstand
sadly they only seem to be enough for me to understand. Probably cause i wrote them for me 
llvm's cmake tries to run sh on windows if you don't use MSVC
How can people be so big brained to make LLVM and then proceed to make the most unhinged, retarded and stupid ass cmake build script ever known to man
ihatewindowsihatewindowsihatewindowsihatewindows
Completely random errors
llvm cmake can't find python with MSVC
made sure it was in the path and all
perfect timing, you're getting your brain in a twist with FIF and I'm finally upgrading to timeline semaphores
the +1 is quite weird though
it comes from my misunderstanding for what a FIF is
I thought that N FIF meant "The CPU can prepare N frames before waiting"
that's my understanding, and I'm pretty sure that's how my code works, e.g. 0 FIF would mean nothing happens
1 FIF means that, if my CPU takes less time to submit+present than my GPU takes to execute, I will have to wait on my sema every frame
ye, if you define FIF like that then you need to either target the next timeline value, or decrease the max timeline difference by one
yeah, that's what I do, doing that difference thing seems weirdly complicated. I literally just do
WaitSemaphores(frameIndex);
Submit(signal = frameIndex + 1);
++frameIndex;
and I use frameIndex % FIF for the binary semas on my swapchain
nah, frame 0 waits for 0 which is a no-op since that's the start of the semaphore, frame 1 waits for 1 because of the ++frameIndex, and the last submit signals frameIndex + 1, so for frame 0 that'd be 1
wait hold on let me think about this some more...
you likely want this behavior
for 2 FIF
yeah right shit
I was just happy that switching from fences to timeline semas ran with no validation errors so I didn't think about it lol
How would this work for more fif
You never take the number of fif into consideration
lmao yep, time to rethink it (and probably make it work the same way as yours)
I guess it's just
signal(frame) = frame + 1
wait(frame) = max(frame - (FIF - 1), 0)
ye
and I'm guessing you don't do FIF - 1 hence the discrepancy
I do max(frame - FIF + 1, 0) which is the same thing
yeah, I meant that the - 1 is the reason for the discrepancy in thinking that FIF frames is FIF + 1 frames prepared before waiting
How does a human being remove vcpkg's cmake integration
vcpkg integrate remove says there's no integration
but running cmake fatally fails with the message that vcpkg is missing
is there some FindVcpkg nonsense maybe?
there's a singular cmake variable in a cache somewhere
I have no idea where this "global cache" is
I have fixed everything
#include <Macros.hlsl>
struct SVertexOutput {
Location(0) float4 Position : SV_Position;
Location(1) float4 Color : Color;
};
SVertexOutput VSMain(uint vertexId : SV_VertexID) {
const float3 trianglePositions[] = {
float3( 0.5, 0.5, 0.0),
float3(-0.5, 0.5, 0.0),
float3( 0.0, -0.5, 0.0)
};
const float3 triangleColors[] = {
float3(1.0, 0.0, 0.0),
float3(0.0, 1.0, 0.0),
float3(0.0, 0.0, 1.0)
};
SVertexOutput output;
output.Position = float4(trianglePositions[vertexId], 1.0);
output.Color = float4(triangleColors[vertexId], 1.0);
return output;
}
struct SPixelOutput {
Location(0) float4 Color : SV_Target0;
};
SPixelOutput PSMain(SVertexOutput input) {
SPixelOutput output;
output.Color = input.Color;
return output;
}
le tringle with le HLSL
let's see how far I can push DXC before it gives up on me
Also, with new IrisVk the tringle sample is only 100 LOC
struct SCamera {
float4x4 Projection;
float4x4 View;
float4x4 ProjView;
float4 Position;
};
[[Binding(0, 0)]] StructuredBuffer<SCamera> b_cameraBuffer[];``` this syntax is honestly pretty great
who said HLSL buffers suck
bindless when
;C
also funny how you went PascalCase : )
idk, devsh is working on it
ah
he's bullying DXC devs full time
aaah i remember some conversation about it
add bda to hlsl pls 🥺
I love dxc
ModuleProcessed("dxc-commit-hash: 93ad5b31");
ModuleProcessed("dxc-cl-option: -E VSMain -T vs_6_6 -spirv -fspv-target-env=vulkan1.3 -Zpc -Zi -O0 -I D:/Dev/CLion/Retina/Shaders");
struct SCamera {
float4x4 Projection : [[RowMajor, MatrixStride(16), Offset(0)]];
float4x4 View : [[RowMajor, MatrixStride(16), Offset(64)]];
float4x4 ProjView : [[RowMajor, MatrixStride(16), Offset(128)]];
float4 Position : [[Offset(192)]];
}```
me: make the default column major please
DXC: oh yeah sure (he doesn't know lmao)
@glass sphinx
#ifndef RETINA_HLSL_BINDINGS_HEADER
#define RETINA_HLSL_BINDINGS_HEADER
#include <Macros.hlsl>
#define RETINA_MASTER_ADDRESS_BINDING [[Binding(0, 0)]]
RETINA_MASTER_ADDRESS_BINDING ByteAddressBuffer b_masterBuffer[];
RETINA_MASTER_ADDRESS_BINDING RWByteAddressBuffer b_rwMasterBuffer[];
#define DeclareAliasedStructuredBuffer(type, name) \
RETINA_MASTER_ADDRESS_BINDING StructuredBuffer<type> name[];
#define DeclareAliasedRWStructuredBuffer(type, name) \
RETINA_MASTER_ADDRESS_BINDING RWStructuredBuffer<type> name[];
#endif
``` here's my epic fix for HLSL not having BDA
Is there any better way?
i do exactly the same 
rip
i think i ll kick out structured buffer tho
why?
byteaddressbuffer is the best of the bunch
i find its interface the best after having used it a lot at work
you can only load uint's tho
maybe the docs are outdated?
they are its a little sus with the new things
the only downside of byteaddressbuffer is that it generates a lot of loads
it doesnt understand alignment past 4 bytes
so if make an aligned load/store of lets say 16 bytes it will cut it up into 4 spirv loads
sad life
damn
only for spirv supposedly
need to put that on devsh's agenda
afaik its also not a technical limitation they just implemented the spirv support for byteaddressbuffer poorly
could be wrong on that one
its fine TM
tho dont be afraid from the syntax suggesting you load a whole stuct or store a whole struct
if you do this:
RWByteAddressBuffer b : register(u0);
struct Tester{ float4 vec; uint4 intvec; };
void test()
{
Tester t = b.Load<Tester>(some_offset);
t.vec[0] = 1.3f;
b.Store(some_offset, T);
}
This will consistently be optimized to only store the first element of vec and do nothing else
I saw some posts in #vulkan a while back about that generating different spirv than the equivalent GLSL, did they fix that or is it moreso that it converges at the driver level
that was about RawBufferLoad
and RawBufferWrite, the poopy bda integration for hlsl. ByteAddressBuffers do not have this problem.
does the same hold for StructuredBuffer<T>
yes
nice
oh so what you're saying is HLSL BDA can give you perf landmines if you're porting from GLSL
fun
definetly yes
well depends on how you do it
but they just dont optimize it properly
sadge
as I am seeing if you use StructuredBuffer<T> you're gonna be fine
BufferPtr in hlsl will safe us all
ByteAddressBuffer is more tricky but it'll be fine as well
not for me tho. I use tons or non aligned offsets everywhere within my buffers
cant really use structured buffers for much in my engine
maybe 2024 will be the year of the HLSL BDA
damn, I've yet to encounter a scenario where I need non-aligned load/store

well i also have lots of buffers thyat have a counter in the first few byutes
i guess that is a common one you might encounter, no?
💀
what is unaligned in this scenario
well structured buffer requires alignment to the whole struct size
so you cant just put 4 bytes in the beginning
as your structs may have wild sizes
so you can't just be a smartass and make it part of your struct?
and have the rest of the data as the inner part
huuuuuuuuuuuuuuuh
i guess so
yea that works but then you cant have unbound anything
wait hold up how the hell do I translate to HLSL
layout (...) buffer X {
uint x;
T data[];
};```
???
yea cant do
💀
wow that sux
inb4 lvstri reinventing Cg
: )
glsl is fine tbh, gotta give hlsl more time to git gud
🤠
even potti wishes to go back to glsl but is stuck in d3d12
also it's kinda crazy how the vulkan sdk is so broken with other dependencies like shaderc or dxc lol
they have absolute paths to PDBs in their cmake apparently
lmao
can you not apply for some internship at lunarg or something
and finish your masters there
i dont approve of C prefix in Regina, but i understand why its there and therefore implicitly approve its use
template <
typename C,
typename R = C::Resource,
typename D = C::Descriptor,
typename... Args
>
requires (std::same_as<D, SBufferDescriptor>)
RETINA_NODISCARD auto AllocateResource(
SConstructResourceInlineTag,
Args&&... args
) noexcept -> C;
template <
typename C,
typename R = C::Resource,
typename D = C::Descriptor,
typename... Args
>
requires (std::same_as<D, SBufferDescriptor>)
RETINA_NODISCARD auto AllocateResource(
uint32 count,
Args&&... args
) noexcept -> std::vector<C>;
template <
typename C,
typename R = C::Resource,
typename D = C::Descriptor,
typename... Args
>
requires (std::same_as<D, SImageDescriptor>)
RETINA_NODISCARD auto AllocateResource(
SConstructResourceInlineTag,
EImageLayout layout,
Args&&... args
) noexcept -> C;
template <
typename C,
typename R = C::Resource,
typename D = C::Descriptor,
typename... Args
>
requires (std::same_as<D, SImageDescriptor>)
RETINA_NODISCARD auto AllocateResource(
uint32 count,
EImageLayout layout,
Args&&... args
) noexcept -> std::vector<C>;``` This was the template madness of yesterday btw 
ye types and functions being pascal case caused some issues
so I went with the Unreal Engine™️ code style
R = C
D = C
😄
can you write out C, D and R? Descriptor, Resource, Cock reads better
layout (push_constant) uniform UPushConstant {
uint u_inputImageId;
};
RetinaDeclareStorageImage(restrict readonly image2D, u_image2DBlock);
#define inputImage RetinaGetStorageImage(u_image2DBlock, u_inputImageId)
vec3 Tonemap(in vec3 color) {
const float luminance = GetLuminance(color);
const vec3 reinhard = color / (color + vec3(1.0));
return Saturate(mix(color / (luminance + 1.0), reinhard, reinhard));
}
void main() {
o_output = vec4(ToNonLinearFromLinear(Tonemap(vec3(imageLoad(inputImage, ivec2(gl_FragCoord.xy))))), 1.0);
}
beautiful
nice Lisp program
@glass sphinx how do you manage daxa::BufferId lifetimes
fr tho: for daya to dictate the lifetime managememt for buffers and images aside from manual free and delete is sus because they have top variee lifetime requirements
and refcounting is too slow for this many objects for a general solution imo
fair
le tringle
its a good solution too
rendered out of thin air i assume?
ref coubting gives you a lot of benefits that many ppl ignore/dont know
yeah like i can just fire and forget 
and i don't what to worry about manually cleaning it all up later but that can also be a shot in the foot if i somehow mess up and do circular ref counts
also lors of multithreadding guarantees
if you pass a rer counted handle to a function via ref you can know 100% the object wont be destroyed while running the function
which is a big problem with manual management
yeah i kinda just accepted that i won't probably be making 1000+ vulkan objects in a single frame so performance is just neglible
in daxa i avoided these peoblems with lots of hacks and a big id system

ref counting would have been muuuuch simpler
huh big id system?
hard to explain
index + version like entt
but it is also mt save and such via some other shingus
i don't even know what entt does
mhm
Iirc nvidia dos and donts says that it is not worth it compacting really small meshes such as particles (which only take up 1-2 triangles)
struct SAccelerationStructureGeometryInstance {
glm::mat3x4 Transform = {};
uint32 ObjectIndex : 24 = 0;
uint32 Mask : 8 = 0xff;
uint32 ShaderBindingTableOffset : 24 = 0;
EAccelerationStructureGeometryInstanceFlag Flags : 8 = {};
uint64 AccelerationStructureAddress = 0;
};``` why bitfields
why
why would you do this to me
refresh me on matrix memory layout
if the transform matrix is row major then the last column contains the transform?
how the matrix is displayed there shouldn't depend on the majorness since that only defines the memory layout
tl;dr that column should be the translation
True
I guess this helps more
auto ToNativeTransformMatrix(const glm::mat4& transform) noexcept -> glm::mat3x4 {
RETINA_PROFILE_SCOPED();
auto result = glm::mat3x4();
std::memcpy(&result, AsConstPtr(glm::transpose(transform)), sizeof(glm::mat3x4));
return result;
}```
I think this is correct
maybe
another way to remember is that the rotation+scale are expressed by a 3x3, and the translation is a 1x3
and with a non-square matrix you can tell by looking
solo un pochino
what laziness does to a mf:
if (u_frameIndex > 0) {
const vec3 previousLighting = vec3(imageLoad(g_accumulationImage, ivec2(gl_LaunchIDEXT.xy)));
incomingLight = mix(previousLighting, incomingLight, 1.0 / float(u_frameIndex + 1));
}
imageStore(g_accumulationImage, ivec2(gl_LaunchIDEXT.xy), vec4(incomingLight, 1.0));
imageStore(g_inputImage, ivec2(gl_LaunchIDEXT.xy), vec4(incomingLight, 1.0));```
good enough
Alright
with this RT sample new Iris (codename Retina) is back to the same functionality level of old Iris
...but since RT is fucking addictive I'll be messing around for a day or two
Or a year or few 🥹
ptsd
it truly is, addictive
ye let me 🅱️ush
are you pushing bistor too? 😛 i do have it here already
a klingon love poem
sorry for calling Retina Regina, no idea why i thought it was the latter : (
noice gltfpack uses all the cores, thx zeux
why is everything so dark 
can you try commenting out the section that grabs the textures
sure
it's in the rchit shader
something's wrong with how I shrimple textures maybe
gpu runs at 144W
do you want a little video fly through too?
or is that image enough?
I'll test this too
revert back to where it was and comment this section only https://github.com/LVSTRI/Retina/blob/master/Shaders/RayTracing/Main.rchit.glsl#L106-L116
one sec
dont forget to add -i bistro.gltf -o bistro.glb 😛 for the gltfpack line
when i alt tabbed the first time from the running retina 😄
if you use this bistro it should be good with normal mapping too
I think
alright
RT makes the GPU sweat yes
yeah and much less vram usage too it seems just 1.5gb
interesting my bistro.glb is 2xx mb
yours is 3xx mb
it has interior too I think
It's addictive to look at too!! Looks awesome 😍
hmm looks rather flat
i believe i commented the rchit in again and rebuilt everything
it compiles shaders at runtime so there's no need to rebuild
I think it is
ok
I think my cosine weighted distribution is wrong
sun angle might just be in a weird angle 🙂
sun position is in the ray generation shader
I hardcoded it because I wanted to see shadows asap 
you should set the spawn point outside though
first time i thought it dowsnt work because black screen 😛
now that I'm done bringing this new iris to full functionality I'll make the actual engine
first things first I gotta read what elias daler shared with me
Iris/Retina confirmed 99% stolen from me
true and real
It's quite incredible that the FBX importer/exporter is so garbage that it can work at full speed even when there's a gigabillion page faults happening
(it has to load the entire FBX in memory so I'm currently using 170GB of virtual memory with a working set of 32GB
)
Ah yes 178772237 page faults and counting
one must imagine Unreal users happy
they dont have time to be happy, they are waiting for startup and or compliation
ffffffffuuuuuuuck
Why in the fucking hell would you use int to store the number of triangles of a model
break the model into chunks perhaps : (
I'm still waiting for the export to be done
Only now I've noticed that FBX2glTF can handle only 2 billion triangles
if this fails I'm gonna be so pissed
since here 💀
fuck
btw we're up to 400 million page faults 
Honestly Windows is handling this incredibly well
(its another column you can enable)
ye right now it's small
~1k ?
so more doing IO then
I/O to page files exclusively 
: >
Anyways, glad to see that Window's memory management systems are working properly
I went into this fully expecting to crash, but the system's been running smoothly for hours now
next time say something
UnsignedLongLongBuffer
I think it's time to call the Darian
Just download more VRAM from google
alright time to document FBX sdk adventures
numero uno
FbxStream is a class responsible for opening/reading/writing files
it has a pure virtual function that opens a file virtual bool Open(void* pStreamData) = 0;
now, which file is this function going to open
nobody fucking knows
: D
oh wait I get it
I'm supposed to put the filename in the constructor
and then Open just resets the file pointer to 0
this is great
look at this poor excuse of an example
FBXSDK_strcpy(mFileName, 30, "CustomStreamClass_temp.txt");
hmm yes very clear thank you
this tells me exactly what I have to do
the heck
there are 2 open sauce variants of fbx
ufbx i believe is the one and openfbx the other
I looked at them
they're not multithreaded and they can't handle big files
everyone seems to like 4GiB max
apparently so
FBX2glTF did manage to export it
after I changed every single uint32_t in the codebase to a size_t
...
int offset = (int)pOffset;
switch (pSeekPos)
{
case FbxFile::eBegin:
fseek(mFile, offset, SEEK_SET);
...
the slow part is surprisingly not the FBX SDK
ok
writing a custom export plugin might probably still a better idea
yes
even if you just dump everything obj style into a txt file heh
but figuring out unreal doesn't seem fun 
FBX SDK has been cursed by every developer that had to integrate it. So don't forget the tradition.
Alright
near 50% speedup already
thank god they put in zero effort, they make me look like a good programmer
LVSTRI is the city scene you posted gltf or FBX?
gltf
I'd never post anything FBX
it doesn't deserve taking my precious SSD space even
And you say it's ~9 gigs?
the FBX or the gltf?
Gltf
FBX is 11GiB, gltf is 500MiB of JSON + 7.5GiB of binary
That might actually fit in my VRAM
Hmmmm
I have 1080
Just compress
mate
We do some index compression
Palette compressing the indices to 8 bits if I understood correctly
damn
Microindices
Meshoptimizer does it iirc
But I have no binary gltf loading and no block compressed texture loading setup yet
But I wanna see if Patrick's culling can handle it
I'm still waiting for Darian to make moana island into a gltf
(he ain't gonna do it
)
yeah..
I opened part of it in blender but yeah
Out of core rt sure
But real time I gave up 
How difficult would it be to modify gltf to support >4gb files
zero effort
just find any exporter that uses 8 byte integers instead of 4
cgltf and fastgltf come to mind
@ sean
Nanite stinks, I'm all tessellation and displacement team now
Patrickpilled
I gladly will be, but geometry stuff too scary for me
bro
I'm dying
but I won't let fear take over me
I just have to grow a brain big enough to store all the nanite knowledge
retvrn to triangle strips
no need for a horrible index buffer using all your memory :^)
this is painful
I need a mesh small enough so that I can debug it easily
but it should also be big enough for me to be able to actually generate meshlets out of it
ok sphere it is
sphere is good
rip cubes
cubes have 12 faces
can you use deth's doom.wad?
not gud
i know cubes are not good
that has thousands of triangles 
ah too much
then some icosahedron-esque shape ye
low poly sphere 
add subdivs
assuming my code is correct
(huge assumption)
these are the meshlets for suzanne
how the hell do I validate this
@frank sail if I give you vertex position in a txt file (ready to be copy pasted in C++) can you try loading it in frogfood and checking if the meshlets are correct? 
hmm could you feed it as vertex positions into fastgltf and have it write a gltf?
oh yeah I could write a GLTF
let me try
after this I really gotta go study otherwise my exam tomorrow is gonna go baad
lol
its UEs fault for taking so long
(I still don't understand it but I can fumble my way through at least)
the meshlets being all the same size is sus..
ffs writing a gltf file is too involved
for (auto i = 0; i < meshlets.size(); ++i) {
const auto& meshlet = meshlets[i];
auto accessor = new cgltf_accessor();
accessor->type = cgltf_type_vec3;
accessor->component_type = cgltf_component_type_r_32f;
accessor->offset = 0;
accessor->stride = sizeof(SVertex);
accessor->count = meshlet.size();
accessor->buffer_view = &meshletBufferViews[i];
auto attribute = new cgltf_attribute();
attribute->type = cgltf_attribute_type_position;
attribute->data = accessor;
auto primitive = new cgltf_primitive();
primitive->type = cgltf_primitive_type_triangles;
primitive->attributes_count = 1;
primitive->attributes = attribute;
auto mesh = cgltf_mesh();
mesh.primitives_count = 1;
mesh.primitives = primitive;
meshes.emplace_back(mesh);
}``` who said memory leaks weren't useful
i cant load them with my loader : > it only accepts triangulated primitives 😢
alright I have a hack
please ignore me, my renderer wouldnt know if a meshwas indexless or not either
alright time to engage overdrive mode
Imma write an OpenGL gltf viewer in 5 nanoseconds
true
let me try it
Target "fastgltf_gl_viewer" links to:
glfw::glfw```
bruh
ah gotta execute the python script myself
smh
; (
after fixing the build issue it crashes
probably can't handle non indexed meshes
well it doesn't matter
Iris is still there ready for us
alright moment of truth
welp
at least I see something
ah you are trying to make to write a meshlet algorithm?
and want to see if it spits any meshlets out
yes, using graph partitioning
aaah
time to study
coool
we celery later
: )
i think you are the smartest frog among us
so that I don't lose this
as far as I'm aware the only ones using METIS are:
- me
- unreal
that's it 
alright NOW it is time to prepare for exams
Sure
Gltf is preferable
we got the thing already
now it's time to make this actually work 
this generates meshlets close to MAX_PRIMITIVES
but it doesn't actually respect the upper bound
and honestly, I have no idea how to make it work besides splitting the offending meshlet in 2
https://youtu.be/eviSykqSUUw?t=1172 sir brian says "managed to coerce it with small slack and fallbacks" though I have no idea what "slack" is or which fallbacks he used
because he gave up on metis neh?
I don't think he ever thought of METIS 
ah, i thought there was some conversation about it happening
possible that im mixing shit
regardless, I think I'll make my own clusterizer too
METIS is too garbage
I wonder if I can abstract away the graph too
in CSR format
took sir brian just a man-year
if you start now, we might have something by new years eve 2024/2025
I'm more of a child than a man if we compare me to brian 
true, but, its worth a try heh
although you are italian, you probabyl look more manly than him already
TODO fix self reference
clusterPrimitiveBuffer.emplace_back((face * 3 + 0) - minPrimitiveIndex);
clusterPrimitiveBuffer.emplace_back((face * 3 + 1) - minPrimitiveIndex);
clusterPrimitiveBuffer.emplace_back((face * 3 + 2) - minPrimitiveIndex);``` I have no idea what I'm doing 
Hmm maybe I'll stray away from meshoptimizer's design completely
but EXT_mesh_shader kinda requires it
man
every second that passes I realize how big zeux's brain is and how small mine is in comparison 
I gotta use some more heuristics to make sure my partitions are topologically and spatially local
like using distance as edge weights
but there's still the problem of generating the micro index buffer
which I have no idea what it represents
(I don't have an idea about the vertex index buffer either
)
hmm
maybe that's the "slack" sir brain was talking about 😄
the micro indices point into a per meshlet vertex index buffer
the per meshlet vertex index buffer points to actual vertices
Yeah but what does it represent
for example, we know the index buffer is basically an half-edge list for a mesh graph
also with edge weights I get real funky meshlets 
yep I can get that
unfortunately unless I deeply understand things I can't reason with them
it's a big flaw of mine
I think I'll start with a cube mesh
And look at what meshoptimizer outputs
just single meshlet, no funky business
deccer face & body reveal : (

i have my laptop on the belly most of the day : )
whats the difference between yours and potrick's, lvstri?
besides meshoptimizer
or are you saying potrick's way is shit?
or cant do something you want to do?
this
I'm dumb
I can't partition a graph in a good way
Also I don't understand microindex buffers
How to generate them at least
then we need to find people who can explain it better to you
alright this looks promising
I changed the heuristic to (1.0 / (distance * distance + 1.0) * 2^20)
how come the arches have no meshlet looking meshlets?
coincidence? or just not enough verticles?
ah
das ist what i thought
another weird question
do you perhaps need models with properly made "source" meshes
before even processing them into meshlets
not really, I can optimize them myself
ok
so long as they aren't completely degenerate
do those arches have weird vertices compared to the curtains?
I don't think they're that much of a problem, just an annoyance
hmm no, that doesnt even make sense
knot looks good
I have not tried
and maybe you'll break my thing
because I can't really handle 1 sized partitions 
SM_Deccer_Cube_Complex.gltf i think its called
there's no transforms yet sadly
remember that I'm testing this on IrisGL 
some meshletisms end up squished
probably a tringle count problem
the froge
oh : )
I guess my deccer cubes role will be revoked now
yeah like I said, I don't handle transforms yet
they're a hassle when you don't understand whatever the hell you're doing 
clion crashed
nevermind
are you on EAP?
nope
kk, eap is quite unstable
CLion.exe was using 100% of my CPU for whatever reason
I guess it didn't like me renaming a variable
ah indexingisms
