#Fwog and co.
1 messages · Page 20 of 1
hey man static constexpr is actually considered helpful
or maybe it was inline constexpr
but still
no doubt it has an effect, but i wonder why that single declaration merits it, but not the rest
ah
probably copied it from my examples which have a similar construct in the global namespace
putting the static bit in a function is a bit quirky fr
Fwog is love
Fwog is live 
I did but now I realize when I moved it into a function, I should have just made it a regular auto instead.
Thanks for spotting the mistake.
The current version of 01_hello_triangle.cpp doesn't have it too https://github.com/JuanDiegoMontoya/Fwog/blob/9215996c3ca1e7bc855abba6f53f3a48264c4f84/example/01_hello_triangle.cpp#L79
I am looking through the history of that file to see if I did get it from there
Yea i don't know where it came from. it has no reason to be an static constexpr: https://github.com/JuanDiegoMontoya/Fwog/blob/6a5d37a97bd50117914458a585186df297e94c13/example/01_hello_triangle.cpp#L79
In the original its just an auto inside a static function. I will fix the mistake.
thats exactly how i pictured you for a long time
: >
this kind of glasses hehe
@fiery sorrel i did click it, but the dark theme looks better
I think I could set if I want it to be Dark or Light as default. I prefer light and find it more professional. Maybe there's a way to set it to system default or something even from a website?
Yea. There are for native apps. Not sure if there is from javascript tho or if thats a security thing. Maybe id look into it as a form of curiousity
anyways my Sunday is nearly out so it was nice to chat on the Discord
ive been more productive without bein on the cord. I DM @long robin sometimes with the side stuff i work on
jaker should install office hours and charge for it too 🙂
I dont ask him for help!
Unless you meant office hours like a therapist type of deal
:p
anyways point is even tho im less active, im actually "more active" when it comes to coding and GP stuff in general.
Been spending my newfound free time freed up by less Discording with studying and stuff
mayb id become more active in a year or two once I have the skill level to slack off, but now I feel more inspired to catch up to the big guys ig haha
point is, just becos i ain't here doesn't mean I gave up on GP and all the stuff I learned here u know
mostly
I let myself come on Sunday
but I also notice that when I have it i end up way more distracted n get last done
I guess for now its like a 'reminder' why i have such a personal goal.
I wish I had your will
I don't know how I got through this years exam session 
I'm stuck in endless self-destructive procrastination
but at least I have traced sponza
you got this! If you start studying as soon as you can its a lot easier. You seem to have the work ethic since you do work on your projects often, so its just a matter of prioritization for you. I believe you can do it :)
not yet. I haven't thought of a good way to do it
well, I haven't tried too hard either 
i don't think it is too bad as long as you don't use glsl to align
aka let glsl layout and then match the host to that
hmm
anyways, i wuz thinking - it would not be terribly difficult to have a lil' bit of cheeky cmake build part that would dump layouts of the shared structs for both glsl and ZE HOST and then verify if they match
I could imagine doing shared headers if I had a compatibility header, like what we use in FidelityFX
how do you propose dumping the struct layout
which one?
either one
glsl via glslang
now how do I match the structs between the two
with the power of baba and booey
or is that where the shared header part comes in
well yeah
you just match the fields
if they are at the same offset
the contract is correct
I think it would be better to make matching the layouts easier in the first place, rather than offering a way to check that they match
ya feel?
I could ask potrick how he does it for daxa (I think with macros)
i do not feel
making a struct-layout-checker feels equivalent to a sanitizer, when those bugs could (ideally) be impossible to make by construction
@snow sun reveal thine secrets
custom types with alignas should be able to match std430 without too much difficulty I think
could gate those types with a define in a header, then include that in headers where I define shared structs
you would need a macro to introduce the type name i guess - not the end of the world i suppose
but still, what you get is not superbly C++
its scalar only
naw, won't save you
also don't use std140
ah

also thats just scalar with extra steps
exshrimpactly
this word has permanently damaged my anterior cingulate cortex
i'd still like to know the impact of scalar on the various hw and buffer types
on AMD, you get at worst unnecessarily small loads (in my testing, though it seems the SC has gotten better since), which can cause extra cache misses
I'd much rather study the impact of scalar on my brain
idk about NV and idc since they won't let me see ISA very easily
clamclusion: always use scalar block layout 
mao le
but yes
AMD should be perfectly capable of doing it for UBOs and SSBOs
what do you base this on
my own tests and interpretation of the ISA
I have benched the perf impact of scalar block layout on AMD
but coalescing is independent of the isa
then clarify what you mean
ah, I thought you meant turning small loads into big loads in the ISA
thats one thing
I do not have any idea about the hw coalescing
looks at author
god he's everywhere
yeah
I guess the question is (pardon my jargon) how does unaligned mem access affect hw schtuff
ig this is what devsh is sayin
you still do 430 alignemenets, the compiler doesn't generate shittier code with scalar
that's boring
maybe we can send pixel into the aco mines
can you unroll that abbreviation for me
P pixelduck
I ig its pixelduck
X x gon give it to ya (it == pixelduck)
E eeeey its pixelduck
L life is going by while i meme on this server
aco stands for AMD compiler
v helpful I know
(it’s the amd gpu compiler backend in mesa)
@valid oriole do you know anything bout this stuff
hmm I think at least as long as you stay 4-byte aligned amd hw shouldn't really do stupid stuffs
at least I don't know of any reason to do stupid things
i'd be quite disappointed if scalar loads of e.g. a vec3 + a float won't turn up as buffer_load_dwordx4
right, but thats not the entire picture
@silver shoal might know
but can he tell 
he can tell if he can tell
but can he
alternatively, I can dig and then see if I can tell
but I don't feel like looking at hw docs tomorrow 
does that mean you look today
well it's 3am, so technically you're correct
which is one of the corrects of all time
arrays break necks
yeah std140 arrays suck and I don't use them
430 arrays of vec3 too
why would you ever want more than one of something
one of the problems i had with scalar was that anything above 64 didnt properly align but that was a bug and fixed
usually scalar exactly matches c
but you have to qualify it or martty hits you with a chair
as long as you have the same type bit width. eg you must enable int64 if you wanna transport buffer references in steucts to march them
i have some tests foe it
but i went to always use scalar to preserve my hair, the aligning thing was just too sketchy for me in the end
but if you have shared makros
you can cheat
just make vec3 a vec4
😎
its an option to ban vec3
yea
alignment doesn't really cause me issues anymore
it's more annoying having to synchronize structs
agreed
there is benefit to shared, yes
@heavy cipher curious what cases you could cause you trouble for unaligned ness and worse coalesing
i accidentally tested unaligned aromic perf a few weeks back, even if you are 3 bytes off they are the same speed on a 1080
i would assume amd can do similar but my renderer crashes on amd atm
but i am in the camp now what when i could be read or write limited i manually make sure vec4 and mats and such are 16byte aligned
no biggie
scalar for life
don't think the atomic perf is the key
ok
this is also an option
Uhhhhh in terms of coalescing vmem operations it should all happen inside the L0? Just by manner of being a cache
It just takes an address per thread so if they all hit it will be quick
same tbh
older gpus can get bottlenecked by queued read/ write addresses
not sure if that is much of a problem anymore
but i got into situations where even okish close loads were just filling the read queue so it couldnt issue more reads
but this is oldold
new gpus likely can just have the unaligned load in the queue instead of having to issue split loads directly
nvidia also makes a big point about how coalesing within warps is very important, but they never really mention alignment
Yeah thankfully that changed with navi - there's a whole section in the rdna1 white paper going "damn when we stopped shoving buffer ops down the slow quad per cycle texturing path things went faster"

Damn the word quad has been changed to queue
just uhh send over the hw docs, we can fish it out from there 
In that case yes hits will get stuck behind misses to return to shader in order
You and I wish it were that simple to just read the docs and know what's going on 
the gcn article above says it, 64 byte alignment or shit gets doingted
but idk bout rdna
as i suspected, you have to look at cuda
Global memory resides in device memory and device memory is accessed via 32-, 64-, or 128-byte memory transactions. These memory transactions must be naturally aligned: Only the 32-, 64-, or 128-byte segments of device memory that are aligned to their size (i.e., whose first address is a multiple of their size) can be read or written by memory transactions.
When a warp executes an instruction that accesses global memory, it coalesces the memory accesses of the threads within the warp into one or more of these memory transactions depending on the size of the word accessed by each thread and the distribution of the memory addresses across the threads. In general, the more transactions are necessary, the more unused words are transferred in addition to the words accessed by the threads, reducing the instruction throughput accordingly. For example, if a 32-byte memory transaction is generated for each thread’s 4-byte access, throughput is divided by 8.
cuda docs are actually pretty insightful many times
let me correct you: nvidia forgoes publishing anything else basically
sad that its so vague on the coalesing
but its clear that aligned makes life easier
the limitation of the coalesing
it doesnt really say it will always merge all addresses that follow on each other into larger loads
i guess they say that
what does it mean with "unused words"
does it do a 16 byte load when for example you issue 4byte loads at 0 and 8 leaving half the load unused?
yes
the loads are always at least 32 byte big
if the invocs don't use all of it, effective throughput goes down

then it makes total sense to me
so its more about having addresses close together
and ofc not overstepping the bounds of an aligned load when possible
so when loading 32 bytes, better make that aligned
this is global memory
yea
interesting that its not simply cache line size, or is a cache line only 32 bytes?
128bytes
hmm
so on amd it seems like the cacheline is 64 byte
unarchiveth thee
did you get some oil money or sth 
mashallah
FwogVk confirmed 💯
pog
wow 100 stars
I have no idea how other people use github tbh
same
vuk is similar
272 stars, only like 10 people max who i've seen talking about actually having used it
i'd guess the percentages of the remainder are
- people who use it but don't communicate at all (5% max)
- people who don't use it but think they might at some point (~45%)
- people who just think it's cool (~50%)
ja its sad somehow
but same for all my other projects too
at least we have us 🙂
mfw I am one of the people who don't give feedback 
you do
aren't the last 2 types pretty much the same
95% "oh this is neat I gotta use it/copy it at some point" forgets
no
#3 isn't even really necessarily a graphics programmer to me
like the couple people in college that i talked to about my game at the time that starred it despite the fact that they could never build it
i also forget about things i have starred in the past : )
I basically don't star anything, if it's useful I just clone it into my "other people's repos" dump folder
especially since keyword searching in github sucked for a long time (maybe still sucks?)
greping local files is convenient
i know what you mean
finding/searching information really sucks
or catalogueing and then finding 🙂
the talk about tracy in #bikeshed-😇 has me thinking if I should add some integration in fwog
hmm how should I handle the case where the user already has tracy installed
Yuh
dunno
and i shall prepare for the wurst
hmm
can tracy even use runtime identifiers for scopes
I remember someone here having an issue with it
transient zones or summin
3.1
Handling text strings
When dealing with Tracy macros, you will encounter two ways of providing string data to the profiler. In
both cases, you should pass const char* pointers, but there are differences in the expected lifetime of the
pointed data.
- When a macro only accepts a pointer (for example: TracyMessageL(text)), the provided string data
must be accessible at any time in program execution (this also includes the time after exiting the main
function). The string also cannot be changed. This basically means that the only option is to use a string
literal (e.g.: TracyMessageL("Hello")).- If there’s a string pointer with a size parameter (for example TracyMessage(text, size)), the profiler
will allocate a temporary internal buffer to store the data. The size count should not include the
terminating null character, using strlen(text) is fine. The pointed-to data is not used afterward.
Remember that allocating and copying memory involved in this operation has a small time cost.
we good
now I'm trying to remember why I didn't use a string_view here 
void (*verboseMessageCallback)(const char* message) = nullptr;
robert downstream junior
be me
want type-erased callable wrapper to dodge a fat include that leaks impl details in a header
realize std::function is exactly what I want
realize that#include <functional>is probably 10x more expensive than the other header
How difficult do you think FwogVk would be? Do you think it would be possible to keep 100% of the current Fwog api but have a version that was Vulkan behind the scenes?
fwog is more or less inspired by vookan
The tricky stuff would be the implicit sync
I think if I made a vk version of this I'd just expose raw sync primitives
I could also make a render graph but I know that's a bit of a worm innit
vuk backend for fwog
Interesting ok
I wonder how close fwog is to being a layer on top of a low level backend that people could add as needed
Especially since you did all the hard work to come up with the interface already
make a validator that throws errors at the user for not calling your barrier funcs appropriately, even in GL
that way when you make FwogVk no code will change
Currently there are no concepts of descriptor sets, memory barriers (except for the OpenGL ones), image layouts, memory allocation command buffers, etc. And Fwog lets you do some glisms like updating buffers and images with implicit sync
the problem is that the amount of info (
) you give to the gl driver is not the optimal amount
Yeah that's why I'm not keen on making fwogvk have the exact same interface
but you have figured out the optimal amount of info to ask for and determined what things can be automated with that
did you fix the srgb-imgui-ism yet?
PRs welcome deccy
Have you considered getting over it
Inshallah deccer will write an epic Vulkan renderer
vkEnumerateInstanceExtensionProperties is the only function call in there vk related : >
just use vook
: )
i would like to be comfy enough making a gpu driven thing first, using opengl 🙂
mayhamps
more optimal than gl at least 🙂
||sorry, that must be borderline insulting as it's not hard to be better than GL
||
what i mean is that there plenty of interesting things left
yeah like [redacted]
no, nothing of the sort
ah
would be interesting to see people go from api1 to api2 and compare, see if api2 is the holy grail and that it was absolute worth the effort
or whether performance is now worse 🙂
mmm, currently i have ideas for: new FE, new BE, improving allocators, new pipeline paradigm, support lib stuff like reactors, batcher, SPIRV templating and third party integrations of le FSR and SPD and other algos
time machine when 😔
when do you start?
time machine as in: generate time
or time machine as in go back
perchance lvstri can help
I'd rather the first one tbh 
vukfood thread when
go back, shoot my kneecaps out of mercy
fukvood*
I literally have no clue about anything he wrote in that message 
except maybe FSR and SPD
haha
FE = frontend
BE = backend
martty is a webdev, trying to improve build pipelines and CSS generators 😛
BE = bellend
almost 🙂
hehe
i have
most of these have some work done already
is that going into the (series of) article(s)
ah yes, thanks, i could also be writing articles 
Btw reflection is a bit annoying to implement because
- Images and samplers use regular uniforms, which are queried with GL_UNIFORM
- variables in uniform blocks are also iterated over with GL_UNIFORM
blocks are ugly af to reflect
Luckily I just need the binding
But for images and samplers, I have to iterate all uniforms 😦
So I'll have a massive switch to check the uniform type
continously reflecting on if torturing oneself with opengl is a good hobby
Actually vulkan doesn't have built-in shader reflection. Checkm8 atheists
is fwog spirv only or not btw
dank
As in uncomfortably damp and moldy
e
does this wrapper add some debugging features? as people generally complain about openGL being a bad degbugging experience
Ye not as good as Vulkan/dx in it's debug layer, but better than raw gl
belmu just linked this... https://placeholderart.wordpress.com/2014/12/15/implementing-a-physically-based-camera-automatic-exposure/
right now I don't do the camera response thing at all lol
this
nice
you mean not yet. thats ok
I'm gonna have to add a feature to fwog to enable rendering to imageless framebuffers
um I'll show you
let me strap in
oooh (its gl4.3+ stuff, explains why i thought it would otherwise be nimcoimplete)
actually, shouldn't glViewport suffice already 
dependson what you want to do? 🙂
e.g., I want to render to a 16k x 16k area
but have no texture attached
I want to use gl_FragCoord.xy and write to an image instead
i would imagine, fbs need to be complete with a color or depth target
and if you dont you get an error 🙂 no?
then the param you showed will make sure you can have an empty fb
param specifies the assumed width for a framebuffer object with no attachments. If a framebuffer has attachments then the width of those attachments is used, otherwise the value of GL_FRAMEBUFFER_DEFAULT_WIDTH is used for the framebuffer
maybe this stuff is just for glGet 
should do the trick then
hmm
render to an image... is also still beyond my imagination
i am thinking you draw to using a compoot shader?
aaah : this feels like a hack, crammed in there after the fact
the reason is that we want to render to a virtual texture where all pages may not be physically resident
we don't want a literal 16k texture, we just want to see where the samples land and write them to resident pages
at least a warning
However, the rasterization of primitives is always based on the area and characteristics of the bound framebuffer
yeah so I guess I need glViewport in addition to this
indeed it also says "otherwise business as usual"
you only pretty much just disabled fragment stage
now re fwog
would it be reasonable to have a dedicated BeginRenderism function for that sort of ism
I can add a shrimple check in fwog to see if there are any attachments, then call glFramebufferParameter if there are none
hmm
i think thats dangerous, what if you accidentally forgor to specify any targets, and dont want to render into empty fbo
because the Render function already takes a viewport that we can derive a size from
idk when the viewport size would need to differ from the framebuffer size in this case
then your screen is black and you open up renderdoc
can we spdlog::warn loggerHackCallback("are you sure you are doing the right thing? ill assume x for now. just so you know. dont come back and complain if the screen is black my mf");
ikr
it would allow expressing the other glFramebufferParameter stuff like layers too
and shrimple count
exactly
curious
Conceptually, virtual shadow maps are just very high-resolution shadow maps. In their current implementation, they have a virtual resolution of 16k x 16k pixels
max viewport for most things is also 16k
16k should be enough
ye
would be cool if all that technicality could be visualized later down the road too
for debugging purposes... somehow
visualize what exaccly
I could make a residency visualization like js has in #1090536732769927178
HOWEVER
#1090536732769927178 message
usually my dumb comments trigger a little corner of your brains a few days later somehow, and you always come around with some idea/solution/implementation of sorts
Me when deccer makes a random offhand comment
had to scroll really far to find that one
proof once again i am right 😛 (jk) (jk = jaker kidding)
so log2(16k) is 14 (bits), and 99% of implementations have 8 subpixel bits in the viewport
that only adds up to 22 bits
maybe there are 2 bits for the guard band for 24 bits total
then the other 8 bits are ???
__reserved
or maybe it's just 10 guard band bits
that gives the gpu a yuuge area in which it doesn't have to do clipping
which is a good thing
I should pay closer attention to those internal docs
the ones from internal.amd.com/gl/4.7/preview/?
there's no deccer at work to dogjiff.gifjif me when I forgor something
put it on your fone, quick access button thing
-100000 productivity
maybe I should just tape a pic of the dogjif to my monitor to remind me
or find one of those photo frame thingies which can play a clip
Hey cool this will be super useful. Ive seen some voxelization methods that also need this
Feature request : https://mastodon.gamedev.place/@[email protected]/110862555675970086
you're not gonna believe this, but I already have one
Neat 📸
it is not the funny number. you may not laugh
now push two commits at once
is that volumetric frog
unreal engine in shambles
fwog does a weird thing with stbi
i wonder why you decided to include it that way
hand including vendor/stb_image.cpp to all projects etc, rather than just compiling into a liblibstbi 😛
i cant trivially include <stb_image.h> in Application.cpp
jaker, fix your 🇸🇹 🇧🇮🇸🇲s immediately though
- signed, deccer
yep, tahts why i have fixed it, exshrimples fetch stb and link with it
and i did a thing
I do it to make the headers visible or something
make the icon be
but the eyes follow the cursor
i use that icon for my shit already 😄
error C2664: 'char *stb_include_string(char *,char *,char *,char *,char [])': cannot convert argument 3 from 'const char [19]' to 'char *'
wth
char* accumulateDensity =
stb_include_string(Application::LoadFile("shaders/volumetric/CellLightingAndDensity.comp.glsl").data(),
nullptr,
"shaders/volumetric",
"CellLightingAndDensity",
error);
const cast time
seems to be a msvc via cmake-tools-in-vscode thing now
didnt even touch those lines
alright, vs doesnt give a shit... and the frog works
Lol
You need to use my stb fork to get a version of stb_include that is const correct
Nope. I just fixed const correctness
@golden schooner what is this (I am finally at my computer)
https://github.com/JuanDiegoMontoya/Fwog/pull/92/commits/e3570086f09aa55e2c378a836d79cd3f6fccfba3
that is why you needed const_cast
explains the entire previous convo 
the version that's vendored in fwog also uses the safe c string functions, which isn't in my fork
hmm
i kind of remember now
but im also not a fan of keeping copies which go stale on day 1
i could perchance fetchcontent your fork instead of pulling it from upstream, unless stb accepts your PR
fat chance of that i think
Why do you have to fetchcontent anything
This all feels pretty orthogonal to what the PR is supposed to do
but isn't this normal practice? you lock your dependencies to a version to begin with and you fast forward them if there's a reason
imagine rerunning cmake and every time there's a chance that some web pull will maybe break your engine
using a never-upstream'd fork is not that different to keeping a submodule locked on a particular commit
it would just be fantastic if stb wrote warning free code, but thats what you get from using a 30 year old compiler?
yup, ill redo
30 year old coding practices more like
its also partially my inability to c++, vscode, vs and clion also behave different
Some (most) people are stuck doing what they are comfortable with
personally I'm starting to warm up a ton to keeping frankenstein'd forks of libs I use
that's the beauty of open source after all
why fetchcontent? because everything else is also fetchedcontented tbhed, and having lose dependencies flying around in your folders is "weird" (tm)
third_party directory
its still a mix
i like order 🙂
and i want to smear it all over the place when i can, obviously hehe
so many people use the third_party directory thing that it doesn't feel out of order to me
silently making web requests whenever I rerun cmake is out of order
if cmake worked/is used properly it shouldnt, but i understand what you mean
Ye that's something that could be changed at some point, but not in this PR. I'd have to put my other changes in my fork first
Hm
ci can it seems
but when i change #include <stb_image.h> #include "stb_image.h" in vendor/stb_image.cpp then it work
< > means there needs to be a -I set properly
but there is none
oki, fetchcontent is gone, only thing i did to stbi is the "stb_image.h" thing
I don't get the difference between <> and ""
I thought they just changed which directories were searched first in most compilers
from my little c++ understanding is "foo" is just raw directory structure, while <foo> relies on include dirs
<stb_image.h> would work if i did -Ivendor
I think in msvc it works a little differently but you can change that if you need
i did change it 😉
it should also be same in vs, but vs do be doing something differently for some obvious raisins
let me boot into windows to just double check that it also works there still
is there some sort of "sane" default flags one should set?
for either cmake or individual compilers somehow
I'm compiling with permissive- on msvc
I have the compile flags a bit looser for gcc/clang since I don't regularly build with them
what are all the flags?
that syntax is a nightmare 😄
Ye luckily I copied it from another project so I know it works 
inshallah the structure was padded
asan also seems to be available as a definition already ASAN
or possible leaked from other dependencies which doesnt properly name it
anywho... booting into michaelsoft binbows
why are you enabling asan for debug builds 🤔
pointing at hajji
asan is a significant overhead i thought
I think there are indeed asan-related builtins in cmake
uh ye, but how much
at least 240 fps (vsync limited)
Tbh I'm not even sure if asan is being enabled correctly
I'll have to check later
is DEBUG a cmake thing
good question
i haven't seen it, nor does it seem to be mentioned in your cmakelists
CMAKE_BUILD_TYPE is and default values seem to be Debug, Release, RelWithDebInfo and MinSizeRel
this asan is gonna be fast bc its not enabled i bet
: )
mao le
and [empty string], the secret build type
where can I file this bug
https://github.com/GraphicsProgramming/RVPT/blob/master/CMakeLists.txt#L58
at thomas the train
these docs are outdated for enabling asan with cmake 💀
https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=msvc-170
you have to look at the schema and use this other thingy
https://learn.microsoft.com/en-us/cpp/build/cmakesettings-reference?view=msvc-170
nvm it still doesn't crash with this 💀
int i[5];
int main()
{
i[100] = 2;
}
oh but it crashes when I make the array have 100 elements like in their example
maybe it only works if you go barely oob
asan found one error on cleanup already
seems like a false positive since it is erroring in a non-null optional with "container overflow" 
(it wasn't a false positive in the end)
how is that not a compiletime error by default
a static analyzer can detect it
code analyzers dont prohibit the compiler from saying no
unless its confiktured to emit one and not just a hint/warning???!
its ridiculous
the C++ stanadard™️ says that OOB is "ill-formed no diagnostics required"
smh
many things are "no diagnostics required" 
im sure its useful for some stupid embedded bs
probably
but then it should be an opt in
you can get the header of a malloc allocation by going out of bounds, if you so wish 
anyway i shouldnt complain about anything c++ related, i barely know her
my main monitor is at -something, +something not at 0,0
you get that when you arrange your desktops in various control panels... right monitor is vertical and i do be rembering that i fiddled with moving the main one around to not get stuck all the time when moving le cursor
interesting
and was wondering why my setwindowpos always made the window appear towards the top of the screen and is no longer centered 😛
since you and i use the same centering code, i took it upon me to fix it in fwoggi too
i should check erhe too probably
and send PRs to ALL!! other repos using glfw xD
I can fix it myself in frogfood lol
i should have made it into 3 PRs tbh ;P
heh idc
its good practice 🙂
i need to look into tbb, its quite annoying 🙂
Is it legal to not provide any input desc if I'm doing a bufferless FST?
no, but you need a default vao for it
wdym
: (
you say FST, i understand FST vs, where the vertices are defined there, the layout implicitly too => no vao setup needed, besides having a default vao bound
where FST = Full Screen Triangle
@robust bough or what do you mean?
I probably should have waited till I had the code running to ask but
im basically wondering if I can do this
void renderer::composite(const glm::ivec2 &target_size) {
Fwog::RenderToSwapchain(
Fwog::SwapchainRenderInfo{
.name = "Render Triangle",
.viewport = Fwog::Viewport{.drawRect{.offset = {0, 0}, .extent = {static_cast<uint32_t>(target_size.x), static_cast<uint32_t>(target_size.y)}}},
.colorLoadOp = Fwog::AttachmentLoadOp::CLEAR,
.clearColorValue = {.2f, .0f, .2f, 1.0f},
},
[&]
{
Fwog::Cmd::BindGraphicsPipeline(pipelines.composite);
Fwog::Cmd::Draw(3, 1, 0, 0);
});
}```
when the pipeline is just a bufferless fst
if your vs is a FST that should work
then your composite pipeline wouldnt require vertexinputdescriptions
yeah, i really should have just seen if it worked before asking 
(im having unrelated issues now)
hehe, does it?
heh
yes opengl, main is already defined for sure
#version 460 core
layout(location = 0) out vec3 v_color;
layout(location = 1) out vec2 uv;
void main()
{
vec2 position = vec2(gl_VertexID % 2, gl_VertexID / 2) * 4.0 - 1;
gl_Position = vec4(position, 0.0, 1.0);
uv = (position + 1) * 0.5;
}
perhaps you assigned vertexshader twice
not sure if fwog asserts proper vs/fs types
Fwog::GraphicsPipeline renderer::create_composite_pipeline() {
auto vertex = create_vertex_shader("assets/shaders/composite/vert.glsl");
auto fragment = create_fragment_shader("assets/shaders/composite/frag.glsl");
return Fwog::GraphicsPipeline{{
.name = "Composite Pipeline",
.vertexShader = &vertex,
.fragmentShader = &fragment,
.inputAssemblyState = {.topology = Fwog::PrimitiveTopology::TRIANGLE_LIST},
}};
}```
this is the code for creating it
oh for fucks sake
😄

i dont want to talk about it
dont copy paste code kids!
Fwog::Shader renderer::create_fragment_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::VERTEX_SHADER, read_shader_source(path));
}
Fwog::Shader renderer::create_vertex_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::VERTEX_SHADER, read_shader_source(path));
}
Fwog::Shader renderer::create_compute_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::VERTEX_SHADER, read_shader_source(path));
}```
i swear i never had these errors back when I did graphics a lot more ;-;
"0(8) : error C5052: gl_VertexID is not accessible in this profile\n0(8) : error C5052: gl_VertexID is not accessible in this profile\n0(10) : error C5052: gl_Position is not accessible in this profile\n"```
glfwInit();
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 6);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_COMPAT_PROFILE);
glfwWindowHint(GLFW_DOUBLEBUFFER, GLFW_TRUE);
handle = glfwCreateWindow(width, height, title.data(), nullptr, nullptr);
glfwMakeContextCurrent(handle);
gladLoadGL(glfwGetProcAddress);
Fwog::Initialize({});```
did I?
you need core
i tried core before
"0(8) : error C5052: gl_VertexID is not accessible in this profile\n0(8) : error C5052: gl_VertexID is not accessible in this profile\n0(10) : error C5052: gl_Position is not accessible in this profile\n"```
thats why im so confused ._.
can you confirm its not the fragment shader?
gl_Position/gl_VertexID are not allowed in fs
Fwog::Shader renderer::create_compute_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::VERTEX_SHADER, read_shader_source(path));
}
stage is wrong here too
I've fixed it there
oh my god
i fixed it there, but rereading it made me realize what i did wrong
i quit
im never doing graphics again
Fwog::Shader renderer::create_fragment_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::VERTEX_SHADER, read_shader_source(path));
}
Fwog::Shader renderer::create_vertex_shader(const std::filesystem::path &path) {
return Fwog::Shader(Fwog::PipelineStage::FRAGMENT_SHADER, read_shader_source(path));
}
😛
im sorry i've failed you
lel yeah
now to see if i can get a compute shader working
(something tells me im about to fail horribily)
@heavy cipher what was the issue you had with std::source_location in msvc?
I was thinking about upgrading my exceptions
@long robin how is this meant to be avoided 
are you using my enet fork
ye
I see
tbh i didn't even remember until i went to source and saw your name lmao
yeah I put my name in there so I'd remember where I made changes lol
i can just ifndef it but perhaps that is not "correct"
yeah it's kinda tricky
I think the most robust way would be to ifndef it in enet.h, then later undef it only if enet.h def'd it
#ifndef NOMINMAX
#define NOMINMAX
#define UNDEF_NOMINMAX
#endif
#include <Winsock.h> // or whatever windows header
#ifdef UNDEF_NOMINMAX
#undef UNDEF_NOMINMAX
#undef NOMINMAX
#endif
nice and symmetrical
mmm i might just undef it when including enet
I can update my fork too
i don't really care to not have it defined anywhere 
if you come up with a solution, let me know
otherwise drop an issue in the enet fork so I don't forgor
re Fwog::Cmd::DispatchInvocations can i nitpick to distract you a little?
whats the difference to ::Dispatch?
it takes number of invocations
if this has been inspired by vuk
normal dispatch takes # of workgroups
ah its an actual thing
DispatchInvocations sounds like DispatchIndirect
ah its not

It's just sugar for glDispatchCompute((x + w - 1) / w, ...)
ye i saw the code
something came to me in a dream
I was thinking about the fact that descriptors in metal are called arguments, and I realized that we should be able to call our shaders like regular (albeit async) functions
so basically cuda lol
possibly unhinged idea: custom build step that reflects shaders and generates a header with declarations that allow you to call the shader like a function
if we restrict it to compute shaders, a shader like this one:
// FooShader
layout(binding = 0) buffer FooBuffer {
float data[];
};
void main() {}
would generate the following declaration:
void FooShader(Extent3D numWorkGroups, Buffer& FooBuffer);
the shit part is that, since it's not single-source, definitions and struct declarations would still not be shared automagically
I guess if you're willing to go as far as to make a custom build step, you might as well just make a C++ dialect 
this sounds like something martty would say
you pick things up when you live in people's walls
all roads lead to writing custom C++ -> shaders
https://ebitengine.org/en/documents/shader.html
Ebitengine has custom shaders written in Go-like language. It's kinda impressive
It's written from scratch, btw
(but this is a bit of a different problem, yeah :D)
But still, imagine coding shaders in C++ and calling them directly 
cute logo
it is similar though, thanks for sharing
Ebitengine's dev and some guy also made it so you can call C functions in Go without cgo (which is too magical) and it's really impressive 😄
https://github.com/ebitengine/purego
This. So much hackery
Contribute to ebitengine/purego development by creating an account on GitHub.
why are my dreams building sized frogs shooting lasers and me eating a melon
I wish I had dreams like this 
Gonna post it here too
geryu is right, have you seen my spirv templates
Yes and they're somewhat disturbing
But very cool nonetheless
iirc there are some inherent restrictions with the method
thank you, thats high praise
Like all good template metaprogramming of course
after more than a year, my typo correction pr was merged on the refpages

Does fwog have abstractions for array textures and cubemaps?
Want to see how this can be done 
Found it: https://github.com/JuanDiegoMontoya/Fwog/blob/3aaf7ea8afbee789b44bd871237cd076bc3f3afa/src/Texture.cpp#L99
Noice
I see so many things I can’t abstract away as neatly without going full modern OpenGL 
yes
you just fill in TextureCreate/TextureUpload accordingly
ah lol
i was too late
Yes
It's part of the texture class
weird how github never picks the default branch when copying source code links by default
Isn't it good, so your links don't decay when the branch is updated
It does make the links ugly though
It’s good that it does it, because without it, 99% old code links are broken, which is -retty infuriating when you read old issues
Tbh you could make a similar abstraction with just 3.3 features
At least for the texture and buffer classes
No texture views though, which would hurt
Yeah, it’s just the binding part that requires a bit more work, since no glBindTextureUnit
But that’s pretty easy to do
it's just two function calls instead of one if you don't have that
I just have this "texture" class which is only abstracts "textures" (mostly diffuse) and I do array/cubemaps by hand
Also I don’t want to write stuff like "shader.bindCubeMap("skybox", tex, slot)" anymore
A thing I see in texture abstractions that mildly triggers me are constructors that take a path
no you are right
Even worse, the only non-move constructor taking a path
textures should be created from buffers 🙂
Vulkan moment
Fwulkan moment
I'm going to call my Vulkan library Fuk
that was my plan already xD
Well, I have
void loadFromFile(const std::filesystem::path& path);
But I think it’s better to make a free/static function which returns a texture and loads it from file
a method factory for Texture is also ok, which consumes a path
static Texture Texture::LoadFromFile(...)
I also recently abstracted away default FBO and made it look like a normal frame buffer… made me feel good 😂
No more "bind fbo 0" now.
we do it via BeginRenderToFramebuffer(someStruct)/EndRender
and BeginRenderToSwapchain/EndRender, which renders to fb0
No, I need to suffer through it all to fully understand it 
fair 🙂
Btw, about abstractions… what does fwog do on render passes? Does it reset all state on each pass?
E.g. clear scissor (if not specified), disable blend, disable depth testing, disable cull, etc.
That’s the thing I don’t really like about my code now - constantly have to keep in mind if the rasterizer state is correct
And it’s easy to remember to enable depth/blend where you need it (and I do it explicitly), but for more "rare" stuff (like scissor), I turn it on and then off
yes
you describe your state into the struct you pass into BeginRenderXXX
thats what it will be set to for the time being within BeginRender/EndRender
there is not really a "turn off again after it was turned on" thing
each pass will just set what it was set up for in said descriptor
and explicitly turned on/off at certain corner cases (framebuffer srgb ism for instance)
Yeah, I need to make some more explicit "render pass" descriptions and objects which control it
Because right now my code is too stateful at times
(But the main renderer is immediate, you add draw commands and lights to it on each frame, and then call "draw" - makes it so much easier to manage)
fwog internally tracks state so a minimal number of state calls have to be made when a render pass begins or a pipeline is bound
the way I track state right now is kinda stupid and will lead to terrible bugs with certain usage patterns that I don't currently do, but I really need to fix it
https://github.com/JuanDiegoMontoya/Fwog/issues/90
what I plan to implement is a generic state-tracking wrapper below fwog, then fwog will call that to set state
hmm I might implement it as a separate library
Yeah, I guess you can have a "global" context which keeps track of current state and changes it to a new render state if necessary
anyways, it will allow me to efficiently do this without having state tracking clutter the impl
void BindPipeline(Pipeline pipeline)
{
state.glEnable(pipeline.someState1);
state.glEnable(pipeline.someState2);
state.glEnable(pipeline.someState3);
state.glEnable(pipeline.someState4);
}
ye it's pretty typical in engines that use GL afaik
Btw, have you seen any engines which have similar abstractions around GL? Most games I’ve seen seemed to do a lot of stuff manually
I might slim things down by removing my vao and framebuffer caches too
no, most of this is just speculation
and most (all?) engines that use GL I've looked at do not use it in a very good way lol
or let's say, "modern"
I guess I’m already far ahead if I start using fwog as an inspiration 
At least I stopped unbinding vaos/textures 
you can have just one vao and set the attrib format and whatnot whenever a pipeline is bound
likewise, you can have just one fbo and bind render targets when a pass begins
i also believe our engines do a better job at abstracting opengl than anything else ive seen, other engines just wrap buffer/texture/shader and thats it
Huh, I thought attaching textures to different fbos ahead of time and not changing it was needed for performance, but I’m probably wrong? 
Also I have a "mesh" abstraction, which has vao+vbo/ebo for each mesh, which is also a pretty GL-beginner territory, probably
it's not bad for 3.3
yeah its fine
the only thing I'd change for moderner gl is to not put a vao in every mesh
that's approaching
territory, but yeah that's what I'd currently do
and use an indirection to handle multiple meshies
This makes me think… FBO is probably just a "current state" thing, right? Kinda like a shader program, where you constantly set uniforms (if not using UBOs), but with attachments as outputs
Some day, some day… I implemented frustum culling at least 💪
yeah who knows if drivers just cache each unique fbo combination or if each fbo corresponds to some low-level object in the driver
I doubt it's the latter
fwog kinda attempts to make performance more predictable by restricting the user in certain ways (you can only set state with Render and BindPipeline calls), but if you want a guarantee you would just use vulkan or d3d12
I guess you can really have only two fbos for your entire program (one for reading, one for writing) and swapping between them when needed
That’s a problem for a future bikeshedding, though 😂
true, I forgot about read and write fbos, but that's only needed for blitting, no?
you could just make two temp fbos in your blit function
Yeah. 99% of the time I only write, not read
Why even two? One would probably do, and the other one will be the "main" one
just to keep things shrimple I guess, but yeah that'd work
I also learned about fb blitting only this week 
it's a dumb name for what is essentially a copy
though it behaves more like a fullscreen tri draw that samples the read framebuffer
Yeah, I only previously seen blit as a "draw to screen, swap backbuffers" thing
And I copied contents of FBOs via shaders 
Speaking of abstractions… I see so much student code (sometimes pretty impressive stuff) with 0 abstractions, haha
"Download zip of my project" ones especially
Just raw gl calls, nothing else
I did one renderer with raw (modern) gl and it wasn't too horrible
same
a little space esque (ofc) scene, with a ring of "cubes" and some jank 6dof thing driven by physx
it sucks having to constantly remember the state when you add some new functionality lol
https://github.com/JuanDiegoMontoya/GLest-Rendererer/blob/main/glRenderer/Renderer.cpp
the fact that fwog lets me use local reasoning when adding a new pass is probably the biggest productivity boon
You have a shader abstraction there at least 
(But I think most people at least do it to keep their sanity)
I also have a really bad texture abstraction in there
I think that's like the bare minimum you need
my text renderer thingy just has a Pipeline and Image, and that's about as far as I'll touch GL
I wish there were books that went beyond explaining the basics of OpenGL and teached you how to build a renderer. I guess "3D Game Engine Architecture" is pretty close, but it feels very outdated, especially how it relies on scene graphs
at that point you're beyond API-specific guides
something like this
https://blog.mecheye.net/2023/09/how-to-write-a-renderer-for-modern-apis/
I was wondering why this article was so good, but then I noticed it’s Jasper 
Also was wondering why I never found it before. sends this article to myself from 5 years ago
Well, I don’t even need API specific guides anymore… the problem is that no one writes stuff like this even without being API specific. It’s all "sort your draw calls! merge geometry together!", but no one shows how to do it efficiently and not-stupidly 
do people tell you to sort your draw calls in 2023
https://realtimecollisiondetection.net/blog/?p=86
Damn, this article is 15 years old now
yep I knew you were thinking of that one
I have it bookmarked somewhere but I think it was irrelevant even before I started graphics stuff
Yeah, but you get my point. Deferred shading is the closest to the "structure" that was given to me, and I’m grateful that at least it helped me get on the right track
yeah it's kinda a too-optimized way to sort draws
like bro your draws can be more than 64 bits of data
std::sort will get you 90% of the way there
and really the only thing you need to sort is your alpha stuff at the end of the day, assuming you're not doing OIT
sorting draws is still useful if you have an engine that doesn't do hardcore batching
and it's super easy to implement
I'd argue other constructs should naturally cause things to be grouped
with pretty much everything except alpha passes you have stuff you might want to be "together" but not necessarily "in order"
Are there any good resources on batching, btw?
I get the idea, but the implementation seems really tricky
how hard do you wanna go with batching? like full MDI?
the vkguide compute culling section is probably good for that
there are different levels of batching
the easiest one is just instancing the same mesh
the 2ndiest is merging geometry
and using an indirectbuffer to point at the elements + gl(Multi)DrawIndirect
That’s assuming you have tons of repeating meshes. Which I probably won’t, tbh
the current complexity of your game world is probably still alright for good old plain glDrawElements
yeah if anything that sounds more like a job for fairly naive culling
especially since you have a fixed camera angle right
Yeah… it’s not like I’ll be drawing huge cities any time soon 😂
but when you add shadows, you might want to have the gpu tell the gpu what to draw, so you merge all geometry, feed it to a compute shader, have the cs cull all the things, and what is not culled is written into the indirect buffer
you could probably just chunk your map on the CPU
and that would be more than good enough
Okay, I think I’ll just do render passes and stop bikeshedding for now 
or you could convert it all to meshlets and implement compute culling
I definitely know which one is more fun
doesnt have to be meshlets yet, as in visbuffer meshlets, but that would probably the next level
No cs on 3.3 
i think you can do something like that in a fragment shader as well
but yeah 🙂
gl46 my beloved
indirect draw is also a nono on gl33 iirc
yeah, your world is definitely in a very good state for simple CPU-side hints to go a long way though
ah no, its an extension in 3.1
I should do a multi-part writeup of ff's renderer at some point
in collaboration with lvstri of course
Final Fantasy’s? 
ye
Which one?
FF:F
I had an idea while ponderin me orb
related to a rant I had a day or two ago
but wouldn't it be neat if you could invoke a pipeline with one function call, then you just provide all the bindings as literal function arguments
to make it more like normal programming and reduce boilerplate
Need some code examples 
uh right
let's do hello tringle
right now, hello tringle in fwog is something like
Fwog::RenderToSwapchain({
.name = "Render Triangle",
.viewport = Fwog::Viewport{.drawRect{.offset = {0, 0}, .extent = {windowWidth, windowHeight}}},
.colorLoadOp = Fwog::AttachmentLoadOp::CLEAR,
.clearColorValue = {.2f, .0f, .2f, 1.0f},
},
[&]
{
Fwog::Cmd::BindGraphicsPipeline(pipeline);
Fwog::Cmd::BindVertexBuffer(0, vertexPosBuffer, 0, 2 * sizeof(float));
Fwog::Cmd::BindVertexBuffer(1, vertexColorBuffer, 0, 3 * sizeof(uint8_t));
Fwog::Cmd::Draw(3, 1, 0, 0);
});
let's also pretend that manual vertex pulling is the only thing that exists
so you have to provide vertices via ssbo
Ideally, I'd want to write something more like
sceneTexture.Clear(.2f, .0f, .2f, 1.0f);
pipeline.Draw(3, 1, 0, 0, viewport, vertexPosBuffer, vertexColorBuffer);
I mean the API looks kinda unhinged for draw calls, but I think for compute it would be more streamlined since the required state is much smaller
computePipeline.Dispatch(dispatchDims, someBuffer1, someBuffer2, someImage);
which is more CUDA-like
Maybe can even make this overload:
pipeline.Draw(viewport, vertexPosBuffer, vertexColorBuffer); // 1 instance, use all vertices
while this looks not-ideal for perf (since it seems like redundantly setting state), I'm at the point where I make few enough API calls that idgaf about API call overhead, plus the abstraction can deduplicate stuff automagically
I don't think there's an easy way to determine how many vertices there are unless you expose fixed-function vertex fetching which I'd rather not do
mostly because it complicates things lol
Can't you store it in buffer structs?
to the API it would just be an untyped buffer
it wouldn't know how big a vertex is or even if the buffer held vertices
Hmm, well, maybe it can be like this at least:
pipeline.Draw(3, viewport, vertexPosBuffer, vertexColorBuffer);
since there's no point of using instances everywhere and having to write "1, 0, 0" everywhere might be a bit annoying
In my game, I store number of vertices next to EBO/VBO so I don't have to manually pass the count to glDrawElements
fair
yeah in fwog I only expose the best draw calls
which means glDrawElementsInstancedBaseVertexBaseInstance, glMultiDrawElementsIndirect, glMultiDrawElementsIndirectCount, and the non-indexed variants
glDrawElementsInstancedBaseVertexBaseInstance what Vulkan-speak is this... 
it's vkCmdDrawIndexed
it lets you specify:
- indexCount is the number of vertices to draw.
- instanceCount is the number of instances to draw.
- firstIndex is the base index within the index buffer.
- vertexOffset is the value added to the vertex index before indexing into the vertex buffer.
- firstInstance is the instance ID of the first instance to draw.
opengl just makes it look scary because it's opengl
Yeah, I was joking because the function name is so long, but Vulkan's is surprisingly shorter
the difficult bit with creating a high level interface is getting it right
when you depart from the wrap to greenfield the decisions are not obvious:/
yeah the neat thing about cuda is that a kernel really is just a main function that you invoke and pass some arguments to
with graphics pipelines it's way more complicated
you could do compoot first
I suppose I could make such an abstraction for compute pipelines as a start, but there are still some things I'd have to 'shed since it's not truly single-source
like deciding how function arguments are mapped to glsl resource declarations
join me brother for my spirv template quest
for gl it just needs to be spirv crossed back
gl can eat spirv directly
but aren't there some fundamental limitations to the template approach or am I trippin
yeah and i can eat glass shards too
well, is there a shader complexity limit with this approach
I mean, it's probably turing complete, so I guess what I'm asking is if it's sane to write complex shaders with it
the key is to throw sanity in the bin at the first opportunity
but i'd say the tricky bit is control flow
yeah I have no idea how you would express that in a way that feels like normal programming to use
gimme uhh
branch(condition, trueExec, falseExec)
doesn't shady allow unstructured control flow? you could just make the programmer write it as a linear stream of instructions and have them use goto
shrimple solutions for clamplex problems
uhh, can't parse that
yea
gob certainly didn't make it easy to find his thing on google
but iirc the IR for it has unstructured control flow
anyways I'm just memeing so it doesn't matter
bueno
anyways, i too was meming
i am not sure how feasible is to do full spirv
the effort skyrockets
whereas "gpu lambdas" is reasonable
yeah that was basically my concern
I do have a few places where that would be handy, but ideally I'd like my hot new shading lang system to support everything (that I deem to be useful)
maybe I could cope by telling myself that I'm actually using a cool functional language when I write spirv templates
What does fwog provide?
in general? or with respect to the ongoing convo?
gotcha
I'd refer to the docs or the github
https://fwog.readthedocs.io/en/latest/
https://github.com/JuanDiegoMontoya/Fwog
tl;dr: it wraps OpenGL 4.6 in a modern C++ API that is less painful to use than plain GL
there are examples in the repo
GL_DONT_CARE 