#DARE (Danny's Awesome Rendering Engine)
7542 messages · Page 8 of 8 (latest)
Nope
Wrap them within a reference counted handle then
And only drop it when the last one gets dropped
wait
// Dropping the handle will decrease the reference count of that handle
// If we drop the last valid handle, then the stored value will get dropped
impl<T: 'static> Drop for Handle<T> {
fn drop(&mut self) {
// If the counter reaches 0, it means that we must drop the inner value
if unsafe { self.decrement_count() } == 0 {
self.trackers.dropped.lock().push(self.key);
self.trackers.cleaned.store(false, Ordering::Relaxed);
}
}
}
``` this was my handle system for example
let me cook
Vec<Weak<T>>
we can put handles where the Weak<T> no longer exists as free
thus eliminating the need for a need to hold a ref to the original
but it would also imply
i would need to iterate over the entire data vec everytime i wanna add 
wait so the raster pipeline permutation problem, is that mainly because of shaders?
also i assume if im storing my pipelines, it should be stored in the material struct, yeah?
okay, when uploading a fucj tons of floats
can i just do
layout (buffer_reference, std430) readonly buffer Vec3Array {
vec3 vectors[];
};
layout (buffer_reference, std430) readonly buffer Surface {
Material material;
mat4 transform;
uint bit_flags;
uint _padding;
Vec3Array vertices;
Vec3Array indices;
Vec3Array normals;
Vec3Array tangents;
Vec3Array uv;
};
Without having to worry about padding for my stuff like vertices
or am i cooked
Dude how tt do you make asset loading faster
multithread it
use a runtime like tokio and just spawn a task for each (sub)asset
it goes brrrr
problemo no. 2
i load too many assets at once
and blue screened my OS

but also turns out tokio has semaphores so i can just use those to manage max amount of memory
pretty sure you can also just limit the amount of concurrent worker threads
but yeah thats not good
i just used tokio semaphores
i made 1 semaphore = 1 bytre
then just have each asset loading acquire the semaphores
worse case, it'll wait
idk how to make it cleaner though without introducing another async crate 
Seems pretty sensible
Just need to make sure you don't end up with a deadlock somehow
Like
total space: 1000b
asset a: needs 300b -> 700 left
asset b: needs 500b -> 200b left
asset a: during loading, recursively needs 250 more bytes -> cannot be granted, wait
asset b: during loading, suddenly needs 100 more bytes -> cannot be granted, wait
deadlock
many ways this could happen, for example a gltf file loading in mesh data first and then resolving paths to textures and starting those
mmm
yeah deadlocking could be problematic
that being said if it dead locks
skill issue frfr
duality

plan numero 2
mpsc
we make a giant fucking queue
which contains information about assets we need to proess
this itself issues out new processes
to read the data
when 1gb is data is read, it will be all submitted to the main thread to copy the data to the gpu by issuing copy commands
🙏
that should prevent the bottle neck of having to create a gazillion different transfer buffers
making those transfer buffers on the fly isnt too bad because you end up suballocating from a real buffer with gpu alloc
the biggest delay is memcpy/buffercopy
which you can just do easily off thread
and of course reading the file is slow as hell
i think if you send a whole gb to the main thread to process you'll end up with spikes
MMMMMMM
true
i mean i guess i'll just have tasks
handle loading assets
making transfer buffer
copying to transfer buffer
then main thread's only job is to literally just submit
ill still need to chunk it
so i don't keep literally double the scene memory
I thought submission had to be on a single thread
access to a queue needs to be externally synchronized
So if you have a queue for all your transfer operations you can just lock it, submit, unlock
Nothing has to be single threaded with vk, you just need to synchronize yourself
Well considering all my submissions do not access the same data
I don’t think I would need to sync?
I would only need to ensure all submitted commands are completed before moving on
The VkQueue access needs to be synced
It writes to some internal memory that's not synchronized by the API
Ahhhh
Would I need to wait for the submission fence to be completed prior to submitting another command?
huh alright
uh 1 command pool per thread or
yeh
cannot record to command buffers from the same pool on different threads
since recording a command buffer may modify some internal memory in the pool
finally, multithreaded asset loadinhg
it's weird however
it seems like the gpu transfers all buffers at once
ah it was because i was deleting my command buffers and pools before 
Hey, how would i generate mipmaps using a shader?
raw vk, no confusion
penguin, why did you ,ake vk::Queue Arc<Mutex<T>> when you're not even cloning it
also i should probably make some sort of "QueuePool" struct or smth so i can simply yoink avaliable queues
hmmmmmmmm
i dont member
maybe ancient
I gave up compute shaders
mainly cuz i didn't know it well enough
so i just used blits
that works too
compute shaders allow you to use slightly better filtering methods
(You could’ve just copy pasted mine)
yea i could've but i wanna understand my code 
im shocked i do
my prof is does like image convolution or smth and he was hyping me up to learn sampling methods 
okay so the next goal is getting gpu acceleration structures rolling
then ill become ray tracing itself after that moment
i think
real rt
Better than wtf I had in my engine lol (it was horrible cpu side filtering 🥰)
ewwww
just realized i was using nearest filtering
and not linear
explains why i had weird artifacts on images
mm im gonna use slangc
ill lose it if i have to use dogshit glsl with no checking at all
@tropic vigil sorry for ponging, but why does your build_info implementation only have one geometry but multiple build range infos?
Pretty sure it’s mostly a copy of the api
The api lets you pass multiple geometry i think?

so how would i deal with dynamic objects... like do i just throw a mutex on them and call it a day? also what about the underlying Mesh info buffers should i have it so i keep a copy per frame or have just one with a mutex
wat da hell
nvm 
inshallah
okay so irealized
for ray tracing to work
i do need a large array of mesh infos

Thus I introduce: BackedSparseSlotMap
A sparse slot map (aka free list with generation counter), that is backed by a GrowableBuffer on the GPU
now should i probably introduce a shrinking mechanism
probably
will i
fuck no
seems reasonable tbh
i lost motivation :(
my main issue is uploading my mesh info
to my buffer
should i just write to it per frame when needed?
what kind of mesh info is this
Model transform + bda addresses of the vertices + indices + material info
yeah just write every frame to a HOST_VISIBLE | DEVICE_LOCAL buffer
actual brain dead code
/// Insert an element into a sparse slot map
pub fn insert(&mut self, data: T) -> Slot<T> {
let next_free_index = self.free_list.pop().unwrap_or_else(|| {
self.data.push(SlotEntry {
data: None,
slot: Slot::new(self.data.len() as u64, None),
});
self.free_list.push(self.data.len() - 1); // bruh moment
self.data.len() - 1
});
let slot = self.data.get_mut(next_free_index).unwrap();
slot.data = Some(data);
slot.slot.clone()
}

i was so confused why my slot ids were
0,1,1,2,2,3

okay so i gaveu p
and used a deletion queue
to manage all my meshes
i mean it works really well
well besides from having to increment them all by one but regardless, it cleans up deleted stuff well
but now how do i deal with asset streaming 
jumping from major feature #1 to major feature #2 i see
average engine development experience
real
i realized i should get my asset loading down first or else i'll end up with my old shitty engine
HMMMMMMMMMMMMMMM
ecs or
event based
in other words, do i bite the bullet and use bevy
penguin pls help
nvm i figured it out
its a bit annoying but not too bad
i just have winit call schedule.run
yeah
okay gang, how to asset stream
pub enum ResourceLoadInfo {
Gltf(Accessor),
}
pub enum GPUResource<T> {
NotLoaded(ResourceLoadInfo),
Loading,
Loaded(T),
Unloading,
}
what
mmm
i think i got it
i just use a sparse slot map however
hmmmmmmmmmm
wtf is erased storage
oh
it's cursed
circumvent the type system
ill stick with enums 
but then you have to add assets to the enum which sucks
I even explained it here https://notapenguin0.github.io/posts/rust-event-systems/#intermezzo-type-erasure
its safe!
you cant deny that it makes for a really clean asset system though
le raw
indeed
yeah
you could
i forgor i was using bevy so ill just use that to manage my assets kekw
lmao
man i still have no clue how to handle streaming well
I guess wrapping my structs in an Enum is probably best
then pseudo copying the gltf 2.0. spec to my own classes to provide info on how to load the assets
cuz penguin's asset storage is good for storing loaded assets, but no clue on streaming it in and out
okay so like i have my assets in storage with the enum
how would i handle the condition that the asset is currently just being loaded in
pub enum StreamableAsset {
NotLoaded,
Loading,
Loaded(T),
Unloading,
}
How is Loading different from NotLoaded
Simply indicates the asset is currently loading
i assume it would deflect duplicated load attempts
I guess that would be useful yeah
Actually I do have Pending(…) I think, I just don’t have NotLoaded at all
If you have a map <path, Asset> with just the last 3 variants then if path isn’t in the map you know it’s NotLoaded so the variant doesn’t really have to exist
o7
as school approaches, this project will now be lobotomized until next break/work term
Real 😭
Dude asset management + streaming is impossible dawg
Wait
I’m fucking stupid
I could just use my existing one then just add the enum bullshit to it
yo chat
what if I changed my gpu resource table
from using handles
to also support holding Weak<T> references
so I don't have to do:
pub struct ArcHandle<T> {
pub handle: Handle<T>,
pub gpu_rt: GPUResourceTable<T>
}
impl<T> Drop for ArcHandle<T> {
pub fn drop(&mut self) {
self.gpu_rt.remove(self.handle.clone());
}
}
I love c# and kotlin too
besides from the doodoo UpperCase variables, i think it's cool especially with Harmony letting you do runtime patches like in minecraft java
They’re not bad yeah
That and the weird way brackets convention
I honestly couldn't care though in my unity projects I just do whatever I want
pub trait UnloadedAsset {
type QueryingAsset: LoadingAsset;
type LoadedAsset: LoadedAsset;
async fn load(self) -> Result<impl LoadingAsset<LoadedAsset=Self::LoadedAsset>>;
}
pub trait LoadedAsset {
type QueryingAsset: LoadingAsset;
async fn unload(self) -> Result<impl LoadingAsset>;
}
pub trait UnloadingAsset: Future<Output=Self::UnloadedAsset> {
type UnloadedAsset: UnloadedAsset;
}
pub trait LoadingAsset: Future<Output=Self::LoadedAsset> {
type LoadedAsset: LoadedAsset;
}

penguin what is the opinion
uhhhh
I'm confused what's wrong with
trait Asset {
type Loaded;
type Unloaded;
async fn load(self) -> Result<Loaded>;
async fn unload(self) -> Result<Unloaded>
}
enum AssetState<A: Asset> {
Loaded(A::Loaded),
Unloaded(A::Unloaded),
// etc
}
AHH FUCK IT ive been bike shedding too long
yo
so like when transferring data to the gpu
do you think i should just have a massive transfer manager which has some sort of pre-allocated transfer pool that will handle all transfers?
or like have it so each buffer asset will simply handle transfers on it's own
imo each asset can handle it on its own
since you do it asynchronously anyway
if you really have a lot of tiny assets you can pool the submits together
but the buffers will already be pooled by using VMA/vk-mem-rs/gpu-allocator
Mm yeah
My only issue is that if im loading too much at once
I might go over my memory limit 
Chat
we have an issue
-> Result<impl Stream<Item = Vec<u8>> {
match x {
x::Path(path) => {
file_stream
}
x::Url(link) => {
link_stream
}
}
}
How tf do i make sure they return the same type
since they obviously won't 
also i found out about streams
they were what i wanted
wait, does this mean i could theoretically send a Pin<T> around threads without much of a worry?
FUCK
maybe making it into async streams was a bad idea
there is no reason for it to even be since im not even awaiting on any async operation. file reading is basically synchronous
penguin advice please
I'm just chunking my files into blocks which I can turn into an async stream but not sure if it's worth it since realistically, it's the stream itself that will be polled/next on multiple threads, not the actual iterator
nuh uh
i mean im just chunking my assets into like chunks
then streaming that into my gpu
only exception is images which i kinda have to load all at once
hmm
also does someone have a co-op position for summer 2025-fall 2025
Uh there any good algos to generate random numbers on the gpu?
// Generate a random unsigned int from two unsigned int values, using 16 pairs
// of rounds of the Tiny Encryption Algorithm. See Zafar, Olano, and Curtis,
// "GPU Random Numbers via the Tiny Encryption Algorithm"
uint tea(uint val0, uint val1) {
uint v0 = val0;
uint v1 = val1;
uint s0 = 0;
for(uint n = 0; n < 16; n++) {
s0 += 0x9e3779b9;
v0 += ((v1 << 4) + 0xa341316c) ^ (v1 + s0) ^ ((v1 >> 5) + 0xc8013ea4);
v1 += ((v0 << 4) + 0xad90777d) ^ (v0 + s0) ^ ((v0 >> 5) + 0x7e95761e);
}
return v0;
}
// Generate a random unsigned int in [0, 2^24) given the previous RNG state
// using the Numerical Recipes linear congruential generator
uint lcg(inout uint prev) {
uint LCG_A = 1664525u;
uint LCG_C = 1013904223u;
prev = (LCG_A * prev + LCG_C);
return prev & uint(0x00FFFFFF);
}
// Generate a random float in [0, 1) given the previous RNG state
float rnd(inout uint prev) {
return (float(lcg(prev)) / float(0x01000000));
}
holy shit
THERE ARE UPDATED BINDINGS HOLY SHIT
slang changed my life forever (an actual functional fucking lsp)
OH MY GOD I REALIZED MY UNIVERSITY HAS SO MANY TEXTBOOKS 💀💀
They’re all online so i have pdf copies of real time rendering and so many rendering books holy shit
i forgot i was in a university
YESSIRRRT
(I am going to borrow the rendering ones even though I already know how to dthat shit for the funsies)
Okay so like how do i reconcile my resource table for bindless
And asset manager
Would having the asset manager hold Arc<T> then the resource table holds Weak<T> be fine?
Asset manager uses a deferred deletion to ensure unused resources are removed
wtf
rtx 3050 doesn't support bindless?
ohh wtfff
rtx 3050 doesn't support shader storage non uniform indexing?
nvm it does i forgor vulkan sdk
for sure it does lool
Okay now for some incomprehensible reason, the min surface size is 800x600 
This has to be sdk related no shot
Okay i have no fucking clue now
Is there some like thing that can report all information about my Vulkan device including min surface size
As it’s reporting a min size of 800x600???
You knkw what
Why does Rust have the most incomprehensible async known to man
And why is any basic QoL feature literally locked behind nightly 
I thought rust was originally made concurrency
they are addicted to nightly
because making async work with all of rusts safety rules turns out to be a very hard problem
alright
i figured out the 800x600 min image issue
fix: clamp everything the fuck down
i have no clue why
nor do i wanna know why
but for some reason winit on my laptop will first make a surface that is 800x600 min,max size
then actually make one that is usable
hmm
should i only load assets into the gpu
as long as the surface (struct that references the buffers, textures, etc.) is still in the render stack?
probably
Deferred deletion is kinda annoying wthf
pub struct DeferredDeletionSlot<T: AssetDescriptor> {
/// static tll
ttl: usize,
/// lifetime of resource
t: AtomicUsize,
/// resource state
state: Arc<RwLock<AssetState<T>>
}
pub enum AssetState<T: Asset> {
Unloaded,
Loading(tokio::watch<Arc<T::Loaded>>),
Loaded(Arc<T::Loaded>),
Unloading(Weak<T::Loaded>)
}
I just really don’t like the double nested arc 
I kinda need it so I don’t have to scream to RwLock if I’m just a frame trying to access my resource
Erased storage, i just used flashmap and made it static after initialization to ensure i effectively get insanely fast reads
@tropic vigil sorry for ponging, but do you have any idea if vulkan resources are Send safe?
they should be, yeah? as long as i don't implement clone or copy on them
thank you my glorious king
a normal raw pointer wouldnt be, but since sending it to another thread doesnt invalidate the gpu resource its fine
hmmm what is a good ratio to allocate for transfer streams
like uhhh
max(100_000, DEVICE_MEMORY_SIZE * (0.05))
sure
would i be smarter to have render resources in my render world (ecs system)
as their own entities
or put them all in one massive dense storage struct
and make that into an ecs resource
i should probably turn them into virtual resources soon
but idfc
i haven't seen pretty images in a while
and im gonna start tweking out if i don't
1 fps damn
it's wrong trust me
anyways i bikeshedded the fuck ton out of the asset streaming system so now i can watch it stream everything in and see all my assets progressively show up
it's a shame all of this work will go to waste as I begin to bike shed the render graph 
alr gang
We say that memory is visible to a particular stage + access combination if memory has been made available and we then make that memory visible to the relevant stage + access mask.
What the fuck does the mean
they used visible to define visible
what the actual fuck
Should each frame have its own mesh buffer?
I hold like a hash or smth so i dont have to do a cpu transfer all the time but still
just upload each mesh toi a buffer once
Like a mesh has its own buffer? Destroy it when the mesh is destroyed?
i yearn for the virtual resources
indirect draw now actually actually works

Penguin i am gonna use your rendergraph as reference
uh do you have any advice on things i shouldn't use as reference iirc you were talking prior how you wanted to rewrite one part of it
also should i make people use names for their resources of just force them to use unique handles
names are very convenient imo
im not really convinced yet my way is fully robust
but it works reasonably well most of the time
Okay yeah I went with unique handles. I figured I would end up re-implementing more unique handle allocator at the end of the day
That being said, I’ll probably add optional debugging symbols onto handles
Hey penguin
How did you deal with
virtual resoures + the fact vulkan lets you synchronize parts of a buffer
i mean easy approach is to turn off brain and assume a virtual resource implies the entire buffer
but like what if I want to only capture a portion of said buffer?
All my buffer dealings use BufferSlice which is just a subregion of a buffer
Hmmm
so like if i got 2 virtual buffer slices
your implementation will treat them effectively as seperate resources?
how tf do you make sure that overlapping slices are properly synchronized 
Yeah pretty much
User error TooTroll
You should disallow creating overlapping slices
No aliasing allowed
Buffers could keep track of which slices are allocated and refuse to give out more
Like your average suballocator would
giga brain
now to make rust happy and tell it that is okay i am referencing a slice of a buffer
i think i might just use unsafe
im not willing to go through that effort to gas light rust that it is safe
You know I've been thinking about switching to C++
Primarily because, I probably do need projects on my resume using actual industry languages
but like sunk-cost you know??
Rust isn't that bad or weird though
And like if you know Rust you probably have no problem picking up C++
hey penguin
how do you ensure reads and writes are ordered properly
ie you dont make too many barriers for multiple reads
uhhhh
good question
i think the way it works is that if you do a read, you can take the "output" of that read as an input and you have a R->R in the graph so it wont make a barrier
its honestly not the best system rn
hmm okok
i think ill just work on virtual resources first
since i can more easily integrate that
my issue rn is dealing with create info stuff
I could use what you did and hash it, but what if i have multiple identical buffers? I'd eventually have to use some uid system
I don't hash create infos for buffers etc
Just for pipelines/descriptor set nonsense
ahh
virtual resources are just string ids that are bound to actual buffers at recording time
penguin
this is gonna sound dumb
but do you like store the create infos for your resources ie images and buffers?
so you can like call them by name and if they're not there physically, you can use instantiate them using the pre-defined create info?
I don’t do this but you could if you wanted to
I don’t really see the advantage though
Buffers are generally a lot easier to manage creation/destruction of “manually”
I did make something like that for image attachments but not for images/buffers in general
hmm pengiun should i just assume every image will have an expect image view
or should i bother to encode that
every image will have a view sure
but not necessarily one that covers the whole image
am i tripping or like
renderdoc becomes really slow
when viewing a lot of resources
like not at once, but if there are just a lot of them, it will struggle
You should
There's no point in using EXCLUSIVE except for images that you're going to use as attachments iirc (on desktop)
is it just because functionally almost every sane gpu doesn't really care?
Yes
alr gang
do we think it's more intelligent to simply contain a sampler info with an image or make it's own component in an ECS
to prevent duplicate samplers from forming, we just use a hashmap everytime we want to transfer an image onto the gpu
whoa whoa hang on
why does your ecs know about samplers
ecs is gameplay stuff not rendering backend stuff
I just have my renderer make the few samplers it needs manually, theres not a lot of them
how would that even work? would images be entities? that smells like overgeneralization
this seems very unnecessary
hmm does bevy to everything with evs?
I effectively extract data from my main world ie a mesh and decompose said mesh into buffer + material components
During rendering I produce GPU commands from the render world
Tbfh i dont need this much overkill and I 100% could simplify it
Bevy uses the ECS everywhere in the engine, not just in gameplay, though its renderer uses it less. Currently assets like images aren't entities, though we want them to be so that we can get the optimizations that have been written for the ECS (we have some [silly, imo] benchmarks where we do things like change 160,000 material assets per frame and then everything has to be really optimized)
BTW @dawn wave there's disagreement on the team about the value of having a separate render world now that Bevy's main branch is GPU driven based on MDIC. Some people want the "render world" only to exist on GPU. But we still have the separate render world for now. I'm in favor of keeping it for a variety of reasons, the biggest one being that we can't use MDIC/bindless on all platforms yet.
Hmmm it probably makes sense for me to abandon ECS render worlds then
My engine just uses bindless. It’s kinda been a pain point for me that iirc bevy doesn’t have a way to do a slot map to upload bindless arrays
If you mean the bindless slot allocator I just rewrote that code yesterday in main.
The biggest problem with Bevy bindless is that wgpu currently requires that you recreate the entire bindless array from scratch every time you modify it, and it does validation on every resource you put in it. wgpu has plans to fix those, but they aren't done yet. We work around it by capping the size of our bindless arrays and breaking MDIC batches every once in a while. This is actually not as bad as it sounds... it adds maybe a dozen drawcalls to every frame, but that's nothing for modern GPUs.
penguin
do you think i should move the implementation of physical resource storage as "left to the user to implement" rather than embedding it into my library
it's probably too opinionated so ill probably leave it up to implementation by user. rn it's just a slot map (id, gen counter)
that has a crossbeam channel so when i drop handles, it will send into the queue to remove the physical resource from queue hopefully letting me implement stuff like ref counting + deferred deletion
imo users should take care of that for most thigns
Like, a pipeline isnt really exposed to the user in the traditional sense in phobos so users dont store it
But buffers are, so those are up to them
How did you guys deal with ensuring unique samplers?

