#Voxel Game Engine
831 messages · Page 1 of 1 (latest)
dope
it loads streams properly but only for rendering, the actual block data is deleted when the geometry is built
working on a structure that can keep the block data around but without being too slow, im stuck on multithreaded unloading as the issue so far 
rendering is currently not very optimized, it has the most trivial block face culling with zero frustum culling so it runs horribly on lower end hw
an hd 7750 can run it with a 10 chunk render distance at ~32fps in 900p 💀
sandy bridge GT1 igp is ~8.8fps
oh, its also built to natively compile and run on both windows using MSYS2 and linux systems
godly performance
its got the opportunity to run much faster, im just getting basic functionality working before doing optimizations like that 
yea get it working first
every block on the border of a chunk has every face rendered which is killing it pretty hard 
lol yea thatll do it
also i cant tell are you really high up here or are the voxels small
ah alr
one of my favorite voxel engines has to be silvermans voxlap engine
its just insane
all on the cpu using sse
oh damn
legacy is just easier to use
i dont feel like learning all the new batching stuff (at least rn)
yeah it is, anything more than what im doing will get really tedious and confusing tho
for simple textured geometry its pretty good if you just want to get something rendered and figure out the rest later
yea youll probably have to switch over to modern gl at some point
my rendering stuff is setup to be extensible so i can just add functionality to it later
it has 2 texturing modes and 4 vertex specification modes
it can be compatible with ogl 1.0 if you want 
ive been iterating on this engine's design over the past 5ish years 
learning what works and what doesnt
😳
extern std::function<void(Geometry *g)> initBuffers;
//syncs the data in g.vertAttribs and g.vertInds with their GL buffers
extern std::function<void(Geometry *g)> syncBuffers;
//clears the geometry in g.vertAttribs and g.vertInds
extern std::function<void(Geometry *g)> clearGeometry;
//sets up the GL state by calling the appropriate glEnableClientState and gl*Pointer functions for the attribute layout in g, should be matched with a cleanupVertexAttribLayout call
extern std::function<void(Geometry *g)> setupVertexAttribLayout;
//draws the geometry in g, uses whatever state GL is currently in
extern std::function<void(Geometry *g)> draw;
//cleans up the GL state by calling the appropriate glDisableClientState functions given the attribute layout in g
extern std::function<void(Geometry *g)> cleanupVertexAttribLayout;
//deletes the GL buffers
extern std::function<void(Geometry *g)> deleteBuffers;
the interface for geometry handling is pretty much function objects that can be reassigned to different actual functions based on what vertex specification mode is desired 
rendering chunks looks like this
for (auto it = chunkManager->chunks.begin(); it != chunkManager->chunks.end(); ++it){
if(!it->second.ptr->isRenderable) {
continue;
}
glm::dmat4 modMat = glm::translate(glm::dvec3(it->second.ptr->worldspaceBlockLocation.x, it->second.ptr->worldspaceBlockLocation.y, it->second.ptr->worldspaceBlockLocation.z));//compute the model matrix from its position
gl::transform::fixed::loadModelViewMat(viewMat * modMat);//load the modelview matrix with the product of the view and model matrices, this transforms the chunk into view space for rendering
gl::geometry::setupVertexAttribLayout(&it->second.ptr->geometry);
gl::geometry::draw(&it->second.ptr->geometry);
gl::geometry::cleanupVertexAttribLayout(&it->second.ptr->geometry);
}
of course you need to setup the gl state before the loop and clean it after, like all your glEnables and glDisables
i have void setVertexSpecificationMode(uint32_t mode); that lets you set the way geometry is rendered, this can be done even at runtime
however changing it from VAO mode to immediate mode requires that you either clean up all the GL state or create a new one and set it up in order for it to actually work
eventually im going to implement a vulkan renderer but from what ive seen of vulkan so far, it would take forever and be a giant waste of time currently
yea vulkan is a massive pain
the current problem is unloading regions and chunks in a multithreaded environment with minimal blocking
this is what my structures look like rn
struct World;
struct Chunk {
std::atomic<bool> isLoaded = 0;
std::atomic<int32_t> numLoaders = 0;
//int64_t chunkLocationX, chunkLocationY, chunkLocationZ;
Coordinates worldspaceChunkLocation;//the chunk's location in the world, in chunk coordinates
std::unique_ptr<uint32_t[]> blocks;
};
struct Region {
std::unique_ptr<AtomicSharedPtr<Chunk>[]> chunks;
};
struct Dimension {
std::shared_mutex regionsMutex;
std::map<std::array<int64_t,3>, Region> regions;
};
struct World {
//the dimensions of a chunk, in block coordinates
uint32_t chunkWidth, chunkHeight, chunkLength;
//the dimensions of a region, in chunk coorinates
uint32_t regionWidth, regionHeight, regionLength;
//std::map<int32_t, Dimension> dimensions;
};
AtomicSharedPtr is this thing i made 
template <class T> class AtomicSharedPtr {
public:
T *ptr = nullptr;
std::atomic<uint32_t> *use_count = 0;
AtomicSharedPtr() {
ptr = new T();
use_count = new std::atomic<uint32_t>(0);
}
AtomicSharedPtr(const AtomicSharedPtr ©From) {
ptr = copyFrom.ptr;
//*copyFrom.use_count += 1;
use_count = copyFrom.use_count;
*use_count += 1;
}
~AtomicSharedPtr() {
if(*use_count == 0) {
delete ptr;
return ;
}
*use_count -= 1;
}
};
shared_ptr 🥲
std::shared_ptr didnt do exactly what i wanted 
use_count is approximate in a multithreaded environment and it doesnt auto allocate either
i meant it more as the whole concept of shared ownership is fishy
but its usually passable for games
it works perfectly for what im using it for
yea
currently stuff can be used by 3 separate systems, the graphics system, the world system and the jobs system
putting guards to keep something from being freed if a job is still running on it is very difficult with how i have it setup
a shared pointer works perfectly to prevent use-after-free crashes
i can remove elements from a map without them being deleted if a reference to that thing still exists
the weird AtomicSharedPtr thing automatically allocates memory for whatever type is specified and automatically deletes when the last copy is deconstructed
namespace chunk {
size_t getBlockOffset(World *w, size_t x, size_t y, size_t z);
void allocBlocks(World *w, Chunk *c);
//void freeBlocks(Chunk *c);
int8_t genTerrain(World *w, AtomicSharedPtr<Chunk> c, voxel_world::Coordinates worldspaceChunkLocation);
//int8_t loadTerrain();
}
namespace region {
size_t getChunkOffset(World *, size_t x, size_t y, size_t z);
void init(World *world, Region *region);
Chunk *loadChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
Chunk *getChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
void unloadChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
void unload(World *world, Region *region);
}
namespace dimension {
Chunk *allocChunk(World *world, Dimension *dim, Coordinates worldpsaceChunkLocation);
int8_t loadChunk(World *world, Dimension *dim, Coordinates wolrdpsaceChunkPosition);
int8_t unloadChunk(World *world, Dimension *dim, Coordinates wolrdpsaceChunkPosition);
}
this is my current interface for each type of structure
most things should be done in the context of a dimension, the rest should be handled transparently
havent worked out all the details tho 
this is an older iteration that has no multithreading and only supports immediate mode ogl 
dat grass 🔥
made it in blender 
this is a much earlier version where i was learning ogl
made a custom export script for blender and a custom loader 
this is simple textured geometry tho, the lighting is baked into the texture
that looks real nice ngl
got an opengl test from 2015 
ive been working on this for a long time
oh damn that means its been 8 years 💀
its ass but i didnt want to learn direct3d
ive never been able to really get into it for fun
YEA
i saw the boilerplate for that
compared to opengl
and you can probably guess what i picked
i never even looked into d3d 
i wanted this to run natively on linux systems as well so d3d wasnt a good choice
itd need some kind of translation layer or something
and im not gonna learn a proprietary API like d3d or something, it feels like a waste of time to lock myself to a specific vendor
same reason why i wont touch CUDA
cant even use CUDA anyway, i have an amd gpu in my dekstop 
cudas also so fucking annoying to write
id rather take my chances with vulkan compute or OCL or something
msvc which "supports" it wont let you use <<<...>>> without screaming that its wrong
anything that doesnt try and integrate with an ide is probably a better option
ill just try my hand at vulkan compute if i ever get to gpgpu stuff 
vulkan 2 scary 4 me
its an insane amount of boilerplate but it may not be too bad when you get that out of the way
yea i wrote a simple 2d "engine" in it
took like a week to even get it drawing a white screen
💀
just found my proof of concept for using multitexturing to make a texture atlas mosaic
😍
this was may 2021 apparently 
i later decided to drop the texture atlas mosaic thing tho
so for the needed amount of texels, 4096x4096 texture support is needed to get 16x16 block face textures
i may try to see if 8x8 textures are usable so that lowers the requirement to 2048x2048
using a mosaic of texture atlases through multitexturing could fix that whole issue but its kinda cumbersome to work with
version from june 2021 testing drawing models and chunks at the same time
stole the minecraft textures 
so, im kinda stuck on how id do multithreaded chunk processing with my current architecture 
have a program wide thread pool and just submit a "job" that's just a void (void) lambda that takes in everything by capture?
i already have a thread pool to process a job queue that takes std::functions
but scheduling it may be a bit weird
are they not asyncronous and operating on a global "world" memory pool?
also check out https://github.com/cameron314/concurrentqueue
It's an absolutely wonderful primitive
I wrote a wrapper around it so the ergonomics of using it are a bit better however it's amazing to structure everything around
they operate on whatever memory they need to operate on, currently the submitable jobs are generating chunk terrain and building chunk geometry
id need a chunk tick function thats launched for every chunk every server tick, however long that is
ill probably go for 1/20th of a second, it seemed fine on minecraft
chunk tick? why are your chunks updating at all every tick
well they dont right now 
ill need some kind of processing done on the chunks every server tick at some point
like growing vegetation, processing machines, mob spawning and other stuff like that
lock free hash map? http://erdani.org/publications/cuj-2004-12.pdf
seems a little sketch
im not sure 
ahh I see, I feel like this could be solved with the main thread being thought of as a dispatcher thread, bascialy all it does is go over every chunk that needs to be updated, and wrappes a call in its update method or whatever to be run async on another thread
idk tho I gotta bounch
ping me in like 90 minutes, ive got more than a few ideas, Ill draw some of them out

also fences are your friend
(most) uses of mutexes can be eliminated with the proper use of fences
and theyre a helluva lot faster
ill look into those 
I've been reading about fences and I still have no clue how they actually work or how you're supposed to use them 
well shit
I had like a paragraph typed out then I acidently CMD + Q'd my discord
basically
are you familiar with condition variables
the things where you can set one off, whatever is looking at it checks to see if its signaled, if its not it goes back to sleep. you normally combine them with a mutex since they can be a bit fucky if youve got mutliple threads accessing the same one causing spurious wakeups
but a fence is guaranteed to only ever have one reader and one writer
so thus no spurius wake ups anymore
so you create a chain of fences, where thread 1 signals THREAD 1-2 OPERATION FENCE when its done, and thread 2 wakes up when its signaled
hopefully I explained it good enough
tldr if you know the order of mutex accesses you can use fences
and theyre faster
however doing so youre still liminited to one thread at a time working on a shared resource

I reccommened just trying shit out and using the thread sanitizers
actually what i've described here is best thought about using a semaphore
for the signalling bit
memory fences are a completely different ball game
those are for single threaded operations if the CPU or compiler is doing some funny instruction reordering
is there a specific thing i should be looking at like std::atomic_thread_fence or is that something else 
yea thats a memory fence
those are seperate
are you familiar with instruction reordering?
this is down at the assembly level
ya, for OOE cpus?
yes
basically a memory fence is you telling the compiler / cpu dont reorder past this boundrary
so nothing after can come before and nothing before can come after
thats the general idea
theres a billion more technicalities
however not important rn
ooh i saw those on the C++ reference website but didnt look too much into them
imma be honest idk wtf you'd use a counting semaphore for
but I also dont know a whole lot about concurrency in the grand scheme
but a binary semaphore is what youd use to implement this signalling
basically youd create one for each "boundrary" where access is handed from one thread to another
you pass one reference to each
THREAD 1
/// normal code
std::binary_semaphore::release() // adds one to the bool
THREAD 2
std::binary_semaphore::acquire() // tries to decrement the value from 1 -> 0, if its sucessful, controll has been passed and the function returns, if unable the thread goes to sleep and is woken up *automagically* by the OS when the corresponding call to std::binary_semaphore::release() adds one and passes controll over
/// normal code
semaphores are super light weight
heres an example of an MPMC queue that uses semaphores to absolutely smoke mutexes
i could probably use binary semaphores on a per chunk basis to ensure jobs dont clobber each other
yup thats the idea
you have one "master" thread that constructs all of the semaphores and passes them around and all the worker threads doing their business
you'll want to use multiple semaphores per thread to stage operations in a way that a CPU executes multiple instructions at once thats how you get faster than single thread performane
in the bottom scenario each row can be thought of as a thread and each column can be a chunj
so you work on essentially min(MAX_SEPERATE_CHUNK_OPERATIONS, NUMBER_OF_THREADS) of times speedup
so say you have the lighting, block update, entitiy update, idk what else you get the idea
and you can pipeline them
ya thats a really good idea
also this just shows you some of the performance speedup that is possible a literal 4!!!!! order of magnitude increase in performance
ive done some asm programming but only for a z80 
vulkan has to call them fences because of other shit
honestly not a bad place to start
yeah it makes it really confusing 
i thought you were talking about memory fences
twas not :P
in vulkan theyre a combo of memory fence + semaphore
so thats why theyre called fences
also you might want to move to vulkan since in vulkan you can submit calls from All threads
im doing opengl for compatibility reasons 
none of this master thread callback bullshit
vulkan 1.0 is supppper widly supported rn
also vulkan has a lot of boilerplate that makes it hard for me to get through it 
this is the downside, however the thought of working with the stateful as hell opengl api accross threads makes me shudder
vulkan has literally no state iirc
oh well
actually
oh yeah opengl is awful, being a global state machine is the worst thing 
its so confusing
i need to make something that can construct a render ready vulkan context
setup swapchains and images and whatever it is
See this is the thing, I got so damn fed up with the opengl state machine that I stopped messing around with graphics all together for years when I was younger and I regret it
here
let me link you my repo
its (fairly?) well sectioned out
I'm refactoring it rn to dynlink vulkan
however each bit is logically seperateed from each other
if I remember when I finish dynlinking vulkan
ill publish my new repo and all youll need is a
lib/libfmt.a and lib/libglfw3.a to build it
and ill get rid of libfmt once it becomes standard in gcc
but yea Ive been doing all of the vulkan biolerplace because I want to get back into graphics but just couldnt stomach working with opengl again
ive been mostly doing legacy ogl so far, all i have implemented rn is geometry rendering, textures and fog
just to get something rendered so i can develop the rest of the engine
My one gripe with vulkan is that certain things like say texturing are heavily optimized and as a result you really have to watch tutorials or dig into the api docs to use them
for instance depth buffering
I wish it was manually implemented in the fragment shader and accessing global memory in a uniform buffer
however that would be slow as balls
💀
so you have to jump through a lot of hoops to implement it
but the end result is fucking mad performance
yeah i assume the ROPs deal with depth buffer modification pretty fast vs something a shader would do
yes they do, but theyre only programmable by the driver so thus specific shit
ya
IMO vulkan is the dream graphics api
Also if you need me to explain some of the code i'd be more than happy to. I implemented it in this weird middle ground between procedural and OOP that I really like and seems to scale well, however if youre used to 10 class inheritance hierarcheries then its going to be a bit different to look at
tldr if you need me to explain shit
id be more than happy to do so
oh im used to weird mixes of OOP and procedural stuff 
thats generally how my engine is structured, mostly procedural stuff with simple classes for specific jobs
most functions take pointers to structures to operate on for instance
basically I go hmm, I need a vulkan buffer for something, time to make a wrapper class that means that I pass in only the bits I care about
are you writing this in C, you poor soul?
C++ lol
references and member functions
IMO I avoid pointer like the plague
references are guaranteed ™️ to be valid
i seem to deal fine with them 
i tried references but they throw a wrench into the syntax for my job queue
i dont want to wrap everything in a std::ref when i can just pass a pointer to std::bind
are you not using a channel like the mpcp::concurrentQueue that I linked /
nope
IMO I like rust's balance between explicitly passing references but still having them be known good
also lifetimes, I miss
i made a thing that can automatically constructs a new thing for something when constructed, then automatically deconstructs it when the last copy of it is deconstructed
uses an atomic counter thats incremented when copy constructed and decremented when deconstructed
you mean a std::shared_ptr ?
its kinda like that, but its easier to use when accessing std::maps
also use_count in a std::shared_ptr is approximate in a multithreaded environment, which was a problem
cursed
thats why i made my own better shared_ptr 
template <class T> class AtomicSharedPtr {
public:
T *ptr = nullptr;
std::atomic<uint32_t> *use_count = 0;
AtomicSharedPtr() {
ptr = new T();
use_count = new std::atomic<uint32_t>(0);
}
AtomicSharedPtr(const AtomicSharedPtr ©From) {
ptr = copyFrom.ptr;
//*copyFrom.use_count += 1;
use_count = copyFrom.use_count;
*use_count += 1;
}
~AtomicSharedPtr() {
if(*use_count == 0) {
delete ptr;
return ;
}
*use_count -= 1;
}
};
declaring it is equivalent to making a new T
its almost transparent, you simply access the pointer to the thing through ptr
I fail to see how this is different from std shared ptr
you have to assign a new thing to the shared pointer when its declared
this does that part transparently
so accessing an element in a map will automatically allocate a new T in that map instead of you having to call the new operator and assign it to the map
also use_count is exact instead of approximate
auto sharp = std::make_shared<std::unordered_map<std::uint32_t, std::string>>()
gl::Chunk *glChunk = chunkManager->chunks[{x, y, z}].ptr;
this for instance will get the pointer of the element accessed in the map without having to check if its not there and having to assign a new gl::Chunk to it
itll handle that part itself
when the element is removed from the map, itll still exist if there is a copy of the AtomicSharedPtr somewhere
for example if one was bound to a std::function for a job
the map access part is the thing i really wanted it for
the map is declared like this std::map<std::array<int64_t,3>, AtomicSharedPtr<Chunk>> chunks;
accessing an uninitialized element in the map will automatically construct a new Chunk
ohh now I see the point
indeed std shared ptr would happily give you a null
see I saw the difference I just didnt see why it was useful
lol
i tried making a move constructor but i apparently dont know how thats supposed to work 
instead of making my own shared ptr, I would have made a class World
and a function World::accessChunkAtCoords(ChunkCoordinates c)
that sees if its in the internal map and if not creates it
the best way to explain move constructors is with std vector
are you familiar with how std vector is implemented?
three pointers:
pointer to beginning
pointer to end of elements
pointer to end of allocated memory
nope
can you ssee how the vector operations could be implemented from these three pointers?
i assume it does pointer arithmetic to figure out the size?
yup
and that gets extended to all the other stuff like accesing at elemnt etc
creates references from pointer arethmetic
etc
the copy constructors / assignment operators follow the pointers down, copy all the data to a new location on the heap, and creates a new vector completly independent from the first
the move constructors / assignment operators, yoink the pointers, copy them into the new instance, and null out the pointers in the old instance, effectevely making it a zombie,
thus there is no data copied, just teh threee pointers
you do this because since C++ doesn't have destructive move (look it up rust has it, a moved from object is destroyed immedately) you need to make the state semi-valid, basically invalid to where it doesnt work, but just valid enough to where when the destructor runs, it just does like 3 nops theres nothing to free at a nullptr
oh that may be where my crashing was coming from
i never nulled the old one, just copied the pointers
that still doesnt work 
if you just copy the pointers, remember the dsetructor still gets run on the moved from object, so it freed the pointers and then you tried to use said pointers in the moved to object
oooooh
there we go
implement a move constructor / assignment operator in something, ill look it over
ayy that fixed it
template <class T> class AtomicSharedPtr {
public:
T *ptr = nullptr;
std::atomic<uint32_t> *use_count = 0;
AtomicSharedPtr() {
ptr = new T();
use_count = new std::atomic<uint32_t>(0);
}
AtomicSharedPtr(const AtomicSharedPtr ©From) {
ptr = copyFrom.ptr;
//*copyFrom.use_count += 1;
use_count = copyFrom.use_count;
*use_count += 1;
}
AtomicSharedPtr(AtomicSharedPtr &&moveFrom) {
ptr = moveFrom.ptr;
use_count = moveFrom.use_count;
moveFrom.ptr = nullptr;
moveFrom.use_count = nullptr;
}
~AtomicSharedPtr() {
if(use_count == nullptr) {
return;
}
if(*use_count == 0) {
delete ptr;
return ;
}
*use_count -= 1;
}
};
had to put a check in the deconstructor to keep it from crashing
i didnt realize the deconstructor was called when move constructing
now itll skip incrementing then decrementing the atomic counter when it just needs to be moved
AtomicSharedPtr(AtomicSharedPtr &&moveFrom) {
ptr = moveFrom.ptr;
use_count = moveFrom.use_count;
moveFrom.ptr = nullptr;
moveFrom.use_count = nullptr;
}
This isn't threadsafe btw. what if the count was copied, incremented on the next clock cycle? nvm you fucking heap allocated the std atomic, 💀 dont do that. if that cache lookup fails it
s slower than a mutex
actually
wait
let me think about this a bit more
it should work fine 
except if heap allocating an atomic variable is bad 
well no
heap allocating an atomic is fine
however ive just never seen it
let me look at a reference implenmentation of std shared ptr
its copying pointers, not the value, modifying the value itself is safe since its atomic
I'm not too sure rn
i can tell you that it at least doesnt crash my engine in its current state 
I'm saying what if you copied it, but wait its a pointer so its fine
yea I guess its fine
yea I guess a shared ptr has to use a heap allocated atomic
I dont think theres any other wya
ignore me im being dumb

since you have to share it accross instances
ya
i assume atomics are some special threadsafe instructions used to modify basic types?
yes / no
I'm going to use rust syntax here because I like it better and its clearer
Imagine a Mutexstd::uint32_t
where the mutex is controlling a std::uint32_t
you understand syntax?
hopefully
i havent seen much rust so im assuming 
basically its a way to encode in the type system what is being controlled
I like it many dont :P
anyway
Mutex<std::uint32_t
so I want to increment this varaible
so I do the following
lock mutex (maybe wait)
increment variable
release mutex
clever CPU developers have figured out absolute fucking magic I dont understand it, but basically it allows all of the cores to implicitly mutex eachother
i.e every core has a unique lock on every memory address as long as it only modifies one address
yeah i have no clue how that would actually work 
good 
So an atomic is just a word that generalizes this notion of Mutexstd::uint32_t but allows the compiler to do optimiations using those fancy pants cpu instructions
how expensive are they anyway 
so on x86 a std::atomicstd::uint32_t is a built in atomic because of that architecture fuckery
its free
like no if ands or buts
free
oh, even if you have many atomic flags that need to be modified?
;asm -O3
#include <atomic>
int main()
{
std::atomic<int> a = 0;
a++;
}
main:
mov DWORD PTR [rsp-4], 0
lock add DWORD PTR [rsp-4], 1
xor eax, eax
ret
i guess the only slowdown would be if one thread tries to read/modify/write when another one is busy doing that at the same time, but thats super fast
as you can see x86 has a special lock add instruction
yeah
but this is what a semaphore is if you remember what I was talking about
its just a convient wrapper around an atomic bool
so then what exactly is a mutex then 
do those have their own instruction or something
this is "free" since it also has to flush the cache to whatever address was modified to the other cores, but as long as you only modify one atomic every like 16 instructions its free, if you do more oh no it takes like 4 cycles
A mutex is just an atomic bool, but with a bunch of OS stuff in the way because the OS can do funny shit otherwise
ah
that os stuff is where all the overhead comes from
so thats why they're expensive
yup
its because even though your program can see the mutex, the OS has to manage the mutex since what if multiple applications try and access the same memory address
oh huh
do you get it or do you want me to explain again?
i think i get it lol
i should probably try and see if i can get my thread pool to work using semaphores
Imagine two programs both trying to access 0x1000, if program 1 had a mutex for that address at 0x8000, program 2 has no idea that exists, so thus the OS has to step in, and make a master look up table of mutexes and their addresses so there arent memory races
semaphores remember are just atomic bools
but they dont have all of that OS stuff
ya, but rn it uses mutexes when it really doesnt have to
So when using they lock access to an object and not an address
its just to block other threads when work is being done on the job queue that they're working with
yup
and dont forget to use std::this_thread::sleep()
or if you use std::binary_semaphore
oh they wait on a condition variable 
its all handled for you :P
condition variables introduce an OS overhead since the OS has to wake up the thread. I forget how semaphore is implemented, but it's lighter than that iirc
std::binary_semaphore::acquire() 👉 👈
what if the job queue is empty 
oh well youd have to ditch the queue
let me explain
// master thread
// TODO: remove the heap allcoation on the binary semaphore, theres a way I just can't think of it rn
std::binary_semaphore sema
// start | end
createJob(std::function<void(void)>, nullptr, sema)
// this job waits untill the first one is signaled to start
// and signals the end semaphore once its done
createJob(std::function<void(void)>, sema, sema2);
createJob(std::function<void(void)>, sema2, sema3);
///...
I'd reccommend channels to collect the jobs in a central spot, attaching the required semaphores there, and then passing them on to whatever processing facilities you have
the jobs are currently stored in a central std::vector<std::function<int8_t()>> thats stored in a JobQueue class
what does std::int8_t represent :P
error code 
moving on....
if it returns -1 for instance, the job is put on the back of the queue
💀 you do you imma not critique design stuff here :p

anyway, do you get my idea?
yeah im looking at it rn 
basically store a bunch of semaphores on the main thread, pass mutable references to the "jobs" and free them once theyre used
rn the jobs are processed on a loose FIFO basis by worker threads that pop the job, run it, if it returns an error, its put on the back of the queue, if not, its deconstructed
well, I'll leave you to grapple with your own design decisions, it's the best part of programing :P
yeah, ive been working on this for a while now 
I reccommend paper and pencil
also, it seems like you can have a thread wait on an atomic variable? 
oh it doesnt seem like that would work in the way i need it to
i may be able to skip waiting on the condition variable if the job queue isnt empty yet 
that should reduce the overhead a bit
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num) {
sem.acquire();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
*num += 1;
sem.release();
}
int main() {
std::binary_semaphore sem(1);
int num = 0;
std::vector<std::thread> threads;
for (int i = 0; i < 1024; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num));
}
for (int i = 0; i < 1024; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
just made a quick program to see if semaphores can be used for preventing race conditions and it turns out yes
Semaphores are also often used for the semantics of signalling/notifying rather than mutual exclusion, by initializing the semaphore with 0 and thus blocking the receiver(s) that try to acquire(), until the notifier "signals" by invoking release(n). In this respect semaphores can be considered alternatives to std::condition_variables, often with better performance.
so i can probably change my jobs system from using mutexes and condition variables to binary and counting semaphores
I did a bit more reading and you should probabkt use std latch
Fundamentally the same just its a std lib wrapper
The standard library just gets bigger and bigger I swear I fijd something new every
Like three weeks
wait why would i need a latch 
arent latches/barriers used to make sure threads have reached the latch/barrier so they can all move on right after?
std::latch is just a semaphore with differently named member functions. I was just looking and it seems that it might be more intiutive, idk though
semaphores and latches are literally the same :P
;compile
<source>:6:6: error: variable or field 'worker' declared void
6 | void worker(std::binary_semaphore &sem, int *num) {
| ^~~~~~
<source>:6:18: error: 'binary_semaphore' is not a member of 'std'; did you mean 'binary_negate'?
6 | void worker(std::binary_semaphore &sem, int *num) {
| ^~~~~~~~~~~~~~~~
| binary_negate
<source>:6:36: error: 'sem' was not declared in this scope
6 | void worker(std::binary_semaphore &sem, int *num) {
| ^~~
<source>:6:41: error: expected primary-expression before 'int'
6 | void worker(std::binary_semaphore &sem, int *num) {
| ^~~
<source>: In function 'int main()':
<source>:13:10: error: 'binary_semaphore' is not a member of 'std'; did you mean 'binary_negate'?
13 | std::binary_semaphore sem(1);
| ^~~~~~~~~~~~~~~~
| binary_negate
<source>:17:39: error: 'worker' was
;compile -std=c++20
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num) {
sem.acquire();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
*num += 1;
sem.release();
}
int main() {
std::binary_semaphore sem(1);
int num = 0;
std::vector<std::thread> threads;
for (int i = 0; i < 1024; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num));
}
for (int i = 0; i < 1024; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
/opt/compiler-explorer/gcc-12.2.0/bin/../lib/gcc/x86_64-linux-gnu/12.2.0/../../../../x86_64-linux-gnu/bin/ld: /tmp/ccsiIQAg.o: in function `std::thread::thread<void (&)(std::counting_semaphore<1l>&, int*), std::reference_wrapper<std::counting_semaphore<1l> >, int*, void>(void (&)(std::counting_semaphore<1l>&, int*), std::reference_wrapper<std::counting_semaphore<1l> >&&, int*&&)':
/opt/compiler-explorer/gcc-12.2.0/include/c++/12.2.0/bits/std_thread.h:135: undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
Build failed

;compile -std=c++20 -llibpthread
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num) {
sem.acquire();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
*num += 1;
sem.release();
}
int main() {
std::binary_semaphore sem(1);
int num = 0;
std::vector<std::thread> threads;
for (int i = 0; i < 1024; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num));
}
for (int i = 0; i < 1024; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
/opt/compiler-explorer/gcc-12.2.0/bin/../lib/gcc/x86_64-linux-gnu/12.2.0/../../../../x86_64-linux-gnu/bin/ld: cannot find -llibpthread: No such file or directory
/opt/compiler-explorer/gcc-12.2.0/bin/../lib/gcc/x86_64-linux-gnu/12.2.0/../../../../x86_64-linux-gnu/bin/ld: note to link with /lib/x86_64-linux-gnu/libpthread.a use -l:libpthread.a or rename it to liblibpthread.a
collect2: error: ld returned 1 exit status
Build failed
;compile -std=c++20 -lpthread
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num) {
sem.acquire();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
*num += 1;
sem.release();
}
int main() {
std::binary_semaphore sem(1);
int num = 0;
std::vector<std::thread> threads;
for (int i = 0; i < 6; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num));
}
for (int i = 0; i < 6; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
6
ayy there we go
;compile -std=c++20 -lpthread
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num, std::chrono::time_point<std::chrono::steady_clock> timePoint) {
std::this_thread::sleep_until(timePoint);
sem.acquire();
*num += 1;
sem.release();
}
int main() {
std::chrono::time_point<std::chrono::steady_clock> start = std::chrono::steady_clock::now();
std::binary_semaphore sem(1);
int num = 0;
int nthreads = 1024;
std::vector<std::thread> threads;
for (int i = 0; i < nthreads; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num, start + std::chrono::milliseconds(100)));
}
for (int i = 0; i < nthreads; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
oh oops 
;compile -std=c++20 -lpthread
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num, std::chrono::time_point<std::chrono::steady_clock> timePoint) {
std::this_thread::sleep_until(timePoint);
sem.acquire();
*num += 1;
sem.release();
}
int main() {
std::chrono::time_point<std::chrono::steady_clock> start = std::chrono::steady_clock::now();
std::binary_semaphore sem(1);
int num = 0;
int nthreads = 10;
std::vector<std::thread> threads;
for (int i = 0; i < nthreads; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num, start + std::chrono::milliseconds(100)));
}
for (int i = 0; i < nthreads; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
10
and then you remove the semaphore acquire and release
;compile -std=c++20 -lpthread
#include <iostream>
#include <thread>
#include <chrono>
#include <semaphore>
#include <vector>
void worker(std::binary_semaphore &sem, int *num, std::chrono::time_point<std::chrono::steady_clock> timePoint) {
std::this_thread::sleep_until(timePoint);
//sem.acquire();
*num += 1;
//sem.release();
}
int main() {
std::chrono::time_point<std::chrono::steady_clock> start = std::chrono::steady_clock::now();
std::binary_semaphore sem(1);
int num = 0;
int nthreads = 10;
std::vector<std::thread> threads;
for (int i = 0; i < nthreads; ++i) {
threads.push_back(std::thread(worker, std::ref(sem), &num, start + std::chrono::milliseconds(100)));
}
for (int i = 0; i < nthreads; ++i) {
threads[i].join();
}
std::cout << num << "\n";
return 0;
}
bruh
;compile -std=c++20 -lpthread
10
;compile -std=c++20 -lpthread
#include <iostream>
#include <vector>
#include <chrono>
#include <thread>
void worker(int *num, std::chrono::time_point<std::chrono::steady_clock> timePoint) {
std::this_thread::sleep_until(timePoint);
*num += 1;
}
int main() {
int numThreads = 12;
int num = 0;
std::vector<std::thread> threadHandles;
std::chrono::time_point<std::chrono::steady_clock> start = std::chrono::steady_clock::now();
for (int i = 0; i < numThreads; ++i) {
threadHandles.push_back(std::thread(worker, &num, start + std::chrono::milliseconds(100)));
}
for (int i = 0; i < numThreads; ++i) {
threadHandles[i].join();
}
std::cout << num << "\n";
return 0;
}
12
damn
ah, so you can do about 12 threads 
anyway, it does trip up with enough threads
$ clang++ -o ./out.exe ./main.cpp; ./out.exe
993
should be 1024
Worlds best compiler bot
I'm thinking about going in the deep end with my Vulkan renderer and making it a voxel based renderer (how original!)
Ive got a couple of questions?
What's the high level overview of your engine?
How does lighting work? do you use a compute shader or just a heavily optimized cpu side per voxel algorithm. lighting?
runtime storage formats so you dont duplicate copy all the data to the gpu every frame?
How does lighting work? do you use a compute shader or just a heavily optimized cpu side per voxel algorithm. lighting?
i currently have no lighting
the plan was a cpu side algo for compatibility and just making it a runnable job
eventually the lighting would just be done fully on the gpu in a deferred renderer
for simplicity's sake i wasnt going to make light level a gameplay mechanic as i have chunks along all 3 axis and that would get expensive to compute
at least it would be hard with sunlight, but block light wouldnt be that bad
making light levels not a gameplay mechanic makes it much easier to have chunks along all 3 axis, if i want just visual lighting, its much easier to just compute it for visible chunks
runtime storage formats so you dont duplicate copy all the data to the gpu every frame?
so for runtime storage, i made an abstraction layer for geometry related stuff in opengl, underneath that it uses buffer objects for vertex data and loads it onto the gpu withglBufferData, then deletes the vertex data from the cpu side
you can then simply draw from the buffer with all the geometry stored on the gpu, it only needs to be reuploaded when there are updates to the geometry, like blocks being added or removed
it can also handle stuff like GL 1.1 vertex arrays, where it doesnt allocate any buffers and simply draws from the vertex data on the cpu side
What's the high level overview of your engine?
i dont really have a way to concisely explain my engine on a high level rn
best i got is that it has multiple systems ive implemented that handle specific tasks like managing graphics chunks, world chunks, drawing text, and taking input and making the camera move based on that input
im trying to make it as modular as possible, for example the graphics chunk system depends on the world chunk system but there is no dependency the other way around
this would make it easier to make a server as i can simply omit stuff that isnt needed
i can give code snippets if you want but its all kinda undocumented 
i was going to clean it up and properly document it when its actually something to begin with
rn its not much
ok sounds like I'm thinking too much into this, I'll just go for it and probably make a thread called "poor design decisions" and chronicle my adventure there like you have

I should be able to get voxels up and rendering by the end of this week
ive been overthinking this for 8 years 
my refactor should be done by like tomorrow and I want to get dynamic vertex generation
my problem rn is interfacing my renderer / (dyanmic vertex / voxel generator idkwhat to call it, pls help)
I dont want them to interact
I want to be able to just return a list of chunks and a camera position and have it get rendered
mine has that rn 
gl::chunk_manager::draw(&glChunkManager, &camera);
the chunk manager has a map of all the chunks that its working with and the draw function simply sets up the ogl state and renders them
I'm thinking of storing two copies of the data, one on the renderer size and one on the voxel side, so I iterate over each chunk hash it and if its changed update the corresponding render chunk
I want to have my generator completly isolated from the renderer
see this works, but my brain is too small to work with global state like this :P

this is what my chunk manager looks like
typedef struct ChunkManager {
shapes::Cube cube;
std::map<std::array<int64_t,3>, AtomicSharedPtr<Chunk>> chunks;
gl::Texture texture;//currently the only texture needed
uint32_t renderDistanceXZ = 96;
uint32_t renderDistanceYPositive = 0;
uint32_t renderDistanceYNegative = 0;
} ChunkManager;
and then i have these functions for it
namespace chunk_manager {
//initializes the chunk manager opengl state and textures
void init(ChunkManager *chunkManager);
//must be run once per frame to stream chunks properly, syncs chunks until the specified time point, will sync chunk geometry if shouldSync is set
void syncChunksUntil(ChunkManager *chunkManager, std::chrono::time_point<std::chrono::steady_clock> timePoint);
//draws chunks if isRenderable is set
void draw(ChunkManager *chunkManager, gl::transform::Camera *camera);
//frees chunks outside of the render volume, currently does not unload chunks
void freeDistantChunksUntil(ChunkManager *chunkManager, gl::transform::Camera *camera, voxel_world::World *world, std::chrono::time_point<std::chrono::steady_clock> timePoint);
void loadCloseChunksUntil(JobQueue &jobQueue, gl::ChunkManager *chunkManager, gl::transform::Camera *camera, voxel_world::World *world, std::chrono::time_point<std::chrono::steady_clock> timePoint);
//utility function for when graphics resets happen
void freeAllChunks(JobQueue &jobQueue, gl::ChunkManager *chunkManager);
}
it allows me to change functions globally at runtime 
ill stop making fun :p
no need to think about messing with classes and assigning methods
also it makes using functions as jobs much easier
my question was what did you call your dynamic vertex generator thing
rn I have this
but I dont like the word scene
polymorphism in C
thats actually a fair point, no need for std::bind nonsense
oh it does use std::bind but i find the syntax of using a class method for a job kinda oof
adding jobs looks like this
jobQueue.appendJob(std::bind(voxel_world::chunk::genTerrain, ...
see I just wrap everthign in a void lambda
seb::submitJob([&]{stuff}).await()
ez
i find making lambdas kind of a pain just for this 
see I just find it really convenient to pass std::function<void(void)> s around since I dont have to muck around with templates
i use std::queue<std::function<int8_t()>> for my job queue
I need to finish my channels implementation (definetly not a overlay over moodycamel::concurrentqueue :P)
the return is used as a success/fail code, failed jobs are put back on the end of the queue and ones that succeeded are deconstructed
this allows me to make sure jobs with dependencies on other jobs are executed in the correct order
the data structures themselves have atomic flags that are set based on what the current state of the structure is
that part is cumbersome but honestly idk how else i would do it 
something i need to do to improve chunk loading performance is pool my VBOs so the driver doesnt have to realloc every time chunks are loaded
itll impact gpu memory usage a bit but it should be fine, i could just make it an option 
i fucking love settings 
so far i was going to have settings for vertex specification modes, texture modes, block face culling modes and i guess a VBO pool setting as well 
im also gonna implement a lot of rendering settings as well when i get to shaders
my draw function for chunks looks like this btw
for (auto it = chunkManager->chunks.begin(); it != chunkManager->chunks.end(); ++it){
if(!it->second.ptr->isRenderable) {
continue;
}
glm::dmat4 modMat = glm::translate(glm::dvec3(it->second.ptr->worldspaceBlockLocation.x, it->second.ptr->worldspaceBlockLocation.y, it->second.ptr->worldspaceBlockLocation.z));//compute the model matrix from its position
gl::transform::fixed::loadModelViewMat(viewMat * modMat);//load the modelview matrix with the product of the view and model matrices, this transforms the chunk into view space for rendering
gl::geometry::setupVertexAttribLayout(&it->second.ptr->geometry);
gl::geometry::draw(&it->second.ptr->geometry);
gl::geometry::cleanupVertexAttribLayout(&it->second.ptr->geometry);
}
gl::geometry::setupVertexAttribLayout(&it->second.ptr->geometry);
gl::geometry::draw(&it->second.ptr->geometry);
gl::geometry::cleanupVertexAttribLayout(&it->second.ptr->geometry);
these 3 are assignable std::functions for example
i can switch what these actually do under the hood to change what vertex specification mode it uses, even at runtime
however switching at runtime is rather difficult as i need to either keep track of and delete all related ogl objects and reinit them, or completely reinit the ogl context itself in order for it to work right
it does however work perfectly on load time, just call the function to set the appropriate functions before anything is done
doing all of this allows me to support even GL 1.0 
First rule of graphics, if it looks right, it is right

its not too cumbersome as im not really shooting myself in the foot too often by forgetting to set the flags properly 
i did run into that problem with preventing use-after-free crashes but i solved that with ✨ smart pointers ✨
RAII is I think my favorite thing in all of programming
all the benfits of GC without any of the drawbacks
it cant use RAII everywhere in my engine as the last I is quite expensive sometimes 
See I solve this by making most of my classes non default constructable and then just using a std::unique_ptr<RAIIWrapperClass> wrapped {nullptr} It's a heap allocation, but idgaf,
is there any way to delay construction of a don default constructable object on teh stack?
dont std::map and std::unordered_map need their elements to be default constructible 
no clue 
hopefully not :P
can't I just hash the uniqueptr<T>?
big brain
fill the heap
it is for the [] operator 
or i guess
DefaultInsertable
ah so yes it must be default constructible if using the [] operator to insert elements
When the default allocator is used, this means that key_type must be CopyConstructible and mapped_type must be DefaultConstructible.
just dont do that 👀
ah but i did anyway 
gl::Chunk *glChunk = chunkManager->chunks[{x, y, z}].ptr;//get the pointer to the working chunk, this also automatically creates it by reading from the std::unordered_map structure
gl::chunk::init(glChunk, GL_VATTRIBLAYOUT_T2F_V3F);//initialize the chunk
->chunks[{x, y, z}] ctad at its limits :P
ctad? 
oh i just got it 
very dope
;asm -std=c++20
#include <iostream>
#include <mutex>
#include <semaphore>
void worker_sem(int *n, std::binary_semaphore &sem) {
sem.acquire();
*n += 1;
sem.release();
}
void worker_mut(int *n, std::mutex &mut) {
std::unique_lock<std::mutex> lock(mut);
*n += 1;
}
💀
;asm -std=c++20
#include <iostream>
#include <mutex>
#include <semaphore>
void worker_sem(int *n, std::binary_semaphore &sem) {
sem.acquire();
*n += 1;
sem.release();
}
here me out here
this does the hard part for you
just have each thread poll this
and if it fails
just put it to sleep
use std::this_thread::yeild
I asked a question in #c-cpp-discussion about this as well since this iwhat im dcurrently doing
i think i just have a big issue with NIH syndrome 
Nih?
Not Invented Here
Ahh
its easier for me to know exactly how something will behave if i made it 
thats mostly why
i just thought of something 
load and unload distances could be different 
this is 10 chunk render distance without unloading chunks 
if you really wanted to get fancy, detect times when threads are sleeping and put them to work doing other stuff :P


i mean i could just submit other jobs 
but thats stuff that doesnt require that its done this frame kind of quick
i need to make this load from the player outwards, rn it loads in whatever direction it feels like which is usually x+ then z+
uninitialized block data 
kinda wish i could emulate what this is doing but im not sure thats even possible 
I always love it when people do this stuff
it has the same energy as turning off execution proection on a micro controller and just letting it run wild

im still stuck on world streaming
i have a high level idea as to how i want it to work but the specifics are very difficult to figure out properly
@flint kettle this is the thread where i ramble about the design of my engine 
im still stuck on world streaming
Ramblage Kek
im trying to design a way to load/unload regions in a multithreaded environment but its really difficult and hard to keep track of everything
When was the last time you just took a break, busted out some pen and paper and drew out your high level design ideas?
Maybe assign simulation for chunks per thread
the high level idea is to have chunks in regions, only unload them when the entire region is unloaded, but have unloaded chunks compressed in memory until they're loaded again
i get the high level design but the specifics are hard
Chunk owns thread and releases thread after whatever work is done
RAII seems like your friend here
i need to make sure i dont clobber my memory with threads running over each other
Chunks will request ownership for singular simulation jobs maybe
I tried to do this but failed

what did you think about the designator thread / semaphores idea?
Maybe store chunk state within the chunk and that chunk can float around to various threads depending on load conditions
how was that supposed to work again?
True
currently the chunks can be processed by any worker thread that runs a job for it
yeah i thought about doing that 
#c-cpp-discussion message
I have precisely no idea what I’m talking about but somehow this is making sense to me 
that's parallelization for you!
itd be way easier if it was singlethreaded 
but bruh i have 16 cores that arent going to be sitting around on my watch 
scalability was a problem i ran into when trying to play the minecraft modpack i like
I still contend that the semaphore idea is the best since you can essentially just think aobut it like single thread, except instead of doing the processing on said thread, you just delegate the work for other threads to do
thus you think about it in a single threaded manner
it seems most stuff in that is run on a single thread for processing all chunks
yeah i should use semaphores
Semaphores are hands down my favorite threading primitive
rn the jobs reschedule themselves if the chunk isnt in the correct state
yeah 
it takes the job and puts it back on the end of the queue if the chunk isnt in the correct state
honestly it sounds like you know where your faults are, would it be a good idea to just burn everything and star over.

Idk how to use a semaphore but it seems like the best choice based on what is described here
hoping the dependent job runs at some point
If it’s anything like the “job stack” I described
Job stack idea fits nicely inside of brain
think a atomic bool, but you can call a wait command on the bool and the OS handles everything else so youre not spininning
its a job queue 
std::queue<std::function<int8_t()>> jobs;
Implement unlinked list

the job on the front of the queue is processed by the first free worker thread
So it just queues up functions?
ya
why can't you just have many threads popping off that queue
thats the simple solution
For most consistent simulation rate
i guess the system i have rn is cumbersome or something 
its hard to get a region of chunks to not get clobbered by multiple threads
currently chunks are generated for building geometry and then immediately deleted
as i have no actual working world system
i have everything designed like this so far but im missing so much
namespace chunk {
size_t getBlockOffset(World *w, size_t x, size_t y, size_t z);
void allocBlocks(World *w, Chunk *c);
//void freeBlocks(Chunk *c);
int8_t genTerrain(World *w, AtomicSharedPtr<Chunk> c, voxel_world::Coordinates worldspaceChunkLocation);
//int8_t loadTerrain();
}
namespace region {
size_t getChunkOffset(World *, size_t x, size_t y, size_t z);
void init(World *world, Region *region);
Chunk *loadChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
Chunk *getChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
void unloadChunk(World *world, Region *region, Coordinates regionspaceChunkPosition);
void unload(World *world, Region *region);
}
namespace dimension {
Chunk *allocChunk(World *world, Dimension *dim, Coordinates worldpsaceChunkLocation);
int8_t loadChunk(World *world, Dimension *dim, Coordinates wolrdpsaceChunkPosition);
int8_t unloadChunk(World *world, Dimension *dim, Coordinates wolrdpsaceChunkPosition);
}
What does clobbered mean in this context
Sorry I am slow with this
Lmao new to parallelized stuff in C++
having multiple threads trying to modify the same data at the same time, itll lead to it essentially corrupting the data as both modify the data
its called race conditions afaik
wait a darn second here
What's actually holding all of the chunks at the highest level?
like for instance, if im trying to save a region to disk but another thread wants to load a chunk in that region
do you have vector of chunks or like a hash map or what
If it’s global state I cry
struct World;
struct Chunk {
std::atomic<bool> isLoaded = 0;
std::atomic<int32_t> numLoaders = 0;
//int64_t chunkLocationX, chunkLocationY, chunkLocationZ;
Coordinates worldspaceChunkLocation;//the chunk's location in the world, in chunk coordinates
std::unique_ptr<uint32_t[]> blocks;
};
struct Region {
std::unique_ptr<AtomicSharedPtr<Chunk>[]> chunks;
};
struct Dimension {
std::shared_mutex regionsMutex;
std::map<std::array<int64_t,3>, Region> regions;
};
struct World {
//the dimensions of a chunk, in block coordinates
uint32_t chunkWidth, chunkHeight, chunkLength;
//the dimensions of a region, in chunk coorinates
uint32_t regionWidth, regionHeight, regionLength;
//std::map<int32_t, Dimension> dimensions;
};
well thats unaboidable you havce a world afterall
i have a std::map to hold regions which has an array of chunk pointers

You could also literally physically map threads to chunks
Using some gridlike pattern
also my blood sugar is a bit low rn so my brain may not be working at 100% 
the issue is how do i get the worker threads to process the chunks without having to iterate over all regions and all chunks
I have caffeine and insanity coursing through my brain
I am gods strongest soldier and Minecraft is his hardest battle 
Gonna go prototype stuff now
an actually good minecraft engine is harder to make than one would think 
ok
lookie here
The things marked with a 🚫 are data races
You can have chunks do internal processing on themselves from multiple seperate threads
You cannot move chunks or insert chunks into the map without a data race occuring
You need to create "dummy" chunks in the map and then pass those pointers around to have their processing done seperately
i want to limit the spawning and destruction of threads, and i also want to closely control how many threads are active
the insertion of new chunks is fundamentally limited
however once theyre constructed
you can do whatever
as long as the threads stay isolated
insertion of chunks into a std::map is safe, which is what im currently using
you already have a job pool?
yes
i need to clean it up because its hacked together but it does work
just spawn like 15 threads and use a mutex and cv on a queue to wait when its empty
and otherwise they just pop off the next job and execute it
again NIH syndrome 
I dont wanna look at this however this is needlessly complex
i prefer to make my own thing over having to learn how something else works
let me show you my impementation
again i need to clean it up
i need to change the cv to a counting semaphore, it provides a less confusing interface imo
discord moment
template<class T>
class Receiver
{
public:
Receiver() = delete;
Receiver(Receiver& other) = default;
Receiver(Receiver&& other) = default;
Receiver& operator=(Receiver& other) = default;
Receiver& operator=(Receiver&& other) = default;
[[nodiscard]] auto receive() const noexcept
-> std::optional<T>
{
std::optional<T> output = std::nullopt;
this->mutex_ptr->lock([&output](std::deque<T>& value_safe)
{
if (!value_safe.empty())
{
output = std::move(value_safe.back());
value_safe.pop_back();
}
});
return output;
}
private:
template<class J>
friend std::pair<Sender<J>, Receiver<J>> create() noexcept;
Receiver(std::shared_ptr<Mutex<std::deque<T>>> ptr) noexcept
: mutex_ptr(ptr) {}
std::shared_ptr<Mutex<std::deque<T>>> mutex_ptr;
}; // class Receiver<T>
template<class T>
std::pair<Sender<T>, Receiver<T>> create() noexcept
{
auto mutex_ptr = std::make_shared<Mutex<std::deque<T>>>();
return std::make_pair<Sender<T>, Receiver<T>>(
Sender {mutex_ptr},
Receiver {mutex_ptr}
);
}
} // namespace mpsc
template<class T>
class Sender;
template<class T>
class Receiver;
template<class T>
std::pair<Sender<T>, Receiver<T>> create() noexcept;
template<class T>
class Sender
{
public:
Sender() = delete;
Sender(Sender& other) = default;
Sender(Sender&& other) = default;
Sender& operator=(Sender& other) = default;
Sender& operator=(Sender&& other) = default;
void send(T&& valueToSend) const noexcept
{
this->mutex_ptr->lock([valueToSend](std::deque<T>& lockedQueue)
{
lockedQueue.push_front(std::move(valueToSend));
});
}
private:
template<class J>
friend std::pair<Sender<J>, Receiver<J>> create() noexcept;
Sender(std::shared_ptr<Mutex<std::deque<T>>> ptr) noexcept
: mutex_ptr(ptr) {}
std::shared_ptr<Mutex<std::deque<T>>> mutex_ptr;
}; // class Sender<T>
I know NIH syndrome but look at this implementation to ge ideas
this is just a channel
Not Invented Here syndrome
i personally prefer to make my own systems so i know exactly how they work, rather than learn a system i didnt make
class LoggerSingleton
{
public:
~LoggerSingleton() = default;
LoggerSingleton(LoggerSingleton& ) = delete;
LoggerSingleton& operator=(LoggerSingleton& ) = delete;
static void sendMessage(Message&& message) noexcept
{
static LoggerSingleton logger {};
logger.thread_sender->send(std::forward<Message>(message));
}
private:
LoggerSingleton() noexcept
{
auto [threadSender, threadReceiver] = seb::mpsc::create<Message>();
this->thread_sender = std::move(threadSender);
this->worker_thread = std::jthread(
[receiver = std::move(threadReceiver)](std::stop_token token)
{
// Main loop
while (!token.stop_requested()) {
std::optional<Message> val = receiver.receive();
if (val.has_value())
{
std::cout << static_cast<std::string>(val.value());
}
else
{
// TODO: refactor to use std::condition_variable
std::this_thread::yield();
}
}
// Cleanup Loop
for (;;) {
std::optional<Message> val = receiver.receive();
if (val.has_value())
{
std::cout << static_cast<std::string>(val.value());
}
else
{
break;
}
}
}
);
}
std::jthread worker_thread;
std::optional<seb::mpsc::Sender<Message>> thread_sender;
};
look at this simple use
better to do this than blackbox
but im too Idiot to write my own things as complicated as this
(not that copmlicated maybe idk)

this is just a simple asyncronous logger
however it's obvious whats being done and how this could be easily extended into a proper thread pool
say I wanted 3 printing threads
it's not that bad
take a look at this
get some ideas
i gotta go
i think using a system of shared and unique locks may do what i need 
no wait 
yeah the system of chained semaphores is probably what i need to ensure all jobs are ran before a region is unloaded for example 
im gonna go get some food and see if that fixes my blood sugar
its hard to think rn
there has to be a good way to do this 
tbh im kinda waiting for the idea to pop into my head suddenly, thats usually how i figure stuff out 
chained semaphores seems like the best way to handle this just be sure to actually setup multiple concurrent chains rather than just having one chain bouncing it's execution ebtween threads :P

also make sure you get it working insingle threaded first :P
yeah, i have a way to execute jobs on the main thread without spawning workers at all
im gonna take a break from trying to figure this out for now, depression got hands
Hi
What are the optimisations you are currently using
Frustum culling is really important i hope you know
Thats currently planned
So far limited block face culling is really the only optimization I've done
The rest is just trying to design stuff to run fast and well
So its not too bad on the cpu end, but its ridiculously heavy on the gpu end
However I know why and how to fix it
I'm currently stuck on making a proper world structure for world streaming, then I'm going to move onto proper full block face culling and then view frustum culling
Vertex buffer pooling is also something I should probably do
yeah vertex buffer pooling would help speed a decent amount, seems the main thread is spending lots of time loading geometry onto the gpu, probably because it needs to keep freeing and allocating buffers each time it loads a chunk
Both of those optimizations will become necessary very quickly 
Yes quite
also im quite interested to see its performance on devices like a pi4 and the atom n470 machine i have and those are the the only rendering optimizations i can really make
Okay, instead of sitting around waiting for me to figure shit out, I guess I could try and clean it up so its presentable and see what you all think of it so far
Tbh I'm not sure how I want to license it tho 
MIT all the way everytime.
Basically anyone can use it for anything provided they give credit.
Not sure, depends on what rights you want to reserve
Id go with MIT for smaller things 
With this i don't want it to be used for commercial purposes, unless I get something for that 
Personally I don't think I'd release any paid for software using this, but if someone is gonna try and make money off of it I'd like something 
I think the libraries I'm using are compatible with that
I need a license that is for FOSS but allows me to dual license for commercial purposes if I ever need to, which realistically I won't need to 
I could just release it initially with no license so I don't have to make an actual decision yet 
Make it FOSS but you have to request source code lmao
Can’t just view it
It must be mailed to you as a binder of paper

okay, the 8-9fps for 5 chunk render distance on a sandy bridge GT1 igp was not what i thought it was
that was with no culling at all 
Lmao
If I render five chunks I want to render it all
No culling

I paid for five chunks I’ll get five chunks

Fuck :((( goddammit
