#Logic gate simulator
1551 messages ยท Page 2 of 2 (latest)
I would also be concerned about cache efficiency with a large state space
Maybe something like a small, flat truth table that's indexed simple by considering the input bitset as an index could scale better. Like encoding truthiness for blocks of 8 inputs. Then you have instant lookup, and much more cache friendly.
Would have to think about how you'd go about compiling that though
i kinda dont want to just choose some random numbers tho ๐ค
8 seems arbitrary
Well, 8 bits have 256 possible combinations. Which is easy to encode
Next full step up would be 16, 65536 is a lot less manageable
Could do something in-between. It would just be about finding a balance between reducing number of lookups, and keeping the state space down
Only way to know is to profile though
maybe i could just use a Vec and then index it based on what bits are set
Yea that's what I meant. Although you could even just use a packed Bitset
That would decrease your truth table size by 8x
Does Vec<bool> in rust have a bitset impl?
like
index = 0
power = 1
for bit in bits {
if bit then index += power
power *= 2
}
output = outputs[index]
not afaik
thats why i got FixedBitSet
would this be good?
Assuming rust optimizes it away, should be fine.
It should just be unrolled and a set of shift and mask operations
i could also have 1 array for the outputs and just take a slice of it
like ```rs
start, end = outputs[index]
*output = output_array[start:end]
or smth
but idk how much better that would be
Yea, I'm not sure. It's hard to guess about perf, especially with these kind of algorithms.
Keep throwing stuff at the wall and see what sticks
i kinda need something working first before i can profile it 
True, maybe just get what you've already got in a state where you can profile it. Rather than going back and forwards
Once you've got numbers, you can tell whether you're going in the right direction
Can't you make a trait for every predefined component?
you mean enum variant?
yes
I was thinking about a component trait, but enum variant works too
well make usage of gta xd
wouldnt a trait require heap allocation? ๐ค
if i was going to store a list of them
well i mean a vec is heap allocation
but at least they arent all separately allocated
storing a Vec<Box<dyn SomeTrait>> would probably be slow
Yeah
cause more indirection
Well you can use both IG. Store them in an enum and use trait to define shared behaviour
meh
i dont mind using match
i have done it before and it ended up with alot of boiler plate inside the enum for just dispatching the function to all the variants
unless you implement Deref<Target = dyn Trait> ๐ค
never thought of that
Just thinking about using 8 bit truth tables as gates. To fix my lambda issue when moving to an AOT language...
Minimal branching logic required, certainly no more guarded dynamic dispatch. Just lots of index and mask operations.
Still lose the ability to express arbitrarily complex logic though 
i kinda want to make my own bitset at this point
this is just ugly
use logic_sim::*;
fn main() {
let not_truth_table = HashMap::from([
{
let mut input = FixedBitSet::with_capacity(1);
input.set(0, false);
let mut result = FixedBitSet::with_capacity(1);
result.set(0, true);
(input, result)
},
{
let mut input = FixedBitSet::with_capacity(1);
input.set(0, true);
let mut result = FixedBitSet::with_capacity(1);
result.set(0, false);
(input, result)
},
]);
let mut scene = Group::new();
let delay = scene.add_delay();
let not = scene.add_truth_table(1, 1, not_truth_table);
}
Do it
i found another lib which is at least a little more sane
let not_truth_table = HashMap::from([
([false].iter().collect(), [true].iter().collect()),
([true].iter().collect(), [false].iter().collect()),
]);
but not by much
You should not really use a hashmap tho...
You should really just use a single large bitset
And index based on what the inputs are
i did have this idea
but im just trying to get something working for now
is it just me or is discord lagging?
So for a 2 input 2 output node giving the input 0b10 would result in the bit index being 2
Yeah it lagged a second
yea that was my idea 
So turns out my UTF converter is actually rather fast xD
And the funny part is I use the worst function ever 
constexpr std::uint8_t UTF8Length(char8_t c)
{
if (c <= 0x7F)
return 1;
else if (c <= 0xBF)
return 0;
else if (c <= 0xDF)
return 2;
else if (c <= 0xEF)
return 3;
else
return 4;
}
That only calculates the length of one codepoint
But how is that faster than using intrinsics

you would probably have to look at what the intrinsics are doing
godbolt time
Like this runs under 20ms for 5 paragraphs of lorem ipsum 10'000 times.
Whilst using the intrinsic takes 178ms xD
It's even worse with clang 
at 201ms
Also the function that was given by someone is actually not working properly 
Yeah all the different intrinsics I try seem to be slower xD
Well I haven't tried to do the conversion with SIMD but oh well xD
You should check out Bob Steagall's UTF8 conversion video if you haven't already. https://www.youtube.com/watch?v=h5oczBeib_M
And this https://github.com/simdjson/simdjson has an absurdly fast UTF-8 parser in it aswell. The whole project is kind of just insane SIMD magic 
Prototyped a tick-less 8 bit truth table based engine, using a fixed circular buffer (backed with a flat-set to prevent duplicate entries) as an update queue. Supports any gate structure that takes up to 8 inputs and produces up to 8 outputs.
Just need to build an converter from my old engine's circuits so I can run the same stress-test on it.
MSVC made a meal of the assembly though, so probably gonna require some finagling to improve codegen.
Not sure yet how it will stack up to the JIT based version as it is, but it opens the door to SIMD via an SSE lut operation.
Testing on some simple circuits and getting increases in gate updates from ~53M/s to ~145M/s
(~2.73x). Prior to any code-gen optimization as well. Need to profile some more complex circuits better see how well it handles branching.
And a neat side-effect from dropping the global "tick" is that it naturally resolves (some) race conditions and stabilizes any weird global cycles.
Maybe i got distracted a bit xd
Well time to add some logic
I gues I will hardcode a few components but let the user select and create own which are loaded at the start or smth like that
But well later
First hardcoded
imo being able to update gates is probably just as important as the logic simulation. 'cause without it it'll be so annoying to build any circuits that you won't want to 
First thing I did was get a sandbox that I could just draw circuits in. No logic whatsoever.
One thing that is very useful to have in the editor is copy and paste, the sooner the better (for your sanity) 
Well I need some kind of selection
I will have a vec of components
So I probably need to go through the whole vec after the selection ended and check if they are in the selected area
I'm just using a HashMap<Vec3, WorldObject> currently.
When I rewrite, I'm planning to change how I represent connections so I'll probably switch to using a BVH or some other acceleration index
Well I will have a vec of components
And I want to store the index to the other components for the connections
but hmhm
What if I remove smth then everything is messed up
I could change it to a map and then recycle the key if the component is deleted
I just send connection events to neighbouring objects on placing / removing
Then there's a whole load of code in the objects themselves to fix up connections. It's a bit of nightmare
Gonna change it so that the objects/connections have no knowledge of each other, just a position and channel
Well I will make some kind of prototype and then refactor it later
Yea, don't try and be too clever about it right now, just iterate over every object every time. Worry about indexes and efficiency when you actually have enough components to cause issues
To be fair, that's more valid when you have technical debt in the first place. You have an excuse when you've got basically a blank slate 
Added some buttons to select the current gate to place
Looking good ๐
I will probably replace them later with smth else. I just want to make it working now xd
Well I think I will try to make it grid based
Okay I changed it to a dropdown
Now time to implement the logic
The fun bit ๐
Are you using raylib ?
Yes
ive almost gotten to 3d rendering in my custom engine that ive been working on 
i just need rotation matrices and perspective projection
No I mean the grid editor that u guys are working on
This
yea ill make a grid in it
just procrastinating making an actual logic sim because the logic sim that im trying to make keeps breaking 
I built my world representation to support 3D from the start. But just haven't built a 3D editor. The rendering isn't hard, but figuring out exactly how the interactions should work is.
the actual logic simulator should be identical for 2d or 3d
its just the rendering that has to be changed
Oh, btw I was experimenting with 8 bit luts and that seems to be a sweet spot.
Nice thing about 8 bits, is that you can use AVX2 gather to process 8 luts in one go if you wanna go SIMD.
hey asshole
i did it
it has
real time
logic gate
simulation
mostly accurate, but does not include power / ground

this includes a full logic description language
so, it has a parser
here's a d latch
the first (...) is input pins
the second one is output pins
the third one is internal pins
pins can have sizes up to 64 bits!
it includes about 10 builtin gate types: and nand or nor xor xnor not buf clock RAM
it indeed has a RAM controller module
which is also simulated
my next goal is to add primitive gates controlled by lua, so you can make wacky shit
i wouldnt call it a programming language
hardware description language, maybe
i didnt make it specifically for this challenge obv
i made it to simulate my 64 bit CPU
it has very little debugging capability, so you have to look at a large list of 1s and 0s and figure shit out from there
what is the '1 thing when declaring pins?
it's the size of the pin
'1 is 1 bit
'5 is 5 bits etc.
during compilation, those size information is lost and all is converted to 1 bit wires instead
so how does multi-bit pins work with the other gates ๐ค
does it just copy-paste the component + connections for every bit
no
basically theres a parsing phase and a compilation phase
say you have a nand gate made of 2 gates: and + not
module nand
('1 a '1 b)
('1 c)
('1 anb);
and (a b) (anb);
not (anb) (c);
end;
this is stored as a struct Module which has 2 gates (and, not)
this is the same
for all ICs you make
but the gates
are not copies
they're just a pointer to the module + the input / output pins
during the compilation phase
each gate is "synthesized" with that module and input output pins
what i mean is, when you have 2 5-bit pins and then you do AND (or some other random operation) on them, does it just copy that gate for every bit
oooh
you cant do and on more than 2 bits
you have to make an IC for it
ok i think i got it wrong again
sorry, can you draw a diagram or something
my head is fucked
ive been working on this for 8 hours
it's nearly 1 AM
module big_and
('5 a '5 b)
('5 output);
and (a b) (output);
end;
i think i have the right syntax
mostly yes
as i said you cant have that
module big_and
('5 a '5 b)
('5 output)
();
and (a[0] b[0]) (output[0]);
and (a[1] b[1]) (output[1]);
and (a[2] b[2]) (output[2]);
and (a[3] b[3]) (output[3]);
and (a[4] b[4]) (output[4]);
end;
this is what you have to do
i would add variable sized primitive gates
but they get clusterfucked too quickly
and break everything
- they're inconsistent
sooo
so making pins bigger than 1 bit is just for grouping? it doesnt seem to have any real functionality
also why dont you have to do a[0] when its 1 bit ๐ค
just doing a would mean the entire "group"
so if you have a 5 bit a
then a means a[0:4]
oh yes you can also have that
you can divide and concat groups
like, a[0:3]..b[3:4]..c[1] is a 7 bit wire
i dont think you get how important even basic grouping is
yea i dont see how its useful it is if it doesnt allow you to just do operations on N bits at a time 
you can do operations on N bits at a time
just not on primitive gates
see the example here
means low signal * means high signal
think of them like constant wires
outputting to them doesnt change their values
@merry oar as for things like this
i'll add a basic macro system
that can do these type of parallel gates
ah ok
maybe not, depends on how useful i find it
i mean you are making a 64 bit cpu, so there will be alot of gates to place
yeah but all ICs will be scaled accordingly
i wont be using 64 d flip flops, i will be using a Reg64
this might prove to be a problem with tristate buffers
but i dont have high impedance state rn
so theyll just be AND + OR combination
i added them
$identifier{start:end} text
basically it's like summation notation in math
or a basic for loop
for example $idx{1:6} idx means idx goes from 1 to 6 and the text is copied that many times with idx replaced with the number
so $idx{1:6} idx would translate to 1 2 3 4 5 6

nice, cool that you can also do math on it like idx-1
yeah
no multiplication / division though
should i add that?
You could, depends what level of abstraction you're going for
I kept everything to 8 bit luts as a limit
To keep everything suitably low level
this is constant arithmetic for selecting wires
Oh, in a circuit description language
yes
technically wires are always 1 bit wide but in the language you can have up to 64 bits
so wire[4:7] would be converted to a wire list of 4,5,6,7
May as well add multiplication/division
, if it's just for the language
obv
i wouldnt add any behavioural shit to the circuit stuff
everything's logic gates and wires
i have primitive gates that every IC is compiled down to, though
I just saw this comment from earlier and assumed from that you might do some more high-level stuff
basically every IC you make is truncated down to "primitive" gates that are builtin
so, AND, NAND, NOT, OR, NOR, XOR, XNOR, CLOCK, RAM
RAM and CLOCK were needed because they're impossible to make (i dont have resistors / capacitors, damn)
what does IC stand for? integrated circuit?
you could use a xor gate hooked into itself
then the second input would toggle it
I have memory implemented as a lut with feedback
well, you see, gates have delays
when you have 50 gates serially connected, that could take 100 femtoseconds to settle into a stable state
when you have an XOR gate hooked into itself that you call a "clock", that will be faster than any time a circuit would take to settle into a state
You can do latches without oscillation, but you need them to stabilize in a single tick
So I also had to implement latches into a single LUT, as opposed to creating them from multiple gates
you could maybe have more delays inbetween for a clock ๐คท, but i was meaning for the ram
They work as multiple gates, but it can result in oscillations
oh
for RAM, it is just a controller that has behavior in C
My clock inputs are all handled separately from gate logic. There is no "gate" for a clock
nice
how do you handle truth tables for things like flip flops though
because there is Q and Qnext
the moment you have wires that go backwards in the circuit, it's hell
The outputs just loop back to the inputs on the gate
The gate won't oscillate because the change in state only happens over one tick
The way I simulate gates is that there is no global "tick". Instead it's just a circular buffer of gates to update. When a gate input is changed it gets added to the tail of circular buffer. I use a O(1) lookup/update (just a bitset of all gate indices in the circuit) set to avoid double updating a gate
i see
i also have that
but not for double updating
im talking about the bitset
i have a global tick
i have 3 wire state arrays
1 for the current wire states
1 for the next tick's wire states
1 for the wire states that are set by the gates (this is for debugging, so a wire isnt set by two sources)
each gate has a natural delay and a randomized delay on top of that
except for the clock
which always ticks at a perfect rate
Oh so you have a non-deterministic simulation. I guess you can run that in parallel then?
I just use fixed timesteps
Means it only really works on a single core 
i dont think i could run it in parallel
emoji fail 
discord switched my emojis around 
gates having randomized delays does mean you get different values every time you restart the simulation
say you have a basic D latch made with 2 NORs
that could start in any state, depends on how it settles
well, there is no reason why i shouldnt be able to
Yea since you don't have to worry about strict deterministic execution, which causes all sorts of problems for me xD
Basically an eventually correct system
Which is neat
my goal is to simulate hardware as correctly as possible
but i dont have high/low impedance wire states
nor basic circuitry elements (voltage, power, current)
i do have what is needed to make a 64 bit CPU from scratch
it would be great if it wasnt piss slow, but theres not much i can do about that
haha, well you do have delays ๐
I just focused on getting as small a simulation loop as possible
So it runs about ~150M gate updates per second, where each gate can do whatever you can fit into an 8-bit lut
But basically no room for ever making it parallel
i have about 3-4 thousand gate updates per second
how are you handling the delays?
yes
technically it can only be off by 1
which is close enough to hardware
still way too much
but i cant really have fractions
i pretty much ran into the same problems as real hardware
had to ask some engineers to know how to proceed on some stuff
Yea, hard to balance perf and precision with your delays
well, with delays in general
yeah
im gonna need to calculate the amount of serially connected gates
and i have no idea how
if i dont, i cant calibrate the clock with precision
this is also a real life roadblock
Yea I can imagine 
your simulator sounds useful for debugging though
the way i debug circuits in my simulation is by looking at an array of 0s and 1s, then deducing if they're correct or not
Well, I do have the advantage of a GUI and visual circuits ๐
yeah, lol
But.. that is also it's own limitation
can take a lot of work to actually build a complex circuit when you've literally got to make it block by block xD
I did at least make copy and paste though 
lol
The roadblock I have is currently I'm limited by the connections I can express on a purely 2D grid
It supports 3D, but I don't have the UI to show that
so large circuits are largely infeasible currently
i see
i ran into a similar roadblock when using sebastian lague's DLS
- it wasnt accurate
- you couldnt fit more than 16-19 pins
one of the reasons why i made my own sim
lol
Fuck these emojis being reordered
i want to make a 64 bit CPU completely from logic gates and wires, which is a daunting task, but not impossible
i would use verilog but it was too hard, so i did this
this is actually harder than verilog but whatever 
I did that in Minecraft, so my original motivation was to build a sim that I could effectively rebuild it in, and get orders of magnitude more performance
But.. I never actually moved into 3D and then it's kinda sat there for a near enough a year now
im dedicating my life to this project
no way in hell it sits in the dust, lol
i even have a compiler set up for the CPU i will be building
Haha, fair
I kinda stopped to work on my renderer and engine, with the idea that I would go back and rebuild the sim in it
Which, is still on the table. But it's not ready yet
Path traced logic now I guess 
i stopped backend development of my compiler to get this stuff ready
i cant make a backend for an architecture that doesnt exist
soo
truue
i'll start off by making a few RISC 16/32 bit cpus though
i need to get the experience + get used to my own software
one of the roadblocks im currently facing is the fact that there is no source material for this stuff
because nobody programs in logic gates
they use HDLs
with behavioral stuff
self taught logic is the best kind
that and piecing together circuits from others breaking them down and rebuilding them better (or just to fit a different architecture)
I guess I've never been a "formal" learner. I just try things and see what works ๐
ive also been doing that a lot
but research also helps
sometimes you cant just reinvent the wheel
it takes a lot of time
yes
Gives you an interesting perspective on why people actual make the wheel a different way 
you go through all the steps other people went through
and then you realize
"damn, this is why they made the seemingly stupid decision thats actually a masterpiece"

my goal is to create a completely self hosted ecosystem of software
with no 3rd party software involved
ill even get it running on silicon if i have enough money for that
it is a huge goal
ive started working on it in june 2022
this logic stuff has been the last 1.5 months
i spent time perfecting my compiler in the previous 5 months
which will be bootstrapped once i get it working
it'll probably be over by 2030 or something

Night 
@sharp spindle I'mma just poke you, when you gonna rewrite your logic simulator in C++ with AVX-512 intrinsics?
ysn't
Nice
I was going to actually re-implement some 2D quad rendering in pyrite this weekend
So, that would be a base

Vertex expansion go brr?
What do you think I am, a pleb? Just instancing

Instancing was slower tho
Did you not see my extensive testing on which was faster
Well I mean I could assume that to be just as slow as vertex pushing 6 vertices 
I'm going to be profiling several things
4 vertices, and they're implicit, generated from quad instance data
Oh btw smartest thing is to make the quads only have a position, scale and rotation and then create the matrix on the gpu...
Or possibly a single mat2x3
position, scale, rotation 
position, up magnitude, aspect 

Another poke, cuz now me not too sure how I'd reimplement the logic simulator shit to use AVX512 tbh...
You'd need to figure out what gates you can combine together into simd operations at circuit compile time. Exactly how, no idea 
Cuz to use the byte lookup, I'd have to ensure the logic gates have 8 or less inputs...
If it's 4 inputs, I could have two in a single lookup
2 inputs would be 4 in a single lookup
and 1 input would be 8 in a single lookup
Well if output count is the same
Yeah, different complexities of gates logic and inputs would affect what you can pack
Ait I'mma just try to do the original non AVX-512 simulation and see if I can make that work...
And try from there
Just use std::function for every gate 

Sheets can
:>
Well normal gates would just be truth tables
Sheets would be like a function
All I have so far 
Basically finished 
Perfect
But yeah I'm still kinda unsure how you'd actually use AVX-512 to do lots of gates in one go...
Because normal gates take two inputs and produce a single output
You usually have 1000000 gates
Just vectorize gate groups
Thing is, a gate with two inputs and one output, will obviously compress the output by 50%
But what if it was two inputs and three outputs, then it would decompress the output by 50%
let on_table = Rc::new([true]);
let off_table = Rc::new([false]);
let xor_table = Rc::new([
false, // false false
true, // true false
true, // false true
false, // true true
]);
let mut world = World::default();
let on_gate = world
.add_component(Component::Table {
input_count: 0,
output_count: 1,
truth_table: on_table,
})
.unwrap();
let off_gate = world
.add_component(Component::Table {
input_count: 0,
output_count: 1,
truth_table: off_table,
})
.unwrap();
let xor_gate = world
.add_component(Component::Table {
input_count: 2,
output_count: 1,
truth_table: xor_table,
})
.unwrap();
world
.add_connection(Connection {
a_component: on_gate,
a_output_index: 0,
b_component: xor_gate,
b_input_index: 0,
})
.unwrap();
world
.add_connection(Connection {
a_component: off_gate,
a_output_index: 0,
b_component: xor_gate,
b_input_index: 1,
})
.unwrap();
world.update();
world.update();
assert_eq!(world.get_component_state(xor_gate), &[true]);
works ๐
there is a maximum of 64 inputs to a table
though tbh you would run out of memory to store the table well before that point
if i ever get around to doing this again, ill use compute shaders for the simulation
parallelizing logic simulation is actually easy af
think of it like this
you have 3 arrays in a basic logic sim
current wire states (bitfield)
next wire states (bitfield)
logic gates (array of structs that are input + output wire indices)
you pass the current and the next wire states as textures to the compute shader
and the logic gates of course
you'll just magically get a shit ton of performance
this is actually amazing my man
doing more optimizations on the "compilation" part of my sim, i think i could actually get a 16 bit RISC CPU running almost as fast as an actual emulator
That gets you a lot of potential throughput
but it doesn't address propagation delay
great if you have a many million of gate state changes per tick though
how do you center the dots/connection points on a component?
i cant seem to figure it out, though i might just be dumb
it does
in real hardware there is gate delay
this accurately simulates it
well, not too accurately
but still better than some half assed impls i see onlnie
I don't mean simulating it, I mean minimizing it ๐
oh i see
fn get_output_positions(position: Vector2, component: &Component) -> impl Iterator<Item = Vector2> {
let mut i = 0;
let rect = get_component_rect(position, component);
let center = position
+ Vector2 {
x: rect.width,
y: rect.height,
} * 0.5;
let output_count = component.get_output_count();
std::iter::from_fn(move || {
if i >= output_count {
return None;
}
let position = center
+ Vector2 {
x: rect.width * 0.25,
y: (i as f32 / output_count as f32 - 0.25) * rect.height,
};
i += 1;
Some(position)
})
}
```i could just be overcomplicating it, idk
you mean turning big components into truth tables?
My single threaded circuit simulator propagates about 200M gate updates per second
also why does raylib have holes in the circle 
So if you have a relatively sparse circuit, and you want to propagate a long sequence of gate updates, then going wide on the entire circuit and scheduling a huge number of invocations could be very suboptimal
really depends on the characteristics of the circuit though
i think we have different ideas on how a basic logic sim should work
my system runs on "ticks", and on each tick, every single gate is updated (depending on their delay, they might skip a tick. the order in which they are updated doesnt matter as 2 bitfields are used for current and next wire states)
this means if you have 20000 serially connected gates that will take 20000 ticks to fully propagate
but it also means you can parallelize this a lot
Yea that's a perfectly fine way to run a logic sim
handling sparse updates (potentially only a tiny fraction of gates actually need to be updated on each tick) and keeping tick periods to a minimum is crucial there
(if you have a circuit with sparse gate updates)
I've experimented with many full-circuit-tick designs, and my previous simulator runs exactly like that
how do you calculate the information of what gates need to be updated / pass that to the compute shader?
i can think of it being a bottleneck on the CPU
yea, I'm not really sure the best way of going about that that wouldn't end up a bottleneck
since different gates have different delays you could avoid updating gates that are currently skipping a tick
I've been experimenting with a new continuous ring buffer update based design, which doesn't have a global tick, and instead simply updates gates and enqueues affected gates to the ring buffer. But that's very much a single threaded design, not really any room for parallelization there
and that one doesn't accurately simulate delay
it's just there for raw single threaded performance
it also may perform worse than a paralellized approach
again, depends on the circuit. It was specifically intended to resolve the issue of long propagations
It propagates sparse updates extremely fast, but would chug if there's lots of updates that could be parallelized across a tick
i cant think of a way to mix the two
me neither 
do you do C?
C++, but in the style of C with classes
would be interesting, but I'm mostly working on my engine at the moment
happy to bounce ideas
nice
i already demoed a system that was "logic description language -> compiler -> logic gate table -> simulation"
im thinking of doing that exact same thing but better + faster
sounds neat, there's a lot of compiler level optimizations that you could apply as well. I never explored that because I was always focused on just directly simulating fundamental gates
you have to preserve gate delay in compiler level optimizations because that might break a circuit relying on it
say you have a module (set of gates, like an ALU / register) that takes 700 ticks to settle
you have to both figure out that tick count and make a truth table
and you will encounter the halting problem in some way i think (just a feeling)
say you have a NOT gate that has input and output on the same wire
you cant make a single dimensional truth table for this
when you introduce gate delays it becomes a cluster fuck
theres literally no way to optimize fast beyond a certain point
yea, that sounds.. non trivial 
the compiler doesnt have to figure out the tick count necessarily, the user could enter it ๐ค
at that point the user could just provide a truth table, lol
imagine this circuit (and say that the wire that loops around at the top is the output)
now imagine each NOT gate takes 2 ticks and each buffer takes 1 tick to propagate
(the tristate also takes 1 tick)
ooh yeah i forgot about tristates
if you include tristate logic in your sim, it needs 4 tables
current wire states bitfield, next wire states bitfield, wires set bitfield and the logic gates
plus, note that this is with fixed gate delays
the moment you try to be more accurate and add randomization to the delays, you get fucked in the ass threefold