#Performance

1 messages Β· Page 3 of 1

left night
#

If you're interested I can send it anyway

sturdy sequoia
left night
tight glade
#

Very nice read as well!

I'm wondering, is it always the best option to use span for memoization? or could we do that more subtly when needed for reasons?

I mean, what other information do we have access to at that stage? πŸ€”

glossy shore
#

Wouldn't Arc<(Hash, RestOfData)> already store the hash alongside the refcounts?

#

Call me foolish, but It's All Just One Instruction Anywayβ„’, no?

left night
left night
sturdy sequoia
#

@left night I kinda want to work on the whole "bytecode" thing, would you be open to that?

left night
sturdy sequoia
#

I don't know what that means πŸ’€

left night
#

it's obviously not completely free from existing constraints, but it involves building lots of new free-standing stuff

sturdy sequoia
#

Yes, I think I'll start with a fairly basic system that doesn't do any kind of smart checking (like validating args beforehand) to try and make it simpler, then grow the complexity (and performance) as I go along

untold turret
sturdy sequoia
untold turret
#

So there is no bytecode in typst, and you want to make one.

sturdy sequoia
#

yes

lunar kettle
sturdy sequoia
#

eval in this case refers to the module inside of the typst source πŸ˜„

untold turret
sturdy sequoia
#

@left night do you know where values are collected? Like joined into one big value for returning 🀨

#

Ah, I found it sorry for the ping

lunar kettle
# sturdy sequoia all typst code

So my understanding is correctly, currently typst code is interpreted by walking through the tree basically, and you want to do something like in Oython where it’s first compiled to Bytecode and then the bytecode gets run instead?

sturdy sequoia
lunar kettle
#

I seee

sturdy sequoia
#

The goal being to make eval blazzing fast πŸš€ and easier to understand (hopefully)

lunar kettle
#

Sounds like a huge undertaking tho haha

sturdy sequoia
#

Actually: not that much πŸ˜„

lunar kettle
#

Would you generate your own bytecode language or are there libraries for that

#

Or how exactly

sturdy sequoia
#

Because I just need to traverse the tree and generate OpCodes instead

#

And then eval the opcodes which is fairly simple

lunar kettle
#

Icic

sturdy sequoia
#

At present it can already do basic operations

lunar kettle
#

Well keep us posted πŸ’ͺ

sturdy sequoia
#

mind you I am not compiling to bytecode yet

#

My goal is eventually to add (primitive) type checking in it (once we have type validation) and check whether variables, functions, etc. exist

left night
sturdy sequoia
#

Later on we might even be able to optimize the bytecode during eval, but that's wayyyyyy out of scope for my early prototype

left night
#

@sturdy sequoia This is more fun than dealing with subtle bugs in the styling system, eh? πŸ˜„

sturdy sequoia
#

The thing is that I don't understand all of the complexity of the styling system so I feel a bit dumb lmao

cunning wadi
#

it might be a good idea to do a register based interpreter instead, while you are already rewriting the interpreter

#

that should increase the performance over a stack based vm

sturdy sequoia
#

That's literally how I'm handling producing the output and joining of values

#

and it's dead easy

#

I'll probably remove the Variable stack value mind you

cunning wadi
#

I guess for joining with a register based vm, we would simply need a join instruction: JOIN accumulator, return_register

sturdy sequoia
#

how would you handle name and spread arguments tho?

#

My idea currently is that calling a function looks like this: Call(function_id, arg_count) and then I pop arg_count arguments and if they are NamedArgument or SpreadArgument I apply the proper behaviour

#

would you essentially do the same?

#

I'm just not sure I want to deal with the complexity of register based VM 😐

#

because (I think) it'll make the compilation much harder

cunning wadi
#

here's how lua 5 does it

sturdy sequoia
#

Hmmm, I see how that leads to a smaller eval indeed

#

It also has the advantage that we don't need to allocate like ever

tight glade
sturdy sequoia
tight glade
#

You did xD

glossy shore
#

We need function signatures to simply not exist at evaltime

#

Oh god that means we will have two entirely distinct compilation passes
terminology is gonna aa

hoary dew
sturdy sequoia
sturdy sequoia
#

I am now looking into the register-based system that @cunning wadi suggested and I wonder if I can use that to make it faster

#

But obviously, I will need to do a lot of testing and benchmarking to make it work

glossy shore
sly pecan
sturdy sequoia
#

it's a bit tricky for those

glossy shore
#

mmnope

#

namedβ€”not any different, just have some canonical ordering
optionalβ€”optional is a call site privilege, simply substitute the value in the bytecode

#

though dynamic dispatch is harder.

sturdy sequoia
#

@glossy shore my current design is as follows:

  • The compiler builds a list of instructions (for which each has a span in a second array)
  • The compiler also builds a list of constants (to avoid cloning and keep instructions copy)
  • And a list of jump labels
  • Then an Executor is created which is very lightweight, containing a fixed-size array of register (initially all Value::None)
  • It executes each instruction one-by-one borrowing as much as possible (to avoid cloning)
#

My instruction set is around 23 instructions so far

#

I'm definitely missing some, for example you cannot build a dict or array at the moment, but I'll add that

#

Cannot do show-set rules either

#

But I'll work on that now

onyx furnace
#

bytecode will be the road to jit🀩

sturdy sequoia
#

Each instruction is only 8 bytes which imo is perfectly fine

sturdy sequoia
onyx furnace
sturdy sequoia
#

I still think that for now I'll handle arguments and variables as special values that are accessed with an ID

sturdy sequoia
#

we'll see how that ends up working!

#

the hardest part will be building the compiler itself I think

untold turret
#

Also possibility to browser side layout computing? If we can convert the bytecode, we can layout and produces frames by executing the bytecode dynamically in browser, without having to downloading entire 17mb compiler.wasm to client.

#

If the bytecode could be lowered to wasm, we can even use browser native wasm interpreter.

left night
#

To lower the bytecode to wasm, it would need to operate on a very low-level basis. I think the planned implementation would continue using all of Typst's data structure and runtime infrastructure, meaning that layout is separate from it.

sturdy sequoia
untold turret
sturdy sequoia
#

My goal is a nice performance bump for now

sturdy sequoia
#

Do you think it's a good idea to do the following:

  • Recusirve decent in the tree, each nodes gets compile to an instruction but has a register that keeps increasing every time we need one
  • Do a second pass once we have collected all of the instructions deduplicating them: essentially once a register is no longer used, mark it as "free" and give the next instruction that needs a register access to it
#

That's essentially a first recursive pass, followed by a second linear pass

#

note that currently locals are stored in a special local store and not in registers, same with arguments. Therefore only values that are not bound are ever stored in registers

sturdy sequoia
#

@left night do assignments return a value in typst?

glad urchin
#

i believe they return None

sturdy sequoia
#

Yes, so in my case they won't return anything

glad urchin
#

wdym?

sturdy sequoia
#

well in my case, an assignment is just a single instruction

#

it won't store anything in any register

#

because where it's used will automatically and always be a none

glad urchin
#

yea but what if the function only has an assignment?

sturdy sequoia
glad urchin
#

and what if the assignment is used in an expression?

sturdy sequoia
glad urchin
#

well

#

then it does return something in the end

#

:p

sturdy sequoia
#

well no, the operator doesn't, it will just get automatically set to a None

#

Basically, register 0 is always a None

#

That way

glad urchin
#

well, if you ensure it behaves exactly as it does now, then i dont see a problem

sturdy sequoia
#
Join 0 1 1

is easy to detect

#

My goal is to be able to remove useless assignments πŸ˜‚

#

And useless ops that use a none

glossy shore
glossy shore
sturdy sequoia
#

Register re-mapping is implemented

#

😎

#

now I just need to finish implementing the VM

#

and it should work?

lunar kettle
#

how long do you think itll take? πŸ‘€

sturdy sequoia
#

I ended up being able to re-use a lot of the existing infrastructure so it's not too bad

#

What I mostly need to do is implement the VM, and optimize everything with IDs (which is already mostly the case, I just need to do it for scopes)

#

The one big thing missing right now is that you can't do any kind of scopes, there is only one scope: the global scope

#

However once I'm done, it'll be okay I think

lunar kettle
#

keep us posted! :D

#

looking forward to seeing how much speed it ends up gaining

sturdy sequoia
#

One of the big changes mind you is also that I'm turning it from a clone fest to a borrow fest

#

which should help reduce cloning a TON

#

And it should be able to much more agressively memoize compilation

lunar kettle
sturdy sequoia
#

I don't know yet

#

I'm kind of stuck on closures πŸ’€

lunar kettle
#

Maybe it’ll be worse πŸ˜‚

sturdy sequoia
#

Now that's just mean πŸ˜‚

#

The main advantage is that it should cache much better, less instructions, etc.

lunar kettle
#

Like that

#

Nah I’m joking

#

I’m sure it’ll be worth it πŸ’ͺ

sturdy sequoia
#

I hope so too

#

it's a lot of work πŸ’€

#

and it's nowhere near merge-quality code yet

tight glade
#

You know I'd love to review help and document right?

tight glade
sturdy sequoia
#
pub struct CompiledClosure {
    /// The span of the closure.
    span: Span,
    /// The instructions that make up the closure.
    instructions: Vec<Instruction>,
    /// The spans of the instructions.
    spans: Vec<Span>,
    /// The captured variables.
    captures: Vec<Capture>,
    /// The default values of the named parameters.
    /// To be loaded into the closure's scope.
    defaults: Vec<Register>,
    /// The number of local variables.
    locals: usize,
    /// The constants of the closure.
    constants: Vec<Value>,
    /// The strings of the closure.
    strings: Vec<String>,
    /// The patterns of the closure.
    patterns: Vec<Pattern>,
    /// The closures of the closure.
    closures: Vec<CompiledClosure>,
}
#

Now that's freaking fat

sly pecan
#

Thicc

sturdy sequoia
#

About half of the instructions are implemented

feral imp
#

If you improve the performance of the typst language, will that improve the performance of the compilation of documents?

tight glade
#

Is eval a big bottleneck?

onyx furnace
#

not really. for oi-wiki, most eval time used is for wasm plugin: generating qrcode. and eval takes about 20%-30%

#

for smaller docs things may become different

#

also might be different for docs which make heavy use of scripting

#

but bytecode and jit is very very cool. i really want to see how it looks like

onyx furnace
sturdy sequoia
#

it depends

#

If you use libraries like tablex or codly, have complex templates, or do any sort of compute? yes

#

oi-wiki is really special because it's bottlenecked by the wasmi runtime

#

but most docs I have seen (so far) did have quite a bit of compute and decreasing our reliance on this will definitely help

#

Will it 10x typst's perfs? no

#

Will it 2x them? maybe, depends on the document

sly pecan
sturdy sequoia
sly pecan
sturdy sequoia
sturdy sequoia
#

(well a decent chunk was written by ChatGPT πŸ˜‚)

sly pecan
sturdy sequoia
#

and I wrote it by hand

#

so it's hand written assembly

tight glade
#

Thanks yous for your answers!

sturdy sequoia
#

Ok, so the compiler mostly works, the following code gets compiled into:

= Hello, world!
This is a more complex example.

#lorem(300)

Gets compiled to the following instructions:

consts: [ Space, Text("Hello, World!"), Text("This is a more complex example."), Parbreak, 300]
isrs: [
  Set { register: Register(1), value: ConstId(0) },
  Join { lhs: Register(0), rhs: Register(1), target: Register(0) },
  Set { register: Register(2), value: ConstId(1) },
  Join { lhs: Register(3), rhs: Register(2), target: Register(3) },
  Heading { span: Span(1), level: 1, body: Register(3), target: Register(4) },
  Join { lhs: Register(0), rhs:   Register(4), target: Register(0) },
  Set { register: Register(5), value: ConstId(0) },
  Join { lhs: Register(0), rhs: Register(5), target: Register(0) },
  Set { register: Register(6), value: ConstId(2) },
  Join { lhs: Register(0), rhs: Register(6), target: Register(0) },
  Set { register: Register(7), value: ConstId(3) },
  Join { lhs: Register(0), rhs: Register(7), target: Register(0) },
  LoadModule { module: ModuleId(0), local: LocalId(58), target: Register(8) },
  Args { target: Register(9) },
  Set { register: Register(10), value: ConstId(4) },
  ArgsPush { args: Register(9), value: Register(10) },
  Call { callee: Register(8), args: Register(9), target: Register(11), math: false, trailing_comma: false },
  Join { lhs: Register(0), rhs: Register(11), target: Register(0) },
  Set { register: Register(12), value: ConstId(0) },
  Join { lhs: Register(0), rhs: Register(12), target: Register(0) }
]
#

What do y'all think?

#

(especially @left night @onyx furnace @tight glade @glossy shore and others that were interested like @sly pecan)

#

As you can see all names, etc. are completely removed instead using indices to try and make accesses much faster

untold turret
sturdy sequoia
sly pecan
sturdy sequoia
#

Otherwise pretty much is implemented

#

show set rules I plan on supporting using a rule stack that changes the behaviour of the Join instruction because it's the easiest

#

And scoping is more about me being lazy πŸ˜‚

onyx furnace
#

how many registers do we have?

sturdy sequoia
onyx furnace
#

does the number of regs matter? I dont know about bytecode design but I feels like there is not a 1:1 mapping between logical regs and physical ones

#

And I think we also have a memory?(If we use up all the regs)

sturdy sequoia
#

But apart from that, not really

untold turret
#

I think they all are in memory. And we may not have to bind them to physical registers in initial impl.

sturdy sequoia
#

the instructions are quite high level in order to reuse as much as possible

onyx furnace
#

Would it be better if we have exactly 0 reg and only relies on stack? And we may do clever reg allocation things after we want to JIT/compile to native code.

sturdy sequoia
#
pub struct Executor<'a> {
    /// The instructions to execute.
    instructions: &'a [Instruction],
    /// The spans in the instruction set.
    spans: &'a [Span],
    /// The labels in the instruction set.
    labels: &'a [usize],
    /// The constants in the instruction set.
    constants: &'a [Value],
    /// The closures in the instruction set.
    closures: &'a [CompiledClosure],
    /// The strings in the instruction set.
    strings: &'a [EcoString],
    /// The scopes used in the instruction set.
    scopes: Scopes<'a>,
    /// The locals used in the instruction set.
    locals: Vec<Value>,
    /// The captured locals used in the instruction set.
    captured: &'a [Value],
    /// The arguments used in the instruction set.
    arguments: &'a [Value],
    /// The current register table.
    registers: RegisterTable,
}


#[derive(Clone, Debug, PartialEq, Hash, Default)]
pub struct RegisterTable {
    pub registers: [Value; REGISTER_COUNT as usize],
}
sturdy sequoia
#

Additionally, it makes the compiler insanely easy to write πŸ˜‚

onyx furnace
#

sounds like cisc vs riscπŸ˜‚

sturdy sequoia
#

Some instructions run a TON of code behind the scene lol

sly pecan
#

HISC

sturdy sequoia
#

Especially closure initialization

sly pecan
#

humongous instruction set computer

sturdy sequoia
#

But each instruction is 32 bytes

untold turret
sturdy sequoia
#

maybe 64 but no more

#

note that currently locals are not stored in registers (which I admit is a bit weird)

#

My plan is to convert locals to registers soonℒ️

#

But I'd like the whole thing to work and be debugged first

untold turret
#

Make instructions having infinite registers brings benefits to static analysis. I don't know whether it introduces overhead to bytecode compiler.

sturdy sequoia
#

currently the compiler actually uses infinite register then does a second pass to reuse registers and optimize

#

My goal with this is mostly to avoid using too much memory and to avoid allocating

#

but we could definitely do this but not put a hard-cap on the number of registers

untold turret
#

Sounds good

sturdy sequoia
#

Like optimizing without limiting to 32 registers

#

My goal is also (maybe) to have multiple sizes of register pages and depending on the function use a bigger or smaller one among multiple sizes

#

to try and be as cache and memory efficient as possible

untold turret
#

We may also have comemo on executing blocks.

#

And using Defer<T> to send a sufficient big instructions batch to another thread to execute...

tight glade
#

Like we could start by allocating 32 registers but grow as needed! I can take a deeper look this evening, where can i take a look at the implementation? ❀️

sturdy sequoia
sturdy sequoia
untold turret
#

Many crazy optimization..

tight glade
#

Istrs 😬

#

At that point just instructions πŸ˜‚

tight glade
untold turret
sturdy sequoia
tight glade
#

Skillissue chat gpt

tight glade
#

But typst users will never see it

sturdy sequoia
#

@untold turret @onyx furnace I actually really like the idea of having a variable number of registers, because it also removes the need to distinguish between: locals, captured, and arguments as being "special" cases which simplifies everything

#

So I think that as soon as it all works, I'll be implementing that way πŸ˜‰

untold turret
#

If we don't target to utilize physical registers in "part 1", I think it is a really good decision. But I heard you had restricted it, so I thought the limiting registers is easy.

sturdy sequoia
#

but if all of those need to be in registers, it makes the compilation much harder

#

because you'll need to reorder things just to make room

#

or have a stack

left night
sturdy sequoia
#

I was already doing it because handling styles was basically impossible

#

The way it works is as follows:

  • Each markup, block, etc. creates a join group, when in a join group, the Join instruction basically appends the value to the group (which is pre-allocated to a size close enough to the output)
  • When a markup, block, etc. ends, it pops the join group saving it to a register (as one big content)
  • Whenever a style rule is encountered, it does the same but pushes it into a StyledElem instead (so a special kind of join group).
#

If a join group is empty, it just produces an empty sequence elem, etc.

#

My idea is that we can therefore pre-allocate a lot, and then we can just build the sequence from the arrays (EcoVec) directly

#

Additionally, I have some smart to handle subsequent show and set rules where they get collected into a single Styles to try and save memory and decrease the depth of the finaly Content tree

#

I plan on writing a LOT of docs in the module to explain how it all works, as well as I have commented basically every line of the compiler

#

Because the VM is actually quite complex, but this complexity is really aimed at improving performance

#

I also need to make the compiler use TrackedMut<Compiler> that way I can just #[comemo::memoize] each block πŸ˜„

left night
sturdy sequoia
#

I might just have two functions one which isn't memoized and based on the size then decide which one to take

left night
#

When a block has a lot of assignments, the mutable constraints will probably tank performance. If it's mostly pure, it could be okay.

sturdy sequoia
#

Because it sounds... hard

left night
#

What's remote assignment?

sturdy sequoia
#

remote assignments = assignments in a parent scope

left night
#

ah

sturdy sequoia
#

can a closure do a mutable assignment (in the parent scope)?

#

I hope not πŸ˜„

left night
#

Overall, I think caching less and smarter (not even every function call) might be more the way to go. Less hashing, lookup, constraint, and memory overhead

left night
sturdy sequoia
#

'cause I was worried πŸ˜‚

#

Then mutable assignments should be relatively easy

#

Just need a Scopes::enter and Scopes::exit call and then it's good πŸ˜„

left night
glossy shore
glossy shore
left night
#

perhaps, it'd be worthwhile to do more struct-of-arrays? a Vec<u8> with opcodes and then one supplemental maybe Vec<u32> for common index arguments and then something for rarer arguments. but it starts complicating things more.

#

By the way, I suspect import "..": * is gonna be a pain to implement

#

Right now the imported string can even be dynamic, but I'm thinking about forbidding that. For import, I think almost nobody uses it, but for include some people do. Not sure.

#

It's also a security concern should we allow some sort of URL imports: #1176122103355953162 message

glossy shore
#

And exactly how comemo-able is this bytecode?

glossy shore
glossy shore
left night
glossy shore
glossy shore
#

we could also allocate it all at once beforehand

sturdy sequoia
sturdy sequoia
#

I am thinking of compiling and evaluating on-the-fly and cross my fingers that it's enough

sturdy sequoia
sturdy sequoia
#

for include it's fine because I can just evaluate it at runtime (relying on comemo for fast compilation)

#

but letting import be dynamic is a big no-no

glossy shore
#

as its linear

sturdy sequoia
left night
#

my problem is that if the URL is dynamic (even if it's a git URL), you can exfiltrate arbitrary data via the URL

glossy shore
#

what exactly is the attack vector here?

left night
#

The whole read ssh keys, embed them into the PDF in some invisible way and get the person to send you the PDF is imo not that big of a deal. but this here would be a pretty big attack surface imo.

sturdy sequoia
#

Ok, I reduced instructions to 16 bytes

sturdy sequoia
#

But indeed, leaking stuff through URLs would be 100x worse

#

I brought it down to 12 bytes

#

and that's bigger than I'd like

#

but it's pretty good

left night
#

what do you think of SOA for instructions (struct of arrays)?

sturdy sequoia
#

I meant to ask for clarifications but then I forgot

#

like just having an array of u8 and decoding instructions on the fly?

sturdy sequoia
#

Ok, I found the culprit and it's now down to 8 bytes

#

That means that each cache line can contain 8 instructions, I'd argue that's about as good as it gets

left night
#

Since not every instruction has the same structure, it would require some cleverness

#

But 8 bytes is pretty good

sturdy sequoia
#

I kind of want to keep it reasonably simple for when I have to debug it πŸ˜‚

#

BTW @left night how can I chain two Styles

#

As in, I already have a style and I want to add one more

#

(some cleverness to avoid nested StyledElem)

feral imp
#

@atomic violet isn't this stuff more like your speed?

left night
sturdy sequoia
#

And I want to keep allocations to the bare minimum if I can avoid it

#

But of course we can revisit it once it actually work

#

Only things missing now are import, include and a few instructions like the ability to store in remote contexts, and the ability to run iterators (for x in y)

#

for which the instructions are there, they're just not implemented

feral imp
#

Amazing. Atleast someone is dedicating their weekend to typst. A hero amongst mortals.

sturdy sequoia
#

Oh, and destructuring values is not yet implemented either!

feral imp
#

I retract my praise then.

atomic violet
tight glade
#

Which is hella cute

sturdy sequoia
#

I have done it like this:
destructuring takes a pattern, the pattern indicates in which locals to store values

#

pretty rudimentary atm

tight glade
#

If the values are already stored somewhere we can just change which register is pointing to it right?

atomic violet
sturdy sequoia
#

it's the joining mechanism in typst

#

like joining multiple pieces of content into one big content

atomic violet
#

I guessed that much, but what does it do in the inside? Like create a tree node, or push to a vector of things to join, or ...?

sturdy sequoia
#

but it has the concept of a context, which is a vector of vector to handle nested joining

atomic violet
#

why is it signature so weird then? shouldn't it only have "where to push" and "what to push" arguments? Like lhs += rhs instead of target = lhs + rhs?

sturdy sequoia
#

now it would just be Push(capacity), Join(what_to_add), Pop(where_to_store)

#
    /// Push a new join group
    JoinGroup {
        /// The capacity hint of the join group.
        capacity: u16,
    },
    /// Pop a join group
    PopGroup {
        // The register in which to store the result.
        target: Register,
    },
    /// Join two values.
    Join {
        /// The value to add to the join group.
        value: Register,
    },
atomic violet
#

ah, i see, that makes more sense

sturdy sequoia
#

styles = show and set rules

#

I've tried to make things fairly simple and clever

#

but there ends up being quite a few moving parts

#

But I think it's okay

#

Because I'm trying to reuse as much of the existing logic as possible, most notably all of the ops::... like ops::add(rhs, lhs) that already exist in the code base

sturdy sequoia
#

@left night do we want to keep this feature:

        if let Some(sink_name) = sink_name {
            if let Some(sink_pos_values) = sink_pos_values {
                remaining_args.items.extend(sink_pos_values);
            }
            vm.define(sink_name, remaining_args);
        }
#

If a sink is unnamed, the other arguments get defined as variables in the scope

#

This feels very error prone?

#

And is basically completely incompatible with my design 😦

glad urchin
sturdy sequoia
glad urchin
#

uh

#

well

#

the idea is that you should be able to take any args without caring about them

#

that's all

sturdy sequoia
#

For now I'll just treat .. as "discard everything else"

glad urchin
#

yea thats exactly what it is

#

not sure if i understand then

sturdy sequoia
#

yes figures, because it defines the variables which is super weird

glad urchin
#

what's the issue?

glad urchin
sturdy sequoia
#

ah no

#

okay

#

my bad

#

I had misunderstood the code, it was a bit confusing with the if let Some(..) = .. everywhere πŸ˜‚

glad urchin
#

:p

#

hapepns

sturdy sequoia
#

ok, so it's the right way πŸ˜„

glad urchin
#

but yeah you can just throw it all away

sturdy sequoia
#

it's...

#

kind of the point πŸ˜„

#

throwing all of the eval code into a volcano πŸ˜„

#

😈

tight glade
#

Question, when we want to change the evaluation , will we have to work with your new bytecode already or is there some level of abstraction?

Like, is the eval trait still a thing? Does it returns bytecode now?

sturdy sequoia
#

Note that the output may very well be Register::NONE in which case it means it returns nothing

#

As an example:

#
impl Compile for ast::MathPrimes<'_> {
    fn compile(&self, compiler: &mut Compiler) -> SourceResult<Register> {
        let value = compiler.const_(
            PrimesElem::new(self.count()).pack().spanned(self.span()).into_value(),
        );
        let register = compiler.reg();

        compiler.spans.push(self.span());
        compiler.instructions.push(Instruction::Set { value, register });

        Ok(register)
    }
}
#

Here is how ' in an equation gets compiled

#

later on, I plan on remplacing the manual span and instruction push with:

#
compiler.set(self.span(), value, register);
#

Which I think will be much nicer

#

Same with removing the .into_value() by making const_ take an impl IntoValue instead

#

Which would lead to:

#
impl Compile for ast::MathPrimes<'_> {
    fn compile(&self, compiler: &mut Compiler) -> SourceResult<Register> {
        let value = compiler.const_(
            PrimesElem::new(self.count()).pack().spanned(self.span()),
        );

        let register = compiler.reg();
        compiler.set(self.span(), value, register);
        Ok(register)
    }
}
#

Obviously some stuff like loops are much harder to do, but I plan on creating a nice-ish API for branching as well

#

Now here are the updated items missing:

  • Destructuring (the instruction exists but is not implemented)
  • Iteration (the instruction exists but is not implemented)
  • Import and include of other files
#

This means that I should be able to start compiling simple docs!!!!

sturdy sequoia
#

IT WORKS BABYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

#

I AM SO FREAKING GOOD

#

GODDAMN

#

@left night ferrisWink

sly pecan
#

now do your thesis

sturdy sequoia
#

missing modules and closures don't work for some reason

sly pecan
#

skill issue

feral imp
#

There's still Sunday.

#

It's just so absurd to have a jit for a typesetting language. I just can't fathom this.

#

Not a jit but whatever this is.

sturdy sequoia
feral imp
#

I know, I know.

#

It's still wild.

sturdy sequoia
# feral imp It's still wild.

if you think of typst more as a jupyter alternative where you can build your visualizations (graphs, etc.) directly in, then I think it makes perfect sense

#

But tomorrow I won't have time to work on it

#

and it'll likely have to wait until next w-e

feral imp
#

Another blanket shall be delivered. Good work!

lament fulcrum
#

a jit would be fun though

#

and maybe even doable with cranelift as a pure rust dependency

tight glade
#

What do you think of naming that compile trait ByteCode instead? (yes I love bikeshedding :p)

sturdy sequoia
#

I think it overall makes more sense

tight glade
#

compiling the module into what? modules? 🀣

left night
#

and the thing is called the compiler of the Typst typesetting language

sturdy sequoia
#

but the function internally is called typeset

glad urchin
#

well yea i guess Compile could be a bit generic perhaps

#

maybe ByteCodeCompile?

#

or VmCompile

sturdy sequoia
glad urchin
#

yea

#

ByteCode on its own is probably a better idea than ByteCodeCompile

#

though VmCompile doesn't sound that bad either

tight glade
tight glade
sturdy sequoia
#

Stop bikesheding angryeyes

left night
sturdy sequoia
#
fn typeset(
    world: Tracked<dyn World + '_>,
    tracer: &mut Tracer,
    content: &Content,
) -> SourceResult<Document> {
#

I mean you wrote it πŸ˜„

left night
#

but it's not the full compilation

#

it's just the post-eval phases

sturdy sequoia
left night
#

I thought you wanted to rename the public compile function

sturdy sequoia
#

now we have
compile -> eval -> typeset -> export internally

sturdy sequoia
left night
#

I'm not sure I follow. Would you rename any of the existing functions?

sturdy sequoia
#

I am just defending the Compile name for the trait

#

πŸ˜‚

left night
#

ah

#

I think it's okay

sturdy sequoia
#

Thanks ❀️

left night
#

I would probably also have called it that

#

But I guess I also have named both Trace and Tracer which are completely unrelated.

#

on an unrelated note: do you use cargo check or cargo clippy as rust-analyzer's check command?

#

because regarding performance I have found that going back to cargo check makes my IDE much more performant.

#

a bit unfortunate that I have to call cargo clippy manually now to ensure that CI will pass

cunning wadi
#

are you using auto-save?

left night
#

no

#

but it still takes between 5-15s to update diagnostics

#

after save

cunning wadi
#

that is a lot

#

much more than on my machine for sure

left night
#

which is why I'm actually saving less and more after having done a batch of changes

sturdy sequoia
left night
cunning wadi
#

I did not, let me check

left night
#

it's a little slower since the two main crates were merged

#

the diagnostics also come step by step. when doing a larger refactor, I can see folder by folder getting red

#

as mentioned before, for me (in VS Code) diagnostics completely disappear when saving and only come back once the new ones are available (even the unchanged ones). which is a bit of a pain.

left night
#

typst t shorthand could also mean test I just noticed and, in fact, test is a subword of typeset.

#

for some definition of subword

left night
# cunning wadi that is a lot

I should also add that it differs. if the project has currently many errors, it takes longer, closer to 15s maybe. if it's in a compiling state, closer to 5s.

cunning wadi
#

and I've been running a program for a few hours now that constantly takes up two threads

#

might just be the processor though

left night
#

I must admit that I haven't actually measured. I just know that it feels pretty slow at times. But I think the worst times might have been on battery rather than plugged in.

#

But even though M1 is impressive, your desktop might also just be faster.

cunning wadi
#

yeah, it's probably that

#

(after mainly using the work laptop for a while I definitely do notice the improved performance, even when just navigating in the editor)

#

(though that one has got an eighth gen low powered intel processor, so, not exactly the most powerful device on the market)

sly pecan
#

The difference between desktop cpu and a laptop one is that the desktop one can operate at max power for an extended period of time

#

Laptop ones cannot for thermal reasons

cunning wadi
#

yeah

sturdy sequoia
#

so the difference in continuous TDP shouldn't matter that much

feral imp
#

when I write pointer-code, unsafe code, clippy is too much anyways, and will fail code that builds/compiles. I'd use cargo check as r-a check command as a standard..

cunning wadi
#

that sounds like the unsafe code might be doing things it should not be doing

sturdy sequoia
#

I'm happy to say I fixed closures!

#
= Hello, world!

#set text(fill: red)
The lazy fox jumped over the brown dog.

#let call_once(fn) = { (fn)(100) }

#call_once(lorem)
#

This for example can already compile πŸ˜‰

#

So it's really getting there! πŸ˜„

tight glade
#

lovely πŸ˜„

sturdy sequoia
#

Ok, I found a very sneaky bug with register remapping, I'm pretty sure it's correct now πŸ˜„

#

(basically I could remap the output register without re-assigning it)

#

Ok so I'm having issues with captured values in closures, I'll try and figure it out tomorrow

sturdy sequoia
#

@left night How are # before a value rejected normally? Because from what I can tell it should be at the lexer level? But for some reason it's allowing it in code mode 😐

#

ah no okay I need to extract the errors separately my bad

sturdy sequoia
#

I fixed closure scopes and captured values πŸŽ‰

#

And now I fixed scopes during eval

sly pecan
sturdy sequoia
#

We can now iterate over values πŸŽ‰

#

only missing now:

  • destructuring
  • modules
glossy shore
#

destructuring ought to be easy

sturdy sequoia
#

modules won't be too hard either imo

#

I have a pretty good idea of how to implement them relatively easily

tight glade
#

Doing so good @sturdy sequoia

glossy shore
sturdy sequoia
glossy shore
#

good.

sturdy sequoia
#

I think it's error prone but I'm curious what you think πŸ˜‰

glossy shore
#

yeah mostly error prone

sturdy sequoia
#

mind you #include will work the same as before

#

just #import is changing

glossy shore
#

mhm

sturdy sequoia
#

(for now at least)

glossy shore
#

module imports just feel like they should be analogous to Rust use

#

if I write if cond { import "mod.typ"; } what's the scope of the import? Just the conditional?

sturdy sequoia
glossy shore
#

is that already the case?

glad urchin
#

As in being able to conditionally import?

#

Or being able to provide a "dynamic string" to imports

glossy shore
#

dynamic string is what I meant

sturdy sequoia
#

you will still be able to conditionally import

sturdy sequoia
#

Import and Include are in

#

just destructure left πŸŽ‰

feral imp
#

If you finish today, and it changes performance by +-20% on thesis, we'll need to go on voice chat and celebrate/rant.....................

sly pecan
#

10x slower

sturdy sequoia
#

Stop being mean 😭

sturdy sequoia
#

then debugging

sly pecan
sturdy sequoia
#

Destructuring is in

#

Destructuring works 😎

#

For those wondering, without the doc (there is some but very sparse), the compiler and executors are close to 6k lines long

#

Poor @left night

#

With the doc and cleanup it'll likely be in the neighborhood of 10k lines

#

πŸ˜„

#

The first ever document compiled with the new eval

#

Ok so it's very picky about failing to remap registers

#

😐

#
= Hello, world!

#show "Lorem": text(fill: red, "Lo")

#lorem(30)
#lorem(30)
#lorem(30)
#lorem(30)
#lorem(30)
#

This uses too many registers πŸ˜‚

#

I figured out why

sturdy sequoia
#

It's 450 times slower πŸ˜‚

#

Probably because of compilation lol

#

IT'S FASTER

#

@feral imp @sly pecan IT IS FASTER

feral imp
#

thank god.

sturdy sequoia
#

(I had forgotten to memoize closure calls πŸ’€)

feral imp
#

we were literarly holding our breath.

sturdy sequoia
#

Now onto fix the bug so that I can test it on masterproef 😎

glad urchin
feral imp
glad urchin
#

scare away typsters with this simple trick

sturdy sequoia
glad urchin
#

wa

#

what didy ou send then

sturdy sequoia
glad urchin
#

ohok

sturdy sequoia
#

I think tonight masterproef will compile using the new compiler πŸ˜„

feral imp
#

For 10k loc I bet laurmaedje needs it to be a bit fast.

sturdy sequoia
#

Ok, it's a lot less than I thought

glad urchin
#

im more impressed that you did this in like a few days

#

:p

#

i dont have that much free time πŸ˜‚

sturdy sequoia
#

THERE ARE NO MORE todo!()

tight glade
#

!!!! ❀️❀️❀️

#

The guy is blazing fast

#

By the thesis!

sturdy sequoia
#

I mean there are bugs, I am now fixing the tests πŸ˜„

#

Actually most of the failed tests are just slight span differences

feral imp
#

waiting intensifies for Goossa; Will typst performance exceed what Man has yet to witness?

low sapphire
#

@sturdy sequoia have you already submitted your thesis or is optimizing Typst just part of your procrastination? xD

sturdy sequoia
low sapphire
#

ok haha nice πŸ˜„

lunar kettle
#

@sturdy sequoia where are the numberssss

#

how much faster/slower on your thesis?

feral imp
glad urchin
glad urchin
#

But maybe today? :p

lunar kettle
#

Dherse stands to his words except when it comes to gradients

glad urchin
#

Well "tonight" has just started in their timezone I believe

#

So let's give Dherse some time πŸ˜‚

sly pecan
sturdy sequoia
#

but I'll work on improving it!

sly pecan
tight glade
#

I love seeing everyone so invested in your work Dherse 😁❀️

glossy shore
#

it's a good investment

sturdy sequoia
sly pecan
#

So what does having bigger spans mean?

sturdy sequoia
#

so instead of this it might be this. that gets globbed

#

obviously not great but I plan on improving upon that later

sly pecan
#

Does it have consequence for the actual output?

sturdy sequoia
#

but it might affect preview <-> source click (in the webapp)

#

and it makes most tests fail

sly pecan
#

How fast is your thesis?

#

Most important question

sturdy sequoia
#

I don't know yet, I am tracking down two bugs that prevent me from compiling it

#

and I took a long break

sturdy sequoia
#

Ok so there's still something wrong with scopes

#

and methods will be the death of me

#

they're parsed so... weirdly

feral imp
#

time to.. rewrite typst?

low sapphire
#

rewrite it in Zig

sturdy sequoia
sturdy sequoia
#

Ok so, doing register optimization S-U-C-K-S

#

I am going for another method of re-using registers

sturdy sequoia
#

I'm happy to say that this new method shaved a good 2k lines from the total

west light
glad urchin
west light
sturdy sequoia
#

I am building a typst VM for faster eval

#

and it uses registers

#

πŸ˜‚

west light
# sturdy sequoia and it uses registers

So it uses custom instructions and virtual registers…. And you’re working on jit optimization? … mostly guessing. How wide are the registers, how many are used?

cunning wadi
#

This does not include a JIT

#

Register optimization is simply the process of choosing the best registers so that as little moves or interactions with the stack as possible have to occur

#

It's 32 registers. Their width I don't know

#

But the registers aren't simply 64 bit numbers like in hardware, but complex values

#

At least I assume as much

west light
sly pecan
cunning wadi
#

It just needs to check things like whether the value of one register is still being required

sturdy sequoia
cunning wadi
#

Or it could determine that one register should be used for something else instead and the existing value can simply be dropped onto the stack in the meantime

west light
glad urchin
sturdy sequoia
#

I mean we have more than one

#

but they're for very specific situations

#

for example we have an iterator stack to store rust native iterators in loops

#

since typst doesn't have iterators

cunning wadi
#

What happens when a function is called?

#

Where does the context of the calling function end up

sturdy sequoia
#

the reason why it spins up a new VM is because the calls get memoized

#

and the VMs are dirt cheap to initialize since they mostly borrow values

cunning wadi
#

Huh

sturdy sequoia
#

So it's not as fast as possible but that isn't the goal either

cunning wadi
#

Do the arguments not end up in specific registers but instead some special value then?

sturdy sequoia
#

I plan on replacing locals and args with registers but I haven't done it yet

west light
cunning wadi
#

What happens if a function takes more than 32 arguments?

sturdy sequoia
sturdy sequoia
#

There is no stack πŸ˜‚

#

Mind you you can call a function with more than 32 args

glad urchin
#

crash and burn?

west light
#

Is a dictionary one argument?

sturdy sequoia
sturdy sequoia
cunning wadi
#

That was my actual question

sturdy sequoia
#

When a function is called, it loads the arguments into registers on use

cunning wadi
#

Oh I see

sturdy sequoia
#

So you can have as many args as you want

west light
#

I’m just guessing the arguments are by pointer and that there typically just one Args item

sturdy sequoia
west light
#

Or perhaps nothing.

sturdy sequoia
west light
#

So what do the other 31 registers do? Do you mark them as in use and just allocate deallocate them based on need?

sturdy sequoia
#

so you always have 32 registers

#

it's only when you're calling a function that it uses the Args data structure

#

Ok, methods are finally fixed afaik

#

Now all that's left are the scopes being a bit borked yet again

#

but I'll fix that next time I have time to work on it

#

I mean I did figure it out: (noting here for future me)
It cannot capture from the parent's parent's scope. This means that I need to create a bigger hierarchy of captures and be able to do recursive capturing, not particularly hard but I need to do it

tight glade
sturdy sequoia
#

I fixed the scoping issues

#

Now I have other issues, but I'm making progress πŸ˜‚

sturdy sequoia
glad urchin
sturdy sequoia
glad urchin
#

i can understand that

untold turret
sturdy sequoia
#

who cares that I have a technical interview later today πŸ˜‚

untold turret
#

Very charmπŸ”₯

sturdy sequoia
#

I am wayyyyy too junior for this position

#

I almost feel bad

sly pecan
#

Fake it till you make it

sturdy sequoia
#

BTW, removing the instruction deduplication almost halfed rust's compile time πŸ’€

#

So it was definitely worth it

#

And I checked another bytecode VM and they do it the way I am now doing it so I'm not re-inventing the wheel

tight glade
sturdy sequoia
sturdy sequoia
tight glade
#

No you can't, because you're obsessed

sturdy sequoia
#

Could've slept some more tho

tight glade
#

Wonderful ❀️❀️❀️

#

You do godly work that's why

sturdy sequoia
#

So now there is something wrong with conditions producing a boolean angryeyes

#

I figured it out

#

But now I really must prepare my interview πŸ˜‚

sturdy sequoia
#

Next issue: some captures don't work for reasons unknown 😐

tight glade
sturdy sequoia
tight glade
#

😎 😎 😎 😎 😎 😎 😎 😎 😎 😎 😎

sturdy sequoia
#

no no

cunning wadi
sturdy sequoia
sturdy sequoia
#

Ok, so, it can compiler: the preface and the introduction

#

getting there!

#

Chapter 1 compiles too 😎

#

Now tablex doesn't compile 😭

#

@glad urchin see what you're doing to me!

glad urchin
sturdy sequoia
#

Ok, it's improving

sturdy sequoia
#

I fixed variable shadowing

#

and now I'm off to bed

#

😴

sturdy sequoia
#

It almost compiles the thesis now

#

it fails many iterations in

#

I think my code just found a bug in queries

#

🧐

sturdy sequoia
#

@tight glade @sly pecan @glad urchin @left night @feral imp I can now confirm that the new evaluation system is significantly faster, I don't know yet exactly how much faster (I need to do more testing) but at least 3x faster even when taking into account the extra compilation that needs to be performed before eval πŸŽ‰

#

πŸš€

#

And I think that with borrowing and a few other optimizations it might be even faster πŸ˜„

sly pecan
#

I assume compilation as a whole isn't 3x faster

sturdy sequoia
#

πŸ’€

#

Ok, so using the handy-dandy --timings arguments:

  • Before rework: 5.7s of which 1.2s is layout => eval is 4.5s
  • After rework: 2.33s of which 1.2s is layout => eval is 1.13s
#

Meaning a speedup of almost 4x in eval

sly pecan
#

is that your thesis?

sturdy sequoia
#

These are obviously very rough numbers because compilation is cached across invokations

sturdy sequoia
#

My thesis still doesn't compile, but I don't know what the bug is... YET!

feral imp
#

4x is nuts.

#

I'd have said 20%-80% percent would be worth it!

sturdy sequoia
#

@left night I do think I will end up rewriting it as an array of structs because currently I am quite limited in doing things like "either a register, a constant, a local, or an arg" which leads to way more instructions than needed

#

Instead I could just have a system like OpCode(Flags) being 16 bits where the Opcode is 8 bits and the flags are 8 bits indicating how the arguments work

#

I think that could lead to much faster execution and less instructions overall

low sapphire
sturdy sequoia
#

like you would see using cetz or tablex, or anything that does data processing in general

low sapphire
#

ah okay

atomic violet
sturdy sequoia
#

I'm tired

atomic violet
#

is there a reason for it though other than some instructions possibly being longer than they should be?

sturdy sequoia
#

It saves cache utilization (for the same reason)

#

It can be faster

atomic violet
#

well, it can, but you will need to measure it

sturdy sequoia
#

It can save tons of instructions by giving me more freedom in how I am "crafting" the instructions by being able to have more than one value

sturdy sequoia
atomic violet
#

the problem with SoA is that it's harder to support, and here I don't see much reason to use it

#

instructions are executed in sequential order anyway, so you will get good cache occupancy thanks to prefetching either way

sturdy sequoia
atomic violet
#

even if instructions are like 32 bytes long

sturdy sequoia
atomic violet
#

well, L1 is 32 kb anyway - that's pretty big, enough to fit every hotspot for sure

sturdy sequoia
atomic violet
#

thing is, SoA is usually implemented if you can take the advantage of parallelism, because you will be able to work with multiple objects anyway thanks to SIMD

#

and here it is not the case

#

well

#

unless you are going to implement some kind of SIMD instruction parsing...

atomic violet
#

which would be kind of cool but also too cool

sturdy sequoia
#

OUT OF ORDER EXECUTION

#

πŸ˜‚

atomic violet
#

lmao

#

anyway, you can try, it would be interesting to see the difference for sure

#

how are constants stored? is it like a global (for every function/block of code/something) array which you can index within some instructions?

sturdy sequoia
atomic violet
#

ok, makes sense... I was going to say that if you want to improve code cache occupancy it might be worth considering improving cache occupancy of something else instead (making layout of everything more predictable will make predictable layout of code even more predictable), but I suppose it's not as simple as it may seemπŸ€”

#

anyway, do your thing and try to compensate for carbon emissions of your compilations, I suppose πŸ˜…

sturdy sequoia
#

What I want is mostly to get a first draft that works, and then try different improvements mostly

sturdy sequoia
left night
#

@sturdy sequoia fun memories. this was the "first" content rework, which moved from an enum to something pretty close to what we have now (but obviously with way less features). the dynamic Attr thing only came later. this goes to show how far-reaching a single bad design decision is.

sturdy sequoia
tight glade
#

What's the bad design decision here?

sturdy sequoia
#

the old Content sytem with a EcoVec<Attr> to store fields

tight glade
#

Hmmm πŸ€”

sly pecan
untold turret
sturdy sequoia
#

But eval speed seems to be really improved

#

incrementally it should be even higher 🀞

#

I'm still having somes issues in tablex and running tests is actually surprisingly hard (because spans don't match for... reasons)

untold turret
#

Have you already applied comemo, or this is a result of executing bytecode without caching.

sturdy sequoia
#

but there is no caching dedicated to compilation (yet)

untold turret
#

You have been announced 2x, 3x, 4x performance improvement again and again. There should be at one another science

sturdy sequoia
#

Small gains x infinity = big gains πŸ˜„

#

I think it's quite nice because I have also made bytecode compilmation part of the timings

glad urchin
#

πŸ˜‚

sturdy sequoia
glad urchin
sturdy sequoia
#

but eventually it will be of course

glad urchin
#

i see

untold turret
#

The thesis and, the table package.

sturdy sequoia
#

I also added module eval as a thing, I just need to add closure compilation

glad urchin
#

cuz i often see some talks about the idea of using a JIT but I think that's going perhaps way too far
some (comparatively) simple bytecode memoization could perhaps be more suitable

#

if that makes sense

sturdy sequoia
#

I generally agree, I think the gains we'll have here will already be awesome

untold turret
sturdy sequoia
glad urchin
#

53 GiB typst-target folder

sturdy sequoia
#

it's just a BUNCH of u16s

glad urchin
#

i mean what can i say

#

it would probably speed up compilation

#

who doesnt want that

sturdy sequoia
#

Really depends how long it takes to compile to bytecode

glad urchin
#

fair

sturdy sequoia
#

Like compiling most of the bytecode in my thesis takes around 20ms

#

and my thesis is long and includes tablex πŸ˜‚

glad urchin
#

sorry but 20ms is too long

#

i want 0

sturdy sequoia
#

and I think that parsing and compilation could be deferred to multiple threads 😎

cunning wadi
glad urchin
#

thanks

cunning wadi
#

happy to help πŸ‘

glad urchin
#

im gonna make a patch for typst 0.[version with bytecode].0 which just replaces all Compile implementations with a mock

#

that should help

untold turret
cunning wadi
#

from what I saw, yes

sturdy sequoia
#

but for bytecode then it's trivial because there are no pointers

#

everything is done with IDs into shared arrays

#

like ConstId for constants, StringID, etc.

untold turret
sturdy sequoia
#

I have tested it and it like halved the compilation time

cunning wadi
#

(doesn't have to be time specifically, could also just be counting number of instructions executed)

sturdy sequoia
#

That's why I've been thinking of "deferred" calls, where the value is only known when it's needed

#

But of course that would require plugins to be Send + Sync

#

Anyway, I'm off to bed ❀️

cunning wadi
#

that sounds like a good idea for me as well

untold turret
#

I think we will get furthermore improvement. But you are right it is other ways to improve and already fast enough even without persisting cache.

untold turret
onyx furnace
onyx furnace
onyx furnace
#

may i know which doc you use for testing? i guess thesis, tablex doc, cetz doc should be "eval-heavy" ones?

sturdy sequoia
sturdy sequoia
# onyx furnace the flamegraph looks amazing

yes I am soooooo happy about the --timings feature and how it came out. Thank you very very much to you and @untold turret for the idea of making it compatible with chrome traces! It's awesome πŸ˜„

left night
#

but overall yes

sturdy sequoia
left night
#

but we potentially also eval shared leafs twice unnecessarily

sturdy sequoia
left night
#

and parsing we can notably not do completely in parallel

#

since we obviously need to find those imports

sturdy sequoia
#

true true, but perhaps the parser could automatically add "static" imports to a big list of files that are "in the path" so-to-speak for parsing and compiling in a deferred way?

#

It's no big deal either way, it doesn't account for very much anyway

#

But I think it's a "low hanging fruit" for making eval faster

sturdy sequoia
#

@left night BTW I hope you don't mind that I am looking into eval but I like these "bigger" projects, I find them really rewarding to work on and with all of the reworks you're doing in other parts of the codebase I didn't want to cause trouble there so I figured eval was a good place to work on to cause as little friction as possible on your own work

#

It also has really motivated me to work on Typst again ❀️

#

Really sorry for being a bit harsh with the revoke thing the other day 😐

left night
#

Don't worry about it, it's already forgotten

#

It's very cool that you're taking this bytecode thing on

onyx furnace
sturdy sequoia
#

I wanted something that would really test eval more than anything else

sturdy sequoia
#

@left night are destructure patterns in for loop supposed to be defined within the parent's scope? like this:

for i in range(10) { ... }

// Is `i` valid here?
i
#

or should i remain scoped to the loop?

#

I think it should be the later but I'm curious what you think

feral imp
#

Former is how it is done in R, so.... I don't think so.

sturdy sequoia
feral imp
#

Off-topic.

sturdy sequoia
left night
sturdy sequoia
#

I'll fix that πŸ˜‰

untold turret
#

There may be some specification interpreter for typst, which never launch too many optimization. And the answer is its result.

sturdy sequoia
#

@left night what do you think of a macro like this for creating instructions:

#

isr! {
    Add -> Register | Local {
        lhs: Register | Constant | Local | Global | Param | Capture,
        rhs: Register | Constant | Local | Global | Param | Capture,
    } => |add| {
        std::ops::add(add.lhs(), add.rhs())
    }

    #[scope]
    Iter {
        value: Register | Constant | Local | Global | Param | Capture,
        target: Iterator,
    }

    #[flow]
    And {
        lhs: Register | Constant | Local | Global | Param | Capture,
        rhs: Register | Constant | Local | Global | Param | Capture,
        target: Jump,
    } => |and, cf| {
        let lhs = and.lhs().cast::<bool>()?;
        if !lhs {
            cf.jump(and.target())

            Value::None
        } else {
            let rhs = and.rhs().cast::<bool>()?;

            Value::Bool(lhs && rhs)
        }
    }

    #[flow]
    If {
        condition: Register | Constant | Local | Global | Param | Capture,
        then: Jump,
        else: Jump | None,
    } => |if, cf| {
        if if.condition().cast::<bool>()? {
            cf.jump(if.then())
        } else {
            cf.jump(if.else())
        }
    }

    #[flow]
    #[scope(consume)]
    Label {
        label: Jump,
        pop: bool,
    } => |label| {
        if label.pop() {
            label.scope().pop()
        }
    }

    #[flow]
    Next -> Register | Local {
        iter: Iterator,
        bottom: Jump,
    } => |next, cf| {
        if let Some(value) = next.iter().next() {
            value
        } else {
            cf.jump(next.bottom())
            Value::None
        }
    }
    #[flow]
    Jump {
        target: Jump,
    } => |jump, cf| {
        cf.jump(jump.target())
    }
    
    #[flow]
    Break {
        target: Jump,
    } => |break, cf| {
        // We want to go into breaking mode.
        cf.break(break.target())
    }
}
#

(obviously mockup code)

glad urchin
sturdy sequoia
#

all of the opcodes (as a repr(u8) enum), all of the bitflags for the different types that each argument can take, etc.

#

and builder structs for every single instruction to make creation nicer

tight glade
sturdy sequoia
#

so Add produces either a Register or a Local

#

and then when the closure is called, it would produce a Value that would automatically be stored in the right place

tight glade
#

Hmmm maybe i just don't have enough context to understand
But typically what is the right place

sturdy sequoia
#

my idea is that it can be done with the one instruction

tight glade
#

But don't we need to know which register or local?

sturdy sequoia
#

that's the annoying part right now

#

with this it could be made much easier

#

I would also write it in a binary format with dynamic length instructions

tight glade
#

I remain confused as to what happens precisely as a consequence of this macro invocation

I can only work with what i imagine is needed for your vm so I'd suggest choosing one of the simplest instructions and writing the non pseudo code version of the macro

sturdy sequoia
#

It would produce:

tight glade
#

Wouldn't dynamic length instructions complicate jumping z lot?

glad urchin
sturdy sequoia
#
  • A builder for each instruction
  • The Instruction enum
  • The Compiler struct
  • The Executor struct
  • An accessor for each instruction
  • The eval method on the executor
glad urchin
#

Maybe even a derive macro

sturdy sequoia
glad urchin
#

Yeah well i don't

#

😎

#

lol jk

#

But I mean I just think it's weird to generate the types as well, idk

#

Unless it would be really hard to generate the types by hand

#

Somehow

tight glade
#

Well i see an interest in grouping all that logic, would the macro be comprehensible by mere mortals? 😁

sturdy sequoia
tight glade
#

πŸ˜­πŸ˜‚

sturdy sequoia
#

but it would make reading the VM code much much easier

tight glade
#

Like how do you plan to handle dynamic length instructions?

sturdy sequoia
tight glade
#

Sounds like you maybe need zero abstraction types to handle that complexity and then combine these lower levels brick to build an system in which you can express the vm

sturdy sequoia
#

it would look like:

| 8-bit  | 8-bit  |     16-bit     | 8-bit  |     16-bit     | 8-bit  |     16-bit     |
|--------|--------|----------------|--------|----------------|--------|----------------|
| OpCode | Flag   |      Arg0      | Flag   |      Arg1      | Flag   |       Dest     |
#

The flag would tell it what the argument is, so if it's a register, a const id, a local id, etc.

tight glade
#

That should reduce the need for a macro then right?

sturdy sequoia
tight glade
#

Maybe simpler macros here can help? πŸ˜…πŸ˜‡

#

Seems like we're moving the complexity around πŸ˜…

sturdy sequoia
#

I mean I could do one macro per instruction and then a big macro that takes all of the other ones in

tight glade
#

πŸ€·β€β™€οΈ

sturdy sequoia
#

like:

instruction! {
  struct Add -> Register | Local {
    lhs: Register | Constant | Local | Global | Param | Capture,
    rhs: Register | Constant | Local | Global | Param | Capture,
  }
}
tight glade
#

That's like the same thing to be honest πŸ˜‚

sturdy sequoia
#

Then I can do something like:

impl Exec for Add<'_> {
  fn exec(self, cf: &mut ControlFlow) -> StrResult<...> {
    ops::add(self.lhs(), self.rhs()
  }
}
glad urchin
#

I think you can still use macros, i just think you don't need to make everything a macro

tight glade
#

Oh!

glad urchin
#

It would be much more readable too

sturdy sequoia
glad urchin
#

Huh?

sturdy sequoia
#

additionally, I also need to specify its output type

#

a derive macro cannot do that

#

an attribute macro could

glad urchin