Performance | Typst | Page 9

sturdy sequoia Apr 23, 2025, 2:14 PM

#

haha yes

#

Gotta start somewhere, the test suite is too large to effectively debug with initially

#

as I go on I'll use that but first I start simple

#

(remember that initially zero test pass, then 1k, then 2k, etc. it's not test-by-test thankfully)

placid pivot Apr 23, 2025, 2:35 PM

#

Good luck krkrkr

sturdy sequoia Apr 23, 2025, 3:00 PM

#

1008 passed, 1624 failed, 0 skipped
It's something

#

https://tenor.com/view/its-something-something-here-you-go-it's-something-its-something-funny-gif-1669161523447633417

Tenor

sly pecan Apr 23, 2025, 3:13 PM

#

Somehow I'm surprised typst has only 2632 tests

cunning wadi Apr 23, 2025, 3:14 PM

#

sly pecan Somehow I'm surprised typst has only 2632 tests

There are a bunch of tests that should probably be multiple

#

But also there are not enough tests

#

(There never are)

sturdy sequoia Apr 23, 2025, 3:24 PM

#

cunning wadi But also there are not enough tests

I am pretty sure that things like inspection are not tested or have very few tests and that's one of the biggest chance with the VM, it works completely differently 🙀

proud sandal Apr 23, 2025, 3:35 PM

#

No tests means you can silently break stuff and no one will know 😈

untold turret Apr 23, 2025, 5:15 PM

#

i didn't know I added 500+ tests for the language server. Adding tests is a heavy work tbh. In that meaning, 3000+ tests is huge. But the giant repo stories like rust and llvm do have much more insane number of tests.

feral imp Apr 23, 2025, 5:28 PM

#

I know you're tired of working on the VM, but the VM Chronicles that you've been posting here, have really been better than Netflix. I'll miss it when vm3d is merged. Hopefully laurmadje reviews it for a good few episodes.

sly pecan Apr 23, 2025, 5:30 PM

#

There'll be a 2 Virtual 4 Machine after

#

Or Virtual M4chine

#

Then Virtual Machine with the V stylized as a roman numeral

#

then VIrtual machine

sturdy sequoia Apr 23, 2025, 5:44 PM

#

As I am debugging I keep printing instructions to debug them, so I finally made them "pretty":

[ 0] AllocateArray { size_hint: 4 }
[ 1] ArrayPush { value: Int(1), span: Span(285158810540467) }
[ 2] ArrayPush { value: Int(2), span: Span(285609892233913) }
[ 3] ArrayPush { value: Int(3), span: Span(286060973927359) }
[ 4] ArrayPush { value: Int(1), span: Span(286512055620805) }
[ 5] Iter { iterable: Stack, range: PointerRange(6, 38), span: Span(284858089411503), content: false }
>>>[ 6] Next { span: Span(283730385177888), pattern: Single(MutAccess { head: 0, segments: [], span: Span(283730385177888) }) }
   [ 7] Eq { lhs: Slot(0), rhs: Int(1), span: Span(290069022724325) }
   [ 8] Conditional { condition: Stack, cond_span: Span(290069022724325), end_if: Pointer(15) }
   [ 9] Scoped { range: PointerRange(10, 14), span: Span(290200588218245), content: false, looping: false }
   >>>[10] AllocateArray { size_hint: 2 }
      [11] ArrayPush { value: Const(0), span: Span(290234654283635) }
      [12] ArrayPush { value: Const(1), span: Span(290241702435095) }
      [13] Join { value: Stack, span: Span(290229955515995) }
<<<[14] Jump { end: Pointer(25), span: Span(290017336280285) }
   [15] Eq { lhs: Slot(0), rhs: Int(2), span: Span(290527152569225) }
   [16] Conditional { condition: Stack, cond_span: Span(290527152569225), end_if: Pointer(23) }
   [17] Scoped { range: PointerRange(18, 22), span: Span(290592935316185), content: false, looping: false }
   >>>[18] AllocateArray { size_hint: 2 }
      [19] ArrayPush { value: Const(2), span: Span(290609968348880) }
      [20] ArrayPush { value: Const(3), span: Span(290613492424610) }
      [21] Join { value: Stack, span: Span(290607618965060) }
<<<[22] Jump { end: Pointer(25), span: Span(290501309347205) }
   [23] Scoped { range: PointerRange(24, 25), span: Span(290743295880665), content: false, looping: false }
   >>>[24] ContinueOp
<<<[25] FlowOp
   [26] Destructure { value: Stack, pattern: Items([Single(MutAccess { head: 1, segments: [], span: Span(289679025010205) }), Single(MutAccess { head: 2, segments: [], span: Span(289735410221885) })], Span(289641434869085)), span: Span(289538061981005) }
   [27] Scoped { range: PointerRange(28, 36), span: Span(295101402866839), content: true, looping: false }
   >>>[28] Join { value: Slot(1), span: Span(295204775754919) }
      [29] Join { value: Const(4), span: Span(295223570825479) }
      [30] Join { value: Const(5), span: Span(295242365896039) }
      [31] Join { value: Const(4), span: Span(295261160966599) }
      [32] Join { value: Slot(2), span: Span(295298751107719) }
      [33] Join { value: Const(4), span: Span(295317546178279) }
      [34] Join { value: Const(6), span: Span(295336341248839) }
      [35] Join { value: Const(4), span: Span(295355136319399) }
<<<[36] Join { value: Stack, span: Span(295101402866839) }
   [37] Jump { end: Pointer(6), span: Span(282828221790996) }
<<<[38] Join { value: Stack, span: Span(282828221790996) }

sturdy sequoia Apr 23, 2025, 5:45 PM

#

feral imp I know you're tired of working on the VM, but the VM Chronicles that you've been...

I think that VM3D might actually stand a chance of being merged

#

Like it's the best VM so far

#

Don't know about performance ||yet||, but in terms of quality for sure

untold turret Apr 23, 2025, 5:45 PM

#

I need a cool bytecode printer here

sturdy sequoia Apr 23, 2025, 5:46 PM

#

sly pecan then VIrtual machine

Ok but like, that's kind of awesome 😂

#

And once we get to episode 995 it'll be just VM

sturdy sequoia Apr 23, 2025, 5:47 PM

#

untold turret I need a cool bytecode printer here

It's not bytecode anymore, I am gonna be exploring multiple repr:

Arc<dyn Instruction>
&dyn Instruction
dyn Fn(...)
a giant enum (I'll have to write it 😭)

#

I am curious which performs better

#

Also, currently constants are indexed in a common array (for the currently compiled code), but I am thinking of just doing deduplication of constants and not storing IDs

untold turret Apr 23, 2025, 5:48 PM

#

why not enum Inst?

sturdy sequoia Apr 23, 2025, 5:48 PM

#

Also also, currently the VM uses a Readable enum for the different types of data (see below) and I might make it have a Const(Value) type.

#[derive(Debug, Clone, Copy, PartialEq, Hash)]
pub enum Readable {
    None,
    Stack,
    Auto,
    Slot(usize), // i.e read a local var
    Take(usize), // i.e take ownership of a local var
    Const(usize),
    Bool(bool),
    Int(i64),
    Float(Scalar),
    Std(usize),
    Math(usize),
}

sturdy sequoia Apr 23, 2025, 5:49 PM

#

untold turret why not `enum Inst`?

that's what I mean with the "giant enum"

untold turret Apr 23, 2025, 5:49 PM

#

oh I ignored it

sturdy sequoia Apr 23, 2025, 5:50 PM

#

for dev, the Arc<dyn Instruction> is by far the easiest

#

But it's probably garbage due to the two levels of indirection

#

also adds complexity in other places

#

Will probably make a macro for declaring the instructions again 🤷‍♂️

proud sandal Apr 23, 2025, 5:50 PM

#

sturdy sequoia But it's probably garbage due to the two levels of indirection

Two levels?

sturdy sequoia Apr 23, 2025, 5:50 PM

#

proud sandal Two levels?

It's a Vec<Arc<dyn ...>>

proud sandal Apr 23, 2025, 5:50 PM

#

Oh because of the dynamic dispatch

#

Oh

sturdy sequoia Apr 23, 2025, 5:51 PM

#

so one level from the vec, then one level to the arg,

#

then yeah actually

#

the dynamic dispatch too lol

proud sandal Apr 23, 2025, 5:51 PM

#

emoji_255

sturdy sequoia Apr 23, 2025, 5:51 PM

#

haha cache goes 💩

#

But the compiler of the new VM is heaps and heaps simpler

proud sandal Apr 23, 2025, 5:52 PM

#

I'll be taking notes when the PR goes live

sturdy sequoia Apr 23, 2025, 5:52 PM

#

just have these magic compiler.flow() calls everywhere which I H-A-T-E but Typst has very peculiar control flow semantics which I can't find a better way of doing in compiled code 🤷‍♂️

#

(it adds a special FlowOp if the previous instruction does not cause control flow)

#

and many instructions can have control flow (which wasn't the case with the old VM where there was only FlowOp)

#

so it might be a bit faster

untold turret Apr 23, 2025, 5:58 PM

#

sturdy sequoia But it's probably garbage due to the two levels of indirection

using arc dyn this doesn't look like to introduce a huge penalty on common PC, because cpus optimize and precisely predict such indirections. iot devices may hate it but this is not a first goal. but whether to use depends on whether it helps to build a full vm3d quickly.

sturdy sequoia Apr 23, 2025, 6:00 PM

#

untold turret using arc dyn this doesn't look like to introduce a huge penalty on common PC, b...

That's why I am planning on testing all the reprs, to measure it on both my MPB and my Desktop PC

untold turret Apr 23, 2025, 6:00 PM

#

this is actual the best plan.

#

😄

proud sandal Apr 23, 2025, 6:07 PM

#

could also try a mixture of both, having hot paths without indirection as enum members with an "unlikely" member that uses dyn Trait

#

which one is which is best found out on real documents

sturdy sequoia Apr 23, 2025, 6:34 PM

#

Set & Show rules are in!

sturdy sequoia Apr 23, 2025, 6:34 PM

#

proud sandal which one is which is best found out on real documents

that's actually a very good idea! Especially with "bigger" instructions

#

(i.e ones where the struct is big)

#

1831 passed, 801 failed, 0 skipped

#

https://tenor.com/view/im-almost-there-pinocchio-white-lies-liar-on-my-way-gif-22303615

Tenor

#

I'll add all of the math stuff now before it keeps bugging me lol

feral imp Apr 23, 2025, 6:39 PM

#

sturdy sequoia I'll add all of the math stuff now before it keeps bugging me lol

Do that! We merge at MIDNIGHT!

sturdy sequoia Apr 23, 2025, 8:42 PM

#

1836 passed, 796 failed, 0 skipped
I am very dissapointed in how few test adding math fixed

#

That means that most of math is broken lol

#

1900 passed, 732 failed, 0 skipped
Fixed one bug, just 732 more to go?

#

Should probably add context next

#

1983 passed, 649 failed, 0 skipped

#

Indeed, it helped

feral imp Apr 23, 2025, 8:52 PM

#

Oh boy. Midnight might have been too ambitious.

sturdy sequoia Apr 23, 2025, 8:53 PM

#

feral imp Oh boy. Midnight might have been too ambitious.

yes 😦

sturdy sequoia Apr 23, 2025, 8:53 PM

#

feral imp Oh boy. Midnight might have been too ambitious.

I mean, if every 10 minutes I make 50 more tests pass, then I just need 1h30 to finish

#

lol

sly pecan Apr 23, 2025, 8:54 PM

#

sturdy sequoia yes 😦

Didn't say which midnight

#

🧠

sturdy sequoia Apr 23, 2025, 8:55 PM

#

bigBrain

sly pecan Apr 23, 2025, 9:35 PM

#

@sturdy sequoia I'm curious, approximately how much of execution time is spent comparing hashes?

#

Not talking the vm specifically

#

Iirc a 128 bit hash is used. I'm wondering if there would be a minor performance gain in only comparing the first 64 bits unless a collision is detected

sturdy sequoia Apr 23, 2025, 9:42 PM

#

sly pecan Iirc a 128 bit hash is used. I'm wondering if there would be a minor performance...

There would likely already be a gain from just hashing to 64-bit instead of 128-bit

sly pecan Apr 23, 2025, 9:42 PM

#

sturdy sequoia There would likely already be a gain from just hashing to 64-bit instead of 128-...

You would still need to either hash to 128 bit or two different 64 bit hashes

#

I'm specifically talking about the comparison aspect

sturdy sequoia Apr 23, 2025, 9:43 PM

#

then two u64 would likely be slower

#

because SIMD

sly pecan Apr 23, 2025, 9:44 PM

#

You can treat a 128 bit hash as two 64 bit ones though. You would almost never have to check the other half of it

#

Man my phone's auto completion is driving me nuts

#

Does that make sense?

sturdy sequoia Apr 23, 2025, 9:53 PM

#

sly pecan Does that make sense?

Yes but that would be three instructions vs just one

#

so I doubt it would actually be faster

#

and since it's 128-bit, there should be no clock division on x86 (don't know on arm)

#

(and even the clock division would only be on Intel CPUs)

#

And actually that's only valid for AVX512 so it doesn't matter

#

I bet direct comp at 128-bit is faster

sly pecan Apr 23, 2025, 9:55 PM

#

sturdy sequoia Yes but that would be three instructions vs just one

Simd instructions are slower no?

sly pecan Apr 23, 2025, 9:55 PM

#

sturdy sequoia Yes but that would be three instructions vs just one

Why 3?

cunning wadi Apr 23, 2025, 9:58 PM

#

sly pecan You can treat a 128 bit hash as two 64 bit ones though. You would almost never h...

that's basically what the compiler already does when comparing 128 integers

#

and well, it still checks the other half

#

but not doing this would be slower because conditionals are slower than just a simple xor and or

sturdy sequoia Apr 23, 2025, 10:02 PM

#

sly pecan Simd instructions are slower no?

not really no

#

not integer 128-bit, they are pretty darn optimized

#

but best way would be direct compare

sly pecan Apr 23, 2025, 10:05 PM

#

😭

cunning wadi Apr 23, 2025, 10:08 PM

#

cunning wadi that's basically what the compiler already does when comparing 128 integers

128 bit integers*

#

I'm keeping this error

sly pecan Apr 23, 2025, 10:16 PM

#

cunning wadi 128 bit integers*

Nah, just VLIW

left night Apr 24, 2025, 6:05 AM

#

sturdy sequoia There would likely already be a gain from just hashing to 64-bit instead of 128-...

that would result in too high of a collision probability

sturdy sequoia Apr 24, 2025, 7:39 AM

#

left night that would result in too high of a collision probability

I know but I meant that’s where the fais would be

#

Gains *

sturdy sequoia Apr 24, 2025, 8:34 AM

#

@left night In the new VM, I would like to dissalow the following:

#

Are you okay with that?

#

I can support it but it gets more complicated in the compiler & slower

#

?r

#import "@preview/codly:1.3.0": codly-init

#{codly-init = 5}

#codly-init

delicate fossilBOT Apr 24, 2025, 8:35 AM

#

sturdy sequoia Apr 24, 2025, 8:36 AM

#

Because imo this is very cursed

#

and imports & includes are already a major pain point of the VM

#

(note that I don't want to prevent shadowing, just overriding)

left night Apr 24, 2025, 9:15 AM

#

so basically you want imports to become const?

sturdy sequoia Apr 24, 2025, 9:15 AM

#

Yes

untold turret Apr 24, 2025, 9:16 AM

#

if you could i wish breaks continues and returns won't be able on any positions of expressions.

left night Apr 24, 2025, 9:16 AM

#

sturdy sequoia Are you okay with that?

I can't give a clear yes now, but I could imagine it

sturdy sequoia Apr 24, 2025, 9:18 AM

#

untold turret if you could i wish breaks continues and returns won't be able on any positions ...

I agree with you, but I am trying to break as little as possible, the only breaking change (so far) would be:

glob imports only work on "known" imports (i.e an import of just a string #import "hello.typ": * or an import based on an global variable #import calc: * but not dynamic imports like: #import "a" + "/b.typ": *), all other imports would need to be named
imported values would be constants

#

(the first is something that was discussed back in the VM 1.0 days already)

left night Apr 24, 2025, 9:21 AM

#

an import based on an global variable
and the global variable needs to be const as well?

#

is that the reason why imports need to be const?

sturdy sequoia Apr 24, 2025, 9:21 AM

#

left night > an import based on an global variable and the global variable needs to be cons...

it can be another imported value

left night Apr 24, 2025, 9:21 AM

#

so that import std.calc; import calc: * works?

sturdy sequoia Apr 24, 2025, 9:21 AM

#

left night so that `import std.calc; import calc: *` works?

yep basically

#

otherwise I need to keep track whether something has been modified

#

which increases complexity a ton

left night Apr 24, 2025, 9:22 AM

#

calc.push(..)

sturdy sequoia Apr 24, 2025, 9:22 AM

#

https://tenor.com/view/scares-me-i-fear-no-man-gif-21507033

Tenor

sturdy sequoia Apr 24, 2025, 9:22 AM

#

left night `calc.push(..)`

is this intended behavior? 😭

left night Apr 24, 2025, 9:23 AM

#

I mean, your approach in making it const fixes it

sturdy sequoia Apr 24, 2025, 9:23 AM

#

left night I mean, your approach in making it const fixes it

"fixes" it, I don't know if it needs fixing

#

I just find the idea of modifying imported variable potentially bad, but mostly it makes my life harder lol

untold turret Apr 24, 2025, 9:24 AM

#

What is a calc.push? Or it simply means a mutable call to some global array.

sturdy sequoia Apr 24, 2025, 9:24 AM

#

I was scared for a second that this would work lol

left night Apr 24, 2025, 9:24 AM

#

untold turret What is a `calc.push`? Or it simply means a mutable call to some global array.

It was just meant to illustrate that it's hard to tell whether a variables is potentially mutated

sturdy sequoia Apr 24, 2025, 9:25 AM

#

This would no longer be allowed essentially

untold turret Apr 24, 2025, 9:25 AM

#

do you value to introduce a const?

#

const let

sturdy sequoia Apr 24, 2025, 9:25 AM

#

No, in this case, imports would be constant

#

We could introduce consts too, but I am specifically talking about imports being enforced const for now

#

(again, trying to minimize breaking stuff to make merging easier)

left night Apr 24, 2025, 9:26 AM

#

not a fan of const let personally

sturdy sequoia Apr 24, 2025, 9:26 AM

#

I would prefer then const x = ... instead of const let but not sure it's needed for typst

left night Apr 24, 2025, 9:27 AM

#

I wish we didn't have mutable methods

#

then it would be trivial to analyze statically whether a variable will be mutated

sturdy sequoia Apr 24, 2025, 9:27 AM

#

left night I wish we didn't have mutable methods

same, the whole MutAccess thing is so painful

#

(it's the name in the VM code)

left night Apr 24, 2025, 9:28 AM

#

though if there is no mutable method name called on a variable and no assignment, we still know for sure it won't be mutated statically, right?

#

that could be useful for optimizations

untold turret Apr 24, 2025, 9:28 AM

#

I am more a fan of val/var, so that all variables are on the "4-indent columns", but I didn't shopping the syntax but would like to encourage people use const let instead of mutable let.

sturdy sequoia Apr 24, 2025, 9:29 AM

#

left night though if there is no mutable method name called on a variable and no assignment...

yes

#

currently I am not using this property

#

but I don't know how to make use of that property (yet)

untold turret Apr 24, 2025, 9:49 AM

#

sturdy sequoia but I don't know *how* to make use of that property (yet)

I guess you will love constant function references, like:

#let f(x) = x
#(f = x => x + 1) // in current typst, function definitions are unfortunately mutable

sturdy sequoia Apr 24, 2025, 1:40 PM

#

untold turret I guess you will love constant function references, like: ```rs #let f(x) = x #(...

Currently they would still be mutable

#

I agree with you overall, but again the fewer breaking changes the VM introduces the more likely we are that Laurenz will merge 🫣

untold turret Apr 24, 2025, 2:09 PM

#

sturdy sequoia I agree with you overall, but again the fewer breaking changes the VM introduces...

No, I try to explain what laurmaedje meant. You could check that the references are constant and optimize accordingly.

though if there is no mutable method name called on a variable and no assignment, we still know for sure it won't be mutated statically, right? that could be useful for optimizations

sturdy sequoia Apr 24, 2025, 2:13 PM

#

untold turret No, I try to explain what laurmaedje meant. You could check that the references ...

Yes but what optimizations could we do ?

untold turret Apr 24, 2025, 2:15 PM

#

🐱 For example, the ultimate inline call optmizations. Not only reduce overhead of comemo, but also elmininate cost to boxing and unboxing arguments.

#

You can also prepare static references of these variables when compiling to avoid looking up scopes repeatedly.

proud sandal Apr 24, 2025, 2:16 PM

#

Most values are already boxed, but this could help reduce copies for arrays and maps with only one reference

untold turret Apr 24, 2025, 2:18 PM

#

proud sandal Most values are already boxed, but this could help reduce copies for arrays and ...

I believe most values are not boxed. enum Value lightly wraps integers, floats and many things, and only some typst types are wrapped in a Value::Dyn(Arc<dyn ..>)

proud sandal Apr 24, 2025, 2:45 PM

#

Ah right this is not a register VM

#

I forgot he's implementing a stack vm I was thinking if he had registers for passing arguments then it would be fine mostly

sturdy sequoia Apr 24, 2025, 5:09 PM

#

proud sandal I forgot he's implementing a stack vm I was thinking if he had registers for pas...

I do plan on redoing the argument system later on by instead passing the last n stack entries

#

But for args it mostly uses move semantics only copying local vars

#

The problem is that named variables always make things difficult, currently I have a MakeArgs instruction that collects the argument into one stack entry of type Value::Args but I am thinking of revamping it by "integrating" it into the FuncCall instruction instead, should make my life a bit easier 😆

#

And it removes an otherwise useless instruction

#

Ok, I removed the MarkArgs because I thought it was redundant, and removing it should improve performance (but I have not measured obviously)

#

Current state:

2003 passed, 629 failed, 0 skipped

#

Ok, last time I did not find a way of doing parallel module compilation

#

but this time I did!

#

I think that might unlock the performance to compensate for the initial compilation!

sturdy sequoia Apr 24, 2025, 6:05 PM

#

Ok, I am really proud that it actually works, it's a nightmare with lifetimes, but essentially by carrying a rayon::Scope along the entire chain, I can sort of bind the lifetime of the Engine to the lifetime of the scope and pass it around nicely with no lifetime issue, no unsafe nothing, and then I just need to collect the loaded module at the end of compilation (or when they're needed) and collect the sinks at the same time!

#

Brilliant I love it!

feral imp Apr 24, 2025, 6:20 PM

#

WE MERGE AT MIDNIGHT! 🤣

sturdy sequoia Apr 24, 2025, 7:04 PM

#

Forgive me for I have sinned 💀

#

I told you impors were tricky 😄

proud sandal Apr 24, 2025, 7:06 PM

#

what the

sturdy sequoia Apr 24, 2025, 7:06 PM

#

proud sandal what the

ASK LAURENZ IT'S HIS CODE!

#

I just intended it some more

#

Low key imports are responsible for like 1.5k LOC

proud sandal Apr 24, 2025, 7:07 PM

#

dedented some of it too

#

for good measure

sturdy sequoia Apr 24, 2025, 7:07 PM

#

My goals of a small VM were crushed by f-ing imports lol

sturdy sequoia Apr 24, 2025, 7:07 PM

#

proud sandal for good measure

I have line wrap enabled :-p

#

it's because I often write text in VSCode, so having line wrap helps make it more manageable

feral imp Apr 24, 2025, 7:12 PM

#

sturdy sequoia Forgive me for I have sinned 💀

Comical.

sturdy sequoia Apr 24, 2025, 7:12 PM

#

feral imp Comical.

Agreed lol

#

I just can't be bothered to make it clean for now

#

this is the code required to turn deferred constants (i.e constants from imported values) into actual "resolved" constants when compilation ends

sturdy sequoia Apr 24, 2025, 7:42 PM

#

Static (parallel) imports work

#

🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀 🚀

#

@left night I think that just for this, the VM might be worth it 😎

#

Added the compile timings

#

Note that this isn't a complete trace as it errors out, but it shows that it's using one thread per file! 🚀

left night Apr 24, 2025, 8:21 PM

#

very cool

glad urchin Apr 24, 2025, 10:23 PM

#

sturdy sequoia Note that this isn't a complete trace as it errors out, but it shows that it's u...

you might be cooking 👀

#

i do wonder how much % of a performance boost this might bring in practice, but i suppose this can be significant for eval-heavy crates

sturdy sequoia Apr 24, 2025, 10:27 PM

#

glad urchin i do wonder how much % of a performance boost this might bring in practice, but ...

Well the main gain here is that we know eval-ing bytecode is faster (as per VM1 & VM2) but we know that compiling to bytecode was a significant enough slowdown to basically kill any performance boost from eval in a real world doc

#

so my hope is that by parallelising the compilation & the evaluation of dependencies of the "main" file, we might run into a scenario where the compilation is short enough to have a significant overall gain

#

the neat thing is: it's very easy to disable parallelisation if I want to to test its realy impact (and I don't mean -j 1, I mean in the compiler itself)

#

It gives a lot of room for exploration of:

#

Instruction repr (enum vs dyn Trait vs mix of both)
Parallelization of compile & eval
Instruction architecture (large instruction vs Bytecode by comparing VM3D to VM2)

#

Stuff like that

#

it's a more "interesting" VM than the previous two

#

But imports are still an ungodly mess unfortunately 😐

#

Simply because there are two import paths: static (i.e I can know when compiling) and dynamic (i.e I can't know when compiling and resolve during eval)

#

Oh no, the VM is finding bugs in packages again 😭 (namely codly)

sturdy sequoia Apr 24, 2025, 11:16 PM

#

2238 passed, 394 failed, 0 skipped

#

Okay, that's enough for tonight

#

Getting there!

#

Just one final trace for tonight showing how it actually evals when it goes far enough!

#

And you can see how short the compile are compared to the eval for those that don't have huge dependencies (like the third green box from the top in the middle)

#

glob imports unfortunately greatly decrease this effictiveness since it requires waiting on the module to be evaluated (to be able to declare all of the imports)

#

but even in that case, we can benefit from parallelism

#

just glob imports = bad, good to know for codly, cetz, etc. if this thing ever gets merged

glad urchin Apr 24, 2025, 11:20 PM

#

sturdy sequoia glob imports unfortunately greatly decrease this effictiveness since it requires...

maybe you can go as far as possible before depending on a (potentially) imported variable?

sturdy sequoia Apr 24, 2025, 11:21 PM

#

glad urchin maybe you can go as far as possible before depending on a (potentially) imported...

that's exactly what I do, I actually "resolve" named imports once compilation is done

#

but for glob imports, I need to create the list of variables to declare them

#

and since I just have a * instead of a list of names

#

I have to busy wait for them to be declared 🤷‍♂️

glad urchin Apr 24, 2025, 11:22 PM

#

sturdy sequoia but for glob imports, I need to create the list of variables to declare them

not if you do it for every unknown variable

sturdy sequoia Apr 24, 2025, 11:22 PM

#

so:

#import "codly.typ": * => slowest
#import "codly.typ" => fast
#import "codly.typ": a, b, c => fast
#include "codly.typ" => fastest

sturdy sequoia Apr 24, 2025, 11:23 PM

#

glad urchin not if you do it for every unknown variable

Yes but then it either "delays" the error at the end, or just slightly delays when I need to wait for the import to be done (i.e when I encounter an unknown variable)

#

so I don't think the gains are that huge and we still benefit from parallel compilation mostly

#

one place where we always 100% benefit from parallel compilation is #include "file.typ" since (if the file is constant), it always resolves to a deferred constant!

#

Which I think is amazing, that means that a project like my thesis that does one include per chapter gets parallel eval "for free"

#

So if we assume that my 10 chapters have the same eval time, that divides the total eval time by > 10x (since I assume compilation + eval is faster than just eval which was the case for VM1 and VM2)

#

I am kind of hoping that for larger projects that leads to a 4M decrease of eval time where 4 is the fixed speedup from using the VM and M is the number of chapter files

#

note that it's eval time not compile time

glad urchin Apr 24, 2025, 11:29 PM

#

yeah thats one of the reasons why i was wondering about real %

sturdy sequoia Apr 24, 2025, 11:29 PM

#

glad urchin yeah thats one of the reasons why i was wondering about real %

Unfortunately that's basically impossible to measure :/

glad urchin Apr 24, 2025, 11:29 PM

#

layout can often take much much more time, especially with state and stuff

sturdy sequoia Apr 24, 2025, 11:30 PM

#

Yes, but think about it this way: every time you re-eval part of the doc to update things (due to state and stuff), you get a significant speedup

#

because the compiled bytecode is shared

#

so larger documents with many introspection loop iterations should benefit most (and eval heavy docs in general too)

#

As well as in watch mode: any closure, etc. should re-use the previous result

#

(that is not implemented yet however)

glad urchin Apr 24, 2025, 11:34 PM

#

sturdy sequoia so larger documents with many introspection loop iterations should benefit most ...

yeah i think this can help, no doubt. Let's see how it goes

#

The parallelization factor will definitely be critical here

sturdy sequoia Apr 25, 2025, 5:11 PM

#

The Thesis™️ compiles!

#

There are still bugs

#

but IT COMPILES

proud sandal Apr 25, 2025, 5:12 PM

#

Woooo

sturdy sequoia Apr 25, 2025, 5:14 PM

#

A flat 15% improvement on the thesis with zero optimizations so far

#

(cold compile)

#

Memory usage is about 10% lower too!

#

About 10% incremental too so far

#

(and again that's on masterproef which is not an eval-heavy doc)

#

Time to try the raytracer

#

well it doesn't compile (yet)

sly pecan Apr 25, 2025, 6:13 PM

#

sturdy sequoia A flat 15% improvement on the thesis with **zero** optimizations so far

Does it also skip 15% of the document? 🧠

sturdy sequoia Apr 25, 2025, 6:13 PM

#

sly pecan Does it also skip 15% of the document? 🧠

No the doc seems correct to my very trained eye

proud sandal Apr 25, 2025, 6:17 PM

#

Dherse checking his features:

#

👁️ 👄 👁️

feral imp Apr 25, 2025, 6:50 PM

#

Alright, I'm not going to say, we merge at midnight, because you're too close to the finish line. But I will say, this is epic, and I'm glad we get to follow along your vm experiments.

I'm glad this time around, you're seeing some improvements.

proud sandal Apr 25, 2025, 6:52 PM

#

We merge at midnight

sturdy sequoia Apr 25, 2025, 9:29 PM

#

proud sandal 👁️ 👄 👁️

I checked it again and it looks correct to me, I am sure there are some minute differences most likely, but none that are significant at least

#

Even the only eval-heavy part (which is a diagram) looks good

#

2274 passed, 358 failed, 0 skipped
Tests are slowly improving but a lot of the remaining ones are span mismatches because errors have the wrong span, just takes tedious fixing unfortunately

sturdy sequoia Apr 26, 2025, 11:09 AM

#

I also can't wait to test all of the different ideas I have to further improve performance, what I find reassuring is that compiling + eval is already quite a bit faster than old eval, but it can definitely be better

sturdy sequoia Apr 26, 2025, 1:34 PM

#

2281 passed, 351 failed, 0 skipped
Progress!

sly pecan Apr 26, 2025, 2:42 PM

#

sturdy sequoia > 2281 passed, 351 failed, 0 skipped Progress!

We merge at midnight

lofty turret Apr 26, 2025, 7:39 PM

#

glad urchin Apr 26, 2025, 8:07 PM

#

sturdy sequoia > 2281 passed, 351 failed, 0 skipped Progress!

remember that you're only done when you can run elembic's tests 😎

#

(please do though cuz im very curious about the potential speedup there, it is very eval-heavy lol, though probs not as much as cetz)

feral imp Apr 26, 2025, 8:11 PM

#

As always, I'm super duper curious about memory usage reduction... Cuz better performance is great, but if the memory is reduced greatly.. that's even better, even if perf gains are small.

untold turret Apr 27, 2025, 5:50 AM

#

glad urchin remember that you're only done when you can run elembic's tests 😎

do you have plan to make or are you interested in making "customed element" PRs that integrate elembic into rust typst 😀?

glad urchin Apr 27, 2025, 5:54 AM

#

untold turret do you have plan to make or are you interested in making "customed element" PRs ...

i mean, there is definitely a plan to bring its most important functionality (custom elements and types) to the core, no question about that

#

but whether i will be the one implementing it, or when , or in which way, that's not yet defined

#

what i can say though is that ive been working on elembic again

#

i want to publish it to the package manager soon (within 1-2 weeks)

#

so that will be interesting to see

#

will be of great help to see how far we can go with the idea at least, and see what would we want to actually bring to the compiler

untold turret Apr 27, 2025, 7:07 AM

#

glad urchin i mean, there is definitely a plan to bring its most important functionality (cu...

laurmaedje said it would give preasure to maintain a dashboard of plans, but I only would like to have some brief lists see how html export, customed elements, and other things on roadmap will go concretely. would be great even if we are updating the plan dashboards quarterly or even annually.

rust wg-async has some good list:
https://rust-lang.github.io/wg-async/vision/roadmap.html#overview

📅 Roadmap - wg-async

untold turret Apr 27, 2025, 7:25 AM

#

This extends too much probably. I only wondered when I saw elembic again: how many steps we should take to achieve custom elements, and how many steps we have achieved.

left night Apr 27, 2025, 7:39 AM

#

untold turret This extends too much probably. I only wondered when I saw elembic again: how ma...

There are still a few design blockers that need to be resolved

#

I'll post more about that in https://discord.com/channels/1054443721975922748/1175895383600275516

sturdy sequoia Apr 27, 2025, 8:28 AM

#

2303 passed, 329 failed, 0 skipped
Fixing the issue I noted in #discussions (which was a VM bug) improved tests by 22!

sturdy sequoia Apr 27, 2025, 8:28 AM

#

feral imp As always, I'm super duper curious about memory usage reduction... Cuz better pe...

It's about the same TBH

#

Just wanted to share some small performance results on masterproef:

-j 1 gives 15% improved cold compile times
-j 4 gives 15% improved cold compile times
-j 8 gives 12% improved cold compile times
What I find strange is that I expected the speedup to increase with higher thread count due to parallel compilation
(there were run with hyperfine)

#

To be fair, the difference in initial eval time (before layout) is:

-j 1: 152 ms
-j 8: 135 ms
Which is a surprisingly small improvement judging by the level of parallelism

#

It tells me something isn't going right with parallelism or the "joining" at the end of the compiler is very slow

#

could be both tbh

proud sandal Apr 27, 2025, 8:50 AM

#

Oh no it's going to be vm2.0 all over again

#

/jk

sturdy sequoia Apr 27, 2025, 8:50 AM

#

proud sandal Oh no it's going to be vm2.0 all over again

I mean the gains eval on a low-eval doc are 15%

#

so there's definitely hope for it

#

but I don't understand why parallel eval doesn't give a better return

sturdy sequoia Apr 27, 2025, 9:25 AM

#

Ok, I did some improvement to the joining logic which gives a 15% performance increase at -j 8 too!

sturdy sequoia Apr 27, 2025, 10:04 AM

#

To my surprise, using a big enum does not really improve performance 😮

sturdy sequoia Apr 27, 2025, 10:30 AM

#

Like after some layout optimization I get about 16% performance improvement vs 15% previously

#

so the gains are tiny

untold turret Apr 27, 2025, 11:05 AM

#

In most computers, the hardware will optimize virtual calls heavily and give similar overhead as making a big switch, i.e. both of them cost like a single branch instruction with knowing the branch target.

sturdy sequoia Apr 27, 2025, 11:06 AM

#

untold turret In most computers, the hardware will optimize virtual calls heavily and give sim...

Yes, I managed to get it to be slightly better but really like 1% of the overall exec time

#

What is still missing optimizations wise:

using PicoStr for arg & field names (super easy)
using Cow for values borrowed from the VM instead of systematically cloning (tedious but easy)
improving the repr of several instructions (i.e making them smol) and of the CompiledSource (hard-ish)
adding a new type of constant (globals) which are just &'static Value (easy)
adding the disabling of memoization for small function (especially now that eval is so cheap) (tedious)
adding the Readable::Move for values that are at the end of their lifetime (easy most code is there)

#

My goals are: a further 30-50% reduction in compile time (i.e AST -> Vec<Instructions>) and a further 30% decrease in eval time

#

at least those are my goals

#

My hope is that a doc like masterproef can reduce by 25% total compilation time (i.e complete runtime) and an eval heavy doc could be 2-4x faster hopefully. Don't have a sense for how much time it should reduce in incremental so will test later

untold turret Apr 27, 2025, 11:14 AM

#

adding the disabling of memoization for small function
Is that the reason you started vm3d? I was not actually getting that we gave up and we are now making VM great again.

sturdy sequoia Apr 27, 2025, 12:07 PM

#

untold turret > adding the disabling of memoization for small function Is that the reason you ...

Well my thinking is that if we saw a slight performance decrease (bad) for huge memory decrease (good), then with the VM being heaps faster, we might be able to get the huge memory decrease with no performance decrease hyperthink

sturdy sequoia Apr 27, 2025, 12:22 PM

#

I also compared main with my fork on my Desktop compute (7950x3D): (time ± stddev)

-j 1: 2% ± 2% faster
-j 4: 9% ± 3% faster
-j 8: 8% ± 3% faster
So the gains are much higher on my M4 Max MBP

#

it's also interesting that typst is much faster on my MBP than my Desktop

#

thonk2

#

I have a feeling it's a memory latency thing

sturdy sequoia Apr 27, 2025, 1:02 PM

#

The more I compare my MBP to my desktop the more I want to buy an M3 Ultra Mac Studio

feral imp Apr 27, 2025, 1:34 PM

#

Alright. Are you comparing linux to macos, or do you have stinky windows on your desktop? Cuz, I can tell you... Windows sucks.

sturdy sequoia Apr 27, 2025, 1:36 PM

#

feral imp Alright. Are you comparing linux to macos, or do you have stinky windows on your...

Windows 😭

feral imp Apr 27, 2025, 1:37 PM

#

Stop... Stop comparing, and go back to coding. Jesus Christ.... I have an airfrier in my kitchen that can beat any desktop with Windows on it..

sturdy sequoia Apr 27, 2025, 1:38 PM

#

feral imp Stop... Stop comparing, and go back to coding. Jesus Christ.... I have an airfri...

I'm doing both angryeyes

#

And building is so slooooooooooooooooooooooooooooooooooooooooooooooow

#

but I think it's more of a Windows issue than a my PC is slow issue

sly pecan Apr 27, 2025, 1:41 PM

#

feral imp Alright. Are you comparing linux to macos, or do you have stinky windows on your...

You know what sucks? Linux drivers 😛

feral imp Apr 27, 2025, 1:42 PM

#

sly pecan You know what sucks? Linux drivers 😛

That's why we go Mac now. It's done. The war is over.

sly pecan Apr 27, 2025, 1:43 PM

#

🙅‍♂️

sturdy sequoia Apr 27, 2025, 2:00 PM

#

Ok, so interning of arg names (at -j 8):

7950X3D: 8% -> 13%
MBP: 9% -> 12%
Note that the stddev is often ~3% so this is barely outside of the margin of error :/

#

At -j 1 on the MBP the gains are negligeable 15% -> 16% and on the desktop: almost none

#

But this is all on masterproef, gotta try on notes.typ, PgSuper's test doc, and raytracer.typ then do in-depth comparison

#

With enough simples to see the actual distributions

lost meteor Apr 28, 2025, 5:26 PM

#

sturdy sequoia I also compared main with my fork on my Desktop compute (7950x3D): (time ± stdd...

Might be a better data prefetching pipeline?

sturdy sequoia Apr 28, 2025, 5:31 PM

#

lost meteor Might be a better data prefetching pipeline?

Yes but then the amd cpu should be better no?

#

Like typst is almost 50% on my MBP

#

And even more so with the VM

sly pecan Apr 28, 2025, 5:35 PM

#

sturdy sequoia Ok, so interning of arg names (at `-j 8`): - 7950X3D: 8% -> 13% - MBP: 9% -> 12%...

Did you force it to run on the 3d cache ccd?

sturdy sequoia Apr 29, 2025, 5:16 PM

#

sly pecan Did you force it to run on the 3d cache ccd?

I didn't do any pinning 💀

#

Too lazy, especially on Windows

untold turret May 9, 2025, 4:47 AM

#

@sturdy sequoia I have a problem on profiling tinymist using typst-timing. cli resolves spans after each compilation, but tinymist cannot do with the same way because it runs typst tasks concurrently. Speaking two worlds X and Y run at the same time and have different content (revision) of a same source file, where a span of the same raw number may point to two physical file location. The recorded span from X and Y of the source file will be mixed together and I cannot resolve them by the correct world. Any idea? Btw, I have applied typst-timing to the analyzers and it works really well. typst-ide could also do that I think.

sly pecan Jun 15, 2025, 4:29 PM

#

https://github.com/andylokandy/smallbox is this something that could be useful for typst?

#

oh I see it was proposed once but was problematic because of unsafe use

sturdy sequoia Jun 19, 2025, 8:25 AM

#

untold turret <@130737672951037952> I have a problem on profiling tinymist using `typst-timing...

We could add a thread ID to the spans, that should resolve it

#

right?

untold turret Jun 19, 2025, 8:26 AM

#

sturdy sequoia We could add a thread ID to the spans, that should resolve it

maybe a compile task ID, because typst's layout can be dispatched to rayon threads.

sturdy sequoia Jun 19, 2025, 8:27 AM

#

untold turret maybe a compile task ID, because typst's layout can be dispatched to rayon threa...

Right, that might make sense

#

We could add a thread_local! for a task ID in Typst

#

But right now we do use thread IDs actually

#

Since we can see multithreaded traces from Typst rayon dispatch

#

so I am not sure what is the source of this issue in tinymist

untold turret Jun 19, 2025, 8:30 AM

#

sturdy sequoia so I am not sure what is the source of this issue in tinymist

I run multiple worlds that map same FileId to different content, which make span resolving problematic.

sturdy sequoia Jun 19, 2025, 8:31 AM

#

untold turret I run multiple worlds that map same FileId to different content, which make span...

Wow

#

Why?

#

But we could add a task_id in addition to the thread, but I don't think that would really solve the issue

#

I guess we could do one recording per "task"

untold turret Jun 19, 2025, 8:32 AM

#

For example, I compile thousands of docstrings in parallel and every docstring's main has file id main.typ

untold turret Jun 19, 2025, 8:37 AM

#

sturdy sequoia But we could add a `task_id` in addition to the thread, but I don't think that w...

I actual don't know how to introduce a "task_id" for typst::compile.

#

My current solution is simply try resolving otherwise giving up. This is not good but does almost work.

sly pecan Jun 26, 2025, 10:39 AM

#

@sturdy sequoia surely there's a pre-existing issue for https://github.com/typst/typst/issues/6508, but I can't find it. You would know I guess

sturdy sequoia Jun 26, 2025, 10:41 AM

#

sly pecan <@130737672951037952> surely there's a pre-existing issue for <https://github.co...

I don’t think there is, it has been discussed ad nauseum on the Discord, but I don’t think an issue has ever been opened. I would be very curious if they have a good example document for this. Because I couldn’t show a huge advantage overall. An advantage? Yes, especially on memory usage but not much on performance

#

#1176509648707256370 message

sly pecan Jun 26, 2025, 10:42 AM

#

There are some benchmarks there, but I didn't read it because LLM walls of text make me want to rip my eyes out

sturdy sequoia Jun 26, 2025, 10:43 AM

#

Is it an LLM wall of text?

lofty turret Jun 26, 2025, 10:43 AM

#

sly pecan There are some benchmarks there, but I didn't read it because LLM walls of text ...

Oh, I personally didn't get that impression (is it the en-dashes?)
-# seems like i need to re-tune my AI detector

sly pecan Jun 26, 2025, 10:44 AM

#

sturdy sequoia Is it an LLM wall of text?

Like 110%

sturdy sequoia Jun 26, 2025, 10:44 AM

#

Ok fair

#

But like all he shows in the benchmark seems to be amortisation of bigger function calls?

#

Is it something we didn’t already know?

sly pecan Jun 26, 2025, 10:47 AM

#

I think the main complaint here is memory usage. Not sure exactly what they claimed speedup is

sly pecan Jun 26, 2025, 10:49 AM

#

lofty turret Oh, I personally didn't get that impression (is it the en-dashes?) -# seems like...

I think it's been edited in part after, so it's not the raw response, but if it's not LLM then OP has gone to the chatgpt school of writing ™️

#

Anyway I could swear there was an issue, but I guess it's just been discussed to death here on discord

lofty turret Jun 26, 2025, 10:53 AM

#

sly pecan Anyway I could swear there was an issue, but I guess it's just been discussed to...

Maybe this issue? Though in this case it only applies to plugins: https://github.com/typst/typst/issues/2309

left night Jun 26, 2025, 11:02 AM

#

I didn't get the impression that it is written by LLM tbh

#

Either way, they did quite a bit of measurements, which I can appreciate

sly pecan Jun 26, 2025, 11:13 AM

#

I'm like McCarthy but for LLMs instead of communism

feral imp Jun 26, 2025, 11:35 AM

#

sly pecan <@130737672951037952> surely there's a pre-existing issue for <https://github.co...

That's one raaaaaad issue.

left night Jul 17, 2025, 2:28 PM

#

As some of you have also experienced, it's quite tricky to benchmark the Typst compiler because repeated runs have high variance in wall clock times (sometimes between 5-10%). This makes it really tricky to make any kind of statements about performance improvements or regressions as very few changes have such drastic impact.

This was really quite frustrating. So I looked at what the Rust project does to benchmark rustc and tried to replicate it by

renting a dedicated Hetzner server where we have 100% control over the OS
tweaking it to reduce noise as much as possible (no ASLR, no hyper-threading, core isolation, dylib interposition to override getrandom with a deterministic version); most of these tricks directly follow what Rust does
built some infrastructure to perform benchmarks and compare measurements (I've had some fun by building the comparisons via Typst-powered server-side rendering)

It's all still quite rough, but the results are very promising. Below, there are two runs of the exact same binary. As you can see clock time varies a bit, but the instruction count is really almost exactly the same. (Here, the difference is literally just 1. Typically it's a bit more, but not more than ~0.0005% variance.)

#

mortal bough Jul 17, 2025, 3:06 PM

#

Is this something that will eventually be part of CI in some way?

left night Jul 17, 2025, 3:10 PM

#

mortal bough Is this something that will eventually be part of CI in some way?

Likely in some capacity, but I don't think it would need to run on every PR

#

And if it should run on user PRs, I also need to make sure that it doesn't compromise the server

sturdy sequoia Jul 20, 2025, 7:18 PM

#

left night

Does it include export in different target? 😄

left night Jul 21, 2025, 7:12 AM

#

sturdy sequoia Does it include export in different target? 😄

Only PDF at the moment

sly pecan Jul 24, 2025, 10:46 AM

#

@left night cooking something in comemo-prescient?

left night Jul 24, 2025, 10:52 AM

#

sly pecan <@311948531835469827> cooking something in comemo-prescient?

comemo-prescient is a testbed to check what the theoretical limit for optimizing comemo is. it runs the compiler twice, first recording the exact the hit/miss sequence and the second time, it already knows exactly how it's gonna play out and does not need to do any comemo tracking (which has significant overhead).

#

I'm in the process of fixing https://github.com/typst/typst/issues/5220

sly pecan Jul 24, 2025, 10:53 AM

#

what do you mean by hit/miss sequence? As in what is actually reused versus not?

left night Jul 24, 2025, 10:56 AM

#

which calls are a cache hit vs not. in reality, comemo has to do a bunch of stuff to validate whether there is a cache entry that is a hit. in this branch it already knows whether it will be a hit or not as soon as it enters a memoized function call.

on main, the validation can take quadratic time (that's the issue). I've implemented something that can validate it in linear time with a tree data structure. this makes incremental somewhat faster on average and much faster in the cases where the issue comes into play, but it slightly regressed some clean compiles and I was trying to figure out why.

sly pecan Jul 24, 2025, 10:57 AM

#

That'e exciting. I think incremental compilation speed is more important than cold/clean, but regressions are obviously not ideal anyway

left night Jul 24, 2025, 10:58 AM

#

interestingly it seems to be a slight regression in instruction count, but not in wall clock time

#

at least it appears that way. the thing is, wall clock time is noiser so it's a bit harder to say for sure

#

it could be that the instructions used in the one case are more expensive

#

but optimizing for instruction count is much nicer as it's super stable

sly pecan Jul 24, 2025, 10:59 AM

#

This could differ from platform to platform too no?

#

based on what the bottleneck is

left night Jul 24, 2025, 10:59 AM

#

it could probably

#

but optimizing for the benchmark server is good enough for me and the rest will behave reasonably similar I think

#

how much all of this caching stuff matters also highly depends on the document

#

if it's mostly text, it doesn't matter much; if it uses a ton of introspection, comemo overhead starts taking up a fair chunk of the compile time

sly pecan Jul 24, 2025, 11:02 AM

#

As the package ecosystem grows and you have users blindly using a ton of them it's likely that even seemingly simple documents may have a lot of introspection

left night Jul 24, 2025, 11:02 AM

#

yeah

sly pecan Jul 24, 2025, 11:02 AM

#

I've seen some...creative solutions

left night Jul 24, 2025, 11:03 AM

#

at least, with the changes I'm implementing, performance will not randomly degrade quadratically

#

if you have some contextual construct that appears a few hundred times in your document, that can already turn quadratic cache validation into a significant amount of the incremental compile time

sturdy sequoia Jul 25, 2025, 3:05 PM

#

sly pecan I've seen some...creative solutions

Like inlining the entire package in one 1k+ LOC function? 😄

sturdy sequoia Jul 25, 2025, 3:05 PM

#

left night at least, with the changes I'm implementing, performance will not randomly degra...

Can I have a bit of a rundown? Just curious 😄

left night Jul 25, 2025, 3:53 PM

#

sturdy sequoia Can I have a bit of a rundown? Just curious 😄

the problem and solution are described in the https://discord.com/channels/1054443721975922748/1325795747165114479 forge if you read from the start. the forge isn't that long in total.

spiral current Jul 28, 2025, 5:39 PM

#

(Not sure if this is the right place) We have Rust code that uses Typst as a library and we recently implemented generating multiple pdf files in a loop from the same Typst code, with a different JSON file input for each iteration. We noticed that each iteration compiles significantly slower than the previous one. It turns out that adding comemo::evict(0); after each iteration fixes this, so it looks like Typst slows down once the cache grows beyond a certain threshold. Do you think this is related to what the quadratic cache validation posted above, or some other issue?

left night Jul 28, 2025, 6:07 PM

#

spiral current (Not sure if this is the right place) We have Rust code that uses Typst as a lib...

Quite likely. But luckily the quadratic cache issue will be gone soon and then you can test :)

spiral current Jul 28, 2025, 8:24 PM

#

Sounds good, thanks!

sturdy sequoia Aug 1, 2025, 11:42 AM

#

@left night Congrats on the performance improvements, on masterproef with -j 8 I am seeing around 4% improvement on my MBP, and in -j 1 I am not seeing improvements. This is on cold, but it's cool to see. I'll try watch mode later because I am going to the gym in five minutes 💪 lol

left night Aug 1, 2025, 12:09 PM

#

sturdy sequoia <@311948531835469827> Congrats on the performance improvements, on masterproef w...

I use masterproef in my benchmark setup actually and it's the document that performed worse with the comemo changes ^^

#

I assume the 4% are from faster hashing

#

In clean compiles, that is. In watch mode with -j 1, it was 15% faster.

sly pecan Aug 1, 2025, 12:17 PM

#

I'm guessing the exact numbers will depend on hardware

sturdy sequoia Aug 1, 2025, 1:16 PM

#

I feel like masterproef is too text heavy to benefit from these things as much as other more recent documents. In a way masterproef was written when typst was a lot more basic than it is now and with those limitations in mind (remember the whole 27 GiB of ram to compile thing 🤣)

#

So in a way I don’t find it surprising.

#

To be fair many of the features I contributed and other improvements to performance were tailored for masterproef so maybe it has just become a document for which typst is very heavily optimised.

sturdy sequoia Aug 1, 2025, 1:37 PM

#

@left night I did test watch mode and it does seem even 20% faster on masterproef! Really congratulations 🚀 I can't wait to test it with the vm3d too, I need to finish updating it sweatwipe

sly pecan Aug 3, 2025, 1:05 AM

#

https://github.com/orxfun/orx-parallel is this something that could be useful for typst?

GitHub

GitHub - orxfun/orx-parallel: A performant and configurable paralle...

A performant and configurable parallel computing library for computations defined as compositions of iterator methods. - orxfun/orx-parallel

#

Claims to be much faster than rayon

proud sandal Aug 3, 2025, 6:55 AM

#

Probably not, I don't think that rayons overhead currently contributes enough to warrant a switch

wanton meadow Aug 3, 2025, 8:29 AM

#

something that competes with rayon sounds very interesting in general

proud sandal Aug 3, 2025, 8:34 AM

#

The benchmarks seem interesting, they also noted that they don't use container-specific implementations, which is why some container-specific rayon implementations are necessarily faster. So we'd only benefit from a switch in cases where rayon itself is the bottleneck, as I understand it. My understand is that, since layout can be fairly expensive, the rayon overhead should be negligible.

sly pecan Aug 3, 2025, 10:10 AM

#

Maybe some things that aren't currently parallelized because the overhead would be too big?

#

I don't know. Just thought I'd mention it here

proud sandal Aug 3, 2025, 10:27 AM

#

I think trying it out is definitely a good idea

sturdy sequoia Aug 3, 2025, 1:04 PM

#

Trying it out is definitely worth it imo

#

But I don't think this'll make a huge difference

#

perhaps some of the Deferred<T> will be a bit faster with less overhead

#

but that's about it

sly pecan Aug 3, 2025, 1:21 PM

#

sturdy sequoia Trying it out is definitely worth it imo

The performance guru has spoken

sturdy sequoia Aug 3, 2025, 1:21 PM

#

sly pecan The performance guru has spoken

Well it's always worth it 😉

#

@silent I think that crown goes to @left night now with the comemo changes

#

nooooooo

#

I heck-ed it up, sorry Laurenz meant to silent ping

sly pecan Aug 3, 2025, 1:22 PM

#

You woke the sleeping dragon

cunning wadi Aug 4, 2025, 2:20 PM

#

Depending on if Typst hasher inputs tend to be big or small, https://github.com/hoxxep/rapidhash might improve performance a bit

GitHub

GitHub - hoxxep/rapidhash: An extremely fast, high-quality, non-cry...

An extremely fast, high-quality, non-cryptographic hash function. Platform independent compile-time and run-time hashing in rust. - hoxxep/rapidhash

#

Could perhaps be worth a shot

#

(Chromium and a few other well known projects use the C++ version of the same hash function)

buoyant lark Aug 4, 2025, 2:24 PM

#

Is this beyond what was done with #6678 last week? That changed the hash function to rustc-hash, a polynomial hash with inspiration from wyhash.

Seems like it'd be a marginal improvement within typst for bringing in a pretty new dependency with few users so far https://github.com/hoxxep/rapidhash?tab=readme-ov-file#benchmarks

cunning wadi Aug 4, 2025, 2:26 PM

#

buoyant lark Is this beyond what was done with [#6678](<https://github.com/typst/typst/pull/6...

That one's fxhash. There are benchmarks in the rapidhash repo comparing the two

sly pecan Aug 4, 2025, 2:26 PM

#

cunning wadi Depending on if Typst hasher inputs tend to be big or small, https://github.com/...

Maybe I'm misreading things, but that seems to be 64 bit only?

cunning wadi Aug 4, 2025, 2:27 PM

#

sly pecan Maybe I'm misreading things, but that seems to be 64 bit only?

According to the readme it's platform-independent

sly pecan Aug 4, 2025, 2:27 PM

#

cunning wadi According to the readme it's platform-independent

I mean the hash is 64 bit

#

Doesn't typst use 128?

wanton meadow Aug 4, 2025, 2:28 PM

#

it might be rustc-hash that should be the item to look at (for current typst git hasher) in the benchmarks from that github repo I guess

cunning wadi Aug 4, 2025, 2:29 PM

#

sly pecan I mean the hash is 64 bit

Not everywhere: https://github.com/typst/typst/pull/6678

GitHub

Use `rustc-hash` for hash maps and sets by laurmaedje · Pull Reque...

This gives 2-3% speedup on average documents. I also switched to fxhash in comemo (not yet part of this PR), which gives another 2-5% on top.

#

And with the comemo thing mentioned in the PR, I'm not sure it uses 128 bit hashes anywhere now

cunning wadi Aug 4, 2025, 2:30 PM

#

buoyant lark Is this beyond what was done with [#6678](<https://github.com/typst/typst/pull/6...

The performance improvements are mainly interesting if typst likes to hash large chunks of data

#

(Example: images)

buoyant lark Aug 4, 2025, 2:36 PM

#

I guess laurmaedje is the only person that really has the tools to analyze the real-world performance impact of this currently and would have to be the one to make the choice 😅

looks interesting certainly!

sly pecan Aug 4, 2025, 2:38 PM

#

cunning wadi And with the comemo thing mentioned in the PR, I'm not sure it uses 128 bit hash...

I thought 128 bit was crucial to avoid collisions?

#

Based on what I can tell (not a rust dev), comemo is still using 128 bit hashes

wanton meadow Aug 4, 2025, 2:41 PM

#

the Rust stdlib interface for hashing only works with u64 final values anyway, so that's also what a Rust HashMap uses, a 64-bit hash. The internal state of the hash algo could be bigger of course

#

siphash's internal state is 128 bits (Rust's default hasher)

#

oh 128 bit is the key size the internal state is bigger

#

It's hashes in two different places, that explains the collision of concepts

pub struct CacheData<C, Out> {
    /// Maps from hashes to memoized results.
    entries: FxHashMap<u128, Vec<CacheEntry<C, Out>>>,
}

The u128 value is a hash. The hash will be (64-bit) hashed to find its slot as a key in the hash map. That's the two layers.

cunning wadi Aug 4, 2025, 2:54 PM

#

wanton meadow It's hashes in two different places, that explains the collision of concepts ``...

I see, that's good to know and does indicate that rapidhash isn't the best choice there

wanton meadow Aug 4, 2025, 2:57 PM

#

I don't know the details but if the u128 value is already a good hash value, wouldn't the trivial hash function (truncate 128-bit to 64-bits) be faster and so on?

#

that's for hash map performance only

left night Aug 4, 2025, 2:57 PM

#

siphasher is for resistant hashes, rustc-hash for fast collision-is-ok hashes

left night Aug 4, 2025, 2:57 PM

#

wanton meadow I don't know the details but if the u128 value is already a good hash value, wou...

yeah probably

sly pecan Aug 4, 2025, 3:00 PM

#

wanton meadow I don't know the details but if the u128 value is already a good hash value, wou...

Whether or not the truncated value would be a good 64 bit hash would depend on the algorithm used for the 128 bit hash no?

#

I can't see that being automatic

left night Aug 4, 2025, 3:00 PM

#

but regardless the rustc-hash is also used in a few of other places in comemo

cunning wadi Aug 4, 2025, 3:00 PM

#

cunning wadi I see, that's good to know and does indicate that rapidhash isn't the best choic...

Though one interesting thing to note is that rapidhash passes all of SMHasher3, while fxhash doesn't

wanton meadow Aug 4, 2025, 3:02 PM

#

sly pecan Whether or not the truncated value would be a good 64 bit hash would depend on t...

then "good hash" in the conditional has to ensure all that 🙂

#

we think that siphash has very nice properties, it should be able to be truncated as you want

sturdy sequoia Aug 4, 2025, 6:37 PM

#

cunning wadi Depending on if Typst hasher inputs tend to be big or small, https://github.com/...

saw it as well lol

molten kayak Aug 5, 2025, 8:50 AM

#

wanton meadow It's hashes in two different places, that explains the collision of concepts ``...

I wonder why not use a passthrough hasher then, since the keys are already high quality hashes?

left night Aug 5, 2025, 9:20 AM

#

molten kayak I wonder why not use a passthrough hasher then, since the keys are already high ...

see here: #1176509648707256370 message

#

#1176509648707256370 message

#

the answer is: nobody did it

untold turret Aug 5, 2025, 9:29 AM

#

needs some testing. nohash-hasher sometimes makes accesses to hashmap slower for bad distribution but worth a try.

wanton meadow Aug 5, 2025, 9:29 AM

#

I did it now, just need to figure out how to benchmark it https://github.com/typst/comemo/pull/18

#

in this case the u128 is a siphash output so it should be at least as good as the hash was before (the change to rustc-hash)

left night Aug 5, 2025, 9:31 AM

#

wanton meadow I did it now, just need to figure out how to benchmark it https://github.com/typ...

if you give me a git commit hash of a [patch]ed typst version and the git commit hash of the baseline typst version, I can start a run

wanton meadow Aug 5, 2025, 9:43 AM

#

working on it

wanton meadow Aug 5, 2025, 9:46 AM

#

left night if you give me a git commit hash of a [patch]ed typst version and the git commit...

Then I suggest we test (new typst) 459a8654437902f68e2f753a171199c2b3482359 vs (main/previous commit typst) 6177c1b22df6d675646102247eb4382a3f1db60d
Maybe I should make a PR vs the typst repo so that the commit appears there and not just in my fork https://github.com/bluss/typst ?

left night Aug 5, 2025, 9:48 AM

#

wanton meadow Then I suggest we test (new typst) 459a8654437902f68e2f753a171199c2b3482359 vs...

no, it's fine

#

ok, benchmarks are queued, will take ~10-20mins

wanton meadow Aug 5, 2025, 9:52 AM

#

excited

lunar kettle Aug 5, 2025, 10:01 AM

#

is that benchmarking tool typst agnostic? could imagine that there are other uses cases where this could be useful 😄

#

say for example to benchmark a PDF renderer 👀

left night Aug 5, 2025, 10:03 AM

#

currently, it is fairly Typst specific

#

but it could be extended I guess

#

the denoised setup is not Typst specific, just the infra around it

left night Aug 5, 2025, 10:04 AM

#

wanton meadow <:excited:1165730189976809583>

unfortunately, it seems to barely make a difference (for Typst), just ~0.05% fewer instructions on average

wanton meadow Aug 5, 2025, 10:05 AM

#

oh ok, that was quick response. Did we do something wrong? It also replaced all the FxHashMaps except for this one: edges: FxHashMap<(InnerId, u128), NodeId> because it's the one with more than just u128 as a key

left night Aug 5, 2025, 10:07 AM

#

the diff looks correct to me

#

probably, it's just not a bottleneck anymore, at least for Typst

#

that raw gains in comemo could be different, though they're probably also not that large because comemo overhead is a considerable part of Typst compilation in some of the benchmark documents

wanton meadow Aug 5, 2025, 10:11 AM

#

aha, now we know... you don't have any microbenchmarks in the comemo repo either? Then you can't be misled by them 🧠

left night Aug 5, 2025, 10:12 AM

#

no, there are no comemo-specific benchmarks

#

but it's also not used by anybody notable except Typst afaik

wanton meadow Aug 5, 2025, 10:46 AM

#

if fxhash was a 2-5% win; i'd have a priori expected a similar or maybe 2% win with this too. I spot missing inline directives and the only thing that might have ruined the first try.
Is there a benchmark for typst or comemo I can run locally to verify a difference?

left night Aug 5, 2025, 11:00 AM

#

wanton meadow if fxhash was a 2-5% win; i'd have a priori expected a similar or maybe 2% win w...

For me, locally was always too noisy

#

Isn't #[inline] within a single crate not really needed anymore

#

But it's no problem, I can start another run if you'd like

wanton meadow Aug 5, 2025, 11:03 AM

#

I'm not sure, because for example CallSequence is generic, isn't it monomorphized into typst somewhere, and then it still becomes cross crate(?)

wanton meadow Aug 5, 2025, 11:10 AM

#

left night But it's no problem, I can start another run if you'd like

If you don't mind, the new typst commit is 8a7853cf85590c6aabb26d53d4789e9080376021 vs (main/previous commit typst) 6177c1b22df6d675646102247eb4382a3f1db60d (same as before)

#

just running it seems to be the easiest way to know

#

matklad sums it up https://matklad.github.io/2021/07/09/inline-in-rust.html "There’s a lot of tribal knowledge surrounding #[inline] attribute in Rust." 😆

Inline In Rust

There's a lot of tribal knowledge surrounding #[inline] attribute in Rust.
I often find myself teaching how it works, so I finally decided to write this down.

sturdy sequoia Aug 5, 2025, 12:17 PM

#

Well I mean, even then, using a passthrough hasher surely can't be a bad thing, right? It's not like it matters in the grand scheme of things?

#

(code/binary size wise I mean)

#

And I would expect the gains, small as they may be, to increase in watch mode

#

And I don't think the current benchmarking harness Laurenz has made does watch mode?

#

I wonder if a small one could be declared to avoid dependencies: one that just hash a u128 and turns it into XOR-ing the u64 pieces? 🤷‍♂️

wanton meadow Aug 5, 2025, 12:20 PM

#

the WIP one I did is just a file, it's like 15 lines of code and it truncates the u128 to 64-bits. So basically it's done like that already

#

would be more satisfying to be able to solve this one too FxHashMap<(InnerId, u128), NodeId> but it can't be a passthrough anyway so maybe it's not worth more custom code

#

(InnerId is a usize)

left night Aug 5, 2025, 12:24 PM

#

the #[inline] annotation shows ~0.000% difference

#

but thin-lto is also enabled on release builds

wanton meadow Aug 5, 2025, 12:24 PM

#

ok ok. thanks a lot for running it

#

thin lto seems to be enough to enable inlining anyway then (with codegen-units=1)

left night Aug 5, 2025, 12:28 PM

#

fwiw, I am not per se against this change even if the gains are little

left night Aug 5, 2025, 12:28 PM

#

sturdy sequoia And I don't think the current benchmarking harness Laurenz has made does watch m...

no, because I'm too lazy :(

#

also, the benchmarking harness uses only a small amount of documents

#

I'd kinda need a better collection

#

I was considering to ask people for them, but at the very least, it'd need some clarity on licensing

wanton meadow Aug 5, 2025, 12:29 PM

#

as always would be good to understand what's going on. When I'm looking in typst/typst (git) there are zero matches for "benchmark". It would be nice if there was a hint where to look for existing benchmarks, or which testcase to use

mortal bough Aug 5, 2025, 12:29 PM

#

Could you use random documents from users of typst.app?

sturdy sequoia Aug 5, 2025, 12:30 PM

#

left night also, the benchmarking harness uses only a small amount of documents

Could you share it with me the repo such that I could try and see if watch could be added? 😄

left night Aug 5, 2025, 12:30 PM

#

mortal bough Could you use random documents from users of typst.app?

That would be against our Terms of Service

sturdy sequoia Aug 5, 2025, 12:30 PM

#

And would be wildly problematic imo

left night Aug 5, 2025, 12:31 PM

#

sturdy sequoia Could you share it with me the repo such that I could try and see if watch could...

it's kinda held together by duct tape and doesn't really have a proper deployment stuff, it's literally a deploy.sh that needs your local shell to be able to ssh into root on the dedicated server

mortal bough Aug 5, 2025, 12:31 PM

#

sturdy sequoia And would be wildly problematic imo

I don't know... As long as the contents aren't published anywhere, I don't expect ful privacy given that documents are encrypted I think

wanton meadow Aug 5, 2025, 12:32 PM

#

what's the current pointer towards benchmarking if we want to do some experiments, find our own documents to test?

sturdy sequoia Aug 5, 2025, 12:33 PM

#

left night it's kinda held together by duct tape and doesn't really have a proper deploymen...

oh wow

sturdy sequoia Aug 5, 2025, 12:33 PM

#

mortal bough I don't know... As long as the contents aren't published anywhere, I don't expec...

I would rather see an opt-in system for this then. I definitely have documents I would be willing to share for the purpose of benchmarking

#

but definitely not opt-out

mortal bough Aug 5, 2025, 12:34 PM

#

There is already a Telemetry setting

#

It could be made document-wise

left night Aug 5, 2025, 12:35 PM

#

we can access document contents (both to support people and also because building a collaborative encrypted online document editor is actually super hard; I know of basically only one, by Proton), but do so only to support people and enforce the ToS

#

if it were encrypted to the point that we couldn't access it, we couldn't run benchmarks either, so I'm not sure I understand the idea

#

but anyway, I don't think it's necessary

#

just a few representative documents should be more than enough

sturdy sequoia Aug 5, 2025, 12:36 PM

#

Necessary? No, I don't think so either

left night Aug 5, 2025, 12:36 PM

#

I just want a public repo with a few test cases and with proper licenses on the documents

sturdy sequoia Aug 5, 2025, 12:36 PM

#

I know y'all use at least masterproef in it héhé, still the document for which Typst is the most optimized 😈

sturdy sequoia Aug 5, 2025, 12:37 PM

#

left night I just want a public repo with a few test cases and with proper licenses on the ...

I need to re-add one for masterproef, it used to have one then I accidentally yeeted it

left night Aug 5, 2025, 12:37 PM

#

masterproef repo doesn't have a license either, so I could not commit it and am probably already in violation for using it :(

sturdy sequoia Aug 5, 2025, 12:37 PM

#

left night masterproef repo doesn't have a license either, so I could not commit it and am ...

It used to have one in the repo

#

I need to upload one

#

or re-upload one

#

As far as I know the content is under some Creative Commons license by the uni

left night Aug 5, 2025, 12:37 PM

#

wanton meadow what's the current pointer towards benchmarking if we want to do some experiment...

there isn't really anything. I've only just set up the dedicated machine recently myself.

#

and unfortunately I don't really have the time currently to build it into a proper system usable by anyone

wanton meadow Aug 5, 2025, 12:40 PM

#

I was thinking it doesn't need to be as good as your main benchmarking setup. But if the repo of documents doesn't even exist publically, then that's how it is now

left night Aug 5, 2025, 12:40 PM

#

4/5 are public basically

#

or rather, can be

wanton meadow Aug 5, 2025, 12:40 PM

#

a benchmark corpus should preferably be stable so that you can compare versions. Then you need to port a little bit if there are breaking changes

sturdy sequoia Aug 5, 2025, 12:41 PM

#

Ok @left night crazy idea: spin up a QEMU emulated VM, and count instructions of just typst, that way it's a "pure" instruction count? Doesn't take into account the effect of memory layout - that we know are important, see Content Rework 2 - but at least it should give a rough idea. 😂 (it's horrible I know)

sturdy sequoia Aug 5, 2025, 12:41 PM

#

wanton meadow a benchmark corpus should preferably be stable so that you can compare versions....

I regularly update my thesis because it gets used for bencharmking, at least by myself

left night Aug 5, 2025, 12:41 PM

#

I have these basic cases: A cetz diagram, hello world, and a ton of lorem ipsum.

📎 basic-cases.zip

#

Then masterproef

#

And then one math-heavy document that I cannot share

#

That's it currently

left night Aug 5, 2025, 12:42 PM

#

sturdy sequoia Ok <@311948531835469827> crazy idea: spin up a QEMU emulated VM, and count instr...

I mean, that would maybe also work.

sturdy sequoia Aug 5, 2025, 12:44 PM

#

I have some slides I could share, I'd need to ask for permission but they have been presented outside of the company quite a few times before at events/conferences so it should be fine. (I'll ask on Thursday). They're nice and slow because of introspection catwow

sly pecan Aug 5, 2025, 12:46 PM

#

left night fwiw, I am not per se against this change even if the gains are little

Speaking of. Did you see the rayon alternative mentioned earlier?

sly pecan Aug 5, 2025, 12:49 PM

#

left night I have these basic cases: A cetz diagram, hello world, and a ton of lorem ipsum.

How about some of the more beefy package manuals?

left night Aug 5, 2025, 12:50 PM

#

sly pecan Speaking of. Did you see the rayon alternative mentioned earlier?

I didn't really look at it, but I have my doubts whether rayon is at all a bottleneck.

sly pecan Aug 5, 2025, 12:51 PM

#

The difference in performance seemed very substantial, but I guess it's very situational based on what the limiting factor is

thick solstice Aug 5, 2025, 12:59 PM

#

How powerful is the benchmarking server?

sturdy sequoia Aug 5, 2025, 1:01 PM

#

left night I didn't really look at it, but I have my doubts whether rayon is at all a bottl...

The codly one is rather chonky 😉

left night Aug 5, 2025, 1:02 PM

#

thick solstice How powerful is the benchmarking server?

https://www.hetzner.com/de/dedicated-rootserver/ax41-nvme

sturdy sequoia Aug 5, 2025, 1:18 PM

#

left night <https://www.hetzner.com/de/dedicated-rootserver/ax41-nvme>

Low key I thought you guys would've bought a base Mac Mini M1/2/3/4 and shoved it in a closet in your office/at home lol

sly pecan Aug 5, 2025, 1:32 PM

#

sturdy sequoia Low key I thought you guys would've bought a base Mac Mini M1/2/3/4 and shoved i...

I'm not sure how representative macs are for performance in general

sly pecan Aug 5, 2025, 1:39 PM

#

left night masterproef repo doesn't have a license either, so I could not commit it and am ...

Is CC-BY-SA alright, or would it need to be CC-BY?

sturdy sequoia Aug 5, 2025, 1:40 PM

#

sly pecan I'm not sure how representative macs are for performance in general

Surprisingly, compared to my 7950x3d, any change to instructions count leads to the same speedup (say 10% for both) but memory speedups are bigger on the mac (like cache efficiency stuff)

left night Aug 5, 2025, 1:46 PM

#

sly pecan Is CC-BY-SA alright, or would it need to be CC-BY?

both would work

#

I just want to be able to compile it and commit it to a repo

sly pecan Aug 12, 2025, 8:34 PM

#

https://www.reddit.com/r/rust/comments/1mo8z2c/derive_hash_fast_over_2x_faster_hashing_than/ is this interesting?

left night Aug 13, 2025, 5:53 AM

#

I think it could only be applied to a very small number of structs in Typst

sly pecan Aug 16, 2025, 5:46 PM

#

In guessing this could eventually substantially speed up the web app https://github.com/WebAssembly/wide-arithmetic/blob/main/proposals%2Fwide-arithmetic%2FOverview.md

left night Aug 17, 2025, 9:09 AM

#

Currently it's not worth the effort but at some point we could have separate wasm blobs for newer and older browsers.

sly pecan Aug 17, 2025, 11:10 AM

#

I don't think a single browser supports it yet anyway

sly pecan Aug 18, 2025, 9:51 PM

#

@left night do you know if your recent changes will speed up https://typst.app/universe/package/equate/ ?

left night Aug 19, 2025, 6:33 AM

#

sly pecan <@311948531835469827> do you know if your recent changes will speed up <https://...

No idea

rigid latch Aug 21, 2025, 6:30 AM

#

Oh, speeding up the equate package would be so great 😕

sly pecan Sep 4, 2025, 4:46 PM

#

How much time is spent on shaping in a typical document? Asking since harfrust is supposedly like 3 times faster than rustybuzz

left night Sep 4, 2025, 4:47 PM

#

sly pecan How much time is spent on shaping in a typical document? Asking since harfrust i...

Can you read minds?

#

Because I have literally today gotten an email from Behdad where he was inquiring about how much shaping is a bottleneck in Typst.

#

Copying a part of my response here:

On a long document that consists only of Lorem ipsum, just the shaping with rustybuzz (excluding shape plan creation) made up ~25% of total compile time (including PDF export which is about 1/3 of compilation there). For previews, PDF export is skipped, so there it would be a little more. This was with naive first-fit line breaking. With optimized line breaks, it would become less, at least as long as the line boundaries are safe to break.
On a document that uses more of Typst's other features, shaping quickly becomes a much smaller factor. On one thesis document I tried, it was only 3% of compile time (still with rustybuzz).

sly pecan Sep 4, 2025, 4:49 PM

#

left night Can you read minds?

Maybe

#

That should still be a decent performance boost for text heavy documents. Maybe also memory usage if that's improved?

#

But not game changing

sturdy sequoia Sep 7, 2025, 10:37 AM

#

left night Copying a part of my response here: - On a long document that consists only of L...

Is masterproef also in the 3% neighbourhood? I really can't recall, but I think it's a fair bit heavier isn't it?

left night Sep 8, 2025, 7:34 AM

#

sturdy sequoia Is masterproef also in the 3% neighbourhood? I really can't recall, but I think ...

masterproef was the 3% doc 😅

sturdy sequoia Sep 8, 2025, 8:27 AM

#

left night masterproef was the 3% doc 😅

Fair lol

sly pecan Sep 10, 2025, 3:49 PM

#

Harfrust is now only 16% slower than Harfbuzz based on the geometric mean of the latest update in their spreadsheet

#

Compared to rustybuzz which is like 3-4 times slower

sly pecan Sep 10, 2025, 4:12 PM

#

Apparently less even https://github.com/linebender/resvg/pull/922#issuecomment-3275567195

GitHub

Port resvg to `harfrust` and `fontations` by nicoburns · Pull Requ...

Changes made

Replaces rustybuzz with harfrust
Replaces ttf_parser with skrifa

Notes

This is failing 23 tests. These probably need to be manually checked. I'm waiting on Improve the test ...

minor kelp Oct 1, 2025, 9:33 PM

#

Was there a behchmarking script somewhere or am I misremembering?

buoyant lark Oct 1, 2025, 10:53 PM

#

IIRC the benchmarking stuff has more to do with steps towards consistent results than specialty stuff around Typst itself: #1176509648707256370 message. I don't think anyone outside of the team currently has that set up, but in the past they've helped get various patches tested #1176509648707256370 message

for less critical stuff / initial testing you can probably follow standard practices.

minor kelp Oct 1, 2025, 11:27 PM

#

The first link is probably what I had in mind. Well at least now I know what it was about, thanks

buoyant lark Oct 1, 2025, 11:33 PM

#

I mean there is also this #1176509648707256370 message (right below you can find some documents that are used for it), which I guess fits "benchmarking script" better, but its just a hacky interface to the former 😅
There are a couple of things users have tried in this thread but I dont think those have been really validated yet.

Wishing u the best with your shenanigans

spiral current Nov 7, 2025, 2:05 PM

#

Just rechecked our use case (#1176509648707256370 message) with Typst 0.14 and Comemo 0.5, unfortunately the issue still exists so we still have to evict comemo cache after each iteration. There is noticable increase in memory usage in both debug and release builds, and in debug builds the pdf generation still gets noticably slower after a few dozen iterations.

left night Nov 7, 2025, 2:12 PM

#

spiral current Just rechecked our use case (https://discord.com/channels/1054443721975922748/11...

Could you share a minimal reproduction?

spiral current Nov 7, 2025, 2:19 PM

#

Yes I can create one (but probably not very soon)

#Performance