Performance | Typst | Page 8

sturdy sequoia Nov 24, 2024, 6:10 PM

#

For PicoStr, could just say that the LSB/MSB determines whether it's a "const" string or not

#

But it's true that it's tricky

left night Nov 24, 2024, 6:11 PM

#

a global list is probably the only way

sturdy sequoia Nov 24, 2024, 6:11 PM

#

yep

#

the question is, when the user writes a string that is in the global list, how do we check that it's in the global list?

#

by hash I assume?

left night Nov 24, 2024, 6:11 PM

#

yeah

#

could even use perfect hashing

sturdy sequoia Nov 24, 2024, 6:11 PM

#

left night could even use perfect hashing

my beloved

left night Nov 24, 2024, 6:12 PM

#

though rustc I think just dumps it into the normal interner hash map in a Lazy

sturdy sequoia Nov 24, 2024, 6:15 PM

#

For performance, I also wonder how much adding a "flatten" pass to the AST would bring

left night Nov 24, 2024, 6:19 PM

#

sturdy sequoia For performance, I also wonder how much adding a "flatten" pass to the AST would...

it's tricky with closures

#

they keep around a strong ref to part of the AST and might outlive the source file (in the comemo cache)

sturdy sequoia Nov 24, 2024, 6:23 PM

#

Ah, I see

lost meteor Nov 24, 2024, 9:31 PM

#

sturdy sequoia For performance, I also wonder how much adding a "flatten" pass to the AST would...

What would you be thinking for that?

#

I've thought about ways to impl flattening just a little and haven't felt confident in anything

sturdy sequoia Nov 24, 2024, 9:32 PM

#

lost meteor What would you be thinking for that?

using IDs instead of storing nodes

#

and having all the nodesw in a bi array

lost meteor Nov 24, 2024, 9:32 PM

#

Also in particular with interactive use

#

binary array?

sturdy sequoia Nov 24, 2024, 9:35 PM

#

big array *

#

sorry was loading into an ow match lol

lost meteor Nov 24, 2024, 9:39 PM

#

lol. Almost had me getting flashbacks to implementing heaps for algo class

sturdy sequoia Nov 24, 2024, 9:41 PM

#

lost meteor lol. Almost had me getting flashbacks to implementing heaps for algo class

could you imagine, a binary tree for ASTs

#

that sounds kind of cursed

lost meteor Nov 24, 2024, 9:43 PM

#

It would either have godlike performance or completely destroy your cache hit rate and I'm not sure which it would be

sturdy sequoia Nov 24, 2024, 10:01 PM

#

lost meteor It would either have godlike performance or completely destroy your cache hit ra...

One of the two for sure 😂

left night Nov 27, 2024, 4:00 PM

#

@sturdy sequoia I've come up with a fairly nice approach for compile-time constant PicoStr. :)

The idea is as follows:

We increase the size of PicoStr to 64-bit (PicoStr = NonZeroU64)
We encode strings of length <=12 with the charset abcdefghijklmnopqrstuvwxyz-1234 directly in the number
Longer strings can be compile-time interned by adding them to an exception list
Runtime interning is also possible, as before

Compared to just listing all strings, the benefit is that most strings can just be written directly without touching the exception list. The small downside is 64-bit instead of 32-bit.

Here is an initial gist with an implementation. All of the decoding and exception handling is 100% const fn, no macros!

Notes:

I tried various exception lookup mechanism, but perfect hashing in const fn was a little much, so I opted for a simpler binary search.
I also tried more efficient number encodings (Huffman coding, Arithmetic coding), but they didn't really give much of a benefit over a simple 5-bit encoding, so I skipped that as well.

sturdy sequoia Nov 27, 2024, 4:03 PM

#

thonk

#

bigbrain

#

@left night Do you have an idea on how many built-in vars/fields/etc. that covers?

#

I would assume most of them?

left night Nov 27, 2024, 4:04 PM

#

I did check that and there were about 10 exceptions

#

Maybe 15

sturdy sequoia Nov 27, 2024, 4:04 PM

#

How do you distinguish between interned strings and exceptions?

left night Nov 27, 2024, 4:05 PM

#

index < exceptions.len() => exception
index > exceptions.len() => subtract exceptions.len() and use

sturdy sequoia Nov 27, 2024, 4:05 PM

#

I see, the 4 top bits

left night Nov 27, 2024, 4:05 PM

#

Top 1 bit is for bitcode vs indexed

#

And indexed covers both exception and runtime

sturdy sequoia Nov 27, 2024, 4:05 PM

#

Have you tried swapping the current impl with it and see if it degrades perf? (beyond any additional addition of strings)

left night Nov 27, 2024, 4:06 PM

#

I haven't, but I doubt it would affect anything much, since PicoStr is only used for Labels

#

Which are somewhat hot, but 32-bit vs 64-bit shouldn't that big of a deal

sturdy sequoia Nov 27, 2024, 4:06 PM

#

BTW for perfect hashing, there is a crate to generate it for you

left night Nov 27, 2024, 4:07 PM

#

But it doesn't work in const fn

sturdy sequoia Nov 27, 2024, 4:07 PM

#

Do you want me to try (after work ~1 hour) to see how it impacts fields too?

left night Nov 27, 2024, 4:07 PM

#

Probably best if I first make a PR, so that we don't need to integrate it twice

#

And then you can do some experiments :)

#

Though I guess it's mostly copy-paste

#

I didn't really write in Typst, it's a playground crate

#

My main motivation, tbh, is to be able to write struct HtmlTag(PicoStr) and have HtmlTag::H1 as a const 😂

sturdy sequoia Nov 27, 2024, 4:08 PM

#

In the meantime, I am working on my own cursed creations:

left night Nov 27, 2024, 4:09 PM

#

what the heck

sturdy sequoia Nov 27, 2024, 4:09 PM

#

😂

#

You have no idea how cursed code I am doing for work is

#

but it's fast 😎

left night Nov 27, 2024, 4:09 PM

#

well, at least that

#

I'm kind of into const fn now

sturdy sequoia Nov 27, 2024, 4:09 PM

#

It's for C++ <-> rust interop

left night Nov 27, 2024, 4:09 PM

#

It's quite limited, but super nice

#

If you're willing to walk the mile, you can do a ton of stuff

sturdy sequoia Nov 27, 2024, 4:10 PM

#

Yes, you can even do some basic hashing, which I am planning on using in my project to test if I need to regenerate files, etc.

#

Kind of like comemo but between invokation of my own cargo subcommand

left night Nov 27, 2024, 4:10 PM

#

ah interesting

#

I did write a mini-hashmap in earlier iteration of pico.rs (~50 lines of code), but I think it would have been slower than binary search

#

because it wasn't perfect hashing and linear probing could've taken a bit

sturdy sequoia Nov 27, 2024, 4:11 PM

#

it's true that it's linear probing of pointers :/

#

not of the direct values

#

(did someone say inline byte array with \0 to delimit strings?)

left night Nov 27, 2024, 4:12 PM

#

can build that in a const fn from the array!

#

the interesting part to me is that you can kind of by dynamic, but you need to do it in two steps

#

you can build an array of dynamic size by first computing its size (one const fn to sum the strings length + 0 bytes) and then another one that returns a [u8; THAT_LEN] which actually builds it.

#

it's interesting how you need to split it

sturdy sequoia Nov 27, 2024, 4:14 PM

#

you can do some nifty stuff indeed, not as powerful as zig unfortunately

left night Nov 27, 2024, 4:14 PM

#

yes, but more powerful than I thought previously

sturdy sequoia Nov 27, 2024, 4:15 PM

#

at work my solution is the following:

have proc macros that hide functions in the user's final library
generate (using rustc) a tiny executable that calls all of those functions and write to a file the serialized output
read those outputs
generate c++ code
bundle the rust library with the c++ code
???
Profit!

#

It allows me to get information you normally would not like the size, alignment, etc. of structs

#

It also fully works with feature flags, etc. 😎

left night Nov 27, 2024, 4:16 PM

#

kinda like wasm-bindgen I think

sturdy sequoia Nov 27, 2024, 4:16 PM

#

Probably, I took inspiration from a crate for building PostgreSQL plugins

#

Before I used a heap of build script and it was... messy

glad urchin Nov 27, 2024, 4:21 PM

#

left night I'm kind of into const fn now

FWIW the newer rust versions are getting lots of new const functions

#

mostly cuz &mut will work in const in the next rust version

#

iirc

#

and together with the new (already released) const { } syntax, const stuff in Rust is gonna go boom

sturdy sequoia Nov 27, 2024, 4:24 PM

#

glad urchin mostly cuz &mut will work in const in the next rust version

So good

left night Nov 27, 2024, 4:24 PM

#

tomorrow actually!

#

Rust 1.83

sturdy sequoia Nov 27, 2024, 4:26 PM

#

I'm curious what people will be able to do with it

#

Next, I want proc_macro_span to be stabilized 😭

left night Nov 27, 2024, 4:33 PM

#

sturdy sequoia Do you want me to try (after work ~1 hour) to see how it impacts fields too?

With fields we need to be somewhat careful. We don't want to accidentally intern a ton of dynamic field accesses but we want fast comparison.

sturdy sequoia Nov 27, 2024, 4:33 PM

#

Yes of course

left night Nov 27, 2024, 4:33 PM

#

How would you approach it?

sturdy sequoia Nov 27, 2024, 4:35 PM

#

I don’t know yet science

left night Nov 28, 2024, 11:17 AM

#

sturdy sequoia Do you want me to try (after work ~1 hour) to see how it impacts fields too?

PR is up: https://github.com/typst/typst/pull/5491

GitHub

Compile-time `PicoStr` interning by laurmaedje · Pull Request #5491...

This PR allows PicoStr (an interned string type) to be created at compile-time, so that it can be used in constants, match patterns, etc. My main motivation for this is to be able to have struct Ht...

sturdy sequoia Nov 28, 2024, 11:19 AM

#

left night PR is up: https://github.com/typst/typst/pull/5491

We should probably fuzz that btw

left night Nov 28, 2024, 11:22 AM

#

sturdy sequoia We should probably fuzz that btw

Don't question my code, it's flawless 😂

proud sandal Nov 28, 2024, 11:22 AM

#

me a few days ago after i got called for the production segfault

sturdy sequoia Nov 28, 2024, 11:22 AM

#

I won’t do it again 😅

proud sandal Nov 28, 2024, 11:22 AM

#

jk, it segfaulted in tests

left night Nov 28, 2024, 11:23 AM

#

sturdy sequoia We should probably fuzz that btw

We do have that fuzzing harness

#

So we can just let google do the fuzzing

#

isn't it lovely? that's all I wanted.

#

beautiful number comparisons at runtime

#

I guess it could've been an enum, but this is so much nicer

#

btw, did you guys know that html attributes may contain whitespace like the paragraph separator?

sturdy sequoia Nov 28, 2024, 12:12 PM

#

left night isn't it lovely? that's all I wanted.

That's indeed super cool 😮

sturdy sequoia Nov 28, 2024, 12:12 PM

#

left night btw, did you guys know that html attributes may contain whitespace like the para...

what do you mean?

#

like class = "Hello, World"?

left night Nov 28, 2024, 12:14 PM

#

sturdy sequoia what do you mean?

I mean my-attr="hello", but with the - being replaced by U+2029 PARAGRAPH SEPARATOR

sturdy sequoia Nov 28, 2024, 12:14 PM

#

What in heaven's name is that

#

🤢

left night Nov 28, 2024, 12:15 PM

#

Let's give it a try in Discord: my attr="hello"

#

Well yeah, more or less

proud sandal Nov 28, 2024, 12:15 PM

#

left night btw, did you guys know that html attributes may contain whitespace like the para...

:(

sturdy sequoia Nov 28, 2024, 12:15 PM

#

BTW for PicoStr, do you think we could use a faster hashing function?

#

would make interning a tiny bit faster

#

Also, using a const-constructable HashMap could avoid having the LazyLock which would likely make it faster too

left night Nov 28, 2024, 12:20 PM

#

yes, both things make sense

#

though I'm not sure how much dependency cost I'd be willing to pay to get them

sturdy sequoia Nov 28, 2024, 12:26 PM

#

I mean for the hashing, I think we already have some faster ones in it, don't we?

#

And for the HashMap I'd have to look, I don't know what's the state of const hashmaps

left night Nov 28, 2024, 12:30 PM

#

sturdy sequoia And for the HashMap I'd have to look, I don't know what's the state of const has...

actually std hash map is const if you provide your own hash builder!

#

it's probably just not const because of RandomState

sturdy sequoia Nov 28, 2024, 12:30 PM

#

But for this, we don't care about ahving a random state?

#

I don't think this is DDoS sensitive

#

Maybe I'm wrong

left night Nov 28, 2024, 12:30 PM

#

agreed

sturdy sequoia Nov 28, 2024, 12:30 PM

#

So it might be relatively easy overall

left night Nov 28, 2024, 12:31 PM

#

if you want to ddos yourself, go ahead and write range(10000000)

slim sequoia Nov 28, 2024, 12:36 PM

#

https://tenor.com/view/bocchi-the-rock-glitch-anime-bocchi-gif-27016262

Tenor

proud sandal Nov 28, 2024, 12:50 PM

#

Nerd

#

Uhm actually that's a DoS, not DDoS

left night Nov 28, 2024, 1:09 PM

#

proud sandal Uhm actually that's a DoS, not DDoS

I know and I don't care 🙃

#

cause DoS sounds dumb

feral imp Nov 28, 2024, 1:12 PM

#

You'd probably like it if it was D-o-S.

sturdy sequoia Nov 28, 2024, 1:20 PM

#

@left night couldn't PicoStr contain an inline string or a static pointer to a string?

#

that way resolving is faster?

#

using the LSBs to differentiate

#

since it's 8 bytes now anyway

#

Make the eq impl between &str and PicoStr trivial

#

No need to lock when resolving a pointer that way

left night Nov 28, 2024, 1:37 PM

#

sturdy sequoia <@311948531835469827> couldn't `PicoStr` contain an inline string or a static po...

The static pointer cannot be known at compile time

#

Preventing const fn

#

Maybe we can only use indices for the exceptions and a raw pointer for runtime. Though LSB doesn't directly work because of string's alignment being 1.

sly pecan Nov 28, 2024, 3:51 PM

#

Endgame is building all conceivable documents at compile time. Truly instant preview 🧠

sturdy sequoia Nov 28, 2024, 4:03 PM

#

left night _Maybe_ we can only use indices for the exceptions and a raw pointer for runtime...

ah shoot

sturdy sequoia Nov 28, 2024, 4:03 PM

#

sly pecan Endgame is building all conceivable documents at compile time. Truly instant pre...

duh!

#

We can even compute the derivative of an arbitrary function at compile time now

#

https://doc.rust-lang.org/nightly/std/autodiff/attr.autodiff.html

autodiff in std::autodiff - Rust

This macro handles automatic differentiation. Automatic Differentiation macro which allows generating a new function to compute the derivative of a given function. It may only be applied to a function. The expected usage syntax is #[autodiff(NAME, MODE, INPUT_ACTIVITIES, OUTPUT_ACTIVITY)] where: NAME is a string that represents a valid functio...

feral imp Nov 28, 2024, 4:30 PM

#

If this shit ever ships to nightly, so it can actually be used without rebuilding rust, then and only then, will I entertain it's existence.

sturdy sequoia Nov 28, 2024, 7:20 PM

#

feral imp If this shit ever ships to nightly, so it can actually be used without rebuildin...

but it is in nightly

#

isn't it?

#

There's even a crate using it

feral imp Nov 28, 2024, 7:21 PM

#

There are still PRs open.

sly pecan Nov 29, 2024, 6:53 AM

#

https://arxiv.org/pdf/2411.11469

#

Is this interesting?

stone vector Nov 29, 2024, 1:47 PM

#

VM comeback?

sturdy sequoia Nov 29, 2024, 2:40 PM

#

stone vector VM comeback?

poggers

left night Nov 29, 2024, 2:47 PM

#

sly pecan https://arxiv.org/pdf/2411.11469

Not sure how applicable it really is to Typst because a large part of Typst execution time is in the "runtime" itself and not in actually running specific instructions. But it certainly looks good, even a bit too good to be true.

placid pivot Nov 29, 2024, 6:41 PM

#

stone vector VM comeback?

Dherse nightmare / dream

slim sequoia Nov 29, 2024, 7:33 PM

#

I think JIT makes sense if the API were massive, but as it is, a user only imports what is needed and so it probably most codepaths will be run. The only exceptions are those behemoth packages like cetz, and it would probably massively benefit them, though there are easier ways of handling that currently (splitting up packages as much as possible)

sturdy sequoia Nov 29, 2024, 8:05 PM

#

slim sequoia I think JIT makes sense if the API were massive, but as it is, a user only impor...

I think, in general, that packages like cetz will already benefit from the VM itself

lament fulcrum Nov 29, 2024, 9:06 PM

#

sly pecan https://arxiv.org/pdf/2411.11469

huh i recognize the authors

#

they also wrote the infamous copy&patch compilation paper

#

https://fredrikbk.com/publications/copy-and-patch.pdf

#

it seems this paper builds upon that one

sturdy sequoia Nov 30, 2024, 12:37 PM

#

lament fulcrum they also wrote the infamous copy&patch compilation paper

TL;DR?

lament fulcrum Nov 30, 2024, 12:39 PM

#

sturdy sequoia TL;DR?

they use so-called "stencils" to stitch together code snippets from a vast library of binary snippets which during code generation gets their missing values filled in

#

and they provide a system for generating these binary stencils

proud sandal Nov 30, 2024, 12:41 PM

#

ah cool, that's what the new python jit uses

lament fulcrum Nov 30, 2024, 12:42 PM

#

yep

#

pretty neat that they saw that paper and were like "we're using that"

sturdy sequoia Nov 30, 2024, 3:36 PM

#

That's pretty cool

#

but a bit overkill for typst?

lost meteor Dec 1, 2024, 2:50 AM

#

left night btw, did you guys know that html attributes may contain whitespace like the para...

I learned that IDs can be plain numbers recently, but if you want to style them from css you need to escape the first character 🙃

#

https://bsky.app/profile/gareth.fyi/post/3laytyr3dy42a

lost meteor Dec 11, 2024, 6:32 PM

#

Found this neat paper about how they tricked out a tree walking interpreter for R with a bunch of tree optimizations and beat the standard bytecode vm
https://dl.acm.org/doi/10.1145/2576195.2576205

proud sandal Dec 11, 2024, 6:32 PM

#

dherse in shambles rn

sturdy sequoia Dec 12, 2024, 8:23 PM

#

proud sandal dherse in shambles rn

impressed more than anything

left night Jan 9, 2025, 8:43 PM

#

@sturdy sequoia I did a small performance experiment and I thought you might be interested in the result.

What I tested:

Normal Typst on a few small to medium sized documents.
A customized version where the global allocator is just a bump allocator. Essentially trading off memory for the theoretical optimum in terms of allocation (and especially deallocation) overhead.
A customized version of Typst with Mimalloc.

Interestingly, the bump allocator was only ever so slightly faster than the normal one, but Mimalloc was a fair bit faster than the bump allocator!

I can see two potential reasons for that:

Mimalloc has support for realloc, which is probably quite relevant for vectors, or really any growable collection
IIRC Mimalloc has thread-local free lists, whereas the bump allocator was one global atomic, so it might have more sync overhead

I do wonder which (if any) of the above is the primary reason. Either way, very impressive by Mimalloc.

sturdy sequoia Jan 9, 2025, 8:50 PM

#

left night <@130737672951037952> I did a small performance experiment and I thought you mig...

I had tested mimalloc previously and it generally provides a nice speedboost, I would assume that it's because it's also nicely made for parallel applications (iirc). Realloc does make sense too but I don't know how much realloc-inc rust does by default

#

I remember long ago I could get a 15% uplift just with Mimalloc, for a long while I ran locally a custom Typst with MiMalloc lol

left night Jan 9, 2025, 9:04 PM

#

sturdy sequoia I had tested mimalloc previously and it generally provides a nice speedboost, I ...

I would assume that all std containers use realloc.

#

Until a few weeks ago, I never realized that doubling a big vector in size doesn't even necessarily need a large memcpy: Even if the memory is fragmented, the allocator could just remap the virtual memory! That's something that fundamentally distinguishes native from WebAssembly.

sly pecan Jan 9, 2025, 9:12 PM

#

Why would you call your allocator memealloc?

left night Jan 9, 2025, 9:12 PM

#

because everything Microsoft does is a meme

sly pecan Jan 9, 2025, 9:22 PM

#

Didn't realize it was a Microsoft thing

#

In hindsight that explains the "mi" part...

sturdy sequoia Jan 9, 2025, 9:28 PM

#

sly pecan Didn't realize it was a Microsoft thing

Mind you, for once, Microsoft did something right

#

MiMalloc is awesome

sturdy sequoia Jan 9, 2025, 9:28 PM

#

left night Until a few weeks ago, I never realized that doubling a big vector in size doesn...

Ugh, that's true but not necessarily obvious

#

although I would think that in the std code, they don't handle the realloc themselves

#

does memcpy check if the src and dst addresses are the same?

left night Jan 9, 2025, 9:35 PM

#

sturdy sequoia does memcpy check if the src and dst addresses are the same?

I'm not sure, but why would it matter? If you realloc, the contents get copied over by the allocator after all.

left night Jan 9, 2025, 9:35 PM

#

sturdy sequoia although I would think that in the std code, they don't handle the realloc thems...

what do you mean?

sturdy sequoia Jan 9, 2025, 9:38 PM

#

left night what do you mean?

In the code of the collections themselves (outside of the allocator)

sturdy sequoia Jan 9, 2025, 9:38 PM

#

left night I'm not sure, but why would it matter? If you realloc, the contents get copied o...

oh, it does? 😮

#

Then yeah it doesn't matter indeed

left night Jan 9, 2025, 9:38 PM

#

The collections just do alloc::alloc::realloc

sturdy sequoia Jan 9, 2025, 9:38 PM

#

left night The collections just do `alloc::alloc::realloc`

damn

#

so they do

#

fascinating

left night Jan 9, 2025, 9:39 PM

#

and I suppose mimalloc is probably also tuned for good realloc perf

#

maybe it leaves some gaps and fills them in later

#

idk. I did skim the mimalloc paper recently, but I don't remember anything to that effect.

proud sandal Jan 9, 2025, 9:40 PM

#

The collections doing realloc makes sense, since the allocator doesn't know when an allocation is to be reallocated vs newly allocated, but the collection does

left night Jan 9, 2025, 9:40 PM

#

I do remember that it allocates in size classes, which kind of goes against reallocing in place

sturdy sequoia Jan 9, 2025, 9:40 PM

#

proud sandal The collections doing realloc makes sense, since the allocator doesn't know when...

When I wrote my own collections in C, I manually called free and malloc/calloc

#

I am dirty like this

#

back then I didn't know about realloc

proud sandal Jan 9, 2025, 9:41 PM

#

sturdy sequoia I am dirty like this

You slut /s

left night Jan 9, 2025, 9:41 PM

#

why u hurting ur ram like this?

sturdy sequoia Jan 9, 2025, 9:41 PM

#

proud sandal You slut /s

I have a specific episode of Rick and Morty that comes to mind everytime I hear/read that word

sturdy sequoia Jan 9, 2025, 9:41 PM

#

left night why u hurting ur ram like this?

Sorry, I didn't know better 😭

left night Jan 9, 2025, 9:42 PM

#

don't worry, Typst has already hurt your RAM worse than your own collections ever did

sturdy sequoia Jan 9, 2025, 9:56 PM

#

left night don't worry, Typst has already hurt your RAM worse than your own collections eve...

Iirc correctly, typst used > 28 GB of RAM to compile my thesis back when it used the first iteration of my glossary

#

which BTW, all glossaries on Typst Universe are descendants of that glossary

#

I was ahead of the curve 😎

sturdy sequoia Jan 10, 2025, 1:03 PM

#

left night don't worry, Typst has already hurt your RAM worse than your own collections eve...

Honestly, RAM usage is fine, as long as you have tons of it 😎

molten kayak Jan 10, 2025, 1:07 PM

#

You just need to avoid using cetz in presentations 😭

feral imp Jan 10, 2025, 1:11 PM

#

RAM usage (although I've not seen recent benchmarks) is one pretty big weakpoint.

sly pecan Jan 10, 2025, 1:23 PM

#

sturdy sequoia Honestly, RAM usage is fine, as long as you have tons of it 😎

Couldnt reduced memory usage conceivably even improve performance because of memory bandwidth?

#

Depending how it's done of course

sturdy sequoia Jan 10, 2025, 1:23 PM

#

sly pecan Couldnt reduced memory usage conceivably even improve performance because of mem...

It most likely would indeed

#

If nothing else but also because there would be less stuff to hash, less stress on the allocator, fewer entries in the comemo caches, etc.

sturdy sequoia Jan 10, 2025, 1:24 PM

#

feral imp RAM usage (although I've not seen recent benchmarks) is one pretty big weakpoint...

I agree, but memoization is what makes Typst so incredibly fast, and I don't see an immediate way of improving memory usage

#

I think overall we could allocate less

#

And there are optimizations I have previously suggested that we could use in comemo

#

(mostly disabling memoization when the call stack is deep enough)

#

But these are tricky things

#

Also, I am sure we could also optimize a lot of data structures to be less memory intensive

#

but the problem there is that profiling allocations is tricky in my experience, I haven't found a good way of doing it yet

#

That's why I want to try the one on macOS, maybe it can tell us more

#

Or a custom version of the CLI tool that uses a tracing allocator of some kind

feral imp Jan 10, 2025, 1:28 PM

#

I see. Man, it really is intricate stuff. I wish anyone working on this nothing but good luck, and fortune.. Hopefully all your attempts to optimise yield results..

left night Jan 10, 2025, 1:30 PM

#

sturdy sequoia Also, I am sure we could also optimize a lot of data structures to be less memor...

a single BlockElem has std::mem::size_of ~1KB. and we have a ton of them. I absolutely hate it, but it's not trivial to make elements less heavy.

#

I think most (all) fields should probably be behind a pointer. Then we also get the None case for absence for free

#

but I have no clue how much these huge element sizes and all the cloning of them really affect performance

#

hard to know without fixing it in the first place

sturdy sequoia Jan 10, 2025, 1:36 PM

#

left night a single BlockElem has `std::mem::size_of` ~1KB. and we have a ton of them. I ab...

That's perhaps the one bad consequence of the Content rework

#

It was obviously worth it overall but it did balloon the memory usage 😦

left night Jan 10, 2025, 1:39 PM

#

yeah

untold turret Jan 10, 2025, 2:35 PM

#

left night a single BlockElem has `std::mem::size_of` ~1KB. and we have a ton of them. I ab...

what about make block share data by shallow copying each others? I was impressed by scala using it to allow my compiler to edit ast cheaply.

left night Jan 10, 2025, 2:52 PM

#

most copies are already quite shallow through lots of Arc

#

or do you mean something else?

untold turret Jan 10, 2025, 2:52 PM

#

sounds like most fields of block comes from style chain (probably). it is natural to copy edit the element from some value in the context.

untold turret Jan 10, 2025, 2:56 PM

#

left night most copies are already quite shallow through lots of `Arc`

not exact, arc copies on change, but shallow copy creates new instance by referencing some existing value and attaching changes. scala may have some technique to not slow down reads with using shallow copy. I didn't learn much about the underlying implementation.

left night Jan 10, 2025, 2:59 PM

#

I see

sturdy sequoia Jan 10, 2025, 9:22 PM

#

left night a single BlockElem has `std::mem::size_of` ~1KB. and we have a ton of them. I ab...

Ok I might have a bit of a cursed way of solving it

#

that would likely work

#

Essentially, have two repr: dense, and full:

dense: only contains fields marked as #[dense], they are always present
full: all other fields would only be included in this repr if they are set
That way, any elements that doesn't have all of its fields set will use the dense repr, using less RAM, if they have more fields set, it uses the full repr. We can trivially turn one into the other if needed, and we can use a single bit somewhere in Content to do the switch

#

That actually sounds pretty doable, curious how much it would improve performance

#

On the dense impl, accessing one of the field always accesses the style chain

#

If the element doesn't have any #[dense] fields, then it is always full

#

required fields could also be the selector for dense or not

#

Since everything uses dyn Trait under the hood, the Construct impl can return either one and nobody will ever know

#

Of course, that would mean that places where we create BlockElem (for example) would need to either use a BlockElem or a DenseBlockElem based on what they need

#

Then for the impl for T where T can be dense, we can have a dense proc-macro that duplicates the impl with the second version

#

(just copy pasting the code, swapping T for DenseT)

#

For example in raw.rs, this:

realized = BlockElem::new()
    .with_body(Some(BlockBody::Content(realized)))
    .pack()
    .spanned(self.span());

Would become:

realized = DenseBlockElem::new()
    .with_body(Some(BlockBody::Content(realized)))
    .pack()
    .spanned(self.span());

#

As far as all the field enums, etc. they can just be shared

#

I would not use this for all elements, but for the really big ones that are constructed often (block, box, heading, etc.) that often use the style chain. By doing some experiments, we can determine which fields are good to be #[dense] and which ones are not worth it.

#

Still a bit of manual work, but mostly automated!

left night Jan 10, 2025, 9:48 PM

#

Duplicating elements sound like a bit of a pain to deal with and also like a compile time hit. Generally speaking, I'd also like to reduce the amount of proc macro code generation rather than increase it.

For user-defined types it seems more feasible.

#

There's also the problem that showing an element materializes all fields

#

So it's always fully populated then

sturdy sequoia Jan 10, 2025, 9:52 PM

#

From what I can tell, the big ones are block, box, heading, grid, table, pad, place, bibliography (although there's usually only one so not worth it imo), enum, figure, figure.caption, footnote.entry, list, outline (but usually only one), outline.entry (not sure it would be worth it), underline et al. (they're huge), curve, line, path, polygon, rect et al., image

sturdy sequoia Jan 10, 2025, 9:53 PM

#

left night There's also the problem that showing an element materializes all fields

It does? 😮

#

Or by showing you mean in a show rule?

left night Jan 10, 2025, 9:53 PM

#

I mean anything that gets prepared

#

So any input to a show rule

sturdy sequoia Jan 10, 2025, 9:54 PM

#

Perhaps just using the same technique as par by using more ghost fields on the big ones?

left night Jan 10, 2025, 9:54 PM

#

So that you can access the fields in the show rule

left night Jan 10, 2025, 9:54 PM

#

sturdy sequoia Perhaps just using the same technique as `par` by using more `ghost` fields on t...

The semantic pars will get rid of the ghost fields

sturdy sequoia Jan 10, 2025, 9:54 PM

#

left night The semantic pars will get rid of the ghost fields

thonk

left night Jan 10, 2025, 9:54 PM

#

Generally ghost fields mean that the fields will not be user accessible in a show rule

#

That's not good

sturdy sequoia Jan 10, 2025, 9:54 PM

#

I mean they're still used in quite a few other elements

left night Jan 10, 2025, 9:55 PM

#

Yeah, there they will stay of course

#

I just meant for par

sturdy sequoia Jan 10, 2025, 9:55 PM

#

I kind of liked the idea of dense and full repr

#

I think it is pretty clever nonetheless :-p

left night Jan 10, 2025, 9:57 PM

#

I think we can potentially repurpose it, though in a bit different way

lost meteor Jan 10, 2025, 9:57 PM

#

I was writing something about if we had another variant that held a point to an existing full instance and then contained a list of local changes to fields until I realized that that was effectively what the style chain is already 🙃

left night Jan 10, 2025, 9:58 PM

#

Where we always only have the required fields plus a local dynamic container and that dynamic container is an immutable data structure shared with the style chain

#

Style chain would become owned and pre-fold instead of folding lazily

#

That way materialization is typically just an Arc bump

#

and we can potentially get rid of the accessors depending on the style chain

sturdy sequoia Jan 10, 2025, 10:00 PM

#

Ah I see, and when you set block(spacing: 10pt) on the style chain, it updates the structure in the style chain?

#

Then when we realize, we just store an Arc to the style chain's copy?

lost meteor Jan 10, 2025, 10:02 PM

#

left night Style chain would become owned and pre-fold instead of folding lazily

What does "pre-fold" mean?

sturdy sequoia Jan 10, 2025, 10:02 PM

#

So if I do:

set block(spacing: 10pt)
........ (imagine more code here)
set block(stroke: 1pt + black)

I would have two structures on the style chain:

one with just the spacing
a second one (deeper in the chain) with both the spacing and the stroke?

left night Jan 10, 2025, 10:02 PM

#

more or less. basically it's just a dynamic set of properties and uses the same data structure as the style chain

sturdy sequoia Jan 10, 2025, 10:02 PM

#

And we fold eagerly on the style chian?

sturdy sequoia Jan 10, 2025, 10:02 PM

#

left night more or less. basically it's just a dynamic set of properties and uses the same ...

That actually sounds pretty swell

left night Jan 10, 2025, 10:02 PM

#

the style chain as is I believe optimized for the wrong thing

#

it's cheap to amend but fairly expensive to access

#

but access is probably more common

sturdy sequoia Jan 10, 2025, 10:03 PM

#

When accessing is done a lot, but amending is cheap

left night Jan 10, 2025, 10:03 PM

#

we should have something that is cheap to access and reasonable to amend

left night Jan 10, 2025, 10:03 PM

#

lost meteor What does "pre-fold" mean?

the #[fold] thing were 1pt and red merge into 1pt + red

#

currently it's done lazily and that means we can't access such fields via borrow

#

some of them are pretty big so it's unfortunate

lost meteor Jan 10, 2025, 10:04 PM

#

So pre-fold merges them eagerly when found?

left night Jan 10, 2025, 10:04 PM

#

merge them when amending the style chain essentially

#

instead of when accessing it

sturdy sequoia Jan 10, 2025, 10:05 PM

#

I think that should be fairly doable

left night Jan 10, 2025, 10:05 PM

#

I think we can explore these ideas alongside types

sturdy sequoia Jan 10, 2025, 10:05 PM

#

For custom types it is actually fairly natural too then

left night Jan 10, 2025, 10:05 PM

#

That will need a big internal rework anyway and it gives us an opportunity to start from scratch with the macros and stuff

sturdy sequoia Jan 10, 2025, 10:10 PM

#

left night That will need a big internal rework anyway and it gives us an opportunity to st...

To be fair, that would lead to the removal of #[ghost] fields, massive simplification of the #[elem] macro and a simplification of the StyleChain

#

I dig it

#

I'm curious how quickly I could whip up something

#

Unfortunately won't have any time until Sunday

lost meteor Jan 10, 2025, 10:11 PM

#

On the proc macros, I was curious if it might make sense to do the code generation ahead-of-time? I think Rust Analyzer does this with their AST where they use a DSL and a separate build step in Cargo generates struct definitions. I was thinking that if the DSL for defining elements were to match the new custom type definition syntax for Typst, it could be a way to merge the two, and maybe even write the "default show rules" for some elements in Typst itself. Will likely have downsides in rust tooling experience though

sturdy sequoia Jan 10, 2025, 10:11 PM

#

lost meteor On the proc macros, I was curious if it might make sense to do the code generati...

Would require either a complex build script or a custom cargo CLI

#

not sure how much of a gain you'd have there

#

possible? yes, a heck of a lot of work? also yes

#

I am writing a CLI at work to do Rust <-> C++ interop and it's a 15k LOC beast

#

JUST FOR THE INTEROP, the CLI on its own is like 5k LOC too

lost meteor Jan 10, 2025, 10:12 PM

#

Oh no

sturdy sequoia Jan 10, 2025, 10:12 PM

#

Mind you, I am generating tons of C++ & rust code

#

so it's a bit weird

lost meteor Jan 10, 2025, 10:13 PM

#

Yeah, a separate build step would be implied. It's an interesting design space to think in though

lost meteor Jan 10, 2025, 10:14 PM

#

sturdy sequoia JUST FOR THE INTEROP, the CLI on its own is like 5k LOC too

A CLI to generate the interop code? Wild

left night Jan 10, 2025, 10:16 PM

#

lost meteor On the proc macros, I was curious if it might make sense to do the code generati...

what would be the benefit of doing it that way instead of via a proc macro?

sturdy sequoia Jan 10, 2025, 10:23 PM

#

lost meteor A CLI to generate the interop code? Wild

Allows for generating code in dependencies 😉

#

So essentially, a Styles would look like:

struct Styles {
  set_rules: HashMap<Element, Arc<dyn Styled>>
}

#

Each time you create a new node, you clone the hashmap (would probably not use a hashmap for this but whathever, bear with me)

#

And each time you modify an element, you either: insert a new entry into the map, or use the existing entry and fold everything

#

Of course in partice, you'd still want a style chain to avoid cloning the whole style chain every time

#

Then to find for element E, you just look up the first instance of an E in the style chain, giving you its fully folded style at that point

left night Jan 10, 2025, 10:29 PM

#

what's Styled here?

sturdy sequoia Jan 10, 2025, 10:30 PM

#

left night what's `Styled` here?

a trait that makes one of these immutable structures in the chain

#

but since it's in the chain, it has to be in a dyn T

#

could be a dyn Any, just something that the element can downcast back to a concrete type

left night Jan 10, 2025, 10:30 PM

#

so one field value basically?

sturdy sequoia Jan 10, 2025, 10:30 PM

#

Styled: Send + Sync + DynClone or something like that

sturdy sequoia Jan 10, 2025, 10:31 PM

#

left night so one field value basically?

I would make it a struct of all of the element's non-required fields

left night Jan 10, 2025, 10:31 PM

#

ah interesting, I didn't get that

#

do you think that's better than one per field?

#

I mean it does have the benefit that we can clone just the one

#

that wasn't what I was proposing, but very interesting

sturdy sequoia Jan 10, 2025, 10:32 PM

#

And then each element would look like:

struct BlockElem {
  // The non-required fields in an optional pointer
  extra: Option<Arc<BlockElemNonRequiredField>>,
  // The required fields
  body: Content,
}

sturdy sequoia Jan 10, 2025, 10:32 PM

#

left night do you think that's better than one per field?

I would think so, memory usage should still be lower than it currently is, and it makes lookup easier and faster

#

also less fragmented memory when you're actually using the elem (i.e in the Show impl, etc.)

#

since then it's all behind a single pointer instead of being behind n pointer where n is the number of fields

#

so much more cache friendly

#

More memory usage, but more efficient for the CPU

#

Also easier to impleement and understand imo

#

And we can still always borrow from the style chain!

#

So less cloning = more vroom vroom

#

I really want to work on that now 😅

#

More vroom vroom = happy me 🎉

#

I would also not store the style chain data as Values then, but as the "cast" type

#

That way you get less Value conversions overall

#

(also required for folding tbh)

#

I think that in itself would already save a lot of memory

#

And then I would store the revokation, and show rules separately

#

using two arrays, just to make things a bit easier, since they're essentially used in separate steps, I think it also makes sense

#

more allocations, but we reduce so many other allocations that it would be worth it

#

Oh, the style chain still uses Block so it's already the cast value, my bad, I misremembered

#

For liftable, those could be handled essentially the same, just adding a bitset to each elem with which properties are liftable I guess, a bit weirder I'd need to think about it some more

#

@left night What do you think of the per-element version? Or what upsides do you see for the per-field one (outside of memory usage)

#

For default values BTW, I would make the BlockElemNonRequiredField contain all Option<T> (where T is the field's type) and then get the default value from stored globals when it is None, also nicely allows detecting what was set and what was not

left night Jan 10, 2025, 10:53 PM

#

sturdy sequoia <@311948531835469827> What do you think of the per-element version? Or what upsi...

I think it definitely sounds like a viable option as well

left night Jan 10, 2025, 10:55 PM

#

sturdy sequoia For liftable, those could be handled essentially the same, just adding a bitset ...

For the elements liftable shouldn't matter

#

Only in the style chain. But yeah I guess that's what you meant.

sturdy sequoia Jan 10, 2025, 10:56 PM

#

left night Only in the style chain. But yeah I guess that's what you meant.

I thought it was per-field, is it not?

#

hence why I said a bitset, could even be baked into the BlockElemNonRequiredField for which ones are liftable

#

that way it can use the minimum size needed (i.e u8 or u16, etc.)

left night Jan 10, 2025, 10:58 PM

#

yeah it's per field

#

The fields should also be LazyHash

sturdy sequoia Jan 10, 2025, 10:59 PM

#

That honestly all sounds like a pretty good idea

left night Jan 10, 2025, 10:59 PM

#

Behind the Arc

#

It does sound promising indeed

#

Devil is probably once again in the details though

#

but who knows, maybe not

sturdy sequoia Jan 10, 2025, 11:00 PM

#

I'd be curious to see the impact on memory usage & performance

#

I'll definitely see if I can find time to try it (grossly, not optimized, likely with bugs) and see whether it provides any gains

left night Jan 10, 2025, 11:02 PM

#

yeah just a quick test to check whether it's even worth it would be cool

feral imp Jan 12, 2025, 12:53 PM

#

Alright! 24 hours has passed! Did you do it @sturdy sequoia?

sturdy sequoia Jan 12, 2025, 1:37 PM

#

feral imp Alright! 24 hours has passed! Did you do it <@130737672951037952>?

I wasn't even home yesterday

#

wtf lol

feral imp Jan 12, 2025, 1:38 PM

#

It's a risky jokey comment. Funnily enough, your response leads to a Kevin Smith reference

#

sturdy sequoia Jan 12, 2025, 2:09 PM

#

@left night do we need to hash the liftable?

#

(i.e do I need a manual impl of Hash on the style chain structs, or is a derived one fine)

#

I think they shouldn't but I am not sure

left night Jan 12, 2025, 3:36 PM

#

sturdy sequoia <@311948531835469827> do we need to hash the `liftable`?

I think yes. Any reason why we wouldn't?

sturdy sequoia Jan 12, 2025, 3:44 PM

#

left night I think yes. Any reason why we wouldn't?

My main thinking was that since the styling structure is cloned inside of the element on realization (using arc clones), the lift able field will be included and might lead to some failure at memorising?

#

Once I get there I can test both anyway

#

Same with using a linked list of styles or always owned

#

But from the elem macro, most of it is implemented already, but I am at an event so I won’t get to continue tonight, maybe tomorrow after work, don’t think I’ve got any late meetings 🤞

left night Jan 12, 2025, 4:16 PM

#

sturdy sequoia My main thinking was that since the styling structure is cloned inside of the el...

I see, you mean only in the element itself

#

Tbh, for an initial perf investigation you can just ignore liftable entirely

sturdy sequoia Jan 12, 2025, 4:21 PM

#

Sure, but I am trying to think longer term too, might become Content Rework 3D

left night Jan 12, 2025, 4:26 PM

#

For the element it shouldn't be part of the hash then I think

sturdy sequoia Jan 12, 2025, 4:52 PM

#

I mean I can do a manual implant for the element

left night Jan 12, 2025, 5:11 PM

#

sturdy sequoia I mean I can do a manual implant for the element

For now I would really just not worry about it

sturdy sequoia Jan 12, 2025, 6:35 PM

#

left night For now I would really just not worry about it

What is it used for?

#

is it for follow up set page(...)?

left night Jan 12, 2025, 7:28 PM

#

sturdy sequoia What is it used for?

it's for page wide styles

sturdy sequoia Jan 12, 2025, 8:13 PM

#

left night it's for page wide styles

What are page wide styles?

left night Jan 14, 2025, 12:50 PM

#

sturdy sequoia What are page wide styles?

see here: https://github.com/typst/typst/blob/a4ac4e656267e718a5cf60d1e959f74b2b7346f3/crates/typst-layout/src/pages/run.rs#L226-L247

#

btw, you can now have --timings on the web app!

#

#

(currently, only available in the dev compiler version, but will also be available in 0.13+)

sturdy sequoia Jan 14, 2025, 12:56 PM

#

left night

YOU CAN HAVE WHAT?

#

That's awesome! 😄

#

Those timings ended up being useful 🎉

#

That's actually awesome!

#

Just tried it 😄

lunar kettle Jan 14, 2025, 1:06 PM

#

left night

would be nice if there is some kind of success message after restarting the compiler

#

Also shouldn't a restart trigger a recompile? 🤔

#

ah nvm, I think the compilation is just so fast I can't see it 😂

left night Jan 14, 2025, 1:16 PM

#

lunar kettle ah nvm, I think the compilation is just so fast I can't see it 😂

I considered removing the preview upon restart, but wasn't bothered enough

sturdy sequoia Jan 14, 2025, 1:20 PM

#

@left night would be nice once you're done recording if it gave you a link to perfetto for viewing it

left night Jan 14, 2025, 1:21 PM

#

sturdy sequoia <@311948531835469827> would be nice once you're done recording if it gave you a ...

yeah, same as above I wasn't bothered enough 😅

#

it would also be nice if some more of the web app-side stuff had #[time] annotations

#

right now there's some blanks in there

#

but the "hard" part (adjusting typst-timing) is done now

sturdy sequoia Jan 14, 2025, 1:23 PM

#

left night but the "hard" part (adjusting typst-timing) is done now

What was needed to change?

#

I guess making it WASM compatible?

#

Wow, it's a much bigger deal than I expected 😮

#

(lookin at the PR rn)

left night Jan 14, 2025, 1:23 PM

#

I moved some stuff around

#

but mostly replacing thread_id and SystemTime::now

sturdy sequoia Jan 14, 2025, 1:24 PM

#

Makes sense, super cool stuff!

left night Jan 14, 2025, 1:24 PM

#

it's fun that you can generate a thread id by combination of a thread_local and an atomic static

#

even more fun is that wasm-bindgen uses a very similar approach to make thread locals possible in the first place

lunar kettle Jan 14, 2025, 1:46 PM

#

left night I considered removing the preview upon restart, but wasn't bothered enough

I guess it isn’t necessary, but it does at least let the user know that it actually restarted :p

#

Cool stuff though!

untold turret Jan 16, 2025, 3:16 AM

#

comemo can be pretty ineffective if there are more than one active "worlds" at the same time. I observed this fact in a crash when the language server was hot reloading fonts today.

#

The crash site is here. A typst function f uses the 472th font in some world. Then, another thread runs the function with another world which has only 17 fonts. It crashes intermediately on the self.fonts[index], complaining index out of bounds: the len is 17 but the index is 472.

    fn font(&self, index: usize) -> Option<Font> {
        self.fonts[index].get()
    }

https://github.com/typst/typst/blob/a4ac4e656267e718a5cf60d1e959f74b2b7346f3/crates/typst-cli/src/world.rs#L212-L214

#

Beyond the crash, when executing a same function call, the multiple threads using different worlds will atomically read and write a same shared comemo entry. Is it kind of ineffective?

left night Jan 16, 2025, 9:03 AM

#

untold turret The crash site is here. A typst function `f` uses the 472th font in some world. ...

yeah, that's a bit dumb. we should just do a checked .get(index)? there

left night Jan 16, 2025, 9:03 AM

#

untold turret Beyond the crash, when executing a same function call, the multiple threads usin...

not much we can do about multiple threads racing on the same call

#

if they have at least one tracked argument, we can't know ahead of time whether it's actually the same call

untold turret Jan 16, 2025, 9:04 AM

#

left night yeah, that's a bit dumb. we should just do a checked `.get(index)?` there

I did that, may be I can also make a PR to typst.

left night Jan 16, 2025, 9:05 AM

#

constraint validation may also a cause of slowdown with multiple worlds, but the discussions in #1325795747165114479 outline ways to improve that

#

#1325795747165114479 is probably a better place to discuss this btw

untold turret Mar 31, 2025, 10:18 AM

#

I probably found an interesting case to optimize. @sturdy sequoia See this function: https://github.com/touying-typ/touying/blob/3d8819c8e13fc9fc80620da26dc7ec48970c44f0/src/utils.typ#L92-L108
The function abused by touying will at least double the cost of memory in regard to comemo caching functions unconditionally. A so-called small function optimization is possible:

Only simple native functions are called in the function.
Only a few of simple if/for/import statement are executed. A for is simple if the loop times is know statically and small.
The function is not recursive.
Then we could invoke the function without caching.
I remeber you had surveyed conditional comemo, so you might be interested in it.

GitHub

touying/src/utils.typ at 3d8819c8e13fc9fc80620da26dc7ec48970c44f0 ...

Touying is a powerful package for creating presentation slides in Typst. - touying-typ/touying

sturdy sequoia Mar 31, 2025, 11:57 AM

#

untold turret I probably found an interesting case to optimize. <@130737672951037952> See this...

I have the same issues in many places in codly, so I did not use a function for it (to avoid comemo)

#

I generally think there should be a content.labelled(xxx) method @left night

#

But I think in general small function optimization would make sense

#

but it's hard to "detect" AFAIK

left night Mar 31, 2025, 1:37 PM

#

sturdy sequoia I generally think there should be a `content.labelled(xxx)` method <@31194853183...

I don't want to add new methods on content because they will need to be deprecated with content = value

left night Mar 31, 2025, 2:19 PM

#

Maybe we could have a very conservative detection of small functions that at least works in some cases.

sly pecan Mar 31, 2025, 2:35 PM

#

#let smol f(x)=

left night Mar 31, 2025, 2:42 PM

#

I think C should adopt smol as an alias for inline 😂

#

Though it wouldn't actually work well in their syntax

#

Cause smol int f() vs short int f() would have very different meanings

proud sandal Mar 31, 2025, 3:18 PM

#

Prefix return type were a mistake

sturdy sequoia Mar 31, 2025, 3:47 PM

#

sly pecan ` #let smol f(x)=`

Ok but low key

#

I like it

#

Can we have this pwease pwetty pwease @left night

#

I need it SO much in codly

#

and clearly in touying too

#

much better than annotations

left night Mar 31, 2025, 4:05 PM

#

I think coming up with a conservative complexity metric might not even be that hard

proud sandal Mar 31, 2025, 4:08 PM

#

the real time system i work on at work simply counts the steps it requires to execute a function and if it's less than n it executes it regularly

#

and the steps are just the AST nodes there

#

well, sans loops and such

sly pecan Mar 31, 2025, 4:43 PM

#

left night I think coming up with a conservative complexity metric might not even be that h...

Can we assign every operation a complexity score, with the score of a function being the sum of those?

#

Maybe too naive

left night Mar 31, 2025, 5:31 PM

#

Something like that, but loops will need special treatment

sturdy sequoia Apr 1, 2025, 9:34 AM

#

Loops & iterator functions!

#

Those need special casing too

left night Apr 1, 2025, 10:44 AM

#

sturdy sequoia Those need special casing too

though for a map it's not unlikely that it's just three items

#

in this case, it's really hard to tell

#

interestingly, if built-in functions and methods were memoized, it would be a moot point. but it would make the situation even worse at the same time.

sturdy sequoia Apr 1, 2025, 11:47 AM

#

left night though for a `map` it's not unlikely that it's just three items

yes my point is more that you have to consider them as loops too

sturdy sequoia Apr 1, 2025, 3:35 PM

#

Actually I have a good metric @left night we measure the hashing time vs the run time, if the hashing time is lower, we use memoization otherwise we don't

#

Obviously can't use the sys call for time but in native we can use the fasttime crate of whathever it's called (crate to get time using CPU shennanigans, forgot the exact name)

#

But I guess that small function detection doesn't work this way

#

because it's input dependent

#

the only solution is indeed to do it from the AST nodes

#

ugh

sturdy sequoia Apr 2, 2025, 4:21 PM

#

Ok, so I am experimenting (no promises on an actual PR), and I did the thing on AST nodes (giving each node a cost, kind of random based on my gut feeling hyperthink ), I also am using my comemo branch with enabled for comemo::memoize and the results, at least on masterproef are disappointing:

It turns out that straight up disabling memoization on closures saves (on masterproef) 60 MiB of RAM (i know i know calm your applause)
Adding the cost thing makes basically no difference at any of the random "thresholds" I have set to compile time but it does reduce memory by a similar amount as outright disabling memoization. So perhaps that's an argument (just memory)

This reinforces that eval really isn't the bottleneck in real world documents, which I think is a theme that arkons back to early days of typst optimization.

Now there may be some real-world eval-heavy documents, but I'd need one of those to actually do the work of determining whether this is worth it and finding "better" values than the ones I have chose. As an aside, if this was part of my job, I'd actually go through the trouble of doing some kind of iterative process to find the gradient on those values and descend to an (likely local) optimum, but my brain is too tired to be doing that.

#

So does anybody have good test docs for me? @untold turret @proud sandal @sly pecan @left night @lunar kettle?

#

(sowwy not sowwy for the pinwgs)

proud sandal Apr 2, 2025, 4:22 PM

#

Don't have any large eval heavy documents i think

glad urchin Apr 2, 2025, 4:46 PM

#

sturdy sequoia So does anybody have good test docs for me? <@432835220593704981> <@795342139860...

Make sure to try elembic's integration tests

#

There's one with like 5000 custom set rules

sturdy sequoia Apr 2, 2025, 4:47 PM

#

glad urchin Make sure to try elembic's integration tests

Good idea

glad urchin Apr 2, 2025, 4:47 PM

#

test/integration/many-rules.typ

sturdy sequoia Apr 2, 2025, 4:47 PM

#

I just actually ran a full instrumentation of typst, and like most of the time is spent in the grid layouter, I guess that my thesis has a lot of tables that span pages and that's bad

glad urchin Apr 2, 2025, 4:58 PM

#

sturdy sequoia I just actually ran a full instrumentation of typst, and like most of the time i...

there's probably stuff to improve there in the future

#

maybe you could share your results? that could be helpful

sly pecan Apr 2, 2025, 5:04 PM

#

sturdy sequoia So does anybody have good test docs for me? <@432835220593704981> <@795342139860...

Someone posted a large gallery of cetz drawings. That's an option

#

Or the cetz documentation itself

#

https://github.com/janosh/diagrams

GitHub

GitHub - janosh/diagrams: Diagrams of concepts in physics/chemistry/ML

Diagrams of concepts in physics/chemistry/ML. Contribute to janosh/diagrams development by creating an account on GitHub.

proud sandal Apr 2, 2025, 5:25 PM

#

cetz-plot also has a particularly slow test

#

and the tests (so far) don't need tytanic to run, they don't use any of the test library features so running typst should just work

sturdy sequoia Apr 2, 2025, 8:30 PM

#

glad urchin maybe you could share your results? that could be helpful

Unfortunately they're with the macOS built-in profiler so I don't know for sure how to extract it

#

But I love Instruments, it works so well

glad urchin Apr 2, 2025, 8:32 PM

#

sturdy sequoia Unfortunately they're with the macOS built-in profiler so I don't know for sure ...

I mean you can just send the raw thing and i can figure it out

#

Unless you mean you don't know how to send the raw thing

#

I guess screenshots also work in a worst case scenario lol

sturdy sequoia Apr 2, 2025, 9:22 PM

#

glad urchin Unless you mean you don't know how to send the raw thing

Exactly 😄

#

Still haven't found a way of getting a flamegraph

#

I kind of narrowed it down to being codly blocks in tables that are especially gnarly

untold turret Apr 3, 2025, 7:24 AM

#

sturdy sequoia So does anybody have good test docs for me? <@432835220593704981> <@795342139860...

There is a fresh "cursed" single-file document. https://raw.githubusercontent.com/jthulhu/grothendieck-fibration/049d4766a3dc9a6c662ec0b884c26ee91be7f565/notes.typ

#

It turns out that straight up disabling memoization on closures saves (on masterproef) 60 MiB of RAM (i know i know calm your applause)

Commented out the line on closures (on the "cursed" doc) then cpu goes from 6 secs to 13 secs, mem goes from 1.92GB to 506MB:

/// Call the function in the context with the arguments.
- #[comemo::memoize]
#[allow(clippy::too_many_arguments)]
pub fn eval_closure(

untold turret Apr 3, 2025, 7:51 AM

#

More info: after disabling cache on some functions, the cpu time goes from 6 secs to 5.9 secs, and max memory usage goes from 1.92GB to 905MB.

#

let file_id = func_site.id();
  if file_id.is_some_and(|file_id| {
      let file_name = file_id.vpath().as_rooted_path().to_str().unwrap();
      file_name.ends_with("utils.typ")
      || file_name.ends_with("util.typ")
          || file_name.ends_with("complex.typ")
          || file_name.ends_with("bezier.typ")
          || file_name.ends_with("vector.typ")
          || file_name.ends_with("matrix.typ")
          || file_name.ends_with("coordinate.typ")
          || file_name.ends_with("grouping.typ")
  }) {
      return eval_closure_inner(..);
  }

  eval_closure_cached(..)

sturdy sequoia Apr 3, 2025, 12:59 PM

#

untold turret More info: after disabling cache on some functions, the cpu time goes from 6 sec...

That's actually super useful!

#

Because I can use this to figure out a good score treshold!!

#

BTW, I also found tons of small optimizations throughout the code base 🙂

#

They save basically zero time

#

but they do improve it by like ~2% repeatably 🤷‍♂️

sturdy sequoia Apr 3, 2025, 1:02 PM

#

untold turret > It turns out that straight up disabling memoization on closures saves (on mast...

With the current settings I have it's 6.95s at 494MB

#

(on my MPB so i need to check the "native" case)

#

Compares to 4.8s at 2 GB for main

#

branch         time    mem 
main           4.77    2 GiB
score (1000)   11.08   390 MiB
score (200)    7.22    500 MiB
score (100)    5.88    650 MiB
score (50)     5.71    700 MiB
score (40)     5.44    740 MiB
score (30)     5.48    740 MiB
score (20)     5.10    830 MiB
score (10)     4.60    1.1 GiB
score (5)      4.68    1.2 GiB

(memory obtained from `typst watch ...` time via `hyperfine typst ...`)
(ran on my M4 Max MBP, lots of other stuff running)

#

Clearly need to optimize scores

#

So clearly the score approach saves TONS of memory

#

I have the inkling that allowing the smol function definition would still be better but heh

sturdy sequoia Apr 3, 2025, 1:19 PM

#

sturdy sequoia ``` branch time mem main 4.77 2 GiB score (1000) 11.0...

Ok, I ran a proper sweep

#

So you can see that we can save time & memory for small functions

#

I'll now update it with hyperfine but it'll take a while

untold turret Apr 3, 2025, 1:41 PM

#

what about the performance on incr compile? I think it should affect a bit if we perform SFO (small func opt) on functions which have T_cache < T_exec, where T_cache is the time to find and return a cached result and T_exec is the time to execute the function.

sturdy sequoia Apr 3, 2025, 1:42 PM

#

Here it's cold compile only

#

Don't have time to do incremental right now (I am literally in the office working lol)

sturdy sequoia Apr 3, 2025, 1:44 PM

#

sturdy sequoia ``` branch time mem main 4.77 2 GiB score (1000) 11.0...

Updated with time from hyperfine

untold turret Apr 3, 2025, 1:45 PM

#

sturdy sequoia Don't have time to do incremental right now (I am literally in the office workin...

kind of like me. I oftenly play a joke saying that I made my useful tools (tinymist etc.) when I'm compiling my c++ and rust projects in my office.

untold turret Apr 3, 2025, 1:46 PM

#

sturdy sequoia Updated with time from hyperfine

looks pretty good.

shy sage Apr 3, 2025, 2:18 PM

#

sturdy sequoia With the current settings I have it's 6.95s at 494MB

M4 Ultra MBP the secret processor that gives you superpowers 😆

sturdy sequoia Apr 3, 2025, 2:35 PM

#

shy sage `M4 Ultra MBP` the secret processor that gives you superpowers 😆

Sowwy M4 Max *

sturdy sequoia Apr 3, 2025, 2:52 PM

#

I also tried on masterproef and I can't really tell any difference, memory usage just slightly goes down, but compile times stay around the same, just slightly slower

#

So the threshold seems document dependent

#

I should try on one of my larger slide decks

untold turret Apr 3, 2025, 3:18 PM

#

i would like to profile the memory objects but the tool told me most were comemo entries and I didn't get more information.

sly pecan Apr 3, 2025, 3:19 PM

#

I think lower memory usage by itself is useful even if it doesn't improve speed

sly pecan Apr 3, 2025, 3:23 PM

#

sturdy sequoia I also tried on masterproef and I can't really tell any difference, memory usage...

I'm wondering if the difference would be bigger with very slow ram

molten kayak Apr 3, 2025, 3:37 PM

#

sly pecan I think lower memory usage by itself is useful even if it doesn't improve speed

If your pc ends up having to swap then it definitely does

sly pecan Apr 3, 2025, 3:39 PM

#

molten kayak If your pc ends up having to swap then it definitely does

I'm more concerned about the very memory constrained web app

#

But yeah

sturdy sequoia Apr 3, 2025, 5:39 PM

#

sly pecan I'm wondering if the difference would be bigger with very slow ram

It's true that this thing [the MPB] has insane memory bandwidth

sturdy sequoia Apr 3, 2025, 5:40 PM

#

sly pecan I think lower memory usage by itself is useful even if it doesn't improve speed

I think it's likely useful

#

I made the threshold parameterized with a ENV var

#

called TYPST_SCORE in my branch

#

I'll publish it tonight if I remember to do so once I am home

#

I will also try it on some of my work slidedecks but obviously I cannot share those for reproducibility 😦 Can't even send them to my Desktop to have a second point 😦

sturdy sequoia Apr 3, 2025, 5:54 PM

#

sturdy sequoia ``` branch time mem main 4.77 2 GiB score (1000) 11.0...

@left night what do you think of those results? For my thesis I can reduce memory usage a bit, but most of the memory is table layout so it's less impactful there 😦

left night Apr 3, 2025, 6:19 PM

#

sturdy sequoia <@311948531835469827> what do you think of those results? For my thesis I can re...

looks promising at first glance

sturdy sequoia Apr 3, 2025, 7:54 PM

#

left night looks promising at first glance

I agree, but I need to test more docs & actually run it automated because doing it by hand is a huge pain 😭

sturdy sequoia Apr 3, 2025, 8:18 PM

#

To add one more data point: for a large slide deck of mine (using Touying), I get a reduction of ~100 MiB of RAM for 62 ms increase in cold compile

#

This weekend (iff I have the time) I might try and measure incremental times & memory for each and do a nice set of plots, but no promises

#

But on my slide deck at least, there's no decrease in hot compile time but a decrease (after one slide removal then re-adding it) of 120 MiB of memory usage, so definitely seems interesting

#

I wonder if we could reach better results with the smol indicator

sturdy sequoia Apr 3, 2025, 8:51 PM

#

Actually not caching contextual blocks also seems to help

#

but I'll need to measure it in more details

sly pecan Apr 3, 2025, 8:52 PM

#

sturdy sequoia Actually not caching contextual blocks also seems to help

How are you determining whether something should be cached or not currently?

sturdy sequoia Apr 3, 2025, 9:12 PM

#

sly pecan How are you determining whether something should be cached or not currently?

I give it a score (done by traversing the AST when defining the closure), the score is:

- 1 per AST-node
- 10 per function call (I want to refine this to specifically array functions (map, filter, etc.) and +1 for others but not done yet)
- 1 per AST node in for loop iterable + 10 * body's cost for for-loop
- 10 * (1 per AST node in condition + body's cost) for while-loop
- 100 for module imports (not sure if needed tbh)

#

If the score is below the threshold then the function doesn't get memoized

#

(note that the scores could be changed, they're just what "felt" right lol)

#

I think that constants should likely have zero cost

sly pecan Apr 3, 2025, 9:24 PM

#

sturdy sequoia (note that the scores could be changed, they're just what "felt" right lol)

I think one would have to actually measure to determine what the appropriate cost should be

sturdy sequoia Apr 3, 2025, 9:25 PM

#

sly pecan I think one would have to actually measure to determine what the appropriate cos...

Yes, but good luck with that

#

There are tons of different AST nodes, assigning a cost to each of them that works is tricky

sly pecan Apr 3, 2025, 9:44 PM

#

sturdy sequoia There are tons of different AST nodes, assigning a cost to each of them that wor...

Linear regression?

sturdy sequoia Apr 4, 2025, 9:21 AM

#

sly pecan Linear regression?

I mean what I could do, but it would be horrendous, is make typst load a parameter file then use some gradient descent algorithms to find optimal costs for each but I would need loads of documents and it would take forever to run (since it would need to estimate the gradients since there would be no autograd) and truth is, while I use torch at work all the time I have never done anything "sampled" like with Stochastic Gradient Descent (https://en.wikipedia.org/wiki/Stochastic_gradient_descent) something I have head of but never used and I am unsure if it is suitable 😐

Stochastic gradient descent

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data s...

#

And it would require many documents for testing

#

rather than the three I am using right now

#

that being said, with many docs and memory & time measurements and a suitable loss function, we should be able to reach a very nice objective potentially 😄

feral imp Apr 4, 2025, 9:30 AM

#

If you can create the dataset, than others could do an analysis.. Also, God did grace humanity with other algorithms than SGD... It is more than nested nature of things that would cause an issue I think.

sturdy sequoia Apr 4, 2025, 9:36 AM

#

feral imp If you can create the dataset, than others could do an analysis.. Also, God did ...

I don't know much about this stuff :-p

#

I generally work with fully differentiable circuits (outside of quantizers which make my life difficult) and I can just use torch as-is 🤷‍♂️

sly pecan Apr 4, 2025, 12:35 PM

#

sturdy sequoia I mean what I *could* do, but it would be horrendous, is make typst load a param...

If you model the cost as linear in all ast nodes, you don't actually have to do any gradient descent. You'd just have to do a bunch of measurements (more than the number of parameters) of different combinations of weights and the resulting objective (execution time?)

sturdy sequoia Apr 4, 2025, 1:41 PM

#

sly pecan If you model the cost as linear in all ast nodes, you don't actually have to do ...

Ok, but you have like all of the different ones? or do you do it one-by-one?

sly pecan Apr 4, 2025, 2:03 PM

#

sturdy sequoia Ok, but you have like all of the different ones? or do you do it one-by-one?

I'm not sure I understand what you're asking about

lost meteor Apr 4, 2025, 4:39 PM

#

sturdy sequoia So does anybody have good test docs for me? <@432835220593704981> <@795342139860...

I have a conway's game of life simulator, but its probably bottlenecked in either layout or being slow from copy on write:
https://typst.app/project/riswjh0KOErxkrhx1UTHQj

#

(You can scale N to make it as big as you want)

sturdy sequoia Apr 4, 2025, 5:00 PM

#

lost meteor I have a conway's game of life simulator, but its probably bottlenecked in eithe...

that's cursed in the best way 😄

#

Lots of small functions too!

sly pecan Apr 4, 2025, 5:03 PM

#

sturdy sequoia that's cursed in the best way 😄

The ray tracer is also an option

sturdy sequoia Apr 4, 2025, 5:47 PM

#

Ok, I automated benchmarking and it's running benchmarks on three docs atm, will check back in once they're done (likely > 1 hour)

#

It includes benchmarking memory usage (so it runs watch and measure memory then kills it)

sturdy sequoia Apr 4, 2025, 7:46 PM

#

                  masterproef              notes               slides           
               time         mem      time         mem     time        mem
score                                                                    
0           3.265 s  1483.3 MiB   5.026 s  2001.2 MiB  1.478 s  811.0 MiB
5           3.673 s  1457.6 MiB   5.039 s  1862.0 MiB  1.482 s  794.7 MiB
10          3.638 s  1427.0 MiB   4.990 s  1523.5 MiB  1.478 s  720.2 MiB
20          3.673 s  1412.7 MiB   5.454 s   899.6 MiB  1.510 s  722.3 MiB
30          3.821 s  1430.9 MiB   5.693 s   852.2 MiB  1.512 s  714.9 MiB
40          3.827 s  1412.7 MiB   6.034 s   783.0 MiB  1.515 s  710.0 MiB
50          3.832 s  1386.3 MiB   6.096 s   695.1 MiB  1.524 s  699.1 MiB
100         3.822 s  1401.9 MiB   6.018 s   623.9 MiB  1.525 s  690.9 MiB
150         3.820 s  1381.1 MiB   6.493 s   565.2 MiB  1.529 s  687.0 MiB
200         3.822 s  1370.7 MiB   6.548 s   482.6 MiB  1.528 s  684.0 MiB
250         3.815 s  1393.6 MiB   6.925 s   534.3 MiB  1.530 s  688.8 MiB
500         3.820 s  1377.2 MiB   9.650 s   404.5 MiB  1.538 s  703.0 MiB
1000        3.818 s  1368.6 MiB  11.108 s   373.9 MiB  1.540 s  692.9 MiB
1000000     4.075 s  1352.0 MiB  16.124 s   360.8 MiB  1.663 s  663.2 MiB

#

So, notes on this:

ran with -j 1
masterproef = my thesis, notes = the notes.typ posted here earlier, slides = a slide deck from my job
ran on a Mac Book Pro M4 Max 128 GB RAM
a score of 0 means all functions are memoized
a score of 1000000 means that (essentially) none are memoized

sly pecan Apr 4, 2025, 9:55 PM

#

So basically not worth it?

feral imp Apr 4, 2025, 9:59 PM

#

sly pecan So basically not worth it?

I'm getting the opposite translation. Everything is memoized right now. If nothing is memoized, then some things are improving by a large margin in memory, but not in time. There is a clear trade-off occurring.

#

I'm basically looking at it as the score being the Treatment, and these three are different patients.. God a survival curve would be nice here.

sly pecan Apr 4, 2025, 10:06 PM

#

feral imp I'm getting the opposite translation. Everything is memoized right now. If nothi...

In all instances the best performance is achieved with basically everything memoized

#

Though for one of the documents in particular there is a big memory tradeoff

feral imp Apr 4, 2025, 10:10 PM

#

Slides gained 148 MiB for loss in time by 0.185 s. I think that's also something.

sturdy sequoia Apr 5, 2025, 10:07 AM

#

sly pecan So basically not worth it?

To me it’s also the opposite, small scores provide a nice decrease in memory usage at almost now performance cost

#

Therefore it’s a valid thing. The problem is that most of the savings is in eval which isn’t the majority of work. Layout is

#

I think that if time is what we want to save then eval is the wrong place to look

feral imp Apr 5, 2025, 10:10 AM

#

Again, I might be terribly shortsighted, but the memory usage is too much for my nerves. The notes thing goes from 2 gigs to 360 MiB.. I think this needs to be dealt with somehow.

sturdy sequoia Apr 5, 2025, 10:32 AM

#

sturdy sequoia Apr 5, 2025, 10:32 AM

#

feral imp Again, I might be terribly shortsighted, but the memory usage is too much for my...

I agree, small function memoization is a big issue, actually I could simplify codly a lot if we had small function memoization

#

but I literally inlined EVERY SINGLE SMALL FUNCTION in the main file to reduce memory usage and execution time

#

I LITERALLY MADE PRE-PROCESSOR MACROS TO DO THIS

#

And honestly looking at these plots, a score of 20 sounds delightful

left night Apr 5, 2025, 10:36 AM

#

sturdy sequoia And honestly looking at these plots, a score of 20 sounds delightful

I think these plots would be easier to judge if the time and memory would start at zero at the bottom

sturdy sequoia Apr 5, 2025, 10:36 AM

#

left night I think these plots would be easier to judge if the time and memory would start ...

Aye can do that

left night Apr 5, 2025, 10:36 AM

#

The way it is right now, the slope is not really meaningful

sturdy sequoia Apr 5, 2025, 10:38 AM

#

@left night

#

Obviously my thesis is a special case because it's all table layout 😄

#

For the notes.typ a score of 20 gets you half the RAM for basically the same performance and on my slides a nice 100 MiB reduction for the same performance

feral imp Apr 5, 2025, 10:46 AM

#

Is it possible to get more granular measurements around 20?

sturdy sequoia Apr 5, 2025, 10:49 AM

#

feral imp Is it possible to get more granular measurements around 20?

Sure, give me like 6 hours for it to run 😭

#

These things are sloooooow

feral imp Apr 5, 2025, 10:49 AM

#

I tried a bunch of stuff with the data you provided yesterday, and it all came out non-sense, or atleast non-sense to me. I think we don't have enough around the infliction point (also the 2 responses is new to me 😦 )

sturdy sequoia Apr 5, 2025, 10:51 AM

#

Should take around ~1 hour

#

I also think I should try various scores for different AST nodes

#

but that becomes painful

left night Apr 5, 2025, 10:54 AM

#

sturdy sequoia <@311948531835469827>

Thanks, that's much easier to read for me

#

It does indeed look like we can get a nice amount of memory saving

#

I think for improving performance rather than memory usage, the key is to reduce function call overhead more generally, and not memoization specifically

sturdy sequoia Apr 5, 2025, 10:56 AM

#

I think that if we do this, I can make codly have functions again and we will see a nice reduction in masterproef too since codly will be memoizable again (which I explicitely made it not be)

sturdy sequoia Apr 5, 2025, 10:57 AM

#

left night I think for improving performance rather than memory usage, the key is to reduce...

Well memoization causes large memory usage, but in some cases, it's not strictly necessary, I still think having a smol keyword could lead to more significant benefits, but I can see the argument against manual control here

#

I'll actually add raytracer & conway's game of life too

#

since they're more eval heavy

left night Apr 5, 2025, 10:58 AM

#

What's could also be a worthwhile optimization is a move optimization

#

Where a value is moved instead of cloned on its last use

#

That way, things like arr = push-one(arr) could reuse the array instead of doing clone-on-write all the time.

#

But that would require two-phase eval

#

Similar things with not re-evaluating constants all the time

#

I think this goes into a similar direction as things you explored in your bytecode VM

sturdy sequoia Apr 5, 2025, 10:59 AM

#

left night Where a value is moved instead of cloned on its last use

I KNOW

#

On my branch I have also been hunting usages of .clone() and .cloned() throughout the code base

#

I actually manage to save some compile time there too

sturdy sequoia Apr 5, 2025, 11:00 AM

#

left night But that would require two-phase eval

The VM could do it while compiling hyperthink

#

hyperthink

#

(is it time for VM3?)

proud sandal Apr 5, 2025, 11:01 AM

#

it'll work this time

sturdy sequoia Apr 5, 2025, 11:01 AM

#

proud sandal it'll work this time

I mean it always worked

#

but for real docs, the compile time just removes the benefits

#

outside of memory usage iirc

#

but with move semantics, more borrowing, & small function optimization? could be interesting 😄

#

I also still think that if we cached package compilation it would be 🔥

#

Like when first downloading a package, it gets cache compiled locally

#

wouldn't that be neat

proud sandal Apr 5, 2025, 11:03 AM

#

it would indeed, but then you'd need a stable ABI of sorts

cunning wadi Apr 5, 2025, 11:04 AM

#

sturdy sequoia Like when first downloading a package, it gets cache compiled locally

*especially when type checking is a thing

cunning wadi Apr 5, 2025, 11:04 AM

#

proud sandal it would indeed, but then you'd need a stable ABI of sorts

I don't think you would?

sturdy sequoia Apr 5, 2025, 11:04 AM

#

cunning wadi *especially when type checking is a thing

That would probably be very useful for custom type performance too 🤷‍♂️

cunning wadi Apr 5, 2025, 11:04 AM

#

rust doesn't have one and still does that

left night Apr 5, 2025, 11:04 AM

#

sturdy sequoia The VM could do it while compiling <:hyperthink:707668334442709003>

yep

sturdy sequoia Apr 5, 2025, 11:05 AM

#

proud sandal it would indeed, but then you'd need a stable ABI of sorts

The INSR wouldn't even need to be stable, it could have a header containing the typst version

#

and just re-cache if the typst version has changed 🤷‍♂️

proud sandal Apr 5, 2025, 11:05 AM

#

right sure that works

left night Apr 5, 2025, 11:05 AM

#

I still feel like it would be possible to make these optimizations as incremental improvements which could turn more and more VM-style instead of outright replacing eval immediately

left night Apr 5, 2025, 11:06 AM

#

sturdy sequoia The INSR wouldn't even need to be stable, it could have a header containing the ...

Not stable, but serializable

sturdy sequoia Apr 5, 2025, 11:06 AM

#

left night I still feel like it would be possible to make these optimizations as incrementa...

true, but at some point you'd still have a CHONKY PR that actually implements bytecode VM

left night Apr 5, 2025, 11:06 AM

#

Which already is a bit of pain

sturdy sequoia Apr 5, 2025, 11:06 AM

#

But only spans are a pain, right?

left night Apr 5, 2025, 11:06 AM

#

Also the constant pool

#

Which contains arbitrary values sort of

#

Especially if you constant fold a ton (because we can do that!)

#

Anything that is well-known during compile, could as compiled and run right away

#

I think it would be sort of a hybrid

sturdy sequoia Apr 5, 2025, 11:07 AM

#

Not really, at least not with how the "old" VM worked, the constants were strings, numbers, etc. not dict, arrays, etc.

left night Apr 5, 2025, 11:07 AM

#

But they could be

sturdy sequoia Apr 5, 2025, 11:07 AM

#

Yes

left night Apr 5, 2025, 11:08 AM

#

An interleaved/hybrid compiler/interpreter that always runs things as soon as it knows enough would be interesting I think

#

This way, things run the least amount of times necessary

#

And it's only possible because of all the guarantees Typst can make

#

Compared to e.g. JS

sturdy sequoia Apr 5, 2025, 11:08 AM

#

Sure

#

But then you'd have both an eval & compile loop

left night Apr 5, 2025, 11:08 AM

#

loop?

sturdy sequoia Apr 5, 2025, 11:09 AM

#

compiler *

#

Essentially we'd have two evals: eval, compiler + vm

#

with three components

#

or I guess we could vm-eval instructions as we go

left night Apr 5, 2025, 11:10 AM

#

sturdy sequoia or I guess we could vm-eval instructions as we go

yeah, I think that's closer to what I meant

#

it's a bit like Zig works I guess, edit: though, maybe no

sturdy sequoia Apr 5, 2025, 11:10 AM

#

We could even keep the state of the VM as we go actually

#

and then resume with a prepared VM

left night Apr 5, 2025, 11:12 AM

#

as for a VM, I'm not sure how the instruction set of yours looked like

#

but I wonder whether higher-level instruction would be more the way to go than emitting lots of low-level ones

sturdy sequoia Apr 5, 2025, 11:13 AM

#

left night but I wonder whether higher-level instruction would be more the way to go than e...

Definitely

#

I think the low-level ones were too complex

left night Apr 5, 2025, 11:13 AM

#

I think the performance problems of the current eval is not even really the interpretation

sturdy sequoia Apr 5, 2025, 11:13 AM

#

it's traversing the tree

#

which is super inneficient

left night Apr 5, 2025, 11:13 AM

#

It's

the AST traversal (especially since it's a CST)
the lack of optimizations
the function call overhead

#

but it could even be okay to compile a while loop to a while loop instruction rather than some jump

sturdy sequoia Apr 5, 2025, 11:14 AM

#

I think the original VM (due to being low level) was too slow for making args too

sturdy sequoia Apr 5, 2025, 11:14 AM

#

left night but it could even be okay to compile a while loop to a while loop instruction ra...

Agreed

#

The old vm was too low-level and as a consequence too complex

left night Apr 5, 2025, 11:15 AM

#

especially since set & show can't be well expressed at a low level anyway

sturdy sequoia Apr 5, 2025, 11:15 AM

#

field access is also very messy in Typst 😦

#

Because it can be mutable or not

#

and it's hard to know

#

because of mutable methods

left night Apr 5, 2025, 11:16 AM

#

yeah

#

mutable methods are also a thorn in my side

#

wish we could get rid of them

sturdy sequoia Apr 5, 2025, 11:17 AM

#

If we had type information (which we can't realistically)

#

we could at least do it more easily

#

but in practice we don't and we can't have multiple mutable references because of rust

left night Apr 5, 2025, 11:18 AM

#

but in practice we know with 95% certainty statically whether it's mutable or not, just from the name

#

perhaps that is enough to make it fast

#

the 5% can be a slow path

sturdy sequoia Apr 5, 2025, 11:19 AM

#

true

#

I need to remind myself of how I had done it

#

https://github.com/Dherse/typst/blob/bytecode-vm2/crates/typst/src/lang/compiler/access.rs

GitHub

typst/crates/typst/src/lang/compiler/access.rs at bytecode-vm2 · D...

A new markup-based typesetting system that is powerful and easy to learn. - Dherse/typst

#

It's... overly complicated

#

One thing I could do in the VM is have one "mutable reference register" that uses some unsafe code to work, but has checks to make it safe

#

allowing to make accesses just chains of mutable in that register

left night Apr 5, 2025, 11:22 AM

#

what is the concrete problem you're trying to tackle right now?

sturdy sequoia Apr 5, 2025, 11:22 AM

#

left night what is the concrete problem you're trying to tackle right now?

mutable access

#

I mean not right now, I am just brainstorming

#

mutable access requires you to have a mutable reference to a value of the VM (self-referential mutable ref)

#

and then be able to "chain" those self-referential mutable ref (for nested access)

left night Apr 5, 2025, 11:25 AM

#

sturdy sequoia I mean not right now, I am just brainstorming

I meant right now as-in brainstorming

left night Apr 5, 2025, 11:25 AM

#

sturdy sequoia mutable access requires you to have a mutable reference to a value of the VM (se...

So you want &mut Value and &mut Vm passed to the same function and the value resides in the Vm?

sturdy sequoia Apr 5, 2025, 11:25 AM

#

yep

#

That was needed in the VM

left night Apr 5, 2025, 11:26 AM

#

ah okay

#

in current eval, it somehow works without that

sturdy sequoia Apr 5, 2025, 11:26 AM

#

yes which is nice

left night Apr 5, 2025, 11:26 AM

#

but I remember it being a bit tricky to deal with that

sturdy sequoia Apr 5, 2025, 11:26 AM

#

but the VM would break it down in several INSR which would kind of be needed

#

what I ended up doing was a single INSR for accesses but it was so slooooow

#

and so overly complicated

left night Apr 5, 2025, 11:28 AM

#

sturdy sequoia I think that if time is what we want to save then eval is the wrong place to loo...

I think honestly improving eval performance is less about making documents go faster now, and more about enabling package authors to build more complex things without worrying about making documents go slower.

#

Things like cetz, lilaq, or codly

#

Like, you shouldn't have to inline all the functions by hand just to make it go a bit faster

#

That's just frustrating. Especially for Rust developers that are so used to more or less zero cost abstractions.

sturdy sequoia Apr 5, 2025, 11:34 AM

#

left night Like, you shouldn't have to inline all the functions by hand just to make it go ...

True

#

But it's tricky to figure out how to improve things

#

clearly disabling memoization helps but it doesn't completely solve it

sturdy sequoia Apr 5, 2025, 1:09 PM

#

It's still running but I can already tell that the results for @atomic violet's raytracer are interesting 😂

#

@left night @feral imp There are less samples so the data is more noisy making it less useful 😦

#

I'll run it with more points per-score tonight

untold turret Apr 5, 2025, 1:29 PM

#

@sturdy sequoia For the convey program, you could annotate the exact functions filtered by the scoring algorithm at the critical points on figures. This sounds meaningful to me.

sturdy sequoia Apr 5, 2025, 2:30 PM

#

untold turret <@130737672951037952> For the convey program, you could annotate the exact funct...

I'll try that 😉

sturdy sequoia Apr 6, 2025, 10:16 AM

#

Benchmarking the raytracer is... interesting 😭

#

Also, it's an interesting doc because you can really see that all of the small function don't get cached which saves time & memory. I am not so surprised because when I profiled the raytraycer, most of the time was spent hashing arrays

#

At least this evening we should have the new plots, I'll also make some zooms in the 10-50 range for scores

sturdy sequoia Apr 6, 2025, 1:44 PM

#

@feral imp @left night @sly pecan Noise free data

#

Zoom from score = 0 to score = 50

#

To me it still shows that a score in the ~20 neighborhood is best in most cases, even masterproef

sly pecan Apr 6, 2025, 1:48 PM

#

It's free real estate

sturdy sequoia Apr 6, 2025, 1:48 PM

#

Better scaling on zoomed plots 😉

#

masterproef Δt   = +13.94%
masterproef Δmem = -3.97%  (-58.06 MiB)
---------------------------------------------
notes.typ   Δt   = +8.80%
notes.typ   Δmem = -55.02% (-1099.09 MiB)
---------------------------------------------
slides      Δt   = +2.32%
slides      Δmem = -11.73% (-94.02 MiB)
---------------------------------------------
raytracer   Δt   = +7.30%
raytracer   Δmem = -30.78% (-850.86 MiB)
---------------------------------------------
conway      Δt   = -38.07%
conway      Δmem = -62.22% (-273.55 MiB)

#

With a proposed score of 20, this leads to those change in execution time (+ means it increased, - means it decreased)

#

I need to make a version of codly with the small functions again and test masterproef with that version of codly

#

I am pretty sure it would show a massive difference

#

I also find it fascinating that the conway impl has so many small function the compile time actually decreases

#

@untold turret my slides use touying but I am surprised by the increase compile time rather than decrease :/

sturdy sequoia Apr 8, 2025, 11:05 AM

#

So any comments @left night? 😄

sturdy sequoia Apr 8, 2025, 11:21 AM

#

I've also added a feature (which likely won't be in the PR -- if we get there -- or maybe?) that produces tha score for each closure when --timings are enabled

#

And looking at it, I have a request (which might not get implemented) for Laurenz: we need a matrix API (at least 2D matrices, not generic tensors) because most methods in cetz in matrix.typ have huge scores but I think they would be orders of magnitude faster if they were native

#

and I think that there is an argument to be made that for Typst, a typographical tool, to have a matrix & basic transform API built-in for drawing libraries

#

For example:

{
  "span": "@preview/cetz:0.3.4/src/matrix.typ:231",
  "name": "mul-mat",
  "cost": 25288
},
{
  "span": "@preview/cetz:0.3.4/src/matrix.typ:281",
  "name": "mul-vec",
  "cost": 1437
},
{
  "span": "@preview/cetz:0.3.4/src/matrix.typ:298",
  "name": "inverse",
  "cost": 264652
},

#

remember that each AST node is worth 1 "cost", and a function call is worth 1 unless it's an array function in which case it's worth 10, loops are worth 10 too

#

So a score of 264k means this is a chonker

#

(nested loop mean that for each level it's squared, so a loop in a loop has a cost of 100, a loop in a loop in a loop a score of 1000)

sturdy sequoia Apr 8, 2025, 11:43 AM

#

Or if we want to have this being possible to make with plugins, I think an opaque API that allows plugin to have managed memory (i.e a Value that points to a value inside of the plugin) that way we can avoid the back-and-forth copying of values

#

That would actually be kind of awesome

#

With a simple API for freeing them (an opaque could have a reference counter) and then a simple free(type_id, opaque_addr) where type Opaque = (RefCounter, type_id, opaque_addr) where the type_id is also opaque and defined by the plugin, and the opaque_addr is just a 32-bit address in the plugin's memory

#

Technically this breaks purity, but if we assume the plugin is pure, then it should work

#

(BTW this could be implemented already, but would lack deallocation since there is no way of freeing a value)

#

That would make a numtyp to be possible to build in Typst using a plugin for better performance

left night Apr 8, 2025, 12:02 PM

#

sturdy sequoia So any comments <@311948531835469827>? 😄

It looks promising! Picking exactly 20 might be overfitting to raytracer and conway a bit, but it's definitely a good start.

left night Apr 8, 2025, 12:03 PM

#

sturdy sequoia Technically this breaks purity, but if we assume the plugin is pure, then it sho...

I think the purity constraints might cause too many problems here, also with plugin multithreading and such.

sturdy sequoia Apr 8, 2025, 12:04 PM

#

left night I think the purity constraints might cause too many problems here, also with plu...

I hadn't considered the multithreadedness, but I think that making it required that they're immutable would be okay, no? Worst case scenarios, plugins will break...

#

(the plugins doing things wrong *)

left night Apr 8, 2025, 12:06 PM

#

sturdy sequoia I hadn't considered the multithreadedness, but I think that making it required t...

Well that would be a bad worst case :p

left night Apr 8, 2025, 12:06 PM

#

sturdy sequoia (the plugins doing things wrong *)

okay haha

sturdy sequoia Apr 8, 2025, 12:07 PM

#

left night okay haha

Yeah I saw "laurmaedje typing..." and realized I was missing a key point

#

lol

left night Apr 8, 2025, 12:07 PM

#

I'm not sure right now tbh. Maybe it could work.

#

The thing is that free is sort of inherently a mutable operation

sturdy sequoia Apr 8, 2025, 12:07 PM

#

I like the ability of building numtyp with actual performance

sturdy sequoia Apr 8, 2025, 12:07 PM

#

left night The thing is that `free` is sort of inherently a mutable operation

yes 😦

#

If we couldn't free things (and we assume immutability) it would kind of work

#

but we also need to be able to free memory lol

left night Apr 8, 2025, 12:09 PM

#

It's a bit of a question of whether it's more worthwhile to try and reduce plugin overhead to the point where many small calls are cheaper or whether it's more worthwhile to optimize performance of numeric Typst code.

#

With enough dev time, I think Typst could do a ton of optimistic optimizations on numeric code

sturdy sequoia Apr 8, 2025, 12:10 PM

#

I think that having a vector + matrix API would already do most of the heavy lifting

sturdy sequoia Apr 8, 2025, 12:10 PM

#

left night With enough dev time, I think Typst could do a ton of optimistic optimizations o...

Probably yes

sturdy sequoia Apr 8, 2025, 12:20 PM

#

left night With enough dev time, I think Typst could do a ton of optimistic optimizations o...

To be fair, that's also an argument for compiling to bytecode, we can do far more optimizations than in an AST-based interpreter AFAIK

left night Apr 8, 2025, 12:45 PM

#

sturdy sequoia To be fair, that's also an argument for compiling to bytecode, we *can* do far m...

Yes, though I think it doesn't have to be bytecode per se

#

We can be creative with whatever kind of data structure we think serves our purposes best

#

There might be some different trade-offs in Typst than in your average programming language

#

(Or maybe I just like to reinvent the wheel 😂 )

lament fulcrum Apr 8, 2025, 12:47 PM

#

Tbf the concept of typesetting languages and their optimizations is relatively unexplored so far I'd say lol

#

Since it doesn't quite fit in the typical compiler model

left night Apr 8, 2025, 12:47 PM

#

Yes, definitely. And even among normal PLs, an imperative, pure language with value semantics is one of the rarer things.

sturdy sequoia Apr 8, 2025, 1:18 PM

#

left night Yes, though I think it doesn't have to be bytecode per se

yes bytecode is a bad term

#

I agree

#

just some "lower-level and linear" representation

#

linear = not a tree, just a big array of ops

#

I wonder if by making the structure higher-level (and therefore removing the issue of slow compilation) we can see a benefit

#

because eval was ~4x faster but compilation often made it impractical

#

but if we could have partial compilation (i.e just like partial re-parsing -- forgot the proper name) & higher-level struct that lead to smaller compilation time we might be able to benefit more from it

#

incremental parsing? therefore incremental compilation?

#

could be a cool fun area to explore

left night Apr 8, 2025, 1:55 PM

#

agreed

sturdy sequoia Apr 8, 2025, 2:21 PM

#

Also, by not doing a register-based VM but a 'slot' (i.e local vars) based system, identifying the last use of a 'slot' (or even an unused slot) should be trivial by doing a first pass over all of the ops, and replacing the last one with a "move" version of the operation

#

Therefore doing value moving

#

We can even make it easier by in the compiler having a list of 'slots' with the 'last use' stored in it, then we just replace all of the 'last use' and if it's none then because no side effects -> we yeet the entire slot

#

then once compiled we don't carry that information

#

Also by not being register based it should be much easier to debug since each var points to one 'slot'

#

and temporary values can use registers still or just 'short lived' slots that are move-only

#

Because in the original VM I allowed reference to registers because variables could be in registers

#

but if it's only ever intermediary values, then they're always moved never borrowed

#

Cool optimization of separating them!

#

So we can probably save quite a few clones!

#

Ugh, I am working on this after work 😂

#

VM 3? 🚀

#

VM1
VM2: Electric Boogaloo
~~VM3: Photonic Boogaloo?~~

sly pecan Apr 8, 2025, 2:27 PM

#

We've been through this before

#

VM3D

sturdy sequoia Apr 8, 2025, 2:27 PM

#

sly pecan VM3D

HOW COULD I FORGET

#

VM3D

untold turret Apr 8, 2025, 3:23 PM

#

I guess we could continue making some hacking on compiler to figure out the bottleneck of packages. we might gain a lot without needing to compile all functions but a small number of ones from existing packages. cetz was greatly optimized while touying didn't, reflecting thar cetz should have different workload from touying. I also remember the performance of polylux (not the touying) was greatly affected by some internal parameter, which could be a good start to learn which case is creating huge state (and bloating the size of the cache). https://github.com/Myriad-Dreamin/tinymist/issues/513#issuecomment-2322834251

GitHub

LSP Server doesn't play well with polylux with regard to memory usa...

Describe the bug This is a continuation of #419 I'm attaching a document that, when previewing, quickly scales tinymist memory usage from 500MB up to, sometimes, 5GB. The issue is very easy to ...

lost meteor Apr 8, 2025, 3:40 PM

#

left night Yes, definitely. And even among normal PLs, an imperative, pure language with va...

I've felt like we're much closer to Haskell than we have any right to be, so maybe worth studying some GHC internals

#

In particular, lazy evaluation might be worth investigating

left night Apr 8, 2025, 3:51 PM

#

lost meteor In particular, lazy evaluation might be worth investigating

That might affect semantics though, doesn't it?

#

Depending on what becomes lazy

lost meteor Apr 8, 2025, 4:05 PM

#

Not in a pure language with our semantics for state objects. Our content is exactly like Haskell IO

#

I think it will have semantic differences for performance, but not for output

left night Apr 8, 2025, 4:06 PM

#

lost meteor I think it will have semantic differences for performance, but not for output

I meant semantic difference in terms of what errors are raised

#

If the code doesn't run, it can't error

lost meteor Apr 8, 2025, 4:07 PM

#

left night I meant semantic difference in terms of what errors are raised

Ah, maybe

#

That is the one non-pure effect that we have

left night Apr 8, 2025, 4:09 PM

#

It's not necessarily a side effect. You can model it solely with return values after all. But it still affects lazyness.

#

Though I guess you can model everything in a pure way if you really want to

untold turret Apr 8, 2025, 4:10 PM

#

the compiler could change semantics on raising error if necessary I think. you can ensure nothing but only raise ones during computing.

untold turret Apr 8, 2025, 4:22 PM

#

lost meteor In particular, lazy evaluation might be worth investigating

A problem is probably that being closer to fp doesn't quite mean get better performance. there has been some laziness in typst. i'm interested if there is some hopeful abstract ion (to motivate investigation) could help typst be more lazy and get benefit.

sturdy sequoia Apr 8, 2025, 5:24 PM

#

Ok, so I just started work on VM3D, and I managed to find a new way of writing the compiler that no longer requires recursive iteration to find nesting, joining, etc. which already significantly simplifies the compiler

sturdy sequoia Apr 8, 2025, 5:59 PM

#

@left night Wanted to quickly run this design by you since it involves one unsafe line:

Essentially for mutable access, I am solving it by "locking" the VM (i.e preventing reads & writes to slots = local vars)
Then I recursively descend the access (does not involve running any instructions but Call instructions which don't need to read slots, only the stack)
This allows me to alias a pointer on the slots (local vars) safely

/// Locks the VM to get mutable access to one of its slots.
/// 
/// During the lock, no other access whether mutable or immutable can be done
/// to any other slots.
pub fn lock<F>(
    &mut self,
    slot: usize,
    method: impl FnOnce(&mut Self, &mut Value)
) where F: FnOnce(&mut Self, &mut Value) {
    assert!(!self.lock.get(), "trying to lock a locked VM");
    self.lock.set(true);

    // Safety:
    // - we have asserted that only one mutable reference to the VM exists (&mut self)
    // - we have asserted that the VM was not locked before (!self.lock.get())
    // - we know the reference cannot escape the closure
    // - we know there will be no other aliases (locked)
    // - we have exclusive lock on the VM `self.lock.set(true)`
    let slot = unsafe {
        let ptr = &mut self.slots[slot] as *mut Value;
        &mut *ptr
    };

    method(self, slot);

    self.lock.set(false);
}

#

(this is a method implemented on the new Vm)

proud sandal Apr 8, 2025, 6:13 PM

#

You should note that data can escape the closure if you add some lifetime bounds later, this should be avoided by whomever refactors this in the future.

#

specifically if you add a lifetime 'f : 'v with 'F: 'f and &'v mut Value you can extract the reference.

left night Apr 8, 2025, 6:16 PM

#

sturdy sequoia <@311948531835469827> Wanted to quickly run this design by you since it involves...

essentially you've implemented a mutex now, right?

#

can't you just put the slots into a mutex or refcell?

#

and under normal circumstances you call RefCell::get_mut to skip the overhead

sturdy sequoia Apr 8, 2025, 6:17 PM

#

left night essentially you've implemented a mutex now, right?

yes, but it needs to be manually implemented because I don't want the cost of using it elsewhere

sturdy sequoia Apr 8, 2025, 6:17 PM

#

proud sandal You should note that data can escape the closure if you add some lifetime bounds...

I'll add manual lifetimes bounded to the &mut self lifetime

#

Ah, I see what you mean @left night !

#

but no, right? because then my aliasing mutable is on the whole Vm no?

proud sandal Apr 8, 2025, 6:18 PM

#

I think as is is fine in terms of bounds, but adding more later for whatever reason may invalidate the safety requirements of the unsafe block

#

subtly

left night Apr 8, 2025, 6:19 PM

#

sturdy sequoia but no, right? because then my aliasing mutable is on the whole `Vm` no?

if vm.lock takes &mut self how can it alias anyway? through a nested call within method?

#

I'm not sure whether the code is safe to be honest. since it created two aliasing mutable references. even if they aren't used, isn't that UB by itself?

proud sandal Apr 8, 2025, 6:21 PM

#

where does it create two mutable aliases?

sturdy sequoia Apr 8, 2025, 6:21 PM

#

Actually found a nicer, safer way:

/// Locks the VM to get mutable access to one of its slots.
///
/// During the lock, no other access whether mutable or immutable can be done
/// to any other slots.
pub fn lock<F>(&mut self, slot: usize, method: impl FnOnce(&mut Self, &mut Value))
where
    F: FnOnce(&mut Self, &'_ mut Value),
{
    let Some(mut slots) = self.slots.take() else {
        panic!("trying to lock a locked VM");
    };

    method(self, &mut slots[slot]);

    self.slots = Some(slots);
}

#

It's also zero-alloc

left night Apr 8, 2025, 6:22 PM

#

proud sandal where does it create two mutable aliases?

okay, maybe it doesn't actually create two to the exact same data

#

only one that is a descendant of another

sturdy sequoia Apr 8, 2025, 6:22 PM

#

and no aliasing can ever occur here

left night Apr 8, 2025, 6:22 PM

#

sturdy sequoia Actually found a nicer, safer way: ```rs /// Locks the VM to get mutable access ...

yeah, if slots are cheap to move, that seems like a quite reasonable solution!

sturdy sequoia Apr 8, 2025, 6:22 PM

#

left night yeah, if slots are cheap to move, that seems like a quite reasonable solution!

it's just a Vec<Value>

proud sandal Apr 8, 2025, 6:22 PM

#

fully safe

#

lovely

sturdy sequoia Apr 8, 2025, 6:23 PM

#

It prevents me from making it a SmallVec<[Value; x]> but in my testing on the old VM it didn't provide any major advantages 🤷‍♂️

#

(for registers back then)

left night Apr 8, 2025, 6:23 PM

#

was slots in an option before?

sturdy sequoia Apr 8, 2025, 6:23 PM

#

left night was slots in an option before?

no

left night Apr 8, 2025, 6:23 PM

#

can't you just replace it by an empty vec temporarily?

sturdy sequoia Apr 8, 2025, 6:23 PM

#

yes but then I can't check if it's locked or not

#

(it will then panic due to oob accesses)

left night Apr 8, 2025, 6:23 PM

#

could have an extra bool

#

just to skip the option overhead in other places

sturdy sequoia Apr 8, 2025, 6:24 PM

#

true

#

but the bool is the same AFAIK

left night Apr 8, 2025, 6:24 PM

#

depends on whether you check it

#

very much depends on how the rest of the VM works I guess

sturdy sequoia Apr 8, 2025, 6:24 PM

#

Actually the size_of::<Option<Vec<Value>>>() and size_of::<Vec<Value>>() is the same

#

so there's already optimization of repr

left night Apr 8, 2025, 6:25 PM

#

the size is the same, but if you call unwrap in other places where you are sure it's there, that'll be extra code

sturdy sequoia Apr 8, 2025, 6:25 PM

#

but it's true that currently there's the is_some check and the bounds check (panicking) with your technique you'd just have the bounds check

sturdy sequoia Apr 8, 2025, 6:25 PM

#

left night the size is the same, but if you call unwrap in other places where you are sure ...

true

#

I'll use the bounds check technique

#

but not bool

#

it can just panic since it should never happen either way

#

(that's literally why there's a compiler)

#

/// Locks the VM to get mutable access to one of its slots.
///
/// During the lock, no other access whether mutable or immutable can be done
/// to any other slots.
pub fn lock<F>(&mut self, slot: usize, method: impl FnOnce(&mut Self, &mut Value))
where
    F: FnOnce(&mut Self, &'_ mut Value),
{
    let mut slots = std::mem::take(&mut self.slots);

    method(self, &mut slots[slot]);

    self.slots = slots;
}

#

I'll add a boolean but only in debug mode

#

(to have a debug_assert nicely in there)

proud sandal Apr 8, 2025, 6:27 PM

#

sturdy sequoia Actually found a nicer, safer way: ```rs /// Locks the VM to get mutable access ...

does '_ always bind to the lowest common declaration? i.e. to the closure bound in this case?

sturdy sequoia Apr 8, 2025, 6:28 PM

#

proud sandal does `'_` always bind to the lowest common declaration? i.e. to the closure boun...

I think so, but I'll remove it, it's not needed anymore

proud sandal Apr 8, 2025, 6:28 PM

#

because if you introduce an explicit bound on lock and use it for the closure it can escape, that's preculiar

sturdy sequoia Apr 8, 2025, 6:28 PM

#

you may also notice that I have the wrong generic param definition lol

#

/// Locks the VM to get mutable access to one of its slots.
///
/// During the lock, no other access whether mutable or immutable can be done
/// to any other slots.
pub fn lock(&mut self, slot: usize, method: impl FnOnce(&mut Self, &mut Value)) {
    #[cfg(debug_assertions)]
    assert!(!self.lock, "attempting to lock a locked VM");

    let mut slots = std::mem::take(&mut self.slots);

    method(self, &mut slots[slot]);

    self.slots = slots;
}

#

Complete with the debug check

proud sandal Apr 8, 2025, 6:29 PM

#

i have chosen to ignore that

sturdy sequoia Apr 8, 2025, 6:29 PM

#

(the #[cfg(debug_assertions)] is needed due to self.lock only being declared in debug mode)

left night Apr 8, 2025, 6:29 PM

#

needs to set the lock as well :p

sturdy sequoia Apr 8, 2025, 6:29 PM

#

left night needs to set the lock as well :p

thonk2

proud sandal Apr 8, 2025, 6:30 PM

#

well on the new version it correctly catches the alias problem, i love rust

sturdy sequoia Apr 8, 2025, 6:30 PM

#

/// Locks the VM to get mutable access to one of its slots.
///
/// During the lock, no other access whether mutable or immutable can be done
/// to any other slots.
pub fn lock(&mut self, slot: usize, method: impl FnOnce(&mut Self, &mut Value)) {
    #[cfg(debug_assertions)]
    {
        assert!(!self.lock, "attempting to lock a locked VM");
        self.lock = true;
    }

    let mut slots = std::mem::take(&mut self.slots);

    method(self, &mut slots[slot]);

    self.slots = slots;

    #[cfg(debug_assertions)]
    {
        self.lock = false;
    }
}

sturdy sequoia Apr 8, 2025, 6:30 PM

#

proud sandal well on the new version it correctly catches the alias problem, i love rust

Rust is love, rust is life

proud sandal Apr 8, 2025, 6:31 PM

#

The move makes a valid escaping reference impossible to express in the type system

sturdy sequoia Apr 8, 2025, 6:32 PM

#

The nice way with the new access method I built (using this weird shennanigan) is that it should be heaps faster than the old one and is much easier and shorter code wise

sturdy sequoia Apr 8, 2025, 9:56 PM

#

pub struct MutAccess {
    /// The head of a potentially mutable access.
    ///
    /// If mutability is an option, the head is *always* a slot.
    /// Otherwise this can be resolved with `RefAccess`.
    pub head: usize,

    pub segments: SmallVec<[AccessSegment; 2]>,
    pub span: Span,
}

pub enum AccessSegment {
    Field { name: EcoString, span: Span },
    Call { method: EcoString, span: Span },
}

impl MutAccess {
    fn eval<O>(
        &self,
        vm: &mut Vm,
        ops: impl FnOnce(&mut Value) -> SourceResult<O>,
    ) -> SourceResult<O> {
        vm.lock(self.head, |vm, value| {
            recursive_access(vm, value, self.segments.iter(), ops)
        })
    }
}

fn recursive_access<'a, O>(
    vm: &mut Vm,
    value: &mut Value,
    mut segments: impl Iterator<Item = &'a AccessSegment>,
    ops: impl FnOnce(&mut Value) -> SourceResult<O>,
) -> SourceResult<O> {
    let Some(next) = segments.next() else {
        return ops(value);
    };

    match next {
        AccessSegment::Field { name, span } => {
            let dict = access_dict(value).at(*span)?;

            recursive_access(vm, dict.at_mut(name).at(*span)?, segments, ops)
        }
        AccessSegment::Call { method, span } => {
            let Value::Args(args) = vm.pop().at(*span)? else {
                bail!(
                    *span,
                    "expected arguments on stack";
                    hint: "this is a compiler bug"
                );
            };

            let next = call_method_access(value, method, args, *span)?;
            recursive_access(vm, next, segments, ops)
        }
    }
}

fn call_method_access<'a>(
    value: &'a mut Value,
    method: &str,
    mut args: Args,
    span: Span,
) -> SourceResult<&'a mut Value> {
    ... // Same as before
}

fn access_dict(parent: &mut Value) -> HintedStrResult<&mut Dict> {
    ... // Same as before
}

/// The missing method error message.
#[cold]
fn missing_method(ty: Type, method: &str) -> String {
    format!("type {ty} has no method `{method}`")
}

#

This is the entire logic for making accesses happen with this technique 🎉

#

Most of which is actually the old logic too

#

which makes me very happy

#

Will make access segments use PicoStr eventually tho

#

It's nice because temp value accesses can be detected at compile time making the logic a bit simpler

#

the only case I haven't dealt with is the "basic" assignment

#

but I got to go to work early tomorrow 😦

sturdy sequoia Apr 11, 2025, 6:10 PM

#

Things are moving along nicely, the new compiler+vm (VM3D) will most likely be much smaller (less than half) of the old compiler+vm (VM2: Electric Boogaloo). It is less of a compiler+vm and more of a "flattening" of the tree into a nice list of operations (although it already contains tons of optimizations)

#

(that being said I have not done loops yet lol)

sturdy sequoia Apr 11, 2025, 6:28 PM

#

I also think it's much easier to understand

untold turret Apr 11, 2025, 6:34 PM

#

is there a public branch already?

#

@dherse loops don't matter a lot for some documents. I wonder some micro metrics of compiling and running some functions of cetz or some other packages.

#

I did run a benchmark of fibnacci(20) of same impl but with typst or rust. rust function took 40us and typst took 800us. I think we could also get a number between them on vm3 for it quickly.

sly pecan Apr 11, 2025, 6:46 PM

#

that's less overhead than I was expecting

untold turret Apr 11, 2025, 6:51 PM

#

you know fibnacci can be memorized easily. I was trying to make a big news to say typst is faster than rust on some case, but typst lost :(. vm3 looks promising.

sturdy sequoia Apr 11, 2025, 7:12 PM

#

untold turret is there a public branch already?

not yet

sturdy sequoia Apr 11, 2025, 7:12 PM

#

untold turret I did run a benchmark of fibnacci(20) of same impl but with typst or rust. rust ...

I agree, but loops are actually rather simple at the end of the day 😄

#

The thing is that I am going expr by expr implement the Compile trait and necessary Instruction trait since I am also planning on experimenting with different repr for instructions (a big enum, a big Vec<Box<dyn Instruction>>, a Vec<&syn Instruction> (which should be equivalent) and a twisted Vec<Fn(&mut Vm) -> SourceResult<()>>)

sturdy sequoia Apr 11, 2025, 7:15 PM

#

untold turret you know fibnacci can be memorized easily. I was trying to make a big news to sa...

VM3D is very early days, it's just so much simpler, I am hoping that its "linear" structure will give us performance

sturdy sequoia Apr 21, 2025, 10:06 AM

#

You know what's great about being on VM3D? I can just reuse pieces of VM 2: Electric Boogaloo

pearl sedge Apr 22, 2025, 8:23 AM

#

I have recently been confronted with huge compilation time for par.lines (1s grows to 21s) in part due to layout divergence in some edge cases, which requires re-doing everything layout-wise, or something close to this effect.

Debugging further by looking at the timings, it seems like most of the latency when the layout has to be re-done comes from blocks (equations, etc.) where for some reason the par.line loop (realize-collect, or what I assume to be par lines) has significantly higher runtime than usual.

In a normal compilation, the elapsed time between realize-collect iterations is more or less <1ms but on longer compilation, it can take up to 5ms for the next it to start, which increases the overall compilation time by a huge amount. Anyway, I disabled par line numbering for unneeded blocks such as images/tables/equations, and that improves the runtime. I attach a "normal" timing and a "longer" timing zoomed in. I can also share the timings if needed!

In any case, is there any issue that tracks the above-described issue, or something related?

sturdy sequoia Apr 22, 2025, 9:44 AM

#

pearl sedge I have recently been confronted with huge compilation time for par.lines (1s gro...

Interestingly, I have never actually looked at the "cost" of line numbering, if you disable them, does the timing go down? (i.e is the increased caused by line numbering) Or is it something like a show rule making a mess?

pearl sedge Apr 22, 2025, 9:48 AM

#

sturdy sequoia Interestingly, I have never actually looked at the "cost" of line numbering, if ...

without any line numbering, the compilation is blazing fast. I don't think it's a show rule in particular that's having this effect?

sturdy sequoia Apr 22, 2025, 10:03 AM

#

pearl sedge without any line numbering, the compilation is blazing fast. I don't think it's ...

Ok, I never looked into how line numbers work tbh
Could @glad urchin chime in (I think you did the line numbers, right?)

pearl sedge Apr 22, 2025, 10:54 AM

#

to add to that, without any line numbering you get a "normal" timing, where every step is executed without any delay

#

For some reason line numbering resolves into a very sparse execution in between each block

sturdy sequoia Apr 22, 2025, 11:54 AM

#

pearl sedge For some reason line numbering resolves into a very sparse execution in between ...

no it's likely that some function is not annotated with the macro so it looks like a gap but typst is busy, just not shown

pearl sedge Apr 22, 2025, 12:40 PM

#

T_T is it time to compile typst with new macros 🙃 ?

sturdy sequoia Apr 22, 2025, 1:00 PM

#

probably yeah :/

#

A nice thing I changed from VM2 to VM3D regarding function call is dedicated instructions for method calls, potentially mutating method calls, and function calls, this should generally lead to a nice speedup I expect, it also greatly simplifies the complexity of the old function calls at the cost of midly more code overall (since each type of call can lead to the "common" type of a simple function call)

glad urchin Apr 22, 2025, 1:10 PM

#

sturdy sequoia Ok, I never looked into how line numbers work tbh Could <@180813971853410305> ch...

Well, it could be looked into, but in principle it does use introspection

#

And it attaches to every single line of everything

#

So if you have lots of lines that's lots of stuff it's doing

#

Still, 21s is a lot, but there's too little info here

#

If you could find a MRE, that'd be lovely

#

(cc @pearl sedge )

sturdy sequoia Apr 22, 2025, 1:15 PM

#

Ok: function calls, method calls, destructuring, and a bunch of other stuff are in, still missing a lot but getting there!

#

But the "big" things are done and it's about 10% bigger than typst-eval for now, it's still missing show and set rules as well as a few other things, but I am confident it should be no more than 50% bigger than typst-eval (compared to VM2 which was ~4x bigger iirc)

#

So still an increase but most of the space is actually taken up by the fact that every single instructions need a struct definition which makes it very big, but the actual code is about the same size

#

This is more of a "flattening" than a straight up Bytecode VMs like the other ones

#

@left night How comfortable are you with this increase in overall code size (note that I am not comparing to typst as a whole, just typst-eval crate)

#

||I am greatly enjoying being on holiday from my day job just to work on typst lol||

#

Still missing is: joining, control flow, content blocks, set/show rules, some types of patterns, some types of accesses, all of math, closures, let binding, anything module related, most control flow

#

But most of these are deceptively simple since they reuse code that's already there (all control flow rely on the VM2 code which I have already ported over), math is simple just repetitive, let bindings rely on patterns which are there, etc.

#

So it's coming along nicely

left night Apr 22, 2025, 2:01 PM

#

sturdy sequoia But the "big" things are done and it's about 10% bigger than `typst-eval` for no...

10% more is not a lot!

#

50% is of course already a bit, but typst-eval is only 2872 lines right now, so that's manageable overall

sturdy sequoia Apr 22, 2025, 2:05 PM

#

left night 50% is of course already a bit, but typst-eval is only 2872 lines right now, so ...

Nice!

#

At least it's not completely useless 🎉

#

It has potential

#

(doesn't mean it'll be merged but good to know it's not just wasted effort)

#

I might make a macro for instructions to make their decl a bit smaller because like... a lot of the code is just the impl for the new functions lol

left night Apr 22, 2025, 2:08 PM

#

sturdy sequoia (doesn't mean it'll be merged but good to know it's not just wasted effort)

I think it is a bit inevitable that Typst's evaluation system will be replaced by something faster at some point. It might be VM3 or VM3 might inform future efforts. I think it's not wasted either way. :)

sturdy sequoia Apr 22, 2025, 2:09 PM

#

left night I think it is a bit inevitable that Typst's evaluation system will be replaced b...

I agree but I think the main problem with VM1 & VM2 were that they were true bytecode VMs, this one won't be, it's just flattening with some cleverness baked in some places

#

I am also massively simplifying a bunch of things like argument handling, etc.

#

Also, it opens the possibility for interning some strings still, namely argument names can be moved to PicoStr

pearl sedge Apr 22, 2025, 2:10 PM

#

glad urchin If you could find a MRE, that'd be lovely

That's the hardest part, empirically that 21s edge case for a 30page document 50eq/50text that only occurs on layout divergence
I can try to whip up something close to my document later today but I'm unsure whether that'd reproduce the issue. We'll see!

sturdy sequoia Apr 22, 2025, 2:10 PM

#

Because doing that doens't make sense on main because you need to intern right before comparing which doesn't gain much performance

sturdy sequoia Apr 22, 2025, 2:10 PM

#

pearl sedge That's the hardest part, empirically that 21s edge case for a 30page document 50...

lorem function is your best friend

#

there's also this tool to mangle your document such that it's not readable

#

like it changes the words by words of equivalent length but different meaning

pearl sedge Apr 22, 2025, 2:11 PM

#

I tried that one but equations are not supported yet! Time to whip out my best math BS generator

sturdy sequoia Apr 22, 2025, 10:26 PM

#

First piece of content rendered using the VM3D 🎉

sly pecan Apr 22, 2025, 10:43 PM

#

sturdy sequoia First piece of content rendered using the VM3D 🎉

What is missing to be able to compile more complicated documents?

sturdy sequoia Apr 22, 2025, 10:44 PM

#

sly pecan What is missing to be able to compile more complicated documents?

Show rules, set rules, most control flow isn't there, lots of small things like that

#

Function calls work, methods etc. too

#

Just a bunch of small things

#

Thankfully with the VM2 I was able to grab the (very) optimized Joiner for handling joining & styles, lots of the control flow logic, etc. I also gave some new capabilities for handling control flow which should be nice

#

also as a side effects, we should get show and set rules as values "for free" although they are currently explicitly forbidden

#

but because the Joiner can join Styles with Content you can technically just do it

#

And looks like I was right on my ~50% increase estimate, I am not fully done but I have all of the big things now and it's about 50% bigger, but I will do a cleanup pass later too

#

Oh and math is just... not implemented

#

because I just don't care atm, it's always (in every VM so far) the last thing I implement

#

because it's not needed for testing performance & most docs (since I can just render math as an empty content in the meantime)

sly pecan Apr 22, 2025, 10:54 PM

#

That makes sense

sturdy sequoia Apr 22, 2025, 10:55 PM

#

sly pecan That makes sense

agreed!

#

(obviously)

glad urchin Apr 22, 2025, 11:18 PM

#

sturdy sequoia because I just don't care atm, it's always (in every VM so far) the last thing I...

(in every VM so far)
seems like this is a recurring thing now 😂

#

maybe if we iterate enough times with a procedurally generated vm we'll eventually converge into the perfect vm

sturdy sequoia Apr 22, 2025, 11:19 PM

#

Lots of things are starting to work!

sturdy sequoia Apr 22, 2025, 11:19 PM

#

glad urchin maybe if we iterate enough times with a procedurally generated vm we'll eventual...

VM3D you mean hyperthink

glad urchin Apr 22, 2025, 11:19 PM

#

not perfect.

sturdy sequoia Apr 22, 2025, 11:20 PM

#

We needed 2 content reworks, and now 3 VMs

#

what will take four iterations? thonk

#

But it's getting close to 1:30 AM, time for sleep catired

glad urchin Apr 22, 2025, 11:21 PM

#

it only stops when the document takes 1 cpu instruction to compile and has zero bugs

sturdy sequoia Apr 22, 2025, 11:21 PM

#

glad urchin it only stops when the document takes 1 cpu instruction to compile and has zero ...

that's called a memcpy 😂

glad urchin Apr 22, 2025, 11:21 PM

#

then we've solved it

sturdy sequoia Apr 22, 2025, 11:21 PM

#

(and that isn't even 1 op code 😦 )

glad urchin Apr 22, 2025, 11:21 PM

#

turns out we just need to predict the output of every document and embed it into the binary

#

not sure how i didnt think of that before

sturdy sequoia Apr 22, 2025, 11:22 PM

#

E = mc2 + AI

#

The solution is in this magical equation

glad urchin Apr 22, 2025, 11:22 PM

#

this is truly inspiring

#

thank you

sturdy sequoia Apr 22, 2025, 11:22 PM

#

for people that don't understand this ref

SPOILER_finally-einsteins-equation-has-been-improved-v0-akm7y8c0u93b1.png

#

I would also like to say that the VM is not, in fact, vibe coded

glad urchin Apr 22, 2025, 11:24 PM

#

sturdy sequoia I would also like to say that the VM is not, in fact, vibe coded

that's a relief

sturdy sequoia Apr 23, 2025, 1:20 PM

#

Took a bit of a break from work (since I didn't take a lunch break) and fixed joining 🎉

#

It's still missing set and show rules, math, contextual, but otherwise it's all in there I think

sturdy sequoia Apr 23, 2025, 2:04 PM

#

BTW, for those of you that followed up, VM1 & VM2 were both register based VMs, but VM3 is stack based which overall is quite a bit simpler

#

With dedicated "slots" for arguments & variables

sly pecan Apr 23, 2025, 2:11 PM

#

sturdy sequoia It's still missing set and show rules, math, contextual, but otherwise it's all ...

Damn good at hello world though!

#Performance