Performance | Typst | Page 2

left night Dec 8, 2023, 5:58 PM

#

I can imagine very well

#

Pretty similar to pitching a startup to investors :)

#

I really think it is. And it's totally awesome that it's almost trivial due to purity.

#

Try doing that with LaTeX!

sturdy sequoia Dec 8, 2023, 6:05 PM

#

And rust really does make that a breeze imo

#

just the Deferred abstraction being this easy kinda blows my mind

#

fearless concurrency for sure!

#

@left night I can already tell you: deferred compression makes a big difference

#

I'm not done testing but it looks very promising

sturdy sequoia Dec 8, 2023, 6:49 PM

#

@left night do you know what might cause comemo to randomly miss caches that should hit? 🤔

left night Dec 8, 2023, 6:50 PM

#

sturdy sequoia <@311948531835469827> do you know what might cause comemo to randomly miss cache...

in which code path?

sturdy sequoia Dec 8, 2023, 6:50 PM

#

left night in which code path?

in comemo::memoized with tracked values specifically

left night Dec 8, 2023, 6:50 PM

#

no I mean in which memoized function specifically

#

or do you mean everywhere?

sturdy sequoia Dec 8, 2023, 6:51 PM

#

#

in comemo's tests for example

left night Dec 8, 2023, 6:51 PM

#

and this happens just with your changes you mean? or do you mean that you found a bug in comemo?

sturdy sequoia Dec 8, 2023, 6:52 PM

#

left night and this happens just with your changes you mean? or do you mean that you found ...

with my changes afaik

left night Dec 8, 2023, 6:52 PM

#

then, it's really hard to say

#

anything could have that effect

#

wrong constraints for instance

sturdy sequoia Dec 8, 2023, 7:05 PM

#

@left night pretty sure I know why, I'm using a hashmap instead of an IndexMap so ordering is random 🤦‍♂️

#

Well no

#

goddamn

sturdy sequoia Dec 8, 2023, 7:25 PM

#

Ok, I determined that it's an issue with the accelerator 💀

#

No, it just happens less without the acceleraotr

#

GODDAMN BUG

#

HEISENBUGS ARE THE WORSE

glad urchin Dec 8, 2023, 7:26 PM

#

sturdy sequoia HEISENBUGS ARE THE WORSE

waltuh

sturdy sequoia Dec 8, 2023, 7:47 PM

#

@left night since you know comemo better, maybe you can figure it out by looking over cache.rs: https://github.com/Dherse/comemo/blob/main/src/cache.rs

GitHub

comemo/src/cache.rs at main · Dherse/comemo

Incremental computation through constrained memoization. - Dherse/comemo

sturdy sequoia Dec 8, 2023, 8:04 PM

#

Ok, the plot thickens: if I run only one test, it never fails (I literally wrote a bash script that runs the test until it fails)

#

But if I run multiple tests then one of them sometimes fails

#

weird

#

very weird

#

Ok, it's eviction that seems to cause issues

#

OF COURSE

#

GOD DAMNIT

#

Tests are un in parallel

#

so it can evict while another test is running

#

😂

#

Since it's not thread local anymore

#

Bingo, that was it

#

GODDAMN that was an annoying bug to track

left night Dec 8, 2023, 8:32 PM

#

makes sense

sturdy sequoia Dec 8, 2023, 8:44 PM

#

@left night could you test your parallel layout with this version of comemo please 🥰

#

I wonder if it's any faster

#

To get back to just Deferred for the PDF stream compressions, the gains are... meh, cold they're pretty good: around 6%, but incremental they're basically non-existent because the deflated streams are already mostly memoized anyway 😦

#

Mind you larger docs will see larger gains

#

(can you send me that lorem test doc btw?)

left night Dec 8, 2023, 10:00 PM

#

📎 lorem.typ

feral imp Dec 8, 2023, 10:02 PM

#

@sturdy sequoia can you like almost guess what the performance benefit would be to native with Dash Map?

left night Dec 8, 2023, 10:13 PM

#

sturdy sequoia <@311948531835469827> could you test your parallel layout with this version of c...

for some reason, it is actually a fair bit slower with that branch (on your thesis) :/

#

this is with the threaded branch of the comemo repository

#

this is with your pull request (+ one line addition to the macro to make it compile. it needs Send + Sync on the trait impl).

#

I have pushed the experiment to the parallel-test branch, so feel free to test yourself.

#

One difference I'm seeing is that the my threaded still uses DashMap because I've written that before I knew about the Safari issue.

sturdy sequoia Dec 8, 2023, 10:20 PM

#

left night One difference I'm seeing is that the my `threaded` still uses DashMap because I...

That might explain the huge difference in performance 😱

#

I'm guessing the contention on the locks is very heavy

#

I'll test it locally and see with a DashMap 'cause I'm pretty sure it will be a fair bit faster

sturdy sequoia Dec 8, 2023, 10:22 PM

#

left night this is with your pull request (+ one line addition to the macro to make it comp...

What line? 🤔

left night Dec 8, 2023, 10:25 PM

#

sturdy sequoia What line? 🤔

sturdy sequoia Dec 8, 2023, 10:26 PM

#

left night

Thanks 😄

left night Dec 8, 2023, 10:26 PM

#

sturdy sequoia Thanks 😄

pential issue? :D

sturdy sequoia Dec 8, 2023, 10:27 PM

#

left night pential issue? :D

whoopsy 😂

#

It's late

#

@left night just so you know, in my slowest incremental test it divides (with my comemo) incremental time by four

#

@left night you're also using parking_lot whereas I'm still using std locks

#

I still think parking_lot is better because they don't poison!

#

So, your version wins on cold compiles but gets absolutely demolished on incremental thanks to all of my other optimizations, on my end you're slightly faster cold by about 5% but incremental your version of comemo is usually ~25% slower than mine.

#

pardon the mouse handwriting 😂

#

😎

#

Mind you, compared to main, both of these destroy main, my version of comemo with your parallel code leads to 66% faster cold, and between 60 and 242% faster incremental

#

parking_lot on my version makes no difference outside of removing all of the ugly .unwrap() calls

#

And actually, it makes sense that your version is faster in cold: in incremental there is less contention over the locks, and while my version suffers from lock contention, your does not (or not as much).

left night Dec 8, 2023, 10:41 PM

#

did you try yours with DashMap?

sturdy sequoia Dec 8, 2023, 10:41 PM

#

I also think that by not having functions race, we might get better performance (i.e by having a built-in Deferred<T>)

sturdy sequoia Dec 8, 2023, 10:41 PM

#

left night did you try yours with DashMap?

not yet

#

I am doing the change now

#

but it's quite in-depth since I have RwLock everywhere

left night Dec 8, 2023, 10:41 PM

#

sturdy sequoia I also think that by not having functions race, we might get better performance ...

It's tricky to not race though

#

compile e.g. has no non-tracked argument, so it would allow one compilation at once

#

which is maybe not a problem for our use case but strange conceptually

sturdy sequoia Dec 8, 2023, 10:43 PM

#

ah, you might be right indeed, my idea was to push it early to the cache and have the cache store a Deferred<T> essentially as the output

#

Just making the ACCELERATOR use a DashMap bumps performance on-par with yours in cold

#

It does make incremental compilation slightly slower

#

so it's kind of a tradeoff between cold & hot compile times

#

mind you I'm pretty sure I could optimize it to handle contention a bit better maybe

#

It's insane seeing my thesis mostly compile it sub 4s 😂

left night Dec 8, 2023, 10:48 PM

#

The accelerator is perhaps also conceptually not ideal the way it is now

sturdy sequoia Dec 8, 2023, 10:48 PM

#

it used to take 68s with the old glossary 😱

sturdy sequoia Dec 8, 2023, 10:48 PM

#

left night The accelerator is perhaps also conceptually not ideal the way it is now

I honestly don't understand it still, I get the rest but not the accelerator 😬

left night Dec 8, 2023, 10:49 PM

#

There is a lot of unnecessary contention because it's just a big HashMap that acts like like a bunch of small hashmaps.

sturdy sequoia Dec 8, 2023, 10:49 PM

#

left night There is a lot of unnecessary contention because it's just a big HashMap that ac...

is there a way we could "split" them?

#

by that I mean split the little hashmaps

left night Dec 8, 2023, 10:50 PM

#

When a value starts being tracked, it gets a new globally unique ID. And when something is validated on it, this validation is cached in the accelerator.

#

So conceptually, each Tracked owns a hash map in there

#

But I didn't want it to literally own because Tracked should remain Copy

sturdy sequoia Dec 8, 2023, 10:50 PM

#

Ok, why not move it into the Tracked then?

#

ah okay

#

I sent my message as your sent yours

sturdy sequoia Dec 8, 2023, 10:50 PM

#

left night But I didn't want it to literally own because `Tracked` should remain `Copy`

could we point to an ID inside of a global list of accelerators?

#

it stays copy and on evict we clear that list of accelerators

#

it's a bit weird ngl but it kinda solves the issue of contention on that big hashmap

#

or we accept that Tracked is Clone 😐

left night Dec 8, 2023, 10:51 PM

#

since the IDs are ever-growing during execution, it would need to be a global list

left night Dec 8, 2023, 10:52 PM

#

sturdy sequoia or we accept that `Tracked` is `Clone` 😐

Cloning it would split the acceleration, which is kind of pointless

sturdy sequoia Dec 8, 2023, 10:52 PM

#

left night since the IDs are ever-growing during execution, it would need to be a global li...

the lists could be local to tracked types?

left night Dec 8, 2023, 10:52 PM

#

It would rather have to be handled through a reference

sturdy sequoia Dec 8, 2023, 10:52 PM

#

left night Cloning it would split the acceleration, which is kind of pointless

inside it could be an Arc<Mutex<HashMap<>>>

left night Dec 8, 2023, 10:52 PM

#

I don't like it

sturdy sequoia Dec 8, 2023, 10:53 PM

#

left night I don't like it

come on, one more generic bro, one more generic and it'll work bro, zero cost abstractions bro

#

or something like that 😂

left night Dec 8, 2023, 10:53 PM

#

Effectively, the accelerator is a leaky bump allocator with manual garbage collection and no reference tracking

sturdy sequoia Dec 8, 2023, 10:53 PM

#

left night Effectively, the accelerator is a leaky bump allocator with manual garbage colle...

currently, or in one of our ideas on how to re-implement it?

left night Dec 8, 2023, 10:54 PM

#

currently. we "allocate" a hashmap in the accelerator by using a specific ID, we only use increasing IDs and just collect during eviction. since it's only for perf, it doesn't matter if we have "use-after-free" (a tracked instance outlives evict) because it's just a validation miss.

#

bump allocator is maybe the wrong term, but for the IDs basically

sturdy sequoia Dec 8, 2023, 10:55 PM

#

Yes in my version, it would panic in a "use-after-evict" situation

#

(using the ID to reference and keep Copy)

#

BTW if that can reassure you, with your parallel version of Typst I can't really see the difference in the PDF, so good job 👍

#

ChefsKiss

left night Dec 8, 2023, 10:56 PM

#

sturdy sequoia BTW if that can reassure you, with your parallel version of Typst I can't really...

with the totally broken branch without proper introspection you mean?

sturdy sequoia Dec 8, 2023, 10:56 PM

#

left night with the totally broken branch without proper introspection you mean?

yeah 😂

left night Dec 8, 2023, 10:56 PM

#

lucky that your thesis apparently doesn't really need introspector disambiguation

sturdy sequoia Dec 8, 2023, 10:56 PM

#

left night lucky that your thesis apparently doesn't really need introspector disambiguatio...

that's surprising 😮

left night Dec 8, 2023, 10:57 PM

#

if you have lots of manually written stuff it is needed far less than when generating stuff

#

because the spans are part of the core hash that could be duplicate and thus needs disambiguation

sturdy sequoia Dec 8, 2023, 10:57 PM

#

I mean there are definitely a lot of quirks like some figures are randomly numbered, but overall it's hard to tell 😂

left night Dec 8, 2023, 10:57 PM

#

yeah, that's about what I would expect

left night Dec 8, 2023, 10:58 PM

#

sturdy sequoia Yes in my version, it would panic in a "use-after-evict" situation

you mean Vec<Map<.., ..>>?

sturdy sequoia Dec 8, 2023, 10:58 PM

#

left night you mean `Vec<Map<.., ..>>`?

yes

#

where we have this big vec pointing to individually locked maps

left night Dec 8, 2023, 10:58 PM

#

and you want to reset the ID in evict instead of ever-increasing it?

sturdy sequoia Dec 8, 2023, 10:59 PM

#

left night and you want to reset the ID in evict instead of ever-increasing it?

No idea, I haven't thought about it that far 😂

#

or there is the option of making Tracked Clone and using the Arc<Mutex<HashMap<>>> 💀

left night Dec 8, 2023, 10:59 PM

#

sturdy sequoia No idea, I haven't thought about it that far 😂

alternatively, it could also store a "head" offset

sturdy sequoia Dec 8, 2023, 10:59 PM

#

left night alternatively, it could also store a "head" offset

||just give it some head 😎||

left night Dec 8, 2023, 10:59 PM

#

trackeds from before evict would then not profit from acceleration

#

because if id < head -> None

sturdy sequoia Dec 8, 2023, 11:00 PM

#

left night because if id < head -> None

yes

left night Dec 8, 2023, 11:00 PM

#

if id >= head -> list[id - head]

sturdy sequoia Dec 8, 2023, 11:00 PM

#

but, do we ever keep Tracked beyond an evict call?

left night Dec 8, 2023, 11:00 PM

#

I want comemo to have a fool-proof API

sturdy sequoia Dec 8, 2023, 11:00 PM

#

left night I want comemo to have a fool-proof API

makes sense I suppose

#

I mean we could have a global Tracked counter

#

but it would require it to be Clone

#

(and panic on evict if any tracked values survive)

left night Dec 8, 2023, 11:01 PM

#

I don't follow

left night Dec 8, 2023, 11:01 PM

#

sturdy sequoia I mean we could have a global `Tracked` counter

we do have one

#

ID

sturdy sequoia Dec 8, 2023, 11:01 PM

#

no, I mean a reference counter of all Tracked values

left night Dec 8, 2023, 11:01 PM

#

ah you mean

#

like Arc

sturdy sequoia Dec 8, 2023, 11:01 PM

#

yeah

#

and then panic on evict if there are Tracked values alive

left night Dec 8, 2023, 11:01 PM

#

I don't like that either

#

evicting while tracked is sound in principle

sturdy sequoia Dec 8, 2023, 11:02 PM

#

neither do I because it also means Tracked is Clone

sturdy sequoia Dec 8, 2023, 11:02 PM

#

left night evicting while tracked is sound in principle

yes

left night Dec 8, 2023, 11:02 PM

#

you could try whether the accelerator sharding gives any gains

sturdy sequoia Dec 8, 2023, 11:02 PM

#

left night you could try whether the accelerator sharding gives any gains

sure

left night Dec 8, 2023, 11:02 PM

#

meanwhile, I need to figure something out for measurement

sturdy sequoia Dec 8, 2023, 11:03 PM

#

my gut says yes, but it's unclear

left night Dec 8, 2023, 11:03 PM

#

I want to get the parallel stuff merged soon. the measurement thing is the only remaining blocker.

sturdy sequoia Dec 8, 2023, 11:03 PM

#

left night meanwhile, I need to figure something out for measurement

To most members of the #contributors it looks like I'm doing complicated stuff, but I actually do the baby level stuff compared to what you're doing 😂

left night Dec 8, 2023, 11:04 PM

#

nah

#

I'm just dealing with the problems I created myself

#

it's crazy to me how invested (and obsessed) the contributor community overall is in the project. it makes me really happy.

sturdy sequoia Dec 8, 2023, 11:15 PM

#

left night it's crazy to me how invested (and obsessed) the contributor community overall i...

I think you tapped into a deep desire to get rid of LaTeX, and Martin and you found some really GOOD solutions to some really REALLY hard problems

#

Especially regarding UX, and for that you should be 100000% proud of yourselves!

glad canyon Dec 8, 2023, 11:16 PM

#

yeah, this is the most successful latex alternative so far

sturdy sequoia Dec 8, 2023, 11:30 PM

#

@left night you have no idea how painful it was to modify Typst to using local-accelerators 😭

#

(without using .clone() everywhere)

#

So, it's not that much faster than my initial implementation, it catches up to yours in cold compilation, and it's slightly faster across the board by about another 5% (up to 10% in some of my incremental tests)

#

One change I did make is to allow &Tracked<T> to be used to remove most of the cloning

#

So the ergonomics isn't terrible

#

but it's not perfect either

#

It blows my mind how freaking fast this makes Typst

#

with the parallel export we'll be golden 😮

#

(since with parallel comemo we can also get parallel export across the board)

#

like we are approaching sub 3s territory here for compiling a 160 page thesis filled with figures, introspection/queries, and other stuff

#

it's mind blowing

#

O-M-G

#

@left night are you seeing this?

#

my thesis just compiled in sub 3s

#

HOW

#

JUST HOOOOOOOOOWWWWWWWWWW

#

@onyx furnace with the work @left night and I are doing, oi-wiki and without wasmtime (which is also coming afaik): 44s cold, with wasmtime we'd likely be looking at ~20s cold

sturdy sequoia Dec 8, 2023, 11:45 PM

#

sturdy sequoia O-M-G

BTW, this is done using mimalloc, which is a highly optimized allocator for parallel applications

#

And there are still a ton of micro-optimizations we can do like using hashbrown, using faster hasher for all of the hashmaps, etc etc etc.

left night Dec 8, 2023, 11:50 PM

#

sturdy sequoia my thesis just compiled in sub 3s

yay!

left night Dec 8, 2023, 11:50 PM

#

sturdy sequoia And there are still a ton of micro-optimizations we can do like using hashbrown,...

doesn't std use hashbrown already?

sturdy sequoia Dec 8, 2023, 11:51 PM

#

left night doesn't std use hashbrown already?

does it? 🤔

#

Maybe it does and I just misremember

#

all you youngsters that are used to the modern rust std lib

#

back in my days mutexes were slow, channels too, and so were hashtables 😂

left night Dec 8, 2023, 11:54 PM

#

sturdy sequoia back in my days mutexes were slow, channels too, and so were hashtables 😂

hey, those were also my days!

#

I've been using Rust since 1.15 or sth like that

sturdy sequoia Dec 8, 2023, 11:55 PM

#

I started using rust and I remember when the 1.0 was released!

left night Dec 8, 2023, 11:55 PM

#

okay, I'm a youngster

sturdy sequoia Dec 8, 2023, 11:55 PM

#

(I am old, please help 😭)

sturdy sequoia Dec 9, 2023, 12:10 AM

#

So, after the most performance productive day in Typst history (99% being due to your work) I am off to bed 😴

left night Dec 9, 2023, 12:12 AM

#

sturdy sequoia So, after the most performance productive day in Typst history (99% being due to...

good night!

sturdy sequoia Dec 9, 2023, 12:12 AM

#

left night good night!

u too (sleep is important 😄 )

#

One last thing, compared to pre-content rework, that's a 5.6x to 6.2x improvement in incremental compilation time

left night Dec 9, 2023, 12:15 AM

#

needed to calculate that before going to bed to sleep well? :D

sturdy sequoia Dec 9, 2023, 12:15 AM

#

left night needed to calculate that before going to bed to sleep well? :D

Yes! 😄

#

I have a big excel spreedsheet (which is a complete mess) so it's fairly easy

sturdy sequoia Dec 9, 2023, 9:36 AM

#

sturdy sequoia O-M-G

@tight glade look at what Laurenz has done 😱

tight glade Dec 9, 2023, 9:37 AM

#

oh wow that's crazy, that's so so impressive Oo

#

let me INVESTIGATE how the hell is this possible 😮

untold turret Dec 9, 2023, 9:50 AM

#

@sturdy sequoia you are so excellent that the bottleneck is going to move to browsers or pdf viewers and not longer at typst compilation. I never imagined a general comemo in parallel

sturdy sequoia Dec 9, 2023, 9:50 AM

#

untold turret <@130737672951037952> you are so excellent that the bottleneck is going to move ...

parallel comemo was haaaaaaaard

#

especially since it's a rather highly optimized one too

tight glade Dec 9, 2023, 9:51 AM

#

sturdy sequoia parallel comemo was haaaaaaaard

did you make it happen?? 🥳

sturdy sequoia Dec 9, 2023, 9:53 AM

#

tight glade did you make it happen?? 🥳

parallel comemo was me, parallel Typst is all Laurenz

#

mind you Laurenz suggested several comemo optimizations which are now in 🙂

tight glade Dec 9, 2023, 9:53 AM

#

my god, you people are busy that's amazing ❤️

onyx furnace Dec 9, 2023, 9:55 AM

#

sturdy sequoia <@408824262015713281> with the work <@311948531835469827> and I are doing, oi-wi...

that's too fast!🚀

sly pecan Dec 9, 2023, 10:15 AM

#

How many threads does it use?

sturdy sequoia Dec 9, 2023, 10:16 AM

#

sly pecan How many threads does it use?

https://tenor.com/view/every-single-one-of-them-steve-kornacki-msnbc-all-of-them-everyone-gif-19744458

Tenor

#

@left night I pushed the changes for parallel comemo with local accelerators so that you can see what it ends up looking like

feral imp Dec 9, 2023, 10:32 AM

#

It would be nice, if the cli had an option on how many threads it is "limited" to use.

#

Like -j or test-threads.

sturdy sequoia Dec 9, 2023, 10:32 AM

#

feral imp It would be nice, if the cli had an option on how many threads it is "limited" t...

yes, but that I think will come later

#

it's also needed for parallel image encoding

left night Dec 9, 2023, 10:32 AM

#

sturdy sequoia <@311948531835469827> I pushed the changes for parallel comemo with local accele...

is it faster?

sturdy sequoia Dec 9, 2023, 10:32 AM

#

which currently isn't limited at all

sturdy sequoia Dec 9, 2023, 10:32 AM

#

left night is it faster?

yes

#

it's as fast as yours in cold compile, slightly faster incremental

left night Dec 9, 2023, 10:32 AM

#

did you also try the sharded global approach?

sturdy sequoia Dec 9, 2023, 10:33 AM

#

left night did you also try the sharded global approach?

no not yet

left night Dec 9, 2023, 10:33 AM

#

you are using Mutex because it's always write, correct?

sturdy sequoia Dec 9, 2023, 10:33 AM

#

left night you are using Mutex because it's always write, correct?

yes

left night Dec 9, 2023, 10:33 AM

#

but that's only because of the entry API

sturdy sequoia Dec 9, 2023, 10:33 AM

#

so no need for the overhead of a second AtomicUx in there

left night Dec 9, 2023, 10:34 AM

#

I wonder if RwLock with upgrade on miss would be faster

sturdy sequoia Dec 9, 2023, 10:34 AM

#

left night I wonder if RwLock with upgrade on miss would be faster

upgrade on miss?

left night Dec 9, 2023, 10:34 AM

#

when entry is none

#

basically read -> check hash map

#

if success -> return

#

if not in the map -> upgrade read to write

sturdy sequoia Dec 9, 2023, 10:34 AM

#

can we upgrade in a RwLock?

left night Dec 9, 2023, 10:34 AM

#

incurs double hashing but only on the slow path

sturdy sequoia Dec 9, 2023, 10:35 AM

#

I've also thought of just making the age atomic

#

then we don't even need mutability to lookup only when inserting

left night Dec 9, 2023, 10:36 AM

#

https://docs.rs/lock_api/0.4.9/lock_api/struct.RwLock.html#method.upgradable_read

sturdy sequoia Dec 9, 2023, 10:36 AM

#

which should help too

left night Dec 9, 2023, 10:36 AM

#

I'm not sure what the performance implications of an upgradable read are

left night Dec 9, 2023, 10:36 AM

#

sturdy sequoia then we don't even need mutability to lookup only when inserting

yes that could be a win, too

sturdy sequoia Dec 9, 2023, 10:40 AM

#

left night yes that could be a win, too

I'm first trying the method with an atomic, I used the orderings from Arc in std-lib for it hoping that it makes it faster than SeqCst

left night Dec 9, 2023, 10:42 AM

#

while the owned accelerator is in principle conceptually cleanest, it does make Tracked conceptually less nice, too. right now, it's is just a wrapper around a reference and TrackedMut around &mut. And both can be used like those w.r.t to copying, reborrowing, etc. having it be clone would ruin that.

sturdy sequoia Dec 9, 2023, 10:43 AM

#

@left night atomic based age provided another huge speedup of about 10% across the board!!!!

#

some incremental compiles are now sub 100ms (LSP territory)

left night Dec 9, 2023, 10:44 AM

#

sturdy sequoia <@311948531835469827> atomic based age provided another huge speedup of about 10...

wow

#

I also think that the vec-based global accelerator might actually be faster because it needs no atomic cloning ops. maybe it could even reuse the allocations for the accelerators.

sturdy sequoia Dec 9, 2023, 10:44 AM

#

left night I also think that the vec-based global accelerator might actually be faster beca...

yes I think you're right

#

I'll try that next 🙂

#

@left night using the sharded accelerators is another 10% faster cold, but equivalent incremental (perhaps 1-2% faster)

#

my bad, ignore that I had made a mistake I just caught

#

I'll check again once it's done building

#

Well after fixing my mistake is equivalent in cold & hot compiles, but it keeps Tracked copy which imo is worth it

left night Dec 9, 2023, 11:02 AM

#

Agreed

#

Did you reuse the allocations or not?

sturdy sequoia Dec 9, 2023, 11:02 AM

#

left night Did you reuse the allocations or not?

like re-using the hashmaps? then no

left night Dec 9, 2023, 11:02 AM

#

yeah that

#

just clearing all of them instead of the vec

#

and resetting the head

sturdy sequoia Dec 9, 2023, 11:03 AM

#

I'll see if it can be done

low sapphire Dec 9, 2023, 11:03 AM

#

btw how much faster are the new changes compared to 0.10.0?

#

might have already been said, but it seems like you find new improvements every day and they add up :D

sly pecan Dec 9, 2023, 11:09 AM

#

low sapphire btw how much faster are the new changes compared to 0.10.0?

A lot

sturdy sequoia Dec 9, 2023, 11:10 AM

#

low sapphire might have already been said, but it seems like you find new improvements every ...

about 3x faster cold, 2-2.5x incremental

#

and that's without some of the other optimizations that I am cooking on the side

left night Dec 9, 2023, 11:14 AM

#

at this point, I think building the introspector is one of the main bottlenecks in incremental

#

this could possibly be fixed by integrating a hierarchical introspection API with layered caching directly into the frames

sturdy sequoia Dec 9, 2023, 11:15 AM

#

left night this could possibly be fixed by integrating a hierarchical introspection API wit...

I've been thinking of that for a while

#

but we'd need to verify the assumption that the introspector is the slowest bit

left night Dec 9, 2023, 11:16 AM

#

we need the tracing 🙃

sturdy sequoia Dec 9, 2023, 11:16 AM

#

left night we need the tracing 🙃

the tracing no longer measures the introspection time :-p

#

(but it could be made to again)

left night Dec 9, 2023, 11:17 AM

#

I don't mean the introspection itself

#

Just building the introspector

sturdy sequoia Dec 9, 2023, 11:17 AM

#

Ok, so I have made typst significantly slower trying to re-use allocations

#

fun fact, it's faster without accelerators 😂

left night Dec 9, 2023, 11:18 AM

#

interesting

#

that was definitely not the case when I first introduced it

#

maybe that's because all tracked methods have become cheaper

#

world is more optimized and introspector, too

#

but I still find it hard to believe that it's really slower

sturdy sequoia Dec 9, 2023, 11:19 AM

#

it could also have to do with all of the goddamn locks everywhere on the accelerator 😂

sturdy sequoia Dec 9, 2023, 11:19 AM

#

left night but I still find it hard to believe that it's really slower

with the sharded thing only, without it's faster

#

so:

one global accelerator: slower cold, slightly slower incremental
one accelerator in Arc per-Tracked: fastest cold, fastest incremental
sharded accelerators: slightly slower cold, barely slower incremental (but doesn't work right yet)

#

I have a feeling that the upgradable rwlock is super duper slow

left night Dec 9, 2023, 11:21 AM

#

The arc version used Mutex right?

#

So this is Apples to Oranges

sturdy sequoia Dec 9, 2023, 11:22 AM

#

left night The arc version used Mutex right?

they all do

#

because in the sharded accelerators I still need mutable access to the one accelerator

#

so I .read() over the list of accelerators then .write() on the individual accelerator

#

and I .write() the list of accelerators when calling evict but that doesn't matter too much imo

left night Dec 9, 2023, 11:24 AM

#

ah

#

what you'd need is a lockfree bump allocator

#

but even if that exists, you could probably not clear it through an immutable reference

sturdy sequoia Dec 9, 2023, 11:27 AM

#

technically bumpalo is lockfree

#

so maybe that's the solution 🤔

#

it's not Sync 😭

quaint blaze Dec 9, 2023, 11:32 AM

#

Average rust generic signature

#

So will multi-threaded typst come out next update?

#

I mean considering you've done all you can to do less work the next step is being able to do more work per second

low sapphire Dec 9, 2023, 11:36 AM

#

sturdy sequoia about 3x faster cold, 2-2.5x incremental

ok holy shit, but I guess it will be less on weaker CPUs, right?

sly pecan Dec 9, 2023, 11:36 AM

#

Off to buy a 7995WX for typst

quaint blaze Dec 9, 2023, 11:38 AM

#

GPU parallel typst rendering when

#

Imagine cetz running on the gpu

sly pecan Dec 9, 2023, 11:38 AM

#

quaint blaze GPU parallel typst rendering when

I don't think that many parts of typst could benefit from gpu acceleration

quaint blaze Dec 9, 2023, 11:39 AM

#

sly pecan I don't think that many parts of typst could benefit from gpu acceleration

It was a half joke

lunar kettle Dec 9, 2023, 11:39 AM

#

Only image export

sturdy sequoia Dec 9, 2023, 11:39 AM

#

low sapphire ok holy shit, but I guess it will be less on weaker CPUs, right?

not really because while it uses many threads, the slow bits are still single threaded so as long as you have four cores or more

lunar kettle Dec 9, 2023, 11:39 AM

#

Which would be good for web app live preview

quaint blaze Dec 9, 2023, 11:39 AM

#

At least I can use all 16 of my cores now

sturdy sequoia Dec 9, 2023, 11:39 AM

#

lunar kettle Which would be good for web app live preview

yes, I have been thinking of writing a webgpu backend for Typst

quaint blaze Dec 9, 2023, 11:39 AM

#

lunar kettle Which would be good for web app live preview

Also typst-live

sturdy sequoia Dec 9, 2023, 11:39 AM

#

quaint blaze At least I can use all 16 of my cores now

it's already the case in 0.10 for PDF export 😉

quaint blaze Dec 9, 2023, 11:40 AM

#

sturdy sequoia it's already the case in 0.10 for PDF export 😉

You see each core runs at 4ghz

low sapphire Dec 9, 2023, 11:40 AM

#

sturdy sequoia not really because while it uses many threads, the slow bits are still single th...

yeah I got 4 cores on a 2019 thinkpad which I mainly use typst on^^

quaint blaze Dec 9, 2023, 11:40 AM

#

I have the 4ghz I WANNA USE THE 4GHZ

lunar kettle Dec 9, 2023, 11:40 AM

#

sturdy sequoia yes, I have been thinking of writing a webgpu backend for Typst

Sounds like quite the task

quaint blaze Dec 9, 2023, 11:40 AM

#

WebGPU is neat though

#

A layer between vulkan, metal, opengl, etc. and your shaders

sturdy sequoia Dec 9, 2023, 11:42 AM

#

lunar kettle Sounds like quite the task

yes indeed

left night Dec 9, 2023, 11:43 AM

#

quaint blaze So will multi-threaded typst come out next update?

Probably yes

quaint blaze Dec 9, 2023, 11:43 AM

#

sturdy sequoia yes indeed

At this point you should just join the typst team as "that one performance guy"

sturdy sequoia Dec 9, 2023, 11:43 AM

#

quaint blaze At this point you should just join the typst team as "that one performance guy"

I wish, I wouldn't need to interview for jobs anymore 😂

quaint blaze Dec 9, 2023, 11:44 AM

#

sturdy sequoia I wish, I wouldn't need to interview for jobs anymore 😂

Working on OSS is one of my dream jobs

feral imp Dec 9, 2023, 12:00 PM

#

sturdy sequoia I wish, I wouldn't need to interview for jobs anymore 😂

you've got three sponsors, can't you live off of that?!

sturdy sequoia Dec 9, 2023, 12:02 PM

#

feral imp you've got three sponsors, can't you live off of that?!

Gotta pump those numbers up 😄

#

I have made 15,63€ so far 😄

sturdy sequoia Dec 9, 2023, 12:46 PM

#

@left night as far as I can tell, the best options are:

local hashmaps in an Arc
a single global hashmap

#

the sharding thing just... isn't working 😦

tight glade Dec 9, 2023, 12:54 PM

#

sturdy sequoia I wish, I wouldn't need to interview for jobs anymore 😂

relatable 🤣

sturdy sequoia Dec 9, 2023, 1:12 PM

#

Ok, I have recovered the lost performance 🎉

#

@left night It's still slightly slower than a local hashmap, but it remains Copy

glossy shore Dec 9, 2023, 1:34 PM

#

sturdy sequoia But if I run multiple tests then one of them sometimes fails

I think that happened to me recently but I couldn't figure out why and it fixed itself

sturdy sequoia Dec 9, 2023, 2:19 PM

#

glossy shore I think that happened to me recently but I couldn't figure out why and it fixed ...

yes, in this case it was tests running in parallel, but it's fixed now 😄

sturdy sequoia Dec 9, 2023, 2:49 PM

#

Ok, I managed to recover the last lil' bits of performance 😄

quaint blaze Dec 9, 2023, 8:02 PM

#

Not sure if this would work but would it be performant to write a comemo cache to disk and when typst starts again load that cache

#

For persistent caching

sly pecan Dec 9, 2023, 8:08 PM

#

quaint blaze Not sure if this would work but would it be performant to write a comemo cache t...

The cache is so big that reading and writing it to disk would likely be slower

#

Especially with all the recent optimizations

quaint blaze Dec 9, 2023, 8:14 PM

#

Interesting

feral imp Dec 9, 2023, 9:14 PM

#

I also think the issue is... Windows support? They were discussing some channel socket business, but windows was a drag.

sturdy sequoia Dec 9, 2023, 9:33 PM

#

quaint blaze Not sure if this would work but would it be performant to write a comemo cache t...

Possible? yes absolutely, necessary? imo (and even more so with all of the tops coming) not so much. You have to realize that we cache lots of things which leads to ~3GB of use for my thesis (imo we probably cache too much), reading and writing to disk would be excruciatingly slow, and finally, we frankly don't need it: unless your document is really really big it will takes like ~20s to compile with these improvement (see oi-wiki with the improvements being massively down), so the further gains would be to maybe make that < 10s but with all of the complexities of disk caching

#

Additionally, there is the problem of deciding where to store them 🤔

left night Dec 9, 2023, 9:57 PM

#

It's also not that simple technically

#

For instance, we're hashing a lot of pointers to statics, which can be affected by ASLR (Address Space Layout Randomization)

sturdy sequoia Dec 9, 2023, 10:23 PM

#

left night For instance, we're hashing a lot of pointers to statics, which can be affected ...

I didn't even consider that, but indeed, we'd need no hashing of addresses

left night Dec 9, 2023, 10:24 PM

#

sturdy sequoia I didn't even consider that, but indeed, we'd need no hashing of addresses

and then there's also the fact that we have leaky interners and we hash the IDs used by them

#

it's a bunch of problems and I don't think it's worth it

sturdy sequoia Dec 9, 2023, 10:24 PM

#

left night it's a bunch of problems and I don't think it's worth it

definitely not imo

quaint blaze Dec 10, 2023, 8:44 AM

#

sturdy sequoia Possible? yes absolutely, necessary? imo (and even more so with all of the tops ...

3gb ram usage sounds like death for lower power hardware

sturdy sequoia Dec 10, 2023, 8:45 AM

#

quaint blaze 3gb ram usage sounds like death for lower power hardware

Hence why I said we’re probably memoizing too much

quaint blaze Dec 10, 2023, 8:45 AM

#

Yeah

feral imp Dec 10, 2023, 8:48 AM

#

Make it work, make it fast, make it work on hardware from the last decade, make it work on IE7, make it rain ☔

quaint blaze Dec 10, 2023, 9:16 AM

#

feral imp Make it work, make it fast, make it work on hardware from the last decade, make ...

I feel like the cutoff here is hardware from last decade

#

If typst can do that I think it's succeeded in being speed

#

If I can get it to run on my 15 yr old Lenovo laptop locally without performance issues I'll count that as a win

sly pecan Dec 10, 2023, 9:30 AM

#

sturdy sequoia Hence why I said we’re probably memoizing too much

I don't mind memory usage if it has an actual purpose. Ideally it should scale based on the users amount of ram right?

feral imp Dec 10, 2023, 9:32 AM

#

quaint blaze I feel like the cutoff here is hardware from last decade

I'm just making a fun remark. I think that memory usage is a complex thing to optimize. It will probably be a separate investigation.
🤔

sly pecan Dec 10, 2023, 9:33 AM

#

quaint blaze If I can get it to run on my 15 yr old Lenovo laptop locally without performance...

I think 8 GB machines is a reasonable target to make sure it works well on

quaint blaze Dec 10, 2023, 9:33 AM

#

sly pecan I don't mind memory usage if it has an actual purpose. Ideally it should scale b...

I agree with this

quaint blaze Dec 10, 2023, 9:33 AM

#

sly pecan I think 8 GB machines is a reasonable target to make sure it works well on

I know a few people who still are running on 4GB

#

but yeah 8GB is reasonable

#

If we can get 4GB running well thats the best outcome imo

sly pecan Dec 10, 2023, 9:36 AM

#

quaint blaze If we can get 4GB running well thats the best outcome imo

Sure. I'm sure it's possible to dynamically scale back comemo.

#

#

This is from the Steam hardware survey. Obviously biased towards higher ram, but it's a data point

quaint blaze Dec 10, 2023, 10:33 AM

#

I'd say that 8gb is probably the most practical minimum

left night Dec 10, 2023, 10:38 AM

#

It's anyway desirable for Typst to run well with 4GB max to itself (which is already a lot of course), since that's the WebAssembly limit

lunar kettle Dec 10, 2023, 10:44 AM

#

at least until memory64 becomes more prevalent 😄

#

but yeah I guess 4GB should be a sensible limit

tight glade Dec 10, 2023, 10:45 AM

#

Honestly, I would be surprised if Typst wasn't already running pretty well on old hardware

sly pecan Dec 10, 2023, 10:49 AM

#

left night It's anyway desirable for Typst to run well with 4GB max to itself (which is alr...

If you have 4 GB ram you're lucky if half that is available for applications

untold turret Dec 10, 2023, 10:50 AM

#

I feel that a best solution is to adopt a circuit breaker for memorization, because if we don't reach the memory limit, the more memory we utilize the faster compilation we get. When it detects the big document takes too much memory, it disables some memorization that has relative low efficiency to ensure experience.

sly pecan Dec 10, 2023, 10:52 AM

#

tight glade Honestly, I would be surprised if Typst **wasn't** already running pretty well o...

It uses quite a lot of memory. Though I think a lot of the memory usage when using the web app comes from the preview canvas not being aggressively garbage collected

sly pecan Dec 10, 2023, 10:54 AM

#

sly pecan If you have 4 GB ram you're lucky if half that is available for applications

I didn't catch the "to itself"

sturdy sequoia Dec 10, 2023, 12:37 PM

#

left night It's anyway desirable for Typst to run well with 4GB max to itself (which is alr...

||the memory limit after fixing the lil' mistake you guys at made 😈||

sturdy sequoia Dec 10, 2023, 12:38 PM

#

untold turret I feel that a best solution is to adopt a circuit breaker for memorization, beca...

Maybe we could do this by allowing the "memoize" macro to take in a "level" of some kind that says how critical it is to memoize its result

sly pecan Dec 10, 2023, 12:38 PM

#

sturdy sequoia Maybe we could do this by allowing the "memoize" macro to take in a "level" of s...

Couldn't it be dynamically prioritized based on how often it's being accessed?

#

#naive

sturdy sequoia Dec 10, 2023, 12:40 PM

#

sly pecan Couldn't it be dynamically prioritized based on how often it's being accessed?

Sure, but I feel like finding a good heuristic for that might be heaps more difficult than... manually annotating functions with an optional level

left night Dec 10, 2023, 12:49 PM

#

sturdy sequoia ||the memory limit after fixing the lil' mistake you guys at made 😈||

No mistake

#

It was intentional because I wasn't sure whether setting 4GB max would fail in some browser, so I wanted to test with a lower limit first

#

Because the docs aren't clear about whether the full amount is always allocated up front when using shared memory

sturdy sequoia Dec 10, 2023, 12:51 PM

#

left night No mistake

https://tenor.com/view/hmm-suspect-gif-22611582

Tenor

untold turret Dec 10, 2023, 12:52 PM

#

The cache evict strategy tends to behave like a smart (@sly pecan, e.g. dynamically prioritized based) generational GC, so I'm not sure whether we could also stole some techniques from them and still keep simplicity.

sturdy sequoia Dec 10, 2023, 1:26 PM

#

untold turret The cache evict strategy tends to behave like a smart (<@399269065388195842>, e....

it's true that if we implemented with a "true" generational GC we might get better results? 🤔

untold turret Dec 10, 2023, 1:31 PM

#

sturdy sequoia it's true that if we implemented with a "true" generational GC we might get bett...

The difference is that we can free some object safely whenever we would like to, because all objects are just for caching computation, so we may not need to make a true generational GC.

quaint blaze Dec 10, 2023, 1:51 PM

#

Me when the thing to compile my code into pdf contains a generational GC for performance reasons

#

Typst is absurd in some ways and I love it

onyx furnace Dec 12, 2023, 11:05 AM

#

just checked the new parallel comemo pr and i dont quite understand what accelerator does. i guess i've missed a lot of disscussion. 😂can anyone tell me about that or improve the docs?

sly pecan Dec 12, 2023, 11:16 AM

#

I just wanna bring up this message by @sturdy sequoia #contributors message Running plugins in parallel would be massive for cold compiles on documents that use them extensively

#

(Say a plugin that adds support for an image format for instance)

sturdy sequoia Dec 12, 2023, 11:17 AM

#

onyx furnace just checked the new parallel comemo pr and i dont quite understand what acceler...

it makes validation across multiple instances of the same Tracked<T> faster if I understand it correctly

#

validation is the act of checking whether two runs have returned the same values, which Typst uses to check whether the doc has converged

sly pecan Dec 12, 2023, 11:19 AM

#

sturdy sequoia it makes validation across multiple instances of the same `Tracked<T>` faster if...

"if I understand it correctly" isn't it your PR? 😂

sturdy sequoia Dec 12, 2023, 12:19 PM

#

sly pecan "if I understand it correctly" isn't it your PR? 😂

Accelerators aren’t science

#

They were already there

onyx furnace Dec 12, 2023, 12:55 PM

#

oh, i thought it was new

sturdy sequoia Dec 12, 2023, 1:00 PM

#

onyx furnace oh, i thought it was new

Mind you, they improve performance a lot

#

so clearly they're needed 😂

left night Dec 12, 2023, 2:28 PM

#

sly pecan I just wanna bring up this message by <@130737672951037952> https://discord.com/...

It's a bit tricky. We can't expect wasm plugins to be compiled with atomics support. While we can fire up a separate instance per thread, that's gonna add overhead and be expensive for plugins that have costly one-time initialization. So it depends a bit on the case whether it's worth it.

sly pecan Dec 12, 2023, 2:30 PM

#

I guess testing would be required, but if I had 100 calls to a plugin, each taking 0.1s, I'm guessing it might be worth it to run them in parallel

#

even with overhead

glossy shore Dec 12, 2023, 3:26 PM

#

Perhaps plugins could somehow hint to their parallelism support

#

Plugins that really don't need it will just not ask for it, and those that do will have to ask for it and guarantee themselves that they won't break

sturdy sequoia Dec 12, 2023, 3:31 PM

#

glossy shore Plugins that really don't need it will just not ask for it, and those that do wi...

it's true that we could just have a boolean in the plugin constructor

#

parallel: false by default

glossy shore Dec 12, 2023, 3:32 PM

#

I think it could be more refined than that but that's definitely also an option

#

Also is there any way for plugins to directly generate PDF elements or Typst values?

glad urchin Dec 12, 2023, 3:33 PM

#

no

#

they arent aware of typst at all

glossy shore Dec 12, 2023, 3:33 PM

#

I figured

glad urchin Dec 12, 2023, 3:33 PM

#

other than the minimal api

sturdy sequoia Dec 12, 2023, 3:33 PM

#

glossy shore Also is there any way for plugins to directly generate PDF elements or Typst val...

no and that won't happen soon at the very least

glossy shore Dec 12, 2023, 3:33 PM

#

Shame

#

could have real potential

lunar kettle Dec 12, 2023, 3:33 PM

#

and could become a real mess 😂

glossy shore Dec 12, 2023, 3:34 PM

#

fair

sturdy sequoia Dec 12, 2023, 3:34 PM

#

glossy shore could have real potential

yes but stability and the interface would be very hard

lunar kettle Dec 12, 2023, 3:34 PM

#

i think the way we have it now is a good balance, at least for now

glossy shore Dec 12, 2023, 3:34 PM

#

for now yeah, it's a good MVP

lunar kettle Dec 12, 2023, 3:34 PM

#

adding typst values somehow would be more reasonable, but writing pdf directly would be pretty hard I think

sturdy sequoia Dec 12, 2023, 3:35 PM

#

lunar kettle adding typst values somehow would be more reasonable, but writing pdf directly w...

I think a Frame interface would make more sense

#

where they can pass serialized frame and load them into Typst

quaint blaze Dec 13, 2023, 7:58 AM

#

lunar kettle adding typst values somehow would be more reasonable, but writing pdf directly w...

And dangerous

lunar kettle Dec 13, 2023, 8:03 AM

#

quaint blaze And dangerous

indeed

tight glade Dec 13, 2023, 9:17 AM

#

sturdy sequoia `parallel: false` by default

noice

left night Dec 14, 2023, 12:02 PM

#

@sturdy sequoia I've reviewed the comemo PR

sturdy sequoia Dec 14, 2023, 12:02 PM

#

left night <@130737672951037952> I've reviewed the comemo PR

Yes I just saw, thanks

sturdy sequoia Dec 14, 2023, 1:19 PM

#

@left night regarding the last_was_hit feature in comemo, I made it default because tests can't specify their own feature 😦

feral imp Dec 14, 2023, 1:30 PM

#

sturdy sequoia <@311948531835469827> regarding the `last_was_hit` feature in comemo, I made it ...

Have you made sure that they are #[cfg(feature = "last_was_hit)] alongside being #[test]? It won't help, but atleast you won't get compile errors if the feature isn't specified..

#

(and frankly, I'd do that and make a test-all feature that can be used in CI / local testing that includes this features, and then remove it from default, just for the cleanness)

glad urchin Dec 14, 2023, 1:31 PM

#

feral imp Have you made sure that they are `#[cfg(feature = "last_was_hit)]` alongside bei...

but then you dont test...?

feral imp Dec 14, 2023, 1:32 PM

#

glad urchin but then you dont test...?

I mean cargo test --all-features is definitely there for a reason, especially if your features are all additive, if not, then you'd make a feature say test-all that includes all additive features, and then it's cargo test --features test-all............

glad urchin Dec 14, 2023, 1:32 PM

#

well, if that works then sure

left night Dec 14, 2023, 1:49 PM

#

sturdy sequoia <@311948531835469827> regarding the `last_was_hit` feature in comemo, I made it ...

they can specify required-features

#

it's a bit of a pain because you still need to pass them manually

#

but that's life

sturdy sequoia Dec 14, 2023, 2:34 PM

#

@left night I am done with the code review items, I am just testing whether the immutable: HashMap<...> is really necessary 🙂

#

@left night (sorry for double ping) what would you replace it by, the old method?

#

or just always adding immutable calls? 🤔

#

I can also test both

left night Dec 14, 2023, 2:40 PM

#

Probably the deduplication is still worth it

#

But hard to say

#

Since I don't want to overoptimize it for the Tracer usage, I would probably keep it

sturdy sequoia Dec 14, 2023, 2:49 PM

#

@left night As far as I can tell, on my thesis at least the two methods are equivalent, without the hashmap is barely faster (like 10ms on an incremental run of half a second) but it is slower in cold by 0.1s

#

So they're pretty much equivalent

#

And I just tested without dedup

#

and it's a good 30% slower than either methods 😂

#

I'll keep the "simpler" manual dedup method then 🙂

#

@left night it's pushed

#

https://tenor.com/view/finished-elijah-wood-lord-of-the-rings-lava-fire-gif-5894611

Tenor

It's done - Finished

▶ Play video

left night Dec 14, 2023, 5:00 PM

#

@sturdy sequoia does the non Send + Sync version really need a separate surface?

#

since the Send + Sync version forces the type to be Send + Sync anyway...

sturdy sequoia Dec 14, 2023, 5:10 PM

#

@left night I did it because the TrackedMut as the #ty passed to it:

#

#

So I wasn't really sure whether I could merge them

#

thonk

left night Dec 14, 2023, 5:12 PM

#

I see. Hmm, the duplication is unfortunate.

sturdy sequoia Dec 14, 2023, 5:13 PM

#

left night I see. Hmm, the duplication is unfortunate.

I agree, but I wans't sure how to circumvent it

left night Dec 14, 2023, 5:13 PM

#

Maybe a better alternative would be to either just require Send + Sync or allow the annotation to be #[comemo::track(unsync)]. in either case, I think we want to generate just one.

#

since we don't personally have a use case for unsync, we can also just skip it I guess

#

the return value must also be sync, so it's a bit moot

sturdy sequoia Dec 14, 2023, 5:14 PM

#

it's a bit annoying imo to have to write Tracked<dyn Trait + Send + Sync>

#

What I don't understand is that if trait Trait: Send + Sync { ... }, why do we need to specify it again

#

?

#

angrythunk

left night Dec 14, 2023, 5:14 PM

#

did you try that?

sturdy sequoia Dec 14, 2023, 5:14 PM

#

left night did you try that?

hum... I don't know 😂

#

I think so

#

but I can try again

left night Dec 14, 2023, 5:16 PM

#

maybe we don't, from a quick test

#

if we can just annotate the World trait Send + Sync, that'd certainly be ideal and no changes in comemo would be required (compared to main)

sturdy sequoia Dec 14, 2023, 5:17 PM

#

left night if we can just annotate the `World` trait `Send + Sync`, that'd certainly be ide...

yes, that's what I like

#

from my quick test it seems to be the case

left night Dec 14, 2023, 5:17 PM

#

very nice!

sturdy sequoia Dec 14, 2023, 5:17 PM

#

Lemme double check before I make a fool of myself 😄

left night Dec 14, 2023, 5:17 PM

#

Tracked<'_, dyn World + Send + Sync> was indeed ugly

sturdy sequoia Dec 14, 2023, 5:18 PM

#

well it's not needed somehow

#

I guess though some of the other cleanups

#

@left night there are still some changes needed to World regarding interior mutability for World to be truly Send + Sync

#

(removing OnceCell, RefCell, etc.)

left night Dec 14, 2023, 5:21 PM

#

Those are on my branch

sturdy sequoia Dec 14, 2023, 5:23 PM

#

left night Those are on my branch

It's pushed 🙂

#

And I confirm, typst works just fine by just swapping comemo and modifying the interior mutability (as well as some stuff that returns Self which isn't supported anymore 😐 )

left night Dec 14, 2023, 11:08 PM

#

sturdy sequoia It's pushed 🙂

I did some refactoring on the PR. In particular, I replicated the Inner optimization from the mutable constraints to the immutable constraints for less locking, split the thing up into a few more files, and switched to required-features for tests. Feel free to take a look / test it yourself before I merge.

sturdy sequoia Dec 14, 2023, 11:54 PM

#

left night I did some refactoring on the PR. In particular, I replicated the `Inner` optimi...

It looks alright to me, just a warning when building in release which I fixed 🙂

sturdy sequoia Dec 15, 2023, 12:12 AM

#

@left night as a final tally, this provides a 30-35% performance improvement in incremental 😉

untold turret Dec 15, 2023, 12:49 AM

#

I see imcoming commit want to make world Send + Sync thread safe. This may make some world implementations quite pain.
https://github.com/typst/typst/commit/5159c303133282c7d1c6567c1487d4f351d83eff#diff-fb076026c9166e521c4b4c70de26b2ed2c2052cec8e4df9edc5c506616417aaeR187
For example, a world contains JsValue will never be sync by design of wasm_bindgen.

GitHub

Switch to multi-threaded comemo · typst/typst@5159c30

#

🥺 @sturdy sequoia idea?

sturdy sequoia Dec 15, 2023, 1:00 AM

#

untold turret 🥺 <@130737672951037952> idea?

I think the easiest in this way will be to have your js handling on a separate thread and use channels to communicate from the world to the js. That will keep world send and sync while not requiring js values to be send and sync

untold turret Dec 15, 2023, 1:04 AM

#

That may increase much cost to communication, but I think that's a general idea to allow non-sync things to be sync

#

👿 still be pain to be honest, I may use unsafe impl Send before I have time to make a that sync wrapper...

#

@sturdy sequoia Another question, will multi-threaded comemo become slower if its user doesn't use any parallel thing like rayon? My some private projects (not related to typst) use comemo, but they are all run in single thread as intended.

sturdy sequoia Dec 15, 2023, 8:36 AM

#

untold turret <@130737672951037952> Another question, will multi-threaded comemo become slower...

No it’s in fact significantly faster

sturdy sequoia Dec 15, 2023, 9:47 AM

#

untold turret 👿 still be pain to be honest, I may use `unsafe impl Send` before I have time t...

Sowwwyyyyyy, but it's needed to make Typst multithreaded 💀

#

Maybe you could just use a patch in Cargo.toml to force use the old version of comemo (single threaded)?

feral imp Dec 15, 2023, 9:56 AM

#

sturdy sequoia Maybe you could just use a patch in `Cargo.toml` to force use the old version of...

that's not really sustainable?

sturdy sequoia Dec 15, 2023, 10:01 AM

#

feral imp that's not really sustainable?

no indeed, but as a stopgap it should be fine?

cunning wadi Dec 15, 2023, 10:08 AM

#

untold turret 👿 still be pain to be honest, I may use `unsafe impl Send` before I have time t...

you might want to check out the send_wrapper crate

left night Dec 15, 2023, 10:48 AM

#

sturdy sequoia <@311948531835469827> as a final tally, this provides a 30-35% performance impro...

merged 🎉

sturdy sequoia Dec 15, 2023, 10:48 AM

#

OH YES

#

Time for a release? or do you want to test it some more?

left night Dec 15, 2023, 10:52 AM

#

not sure. some more testing might be sensible.

untold turret Dec 15, 2023, 11:03 AM

#

sturdy sequoia Sowwwyyyyyy, but it's needed to make Typst multithreaded 💀

no worries. it's worth because multithreaded typesetting is definitely so cool! I don't know but I think latex is still single threaded.

sly pecan Dec 15, 2023, 11:04 AM

#

untold turret no worries. it's worth because multithreaded typesetting is definitely so cool! ...

LaTeX is extremely single threaded, yes

untold turret Dec 15, 2023, 11:05 AM

#

cunning wadi you might want to check out the `send_wrapper` crate

sounds like a good news. I've forgotten it. I'll give a try.

left night Dec 15, 2023, 11:08 AM

#

since we control comemo, I guess we can switch to a git dependency temporarily

#

I just don't want to use git deps for unreleased things that we don't control because it could mess with our releases

sturdy sequoia Dec 15, 2023, 11:09 AM

#

left night since we control comemo, I guess we can switch to a git dependency temporarily

I think we could, the performance bump is worth it overall

sturdy sequoia Dec 15, 2023, 11:10 AM

#

untold turret no worries. it's worth because multithreaded typesetting is definitely so cool! ...

most likely single threaded, after all writing multithreaded C code is hard

left night Dec 15, 2023, 11:10 AM

#

sturdy sequoia I think we could, the performance bump is worth it overall

this is all that's needed, right? https://github.com/typst/typst/tree/parallel-comemo

sturdy sequoia Dec 15, 2023, 11:11 AM

#

left night this is all that's needed, right? <https://github.com/typst/typst/tree/parallel-...

don't forget to modify the test and bench world otherwise they won't pass CI

#

I can open a PR from my branch if you want

left night Dec 15, 2023, 11:11 AM

#

but CI passed?

sturdy sequoia Dec 15, 2023, 11:11 AM

#

weird

#

'cause it shouldn't

left night Dec 15, 2023, 11:11 AM

#

I already adjusted the test world here: https://github.com/typst/typst/commit/cf6ce9fd53dae24ec46142e2c9b249cb4ae102b1

#

which is already on main

sturdy sequoia Dec 15, 2023, 11:12 AM

#

because the World impl in the CLI and in the tests use OnceCell from the stdlib

left night Dec 15, 2023, 11:12 AM

#

the bench world I didn't touch

sturdy sequoia Dec 15, 2023, 11:12 AM

#

ah right, I didn't know that

left night Dec 15, 2023, 11:12 AM

#

feel free to make a PR instead of me merging my branch

sturdy sequoia Dec 15, 2023, 11:12 AM

#

Then it looks good!

left night Dec 15, 2023, 11:12 AM

#

it's your work after all

sturdy sequoia Dec 15, 2023, 11:12 AM

#

left night it's your work after all

Gimme five minutes then 🙂

left night Dec 15, 2023, 11:12 AM

#

please use a git dep with rev instead of a patch though

#

otherwise typst can't be used as a library properly

sturdy sequoia Dec 15, 2023, 11:19 AM

#

https://tenor.com/view/bamboozled-hoodwinked-led-astray-stephen-a-smith-run-amok-gif-12466806

Tenor

hoodwinked, bamboozled, led astray

▶ Play video

#

I had just done it with a patch

#

it's opened 🎉

#

https://github.com/typst/typst/pull/2973

GitHub

Switch to parallel-comemo using git patch by Dherse · Pull Request ...

left night Dec 15, 2023, 11:29 AM

#

It's still pretty broken, but the diff required for parallelity is just laughably small.

sturdy sequoia Dec 15, 2023, 11:31 AM

#

left night It's still pretty broken, but the [diff required for parallelity](<https://githu...

that's impressive!

#

BTW, as a quick update, compared to pre-content rework, we are at around 5x faster incremental and 2.5x faster cold

#

INSANE

#

Without parallel typst which brings that even higher

left night Dec 15, 2023, 11:32 AM

#

pretty nice, eh?

#

good work

sturdy sequoia Dec 15, 2023, 11:32 AM

#

left night pretty nice, eh?

yeah 😄

#

I think there is still some perf on the table while remaining single threaded, but we've already done so much!

#

So hum... I have two hours to make slides 💀

left night Dec 15, 2023, 11:33 AM

#

oh no, godspeed

keen scroll Dec 15, 2023, 11:34 AM

#

sturdy sequoia So hum... I have two hours to make slides 💀

not a problem with all those performance improvements 😉

sturdy sequoia Dec 15, 2023, 11:34 AM

#

left night oh no, godspeed

it's for a typst presentation too 😂

low sapphire Dec 16, 2023, 8:31 AM

#

you lied TanyaAngry
I tried the recent changes and it's a major improvement again xD

sturdy sequoia Dec 16, 2023, 9:52 AM

#

low sapphire you lied <:TanyaAngry:632207317575663616> I tried the recent changes and it's a ...

Sowwy I can’t stop myself

feral imp Dec 16, 2023, 10:04 AM

#

Exposed.

sturdy sequoia Dec 16, 2023, 10:31 AM

#

low sapphire you lied <:TanyaAngry:632207317575663616> I tried the recent changes and it's a ...

So hum… I take it you’re happy with parallel comemo then?

#

science science science

low sapphire Dec 16, 2023, 11:00 AM

#

sturdy sequoia So hum… I take it you’re happy with parallel comemo then?

yup yup, cold compilation time went from 1,3s to 800ms for my document (4 threads)

sturdy sequoia Dec 16, 2023, 11:00 AM

#

low sapphire yup yup, cold compilation time went from 1,3s to 800ms for my document (4 thread...

Wait, with main?????

low sapphire Dec 16, 2023, 11:01 AM

#

yeah

sturdy sequoia Dec 16, 2023, 11:01 AM

#

From which version????

low sapphire Dec 16, 2023, 11:01 AM

#

I compiled the latest main via cargo sometime yesterday

#

from 0.10.0

sturdy sequoia Dec 16, 2023, 11:01 AM

#

Okay, what cpu are you using?

#

Because the gains should be 30% incremental but pretty much zero cold

low sapphire Dec 16, 2023, 11:02 AM

#

should be Intel Core i5-8265U

sturdy sequoia Dec 16, 2023, 11:02 AM

#

Goddamn it confirms my theory that lower end cpus benefit more

#

😎

sturdy sequoia Dec 16, 2023, 11:03 AM

#

low sapphire should be Intel Core i5-8265U

And is it better incremental ?

low sapphire Dec 16, 2023, 11:03 AM

#

wait i'll measure it again

#

Benchmark 1: typst c document.typ
  Time (mean ± σ):      1.806 s ±  0.018 s    [User: 1.540 s, System: 0.623 s]
  Range (min … max):    1.787 s …  1.849 s    10 runs
 
Benchmark 2: typst-dev c document.typ
  Time (mean ± σ):      1.028 s ±  0.024 s    [User: 1.247 s, System: 0.086 s]
  Range (min … max):    1.012 s …  1.095 s    10 runs
 
Summary
  typst-dev c document.typ ran
    1.76 ± 0.05 times faster than typst c document.typ

"typst-dev" is 41c0dae2, "typst" is just 0.10.0

#

that was a different document though (35 pages). I make a lot of use of states and some calculations

#

Simple edits for incremental:

typst: 175ms - 240ms
typst-dev: 150ms - 200ms

#

idk how I would easily pipe this into hyperfine though^^ just an estimate, so cold compile time definitely improved a lot more

low sapphire Dec 16, 2023, 2:47 PM

#

@sturdy sequoia I haven't tried it yet, but do any of the recent developments also make wasm faster? I remember it being very slow, especially for cold compilations when it first came out

sturdy sequoia Dec 16, 2023, 2:50 PM

#

low sapphire <@130737672951037952> I haven't tried it yet, but do any of the recent developme...

@left night is working on using wasmtime as the executor which is significantly faster

sly pecan Dec 16, 2023, 3:18 PM

#

sturdy sequoia <@311948531835469827> is working on using wasmtime as the executor which is sign...

is there a branch to try?

lunar kettle Dec 16, 2023, 3:20 PM

#

I mean using it in the compiler is easy but integrating it into the web app is hard

sturdy sequoia Dec 16, 2023, 3:25 PM

#

lunar kettle I mean using it in the compiler is easy but integrating it into the web app is h...

Yes, in the cli it’s trivial, the problem only shows itself for the web app

left night Dec 16, 2023, 5:04 PM

#

sturdy sequoia <@311948531835469827> is working on using wasmtime as the executor which is sign...

I haven't really continued on that for now because of the wasm-bindgen Sync problem.

sturdy sequoia Dec 16, 2023, 5:04 PM

#

left night I haven't really continued on that for now because of the wasm-bindgen Sync prob...

Well you could have the plugins running in a worker and use a channel to send messages to it

#

Should be fairly easy

glad urchin Dec 16, 2023, 5:06 PM

#

sturdy sequoia Should be fairly easy

Famous last words…

left night Dec 16, 2023, 5:06 PM

#

That would still need world integration

#

Which I was hoping to avoid

#

It's also a bit unfortunate to unconditionally spawn evermore workers just in case although that will need to happen for rayon to work anyway

#

In short: I originally thought I could quickly use the browser wasm and just ship it in a day, but since it's more complex, I shifted it back in my priorities.

sly pecan Dec 16, 2023, 5:13 PM

#

sturdy sequoia Yes, in the cli it’s trivial, the problem only shows itself for the web app

The web app just uses the wasm engine of the browser doesn't it?

sturdy sequoia Dec 16, 2023, 5:13 PM

#

sly pecan The web app just uses the wasm engine of the browser doesn't it?

No right now it uses wasmi

#

Which is why it’s so sloooooooow

sly pecan Dec 16, 2023, 5:19 PM

#

sturdy sequoia No right now it uses wasmi

I just assumed it would use the browser

keen scroll Dec 16, 2023, 9:44 PM

#

sly pecan I just assumed it would use the browser

I think the web app itself runs in the browser
But wasm plugins in the web app use wasmi

feral imp Dec 16, 2023, 10:46 PM

#

Node.js != the browser. I've learned this, is this true

sturdy sequoia Dec 16, 2023, 11:16 PM

#

feral imp Node.js != the browser. I've learned this, is this true

Node.js has a ton of specialty APIs and misses tons of web-specific APIs

onyx furnace Dec 17, 2023, 4:27 AM

#

https://crates.io/crates/diplomatic-bag

#

i found this. maybe it can help typst.ts and (webapp?)

#

A wrapper type that allows you to send !Send types to different threads.

quaint blaze Dec 17, 2023, 10:11 AM

#

low sapphire ``` Benchmark 1: typst c document.typ Time (mean ± σ): 1.806 s ± 0.018 s...

We are speed

sturdy sequoia Dec 17, 2023, 11:19 AM

#

quaint blaze We are speed

https://tenor.com/view/cars-lightning-mcqueen-speed-im-speed-i-am-speed-gif-25593042

Tenor

#

@left night Can you explain to me why you don't want the plugin interface to be part of World? Just curious as to what's the rationale here, it could be just a Sender<PluginCall> or something like that

left night Dec 17, 2023, 12:02 PM

#

sturdy sequoia <@311948531835469827> Can you explain to me why you don't want the plugin interf...

(a) I always want to avoid complexity. In this case maybe it is necessary complexity, but it is still complexity.

(b) It is one more thing that world implementors have to worry about, at least unless it is implemented by default through a feature flag.

sturdy sequoia Dec 17, 2023, 1:08 PM

#

@low sapphire Since you like free performance: https://github.com/typst/typst/pull/2989

GitHub

Parallel export (use par comemo) by Dherse · Pull Request #2989 · t...

This PR does the following changes:

Using parallel comemo: we can now use rayon to get "free" parallel PNG and SVG export (brings a nice speedup)
Using Deferred: PDF page deflate is done...

low sapphire Dec 17, 2023, 1:10 PM

#

Nice!

left night Dec 17, 2023, 1:11 PM

#

sturdy sequoia <@456226577798135808> Since you like free performance: https://github.com/typst/...

why switch from std OnceLock to once_cell?

sturdy sequoia Dec 17, 2023, 1:12 PM

#

left night why switch from std OnceLock to once_cell?

We're already using it elsewhere and it doesn't require unwrapping

#

that's it really

low sapphire Dec 17, 2023, 1:12 PM

#

sturdy sequoia <@456226577798135808> Since you like free performance: https://github.com/typst/...

You should put "I made Typst N times faster" or something on your CV doglaugh

left night Dec 17, 2023, 1:14 PM

#

sturdy sequoia We're already using it elsewhere and it doesn't require unwrapping

OnceLock doesn't require unwrapping?

#

you probably mean parking_lot vs std

sturdy sequoia Dec 17, 2023, 1:14 PM

#

left night OnceLock doesn't require unwrapping?

No it does because by default it returns an Option<&T>

#

or am I confused? thonk

#

Ah no, the big difference is that OnceCell has the .wait() method

#

whoopsy

left night Dec 17, 2023, 1:16 PM

#

sturdy sequoia Ah no, the big difference is that `OnceCell` has the `.wait()` method

yeah, but for the CLI world that's not needed

sturdy sequoia Dec 17, 2023, 1:17 PM

#

I confuss meself

#

thinkies

left night Dec 17, 2023, 1:18 PM

#

Regarding Hash and PartialEq

#

Does it really happen only later for the pattern or immediately?

sturdy sequoia Dec 17, 2023, 1:18 PM

#

left night Does it really happen only later for the pattern or immediately?

hmm, probably immediately so I could probably just .wait() it

#

in my initial work that's what I had done

left night Dec 17, 2023, 1:19 PM

#

sturdy sequoia hmm, probably immediately so I could probably just `.wait()` it

and the deferred only happens in the first place because that's what construct_page happens to do?

sturdy sequoia Dec 17, 2023, 1:19 PM

#

left night and the deferred only happens in the first place because that's what construct_p...

yes

left night Dec 17, 2023, 1:19 PM

#

waiting directly seems cleaner

sturdy sequoia Dec 17, 2023, 1:20 PM

#

left night waiting directly seems cleaner

Done 😉

#

@left night do we have any CI for the CLI as a whole? Like testing the CLI itself not just the "engine" so-to-speak

left night Dec 17, 2023, 1:21 PM

#

No

feral imp Dec 17, 2023, 1:46 PM

#

low sapphire You should put "I made Typst N times faster" or something on your CV <:doglaugh:...

Dherse has a github sponsor page up. We are three people that sponsor him.
You could join us, if you want.. 😅

Also there is sponsoring typst itself. 🤷‍♂️

left night Dec 17, 2023, 2:12 PM

#

@sturdy sequoia doesn't have to be part of this PR, but parallel cmap creation and font subsetting is probably also relatively low-hanging fruit.

sturdy sequoia Dec 17, 2023, 2:12 PM

#

left night <@130737672951037952> doesn't have to be part of this PR, but parallel cmap crea...

I'll have a quick look 😉

#

Where should I look? 🤔

#

create_cmap I suppose?

left night Dec 17, 2023, 2:35 PM

#

sturdy sequoia Where should I look? 🤔

https://github.com/typst/typst/blob/main/crates/typst-pdf/src/font.rs#L147-L154

#

Just the subsetting would be simplest, but the cmap creation mutably borrows the glyph_set that's used by the subsetting, so doing cmap -> subsetting sequentially per font, but for all fonts at the same time, seems reasonable

#

This line might be a problem: https://github.com/typst/typst/blob/main/crates/typst-pdf/src/font.rs#L34

#

I don't think the glyph_set is used after that loop though, so it could probably be taken

sturdy sequoia Dec 17, 2023, 3:12 PM

#

@left night I doubt I did it how you wanted so here's what I did:

First: I prepare the items by allocating all of the refs etc. on a single thread
Second: I use a par bridge to go multithreaded where I do all of the font stuff
Third: I finalize on a single thread to write to the PDF

#

does that makes sense or is it wayyyy too overengineered?

left night Dec 17, 2023, 3:16 PM

#

sturdy sequoia <@311948531835469827> I doubt I did it how you wanted so here's what I did: - Fi...

Is this description for the fonts or for the whole export?

sturdy sequoia Dec 17, 2023, 3:17 PM

#

left night Is this description for the fonts or for the whole export?

of the fonts part

left night Dec 17, 2023, 3:17 PM

#

Why do you need to allocate the refs in the first part?

sturdy sequoia Dec 17, 2023, 3:17 PM

#

let processed_fonts = ctx.font_map
    .items()
    .map(|font| {
        prepare_font(&mut ctx.alloc, &mut ctx.font_refs, &mut ctx.glyph_sets, font)
    })
    .par_bridge()
    .map(process_font)
    .collect::<Vec<_>>();

for processed in processed_fonts {
    finalize(&mut ctx.pdf, processed);
}

#

This is essentially what happens

left night Dec 17, 2023, 3:17 PM

#

I had thought to just prepare the cmap and subset font and then merge First + Third

sturdy sequoia Dec 17, 2023, 3:17 PM

#

left night I had thought to just prepare the cmap and subset font and then merge First + Th...

that also works indeed

left night Dec 17, 2023, 3:18 PM

#

maybe we should off for now

#

and instead design a proper architecture for the whole export

#

where we (likely) have indeed the three phases you described

#

but for everything

sturdy sequoia Dec 17, 2023, 3:19 PM

#

Okay, so I don't push the font changes

#

On my thesis, I cannot measure the change either way 💀

#

But I guess this would be more impactful on CJK stuff

left night Dec 17, 2023, 3:19 PM

#

that's curious, I would have thought that subsetting is quite expensive

#

but for CFF likely indeed much more than for TrueType

sturdy sequoia Dec 17, 2023, 3:20 PM

#

I mean overall PDF export isn't that slow

#

image encoding was extremely slow

#

but since it's been multithreaded it's very decent

left night Dec 17, 2023, 3:21 PM

#

in the naive parallel test I did, it was quite slow

#

but probably just because the test was naive and everything else was very fast

#

30000 page document or sth

sturdy sequoia Dec 17, 2023, 3:21 PM

#

left night 30000 page document or sth

yeah okay 😂

#

I mean right now it's only parallel across fonts, right?

#

not within a single font

#

so if a single font is slow it won't change much

left night Dec 17, 2023, 3:22 PM

#

yeah

low sapphire Dec 17, 2023, 5:15 PM

#

feral imp Dherse has a github sponsor page up. We are three people that sponsor him. You c...

the $10 option sounds tempting xD

sturdy sequoia Dec 17, 2023, 5:59 PM

#

low sapphire the $10 option sounds tempting xD

Please feel free too 😄

low sapphire Dec 17, 2023, 10:02 PM

#

sturdy sequoia <@456226577798135808> Since you like free performance: https://github.com/typst/...

doesn't seem to be faster though? for cold compilations at least

sturdy sequoia Dec 17, 2023, 10:03 PM

#

low sapphire doesn't seem to be faster though? for cold compilations at least

depends on the length of your doc!

low sapphire Dec 17, 2023, 10:03 PM

#

I tried the same from last time

#

seems to be slightly slower on my hardware, but only by a few milliseconds

#

but if it speeds it up for other docs, I guess that's fine

sturdy sequoia Dec 17, 2023, 10:05 PM

#

How long is your doc and how many cores do you have?

low sapphire Dec 17, 2023, 10:07 PM

#

35 pages, 4 cores

#

I just run a test with hyperfine --warmup 3 -m 100 ...

#

your PR is ~10ms slower on my laptop

#

for that payload

low sapphire Dec 17, 2023, 10:10 PM

#

low sapphire 35 pages, 4 cores

full of text, at least 30 images, couple of tables etc

#

it's slightly faster on the same template I used, but without any "content"

#

which has 5 pages

sly pecan Dec 17, 2023, 10:32 PM

#

sturdy sequoia <@456226577798135808> Since you like free performance: https://github.com/typst/...

do I need to change anything to enable it? because I'm seeing no change at all really

low sapphire Dec 17, 2023, 10:33 PM

#

I only checked out his branch and compiled it

sly pecan Dec 17, 2023, 10:35 PM

#

the documents I've tested it on might have other bottlenecks

#

that being said, I only tried pdf

low sapphire Dec 17, 2023, 10:48 PM

#

good point

#

lemme try svg

low sapphire Dec 17, 2023, 10:50 PM

#

low sapphire full of text, at least 30 images, couple of tables etc

yeah svg is a bit faster

#

1.011 s vs 989.5 ms

#

might be even better on newer CPUs, idk

sly pecan Dec 17, 2023, 11:57 PM

#

@sturdy sequoia by the way I'm seeing a performance increase with target-cpu=x86-64-v3

#

it's small, but there

#

like <1%

#

granted my testing was very haphazard

sturdy sequoia Dec 18, 2023, 12:04 AM

#

compared to what? 🤔

sly pecan Dec 18, 2023, 12:14 AM

#

sturdy sequoia compared to what? 🤔

The default is x86-64, which excludes many instructions

#

x86-64-v3 does however exclude CPUs older than approximately 10 years

sturdy sequoia Dec 18, 2023, 12:16 AM

#

sly pecan The default is x86-64, which excludes many instructions

Right, I had tried target-cpu=native but I didn't see many gains

sly pecan Dec 18, 2023, 12:17 AM

#

sturdy sequoia Right, I had tried `target-cpu=native` but I didn't see many gains

x86-64-v3 should be similar to native

sturdy sequoia Dec 18, 2023, 12:17 AM

#

figures

sly pecan Dec 18, 2023, 12:18 AM

#

Anyway, it's free real estate

sturdy sequoia Dec 18, 2023, 12:19 AM

#

sly pecan Anyway, it's free real estate

but it breaks some compatibility

#

which I'm sure Laurenz won't like

#

(not that I care, if your CPU doesn't have AVX2 just throw it in the trash)

sly pecan Dec 18, 2023, 12:20 AM

#

sturdy sequoia but it breaks some compatibility

It would just be an additional target. Thats how mpv does it. You can choose to get v3

#

(although I can't imagine that many people are using pre-haswell computers today)

sturdy sequoia Dec 18, 2023, 12:21 AM

#

sly pecan (although I can't imagine that many people are using pre-haswell computers today...

In developing countries? probably loads

low sapphire Dec 18, 2023, 6:05 AM

#

sly pecan (although I can't imagine that many people are using pre-haswell computers today...

Hey I have a librebooted x200 at home :D

#

https://en.m.wikipedia.org/wiki/Penryn_(microprocessor)

Penryn (microprocessor)

Penryn is the code name of a processor from Intel that is sold in varying configurations as Core 2 Solo, Core 2 Duo, Core 2 Quad, Pentium and Celeron.
During development, Penryn was the Intel code name for the 2007/2008 "Tick" of Intel's Tick-Tock cycle which shrunk Merom to 45 nanometers as CPUID model 23. The term "Penryn" is sometimes used to...

rigid latch Dec 18, 2023, 9:56 AM

#

🙂 Naive question. What are you optimizing? Typst is freakingly fast for me.

sturdy sequoia Dec 18, 2023, 10:11 AM

#

rigid latch 🙂 Naive question. What are you optimizing? Typst is freakingly fast for me.

Well, everything?

rigid latch Dec 18, 2023, 10:13 AM

#

Is there anything that is slow in particular?

sturdy sequoia Dec 18, 2023, 10:27 AM

#

rigid latch Is there anything that is slow in particular?

Several things can be faster, on large documents it can take quite a bit of time

#

I should also mention that you must think of it this way: every optimisation has a big impact on people on low end devices since they’re constrained by the performance of their hardware

feral imp Dec 18, 2023, 10:28 AM

#

I think before dherse started, there were a few issues... Thesis long documents were quite slow to compile.

And it was too slow frankly.

Plus I've tried to start a paper with a long spurious png files in it, and performance suffered quite fast.

rigid latch Dec 18, 2023, 10:43 AM

#

Indeed, I had performance issues before 0.8.0 and 0.9.0 release. But now most of it is gone. I wonder what is left to do. If png files are big, I wouldn't expect the compilation / rendering be fast anyway.

sturdy sequoia Dec 18, 2023, 11:34 AM

#

rigid latch Indeed, I had performance issues before 0.8.0 and 0.9.0 release. But now most of...

Yes, that was the content rework, but now we're mostly working on make Typst more parallel

feral imp Dec 18, 2023, 11:40 AM

#

sturdy sequoia Yes, that was the content rework, but now we're mostly working on make Typst mor...

Isn't it more like... You're working on making the rest of typst parallel?

sturdy sequoia Dec 18, 2023, 11:41 AM

#

feral imp Isn't it more like... You're working on making the rest of typst parallel?

Mostly laurenz

#

I made comemo parallel, but he's making the engine parallel

violet axle Dec 19, 2023, 1:37 PM

#

rigid latch 🙂 Naive question. What are you optimizing? Typst is freakingly fast for me.

I have a long document that I converted to typst, that takes 18secs for typst and 12secs in latex on my old laptop on a clean first build so there is still some optimization potential

lament fulcrum Dec 19, 2023, 1:42 PM

#

that seems like another good benchmarking candidate if you can share it or remove sensitive information

#

and maybe theres even some overuse of state and locate one can get rid of

lunar kettle Dec 19, 2023, 1:54 PM

#

violet axle I have a long document that I converted to typst, that takes 18secs for typst an...

If you could share it it would probably help a lot!

sly pecan Dec 19, 2023, 3:19 PM

#

violet axle I have a long document that I converted to typst, that takes 18secs for typst an...

18s in 0.10?

violet axle Dec 19, 2023, 3:27 PM

#

Yes, but it is much faster on my newer system. It was more of a relative comparison to the latex times.

#

For sharing I would need either a more private setting or some cleanup time since we have solutions included. (which get integrated depending on a boolean flag)

sly pecan Dec 19, 2023, 3:29 PM

#

violet axle I have a long document that I converted to typst, that takes 18secs for typst an...

When you say long document, do you mean one that is mostly text but thousands of pages?

violet axle Dec 19, 2023, 3:32 PM

#

No, there are also quite a few tables and graphics (some of which we converted from pdf to svg). The document is only ~100 pages long.

sly pecan Dec 19, 2023, 3:36 PM

#

18 seconds sounds too slow for that, unless this is on a potato

#

Anyway, latex can be quite fast on "simple" documents (i.e. no fancy packages etc). Probably faster than typst cold compile can hope to be (at least single threaded)

left night Dec 19, 2023, 3:40 PM

#

sly pecan Anyway, latex can be quite fast on "simple" documents (i.e. no fancy packages et...

why can't we hope to be faster?

#

because of more primitive text shaping in pdfLaTeX?

sly pecan Dec 19, 2023, 3:44 PM

#

left night why can't we hope to be faster?

I mean, can't is a strong word, but tex is very well optimized. It's only when you start introducing "complicated" things that it slows down to a crawl.

sly pecan Dec 19, 2023, 3:44 PM

#

left night because of more primitive text shaping in pdfLaTeX?

That too

#

And typst presumably introduces extra overhead in order to be able to incrementally compile

left night Dec 19, 2023, 3:45 PM

#

Yes, that's true, although I don't think it's that much

sly pecan Dec 19, 2023, 3:46 PM

#

I haven't tried with all the performance improvements, but pdftex was definitely faster on just pure text

#

Possibly even luatex, but I can't recall

#

Should try again some time

left night Dec 19, 2023, 3:54 PM

#

The text layout itself hasn't really be optimized yet in Typst

sturdy sequoia Dec 19, 2023, 5:01 PM

#

left night The text layout itself hasn't really be optimized yet in Typst

I mean some stuff like caching is_cjk etc. have helped a ton, but it's clear that there's room in there

hoary dew Dec 22, 2023, 5:36 PM

#

violet axle For sharing I would need either a more private setting or some cleanup time sinc...

https://github.com/frozolotl/typst-mutilate
Hope this could help

GitHub

GitHub - frozolotl/typst-mutilate: A tool to replace words in a typ...

A tool to replace words in a typst document with random garbage. - GitHub - frozolotl/typst-mutilate: A tool to replace words in a typst document with random garbage.

untold turret Dec 29, 2023, 12:07 PM

#

@sturdy sequoia when I upgrade comemo, I find this function is completely broken. 🫠 Now I want to have a thread local comemo and a threaded comemo in same program.

sturdy sequoia Dec 29, 2023, 12:09 PM

#

untold turret <@130737672951037952> when I upgrade comemo, I find this function is completely ...

I think it would be nice if we actually had that like #[comemo::memoize(local)] or something like that

#

Why does it break BTW?

untold turret Dec 29, 2023, 12:09 PM

#

the return type is not send

#

*It was ever mentioned in this forge. But I didn't replace comemo and get errors on comemo::memorize macros.

#

btw, Is comemo::evict weird if there are multiple threads that run different incremental tasks?
In typst preview, there is a incremental compile thread, which evicts comemo cache after each compilation, and there is a incremental export thread, which evicts comemo cache after each render task.

sturdy sequoia Dec 29, 2023, 12:15 PM

#

untold turret btw, Is comemo::evict weird if there are multiple threads that run different inc...

comemo evict evicts all caches, everything will still work, but it won't benefit from caching or acceleration

sturdy sequoia Dec 29, 2023, 12:16 PM

#

untold turret the return type is not send

ah!

#

That's indeed an issue 😭

untold turret Dec 29, 2023, 12:16 PM

#

sturdy sequoia comemo evict evicts all caches, everything will still work, but it won't benefit...

They doesn't seem to find a suitable timing to evict cache to both computation.

sturdy sequoia Dec 29, 2023, 12:16 PM

#

untold turret the return type is not send

we really need a per-thread cache too then 💀

sturdy sequoia Dec 29, 2023, 12:17 PM

#

untold turret They doesn't seem to find a suitable timing to evict cache to both computation.

you should be able to evict both caches in their own time, the problem is that one will evict the current cache of the other thread

#

all of the Tracked<T> will still work, they'll just be slower

untold turret Dec 29, 2023, 12:17 PM

#

you are right, comemo is safe to evict at any time, but that looks a bit weird.

sturdy sequoia Dec 29, 2023, 12:18 PM

#

untold turret you are right, comemo is safe to evict at any time, but that looks a bit weird.

I admit it looks a bit weird, but I designed it (with Laurenz) specifically such that you can call it anytime and everything would still work

#

Essentially since function only memoize once they reach the output, evicting doesn't affect them.

#

and for accelerators in tracked structs, it just disables accelerations for outdated tracked

#

this is because it is a fairly dumb generational GC with a max generation of 1 for accelerators

untold turret Dec 29, 2023, 12:19 PM

#

So how can I go it on 💀.

sturdy sequoia Dec 29, 2023, 12:19 PM

#

I wonder if we could have a #[comemo::memoize(local)] too, it shouldn't be too hard

#

I'll see if I can do that 😉

untold turret Dec 29, 2023, 12:20 PM

#

So we will have an evict and an evict_local?

sturdy sequoia Dec 29, 2023, 12:36 PM

#

untold turret So we will have an evict and an evict_local?

yes that's how I am designing it

untold turret Dec 29, 2023, 12:38 PM

#

@sturdy sequoia If we borrow design ideas from previous work, to make threaded comemo really match rust's thread model. I think tokio's runtime handler design is nice to check. they store the a lightweight metadata in thread local. Then, a thread function can easily creates an isolated async handler, enters that handler, or exits that handler. The handler is also sendable among threads so that they can group threads into different async isolates.

#

Similarly, comemo can have a Handler to create. And we group threads and caches by handlers.

sturdy sequoia Dec 29, 2023, 12:45 PM

#

@untold turret

sturdy sequoia Dec 29, 2023, 12:46 PM

#

untold turret Similarly, comemo can have a `Handler` to create. And we group threads and cache...

that could be an idea, but it feels overly complicated for right now 😐

glossy shore Dec 29, 2023, 12:47 PM

#

sturdy sequoia <@432835220593704981>

Oh that's a bit yucky.

untold turret Dec 29, 2023, 12:48 PM

#

🐱 having local is great enough. In preview scenario, A single compiler uses evict and rest render tasks uses evict_local.

glossy shore Dec 29, 2023, 12:48 PM

#

Can't you just &'static mut ()

sturdy sequoia Dec 29, 2023, 12:48 PM

#

The only problem @untold turret is that you cannot pass Tracked into locally memoized functions 😐

untold turret Dec 29, 2023, 12:48 PM

#

But lsp may have multiple compilers.

sturdy sequoia Dec 29, 2023, 12:49 PM

#

untold turret 🐱 having local is great enough. In preview scenario, A single compiler uses `ev...

evict and evict_local are completely separate

#

so they won't ever interfere

untold turret Dec 29, 2023, 12:49 PM

#

sturdy sequoia The only problem <@432835220593704981> is that you cannot pass `Tracked` into lo...

get it.

sturdy sequoia Dec 29, 2023, 12:49 PM

#

glossy shore Oh that's a bit yucky.

that's because I know raw pointers are never Send afaik

sturdy sequoia Dec 29, 2023, 12:49 PM

#

untold turret get it.

is that a problem?

untold turret Dec 29, 2023, 12:49 PM

#

Not a problem till now.

glossy shore Dec 29, 2023, 12:50 PM

#

Oh right mut pointers are send if their payload is sync, which () is probably

#

let me check that

untold turret Dec 29, 2023, 12:50 PM

#

glossy shore Oh right mut pointers are send if their payload is sync, which `()` is probably

I think they are never send.

sturdy sequoia Dec 29, 2023, 12:50 PM

#

mut references can be send, mut pointers never are

glossy shore Dec 29, 2023, 12:50 PM

#

yup seems so

#

I guess because mut pointers are Copy

#

Sad that we can't just

struct NonSend;
impl !Send for NonSend {}

in stable yet.

glad urchin Dec 29, 2023, 12:52 PM

#

yeah you need to use phantom data with pointer or w/e

glossy shore Dec 29, 2023, 12:55 PM

#

I guess you could &() as &dyn Any

#

Or you'd probably need like Eq or something right?

sturdy sequoia Dec 29, 2023, 1:01 PM

#

@untold turret I did find a way of passing Tracked inside of a local memoized function!

glossy shore Dec 29, 2023, 1:02 PM

#

Well it might look something like this

#[test]
fn test_unsend() {
    trait OpaqueNonSend: PartialEq<()> + Debug {}
    impl OpaqueNonSend for () {}
    #[comemo::memoize(local)]
    fn add(_a: u32, _b: u32) -> &'static dyn OpaqueNonSend {
        &()
    }

    test!(miss: add(1, 2), &());
    test!(hit: add(1, 2), &());
    test!(miss: add(2, 3), &());
    test!(hit: add(2, 3), &());
}

I don't know if this is really better with all the trait mess

sturdy sequoia Dec 29, 2023, 1:02 PM

#

glossy shore Well it might look something like this ```rs #[test] fn test_unsend() { trai...

I mean it's just a test for CI at the end of the day it doens't matter too much :-p

glossy shore Dec 29, 2023, 1:03 PM

#

Yeah I know

#

Again I'm not so sure if this makes things any easier to think about

untold turret Dec 29, 2023, 1:05 PM

#

sturdy sequoia <@432835220593704981> I did find a way of passing `Tracked` inside of a local me...

how do you do that?

sturdy sequoia Dec 29, 2023, 1:07 PM

#

untold turret how do you do that?

    #[comemo::memoize(local)]
    fn dump<'local>(mut sink: TrackedMut<'local, Emitter>) {
        sink.emit("a");
        sink.emit("b");
        let c = sink.len_or_ten().to_string();
        sink.emit(&c);
    }

#

You must manually annotate the lifetimes specifically with the name 'local!

#

Internally it uses substitutes the 'local for 'static

#

it's a bit ugly but it works 😐

glossy shore Dec 29, 2023, 1:08 PM

#

len_or_ten?

sturdy sequoia Dec 29, 2023, 1:08 PM

#

glossy shore len_or_ten?

it'a a thing from the tests nevermind that

glossy shore Dec 29, 2023, 1:08 PM

#

oh ok

sturdy sequoia Dec 29, 2023, 1:09 PM

#

@untold turret https://github.com/typst/comemo/pull/6

GitHub

Local memoization by Dherse · Pull Request #6 · typst/comemo

Also allows local memoization of function results with #[comemo::memoize(local)] and a special evict_local function. This is specifically for interfacing with typst-preview.
The only quirks is that...

untold turret Dec 29, 2023, 1:10 PM

#

sturdy sequoia You must manually annotate the lifetimes specifically with the name `'local`!

You mean we cannot elide a '_ lifetime?

sturdy sequoia Dec 29, 2023, 1:10 PM

#

untold turret You mean we cannot elide a `'_` lifetime?

no :/

#

internally it does the following:

#

type __ARGS<'local>  =  <::comemo::internal::Args<(TrackedMut<'local,Emitter> ,)> as ::comemo::internal::Input>::Constraint;

#

It's a limitation of how thread_local in std resolves lifetimes, it's weird because it should work but it doesn't...

#

Probably running in a weird edge case in rustc

untold turret Dec 29, 2023, 1:13 PM

#

thread local is undoubtly a edge case. We have... many crates to loose our coding live.

sturdy sequoia Dec 29, 2023, 1:14 PM

#

the thing is that in the non-local version, we don't need to annotate lifetime and it still resolves just fine

#

but in the local case, where it goes through the thread_local macro we do

#

so it's really dumb

sturdy sequoia Dec 30, 2023, 5:39 PM

#

@glad urchin in tablex, there is this function:

// Measure a length in pt by drawing a line and using the measure() function.
// This function will work for negative lengths as well.
//
// Note that for ratios, the measurement will be 0pt due to limitations of
// the "draw and measure" technique (wrapping the line in a box still returns 0pt;
// not sure if there is any viable way to measure a ratio). This also affects
// relative lengths — this function will only be able to measure the length component.
//
// styles: from style()
#let measure-pt(len, styles) = {
    let measured-pt = measure(box(width: len), styles).width

    // If the measured length is positive, `len` must have overall been positive.
    // There's nothing else to be done, so return the measured length.
    if measured-pt > 0pt {
        return measured-pt
    }

    // If we've reached this point, the previously measured length must have been `0pt`
    // (drawing a line with a negative length will draw nothing, so measuring it will return `0pt`).
    // Hence, `len` must either be `0pt` or negative.
    // We multiply `len` by -1 to get a positive length, draw a line and measure it, then negate
    // the measured length. This nicely handles the `0pt` case as well.
    measured-pt = -measure(box(width: -len), styles).width
    return measured-pt
}

#

Could that not just be using the calc.sign and calc.abs to measure only once? I'm asking because the measure calls really add up

#

Maybe that would be better handled if we had a len.resolve(styles) method, that way you could obtain an absolute size from a relative one (i.e ems)

#

@left night Do you think we could have this: length.resolve(styles) it would also take the styles argument (just like measure) but work directly with a length as a much cheaper alternative to relying on measure here:

    /// Resolve this length to an absolute length.
    #[func]
    pub fn resolve(
        &self,
        /// The styles with which to measure the length.
        styles: Styles,
    ) -> Length {
        let styles = StyleChain::new(&styles);
        Length { abs: self.abs + self.em.resolve(styles), em: Em::zero() }
    }

#

It's a dead simple function too

sly pecan Dec 30, 2023, 5:48 PM

#

You could do measure(stack(dir:ttb, box(width: len), box(width: -len))).width

#

That would give the absolute value of the length I believe

#

But I haven't actually tried...

#

@sturdy sequoia

sturdy sequoia Dec 30, 2023, 5:51 PM

#

sly pecan You could do `measure(stack(dir:ttb, box(width: len), box(width: -len))).width`

That's a pretty good idea!

#

Still I think having a resolve function makes a lot of sense @sly pecan

sly pecan Dec 30, 2023, 6:14 PM

#

sturdy sequoia Still I think having a resolve function makes a lot of sense <@39926906538819584...

Sure

sturdy sequoia Dec 30, 2023, 6:51 PM

#

sly pecan Sure

Although I get with the whole context thing we might have it be easier

left night Dec 30, 2023, 6:56 PM

#

sturdy sequoia <@311948531835469827> Do you think we could have this: `length.resolve(styles)` ...

It was proposed before. Maybe there's no harm in adding it now. I just don't want to add more and more things to the old system before doing contexts.

#

But since it will all be breaking anyway, there's probably not much harm in it.

glad urchin Dec 30, 2023, 6:59 PM

#

sturdy sequoia Could that not just be using the `calc.sign` and `calc.abs` to measure only once...

Well, in the PR that added this, I wondered if this could have been a problem for performance, but also we didn’t really know how to handle it

#

Also the second measure technically only happens with 0pt measurements anyway

#

Well, <= 0pt
So it should be relatively rare

#

But still there’s no way to convert em to pt otherwise atm

#

It could be made a bit more efficient by just checking em and doing sign and stuff yeah, but I’d have to restrict that to Typst 0.11.0 since that’s what we can currently “detect” inside tablex code

#

Either way we can discuss this more later but the main problems are the technical limitation of using measure and version compatibility between different approaches

sturdy sequoia Dec 30, 2023, 7:04 PM

#

glad urchin Either way we can discuss this more later but the main problems are the technica...

makes sense yeah

glad urchin Dec 30, 2023, 11:53 PM

#

sturdy sequoia <@311948531835469827> Do you think we could have this: `length.resolve(styles)` ...

btw #1175895972090499112 message

sturdy sequoia Dec 31, 2023, 12:06 AM

#

glad urchin btw https://discord.com/channels/1054443721975922748/1175895972090499112/1177407...

thonk2

glad urchin Dec 31, 2023, 12:50 PM

#

sturdy sequoia Could that not just be using the `calc.sign` and `calc.abs` to measure only once...

calc.sign doesnt exist

#

;-;

sturdy sequoia Dec 31, 2023, 12:50 PM

#

glad urchin But still there’s no way to convert em to pt otherwise atm

Waaaaaaaaaaaa

glad urchin Dec 31, 2023, 1:08 PM

#

sturdy sequoia Waaaaaaaaaaaa

i think you replied to the wrong message lol

sturdy sequoia Dec 31, 2023, 1:08 PM

#

glad urchin i think you replied to the wrong message lol

I did

glad urchin Dec 31, 2023, 1:08 PM

#

anyway np i made my own sign function

#

it's glorious

#

(from stackoverflow of course :D)

feral imp Dec 31, 2023, 1:24 PM

#

glad urchin (from stackoverflow of course :D)

Based

glad urchin Dec 31, 2023, 1:47 PM

#

@left night do you have any opinions on creating a built-in calc.sign? 👀

left night Dec 31, 2023, 1:48 PM

#

No opinion

glad urchin Dec 31, 2023, 1:48 PM

#

oh

left night Dec 31, 2023, 1:48 PM

#

I'm not a huge fan of calc overall though

#

I wish it were methods on a number type

#

That can also be called as num.func

glad urchin Dec 31, 2023, 1:48 PM

#

honestly, i have thought the same thing multiple times lol

#

but ig we'd still have to discuss that and consider all viewpoints and talk to the community and laziness is superior

#

😂

#

maybe such a larger-scale change can come with the type rework in the future

untold turret Dec 31, 2023, 1:57 PM

#

having both (int/float).sign sounds great

left night Dec 31, 2023, 2:06 PM

#

I'm not sure whether this would make sense with distinct int and float types though

glad urchin Dec 31, 2023, 2:06 PM

#

well, we'd probably have to implement the same methods on both somehow

#

but personally I'm in favor of keeping distinct types, using int at least you have no chance of getting precision errors or stuff like that. seems fair enough to me

#

not to mention that there are indeed situations where the distinction is important (e.g. some-array.at(5.5) would error)

#

but thats a bit off-topic for this thread so we can probably discuss this further elsewhere if needed

#

:p

left night Dec 31, 2023, 2:10 PM

#

glad urchin not to mention that there are indeed situations where the distinction is importa...

well, it could error still

left night Dec 31, 2023, 2:11 PM

#

glad urchin well, we'd probably have to implement the same methods on both somehow

yeah, that of course. but if you want to write a generic function you'd need to pick one (or use method notation which is sometimes ugly)

#

and e.g. float.sqrt would mostly work, but float.min would yield a different type

#

maybe it's not a big problem

glad urchin Dec 31, 2023, 2:22 PM

#

well yea, float.min would have to return some enum

#

i guess in the end there is some value to calc for those multi-type things. cuz I mean, .min could also make sense for length types for example

#

so it's a bit hard to decide on the best way to proceed here

glad urchin Dec 31, 2023, 2:24 PM

#

sturdy sequoia Could that not just be using the `calc.sign` and `calc.abs` to measure only once...

also, btw : https://github.com/PgBiel/typst-tablex/pull/106 👍

GitHub

Use fields from Typst 0.7.0 when possible by PgBiel · Pull Request ...

Should help make calculations accurate, and even improve performance (we don't have to measure twice to find the value of negative em)
Tasks:

Use fields to convert lengths to pt
Use fields t...

feral imp Dec 31, 2023, 3:14 PM

#

@sturdy sequoia https://blocklisted.github.io/blog/arc_str_vs_string_is_it_really_faster/

Arc vs String, is Arc really faster?

Analysis of Arc vs String cloning performance in contentended scenarios.

sturdy sequoia Jan 1, 2024, 7:53 PM

#

glad urchin it's glorious

https://github.com/typst/typst/pull/3117 https://github.com/typst/typst/pull/3118 there you go

GitHub

Added `resolve` method to length. by Dherse · Pull Request #3117 · ...

Adds the .resolve(styles) method to length to allow retrieving a length as absolute (without the em part) when in the right context.
This changes fixes packages like tablex (https://github.com/PgBi...

GitHub

Added `calc.sign` by Dherse · Pull Request #3118 · typst/typst

Closes #3113
I went for the first approach suggested @PgBiel of specializing calc.sign to work differently whether it's a float or an integer. This can of course be trivially changed.

glad urchin Jan 1, 2024, 8:58 PM

#

sturdy sequoia https://github.com/typst/typst/pull/3117 https://github.com/typst/typst/pull/311...

Very epic

#

Thanks 👌

#

I think length.resolve(styles) would probably just become length.resolve() or similar with the context proposal so that's cool

sturdy sequoia Jan 1, 2024, 8:59 PM

#

glad urchin I think length.resolve(styles) would probably just become length.resolve() or si...

yes, I think the change would be relatively small once we have contexts

untold turret Jan 4, 2024, 10:55 PM

#

@sturdy sequoia when I upgrade to new comemo, the performance degrades a lot on small documents.

// baseline (before)
https://github.com/Myriad-Dreamin/typst.ts/commit/71fccd6814be2a1daeea01f23b2567c9fbf5a484
typst_ts_bench_lowering  fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ lower_cached          293.8 µs      │ 1.493 ms      │ 339.6 µs      │ 395.9 µs      │ 100     │ 100
├─ lower_incr            10.73 ms      │ 17.19 ms      │ 11.66 ms      │ 11.89 ms      │ 100     │ 100
├─ lower_the_thesis      128.6 ms      │ 280.2 ms      │ 143.8 ms      │ 152 ms        │ 100     │ 100
╰─ lower_uncached        1.097 ms      │ 2.142 ms      │ 1.214 ms      │ 1.263 ms      │ 100     │ 100

// with threaded comemo (after)
https://github.com/Myriad-Dreamin/typst.ts/commit/52ae23bd8003c73650956148c7a12a29f32f9fc9
typst_ts_bench_lowering  fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ lower_cached          326.9 µs      │ 3.153 ms      │ 419.3 µs      │ 460.9 µs      │ 100     │ 100
├─ lower_incr            23.93 ms      │ 33.19 ms      │ 25.98 ms      │ 26.56 ms      │ 100     │ 100
├─ lower_the_thesis      127.7 ms      │ 149 ms        │ 134.9 ms      │ 135.6 ms      │ 100     │ 100
╰─ lower_uncached        9.389 ms      │ 18.36 ms      │ 10.07 ms      │ 10.48 ms      │ 100     │ 100

Among 93 seconds bench time, there are 60 seconds of time spent on mutex/atomics.
😨 all top 10 time-consumed functions are mutex.

#

But I'm tired on editing thousands of lines of code broken in lastest typst, I may take a rest and come back to explore more how to rescue the performance degrading.

sturdy sequoia Jan 4, 2024, 11:03 PM

#

untold turret But I'm tired on editing thousands of lines of code broken in lastest typst, I m...

what platform are you running on?

untold turret Jan 4, 2024, 11:03 PM

#

I'm on a laptop windows amd64

sturdy sequoia Jan 4, 2024, 11:03 PM

#

Some degradation was to be expected, but that much contention isn't normal 🤔

#

There just isn't anything multithreaded enough

#

what export target?

untold turret Jan 4, 2024, 11:04 PM

#

sturdy sequoia what export target?

My svg export tho

sturdy sequoia Jan 4, 2024, 11:04 PM

#

because only PNG and SVG should ever cause contention

#

makes sense

untold turret Jan 4, 2024, 11:08 PM

#

I have tried to add rayon to make it parallelized in page level or each frame. the result from vtune shows that it doesn't help

#

But I should write a cache manually to avoid using threaded comemo for my export with latest typst, to check which parts of code cause performance degrading in end

quaint blaze Jan 12, 2024, 10:36 AM

#

What's going on with performance stuff rn?

sturdy sequoia Jan 12, 2024, 12:38 PM

#

quaint blaze What's going on with performance stuff rn?

nothing 💀

#

I have been on a bit of a break

quaint blaze Jan 12, 2024, 12:39 PM

#

Ah

sturdy sequoia Jan 13, 2024, 2:31 PM

#

Is it crazy that changing a SINGLE .unwrap() call to an unwrap_unchecked call saves 6% of the total execution time of typst????

glad urchin Jan 13, 2024, 2:33 PM

#

sturdy sequoia Is it crazy that changing a **SINGLE** `.unwrap()` call to an `unwrap_unchecked`...

Daring today, aren't we

#

😂

untold turret Jan 13, 2024, 2:33 PM

#

which check did you skiped?

sly pecan Jan 13, 2024, 2:33 PM

#

sturdy sequoia Is it crazy that changing a **SINGLE** `.unwrap()` call to an `unwrap_unchecked`...

Then you're bypassing safety aren't you?

#

I don't know rust, but unchecked sounds dangerous!

glad urchin Jan 13, 2024, 2:34 PM

#

Me when the Typst produces UB and deletes my entire hard disk

#

Noooooo

sturdy sequoia Jan 13, 2024, 2:34 PM

#

Well to be fair, we check the condition right before calling unwrap

lament fulcrum Jan 13, 2024, 2:34 PM

#

means its some path where the compiler cant statically analyze or optimize the unwrap i suppose

sturdy sequoia Jan 13, 2024, 2:34 PM

#

fn make_mut(&mut self) -> &mut dyn NativeElement {
    let arc = &mut self.0;
    if Arc::strong_count(arc) > 1 || Arc::weak_count(arc) > 0 {
        *arc = arc.dyn_clone();
    }

    unsafe {
        // Safety: We ensured the content is not shared.
        Arc::get_mut(arc).unwrap_unchecked()
    }
}

sly pecan Jan 13, 2024, 2:35 PM

#

sturdy sequoia Well to be fair, we check the condition right before calling `unwrap`

what happens if the condition fails?

sturdy sequoia Jan 13, 2024, 2:35 PM

#

sly pecan what happens if the condition fails?

we clone the arc meaning we create a known-unique arc

#

so no matter what, we know that we can Arc::get_mut safely 🤔

sly pecan Jan 13, 2024, 2:37 PM

#

dumb question: can there be some race condition where it changes from another thread between the check and the unwrap?

sturdy sequoia Jan 13, 2024, 2:38 PM

#

sly pecan dumb question: can there be some race condition where it changes from another th...

probably not? 🤔

#

I mean I don't want to introduce unsafe

#

I just thought it was funny

sly pecan Jan 13, 2024, 2:41 PM

#

further dumb question: can you unwrap without the check beforehand, and instead possibly handle the error after instead?

#

if unwrap does the check anyway

sturdy sequoia Jan 13, 2024, 2:41 PM

#

sly pecan further dumb question: can you unwrap without the check beforehand, and instead ...

no there's some borrow checker limitations 😦

sly pecan Jan 13, 2024, 2:42 PM

#

shouldn't the compiler be smart enough to optimize away the check if it's possible?

sturdy sequoia Jan 13, 2024, 2:43 PM

#

sly pecan shouldn't the compiler be smart enough to optimize away the check if it's possib...

since there's atomic operations involves, it's unlikely

#

I don't know, I'm just struggling to find new avenues for optimization 😦

sly pecan Jan 13, 2024, 2:54 PM

#

Maybe it's mostly fast enough? 🙂

sturdy sequoia Jan 13, 2024, 3:04 PM

#

sly pecan Maybe it's mostly fast enough? 🙂

https://tenor.com/view/pokemon-no-mad-angry-gif-5881426

Tenor

left night Jan 13, 2024, 3:05 PM

#

sly pecan dumb question: can there be some race condition where it changes from another th...

for another thread to do stuff, it needs to have a ref to the arc itself. in that case the strong count would be > 1. so after the first part of the if condition, it's clear that we are the only one with a strong ref. then we check for weak refs (a weak ref can't introduce a strong one) and there are also none. I'd have to ponder a bit more to be really sure with the weak refs, but generally this kind of check is safe.

#

and for 6% perf across the board, it would honestly be worth it. for 6% in some specific doc, maybe not.

sturdy sequoia Jan 13, 2024, 3:09 PM

#

left night for another thread to do stuff, it needs to have a ref to the arc itself. in tha...

Actually no, because it could also upgrade a weak reference

#

but it could happen that a weak reference gets upgraded between the time you checked the strong ref count and the weak

#

that's why Arc::is_unique does some magic stuff to prevent this scenario, unfortunately it's a private method

left night Jan 13, 2024, 3:10 PM

#

ah okay, yeah as said with weak I'm not 100% sure how it works

#

but with just strong, the reasoning works

sturdy sequoia Jan 13, 2024, 3:10 PM

#

left night and for 6% perf across the board, it would honestly be worth it. for 6% in some ...

I found it because I wanted to see what was the hit rate of Content::get_mut, and whether it needed to clone all of the time, which it does clonse in about 35% of the time (on my thesis) and clones 55% of the time in incremental

#

Because I feel that cloning (and the many atomic operations that come with it) is likely still, I believe, a bottleneck

left night Jan 13, 2024, 3:11 PM

#

It clones so much because it sets the guard field

sturdy sequoia Jan 13, 2024, 3:11 PM

#

left night It clones so much because it sets the guard field

yes

left night Jan 13, 2024, 3:11 PM

#

I am (right now) working on extracting the guard, label, location etc. from the individual elements into a generic Packed<T>

#

This is necessary to be able to merge Content and Value

sturdy sequoia Jan 13, 2024, 3:11 PM

#

Which is why I was thinking that if we could extract the guards it could work out really well

sturdy sequoia Jan 13, 2024, 3:12 PM

#

left night This is necessary to be able to merge Content and Value

Why tho? 🤔

left night Jan 13, 2024, 3:12 PM

#

Because e.g. a Color that ends up in the doc also needs to be able to have guards, labels, locations

sturdy sequoia Jan 13, 2024, 3:12 PM

#

Ideally I think Content should generally be immutable with the exception of synthesized fields

left night Jan 13, 2024, 3:12 PM

#

And an int

#

show float etc

sturdy sequoia Jan 13, 2024, 3:12 PM

#

ah right

left night Jan 13, 2024, 3:13 PM

#

I'm making good progress on this and will likely open a PR early next week

tight glade Jan 13, 2024, 3:13 PM

#

That's exciting 🔥

sturdy sequoia Jan 13, 2024, 3:13 PM

#

left night I'm making good progress on this and will likely open a PR early next week

Wow nice 🔥

#

I wonder how it will improve perfs 🤔

#

Or degrade them angryeyes

left night Jan 13, 2024, 3:14 PM

#

For now, I've kept everything in a single Arc (basically Content(Arc<Inner<dyn NativeElement>>) where Inner holds the stuff. I just want to focus on extracting it and keeping all tests green. The, once Value and Content are merged, I want to optimize the representation a bit.

#

E.g. an int shouldn't be stored as Arc<dyn NativeType>. That's horrible.

sturdy sequoia Jan 13, 2024, 3:15 PM

#

Ok, I think that's a good first step

left night Jan 13, 2024, 3:15 PM

#

(The built-in enum will go out the window.)

tight glade Jan 13, 2024, 3:16 PM

#

Not the built-in enum! I rely on it for typstfmt! /Jk

left night Jan 13, 2024, 3:16 PM

#

And I want to make Packed<T> coercable into Shared<T> (just a shared ref without the metadata). Then we can use Shared<Gradient> in the output and don't have to deal with Arcs internally for every type to keep it small and cheap to clone.

#

Shared might just be Arc. Not sure. Perhaps also a new nice type that saves a lazily initialized hash alongside the ref count. That'd be neat.

left night Jan 13, 2024, 3:17 PM

#

tight glade Not the built-in enum! I rely on it for typstfmt! /Jk

The Value enum?

tight glade Jan 13, 2024, 3:18 PM

#

/joke

sturdy sequoia Jan 13, 2024, 3:18 PM

#

left night Shared might just be Arc. Not sure. Perhaps also a new nice type that saves a la...

Maybe there is a way where this could use a cheaper non-weak supporting Arc?

#

like the one from triomphe?

left night Jan 13, 2024, 3:19 PM

#

sturdy sequoia Maybe there is a way where this could use a cheaper non-weak supporting `Arc`?

If it would store a hash, it would need to be hand-rolled anyway.

#

Probably at least

#

I would probably also write it in a way where the Arc pointer points directly to the data and the header is in front of it. That way Deref is a no-op.

#

EcoVec does that

lunar kettle Jan 13, 2024, 4:00 PM

#

left night `show float` etc

oh damn then you could do stuff like configuring how floats are formatted?

left night Jan 13, 2024, 4:00 PM

#

lunar kettle oh damn then you could do stuff like configuring how floats are formatted?

yes, after content and value are merged

lunar kettle Jan 13, 2024, 4:00 PM

#

thats awesome 😄

#

looking forward to that!

left night Jan 13, 2024, 4:00 PM

#

also big win for datetime

#

because language aware

lunar kettle Jan 13, 2024, 4:01 PM

#

once there are get rules you could then do something like get the current language and format the float depending on that right?

left night Jan 13, 2024, 4:01 PM

#

yes

lunar kettle Jan 13, 2024, 4:01 PM

#

awesome 😄

left night Jan 13, 2024, 4:01 PM

#

as long as you don't format the float into a string eagerly

#

it basically needs to stay a float and end up in the content tree

lunar kettle Jan 13, 2024, 4:05 PM

#

hmm how would you format it without turning it into a string though?

left night Jan 13, 2024, 4:37 PM

#

lunar kettle hmm how would you format it without turning it into a string though?

In the float show rule you can turn it into a string. But if you do [#str(5.0)] then that string conversion can't be language aware and the float show rule will never run.

lunar kettle Jan 13, 2024, 4:38 PM

#

ahh gotcha!

crystal girder Jan 15, 2024, 4:39 AM

#

absolutely, yes

sturdy sequoia Jan 15, 2024, 4:46 PM

#

@left night I've done some preliminary testing on your PR and here are some notes

#

While compiles (cold & hot) are pretty unaffected (~2% slower),

#

Separate arcs for the metadata and dyn trait does give some gains, especially in incremental

#

Using a box is worse

#

inlining the value all into the Value is actually not nearly as bad as one might expect, being even faster in some incremental tests!

#

Overall, I like the change, and I think it opens the door to better performance and API (Value = Content rework) in the future

#

What I am mostly curious is if we can use this to make Content<T> where T defaults to dyn Bounds such that a lot of elements could instead of having a Content, or a RawLine could have a Content<RawLine> which would avoid needing to reallocate.

#

I also think we could try to have the dyn Bounds be Prehashed to gain a bit of performance and remove all other prehashed

left night Jan 15, 2024, 5:24 PM

#

sturdy sequoia - inlining the value all into the `Value` is actually not nearly as bad as one m...

Could you clarify what you mean with that?

left night Jan 15, 2024, 5:25 PM

#

sturdy sequoia What I am mostly curious is if we can use this to make `Content<T>` where `T` de...

This one I also don't really understand. Isn't that more or less Packed<T>?

sturdy sequoia Jan 15, 2024, 5:25 PM

#

left night This one I also don't really understand. Isn't that more or less `Packed<T>`?

no my idea is to keep the Arc but specialize it (which you can do without re-allocating it) to avoid doing cloning in a few elements

sturdy sequoia Jan 15, 2024, 5:26 PM

#

left night Could you clarify what you mean with that?

like having one bigger struct where all of the elements are in the stack and only the dyn bounds is in an Arc

left night Jan 15, 2024, 5:26 PM

#

sturdy sequoia like having one bigger struct where all of the elements are in the stack and onl...

ah, basically making Value huge?

sturdy sequoia Jan 15, 2024, 5:26 PM

#

left night ah, basically making `Value` huge?

yep

left night Jan 15, 2024, 5:26 PM

#

sturdy sequoia no my idea is to keep the `Arc` but specialize it (which you can do without re-a...

but Packed<T> also doesn't reallocate?

sturdy sequoia Jan 15, 2024, 5:27 PM

#

left night but `Packed<T>` also doesn't reallocate?

OOOOOOHHHHHHHHHHHHHHHHHHHH

#

Right

#

you're right

left night Jan 15, 2024, 5:28 PM

#

sturdy sequoia OOOOOOHHHHHHHHHHHHHHHHHHHH

It is basically exactly what you describe, just layout-compatible with Content

#

Which is nice because we can reinterpret a &Content as a &Packed<T>

#

I guess Content<T> might also have worked instead of Packed<T> with #[repr(transparent)] but what I like is that it doesn't depend on any Rust type system coercion foo.

#

It leaves flexibility for how things are implemented.

sturdy sequoia Jan 15, 2024, 5:30 PM

#

Yes you're right

left night Jan 15, 2024, 5:31 PM

#

I have a half-finished blog post with a bunch of ideas on how to implement an efficient Value encoding that also supports user-defined types. Unfortunately, it is half finished :(

#Performance