Performance | Typst | Page 5

atomic violet Feb 5, 2024, 3:11 PM

#

I removed the Prehashed<...> thing from it and it's like 3 times faster now lmao

#

@sturdy sequoia wake up I found the hasher

#

so we are like 5 times slower than python now 🤔

sturdy sequoia Feb 5, 2024, 3:32 PM

#

atomic violet I removed the `Prehashed<...>` thing from it and it's like 3 times faster now lm...

WHICH ONE

#

😮

atomic violet Feb 5, 2024, 3:33 PM

#

sturdy sequoia WHICH ONE

here

#

in Method

sturdy sequoia Feb 5, 2024, 3:34 PM

#

oh no

#

😭

#

I am so dumb

#

Well done ❤️

atomic violet Feb 5, 2024, 3:34 PM

#

You did better ❤️

lunar kettle Feb 5, 2024, 3:37 PM

#

is it faster than main now? 👀

atomic violet Feb 5, 2024, 3:37 PM

#

It is!

lunar kettle Feb 5, 2024, 3:37 PM

#

niice

#

how much?

atomic violet Feb 5, 2024, 3:38 PM

#

like 1.5x maybe

#

something like that

lunar kettle Feb 5, 2024, 3:38 PM

#

i mean that's something!

sturdy sequoia Feb 5, 2024, 3:38 PM

#

Not great not terrible

lunar kettle Feb 5, 2024, 3:38 PM

#

baby steps 😄

atomic violet Feb 5, 2024, 3:43 PM

#

callgrind now reports that the slowest function in the projects is Value::hash?? Somehow??...

#

I don't think I trust it though

#

it's not a very smart function 🤔

feral imp Feb 5, 2024, 4:24 PM

#

lunar kettle baby steps 😄

https://tenor.com/view/putting-in-the-work-what-gif-21519782

Tenor

sturdy sequoia Feb 5, 2024, 4:27 PM

#

atomic violet it's not a very smart function 🤔

I know right

sturdy sequoia Feb 5, 2024, 4:27 PM

#

atomic violet callgrind now reports that the slowest function in the projects is Value::hash??...

yes, there is still some hashing going on

#

somewhere

#

||over the rainbow||

atomic violet Feb 5, 2024, 4:28 PM

#

I don't see it

#

all the calees of hash are call_closure

#

(which is expected)

sturdy sequoia Feb 5, 2024, 4:28 PM

#

yeah but closures are wayyyy too slow

#

I need to improve that

atomic violet Feb 5, 2024, 4:29 PM

#

enter_scope is also pretty slow

#

like 1/4 of what closure is

sturdy sequoia Feb 5, 2024, 4:29 PM

#

atomic violet `enter_scope` is also pretty slow

which is weird 'cause it does... nothing

#

except some std::mem::swap

atomic violet Feb 5, 2024, 4:29 PM

#

Well.. yeah, it does swaps

sturdy sequoia Feb 5, 2024, 4:30 PM

#

they should be cheap, no?

#

unless it's copying ungodly amounts of ram

#

(did someboday say "wrap it in a box"?)

atomic violet Feb 5, 2024, 4:30 PM

#

let me check the code, idk 🤔

sturdy sequoia Feb 5, 2024, 4:30 PM

#

On my machine it's now about twice as fast as main 🎉

#

(mind you I have a more optimized version than you, I'll push gimme a minute)

#

There you go, I just pushed 😉

atomic violet Feb 5, 2024, 4:31 PM

#

the Joiner is not that small 🤔

sturdy sequoia Feb 5, 2024, 4:32 PM

#

True, it's surprisingly large

#

but like that's just a single memcpy, right?

#

RIGHT?

atomic violet Feb 5, 2024, 4:34 PM

#

it should be, I think

#

I want to disassemble it but I am scared of going through inlined garbage

#

does every iteration of the loop requires enter_scope?

sturdy sequoia Feb 5, 2024, 4:35 PM

#

atomic violet does every iteration of the loop requires `enter_scope`?

no, just once to pass the iterator

atomic violet Feb 5, 2024, 4:35 PM

#

ok I see

sturdy sequoia Feb 5, 2024, 4:35 PM

#

I have my meeting with the Ayar Labs folk in 25 minutes

#

I am scared

#

uwu but sad

atomic violet Feb 5, 2024, 4:36 PM

#

sad? why?

sturdy sequoia Feb 5, 2024, 4:37 PM

#

well more like terrified

#

so

#

😱

atomic violet Feb 5, 2024, 4:37 PM

#

you got this

sturdy sequoia Feb 5, 2024, 4:39 PM

#

Thanks bro ❤️

feral imp Feb 5, 2024, 4:39 PM

#

the more meetings, the less demand is going to be on you fam.. Just breath and be yourself.

atomic violet Feb 5, 2024, 4:39 PM

#

yeah

#

if you can write a typst vm, you 100% can uhm...

#

give photons a little push to turn a little? idk

sturdy sequoia Feb 5, 2024, 4:42 PM

#

My thesis now compiles 12% faster!

#

We're getting somewhere

atomic violet Feb 5, 2024, 4:45 PM

#

atomic violet I want to disassemble it but I am scared of going through inlined garbage

it allocates about a kilobyte of stack space, and then takes its sweet time copying what I assume is joiner with ~20 SIMD instructions

#

so seemingly nothing special about it

#

... and then run is inlined so it makes so sense afterwards 😭

#

An interesting thing to do may be to memoize hash of value right in that same value, possibly with interior mutability, because it seems to me that same objects get hashed over and over again. In my call graph it goes up to 7 levels of hashing deep, and I have no idea how is it even possible. I don't have anything 7 levels deep in my raytracer, I think 🤔

#

about 1/6 of all memory reads is spent hashing level 2 objects actually

#

this is a weird call graph though 🤔

sturdy sequoia Feb 5, 2024, 4:57 PM

#

atomic violet this is a weird call graph though 🤔

yes, I think that comemo::memoize for small functions is kinda bad

atomic violet Feb 5, 2024, 4:58 PM

#

if big functions accept deeply nested objects and hash them over and over again, than it's just as bad

sturdy sequoia Feb 5, 2024, 4:58 PM

#

atomic violet if big functions accept deeply nested objects and hash them over and over again,...

yes, I think we should only memoize bigger functions

#

because it might make most of the cost of execution being hashing when computing the function is cheaper

atomic violet Feb 5, 2024, 4:59 PM

#

well... yeah, I suppose this would be a start

sturdy sequoia Feb 5, 2024, 4:59 PM

#

it's fairly easy to do now since we can just put a threshold on the number of instructions

atomic violet Feb 5, 2024, 4:59 PM

#

you can do some simple complexity analysis

#

ACTUALLY

#

It's quite powerful

#

because you can introduce "move-sematics"

#

or something like that

#

because small functions may not require clone on write

#

especially when they are not memoized

#

so maybe it's possible to not copy objects => to not bump reference counter => to make writes in them cheap because they no longer require clone

atomic violet Feb 5, 2024, 5:03 PM

#

atomic violet it allocates about a kilobyte of stack space, and then takes its sweet time copy...

I added inline(never) to run and now it does not use SIMD for some reason

#

I love compilers

left night Feb 5, 2024, 5:13 PM

#

atomic violet An interesting thing to do may be to memoize hash of value right in that same va...

I wanted to try and do something in that direction while merging content and value. Since everything needs to change then, it's a good opportunity to introduce some lazy prehashing.

#

Right now, with Value being an enum, it might be tricky

atomic violet Feb 5, 2024, 5:13 PM

#

Yeah I saw that

#

I wonder whether it's possible to gain something by just hashing, say, dicts and arrays

#

because if I understand correctly: 1. They are used very actively; and 2. They do require recursive hashing, and it is pretty slow

atomic violet Feb 5, 2024, 5:33 PM

#

ok no sorry I won't do that right now - turns out it's not that simple 😅

tight glade Feb 5, 2024, 8:48 PM

#

There was a release mode? (Jkjk)

tight glade Feb 5, 2024, 8:53 PM

#

atomic violet here

Where's the hashing? 😅

sturdy sequoia Feb 5, 2024, 9:01 PM

#

tight glade There was a release mode? (Jkjk)

They had us bamboozled the whole time!

#

Ok, so, with main it runs 155e9 instructions while with the VM it runs 76e9 instructions

#

quite an improvement @atomic violet

#

Oof

#

it literally OOM'ed 😦

#

I just wanted higher resolution uwu

#

oh no my WSL just OOM'ed

#

like, the entire thing

#

@atomic violet that's the highest resolution I could get 😐

tight glade Feb 5, 2024, 9:29 PM

#

sturdy sequoia Ok, so, with main it runs 155e9 instructions while with the VM it runs 76e9 inst...

Are there instructions in main? 🤔🤔

sturdy sequoia Feb 5, 2024, 9:29 PM

#

tight glade Are there instructions in main? 🤔🤔

like real CPU instructions :-p

#

you 🥔

#

❤️

feral imp Feb 5, 2024, 9:31 PM

#

So... It is twice as fast?
Also, typesetting is what we are here for, but this might be the ideal thing to build a vm. I still don't really get it.

Thesis still at 14%?

#

(new phone)

tight glade Feb 5, 2024, 9:32 PM

#

sturdy sequoia like real CPU instructions :-p

Oh yea should have looked at the numbers 😂

sturdy sequoia Feb 5, 2024, 9:32 PM

#

Hey, I did some improvement, it's 10% faster than it was five minutes ago (relative to that)

sturdy sequoia Feb 5, 2024, 9:32 PM

#

feral imp So... It is twice as fast? Also, typesetting is what we are here for, but this m...

lemme test

#

it's actually only 10% faster since main is a bit faster than before (some changes by @left night)

#

but I mean it's 10% in my thesis, in something like @atomic violet's raytracer is twice as fast

#

it also uses a ton less ram

#

which is nice

feral imp Feb 5, 2024, 9:34 PM

#

But dude didn't you say eval was only 20 pct if your thesis anyways?

sturdy sequoia Feb 5, 2024, 9:35 PM

#

feral imp But dude didn't you say eval was only 20 pct if your thesis anyways?

I since updated the version of "main" I am comparing to

#

I was like 40-ish commits late 💀

feral imp Feb 5, 2024, 9:36 PM

#

Alright. Taken ram usage down a notch would be great too............

sturdy sequoia Feb 5, 2024, 9:44 PM

#

feral imp Alright. Taken ram usage down a notch would be great too............

it's about a 35% reduction on elteammate's doc

#

and I just gained another ~5% on the raytracer 😉

#

I mean it brings my thesis down to 5.5s compile time

#

that's pretty darn good if I do say so myself

#

down to just 55B instructions (real ones) from 155B on the latest main!

#

And I think there is still room

sturdy sequoia Feb 5, 2024, 10:05 PM

#

and on my thesis it goes from 120B isr down to 115B

#

sly pecan Feb 5, 2024, 10:27 PM

#

sturdy sequoia I mean it brings my thesis down to 5.5s compile time

what about watch? or is this unaffected by this

sturdy sequoia Feb 5, 2024, 10:28 PM

#

sly pecan what about watch? or is this unaffected by this

Lemme try and get back to you, from what I can tell (I tested earlier) it's slightly improved too

#

Unfortunately I have a (vscode?) bug that prevents me from getting good measurements in incremental because it somehow saves the files twice

#

angryeyes

sturdy sequoia Feb 5, 2024, 11:13 PM

#

I lost some performance (for better code quality & clarity) but I regained it somewhere lese sunglassed_crying

low sapphire Feb 6, 2024, 7:09 AM

#

sturdy sequoia I since updated the version of "main" I am comparing to

Race between main and vm science

cunning wadi Feb 6, 2024, 7:11 AM

#

sturdy sequoia Unfortunately I have a (vscode?) bug that prevents me from getting good measurem...

This is probably the same bug as https://github.com/typst/typst/issues/3312

GitHub

`error: file not found (searched at )` after 6999be · Issue #3312 ·...

Description After commit 6999be9, running typst watch <file.typ> <file.pdf> shows the error error: file not found (searched at <file.typ>) whenever the file is saved (with or with...

sturdy sequoia Feb 6, 2024, 10:24 AM

#

cunning wadi This is probably the same bug as https://github.com/typst/typst/issues/3312

Ah, interesting, well yes it's since I merged with main that I've had the issue 💀

left night Feb 6, 2024, 11:36 AM

#

cunning wadi This is probably the same bug as https://github.com/typst/typst/issues/3312

I'm pretty sure that this change is the cause: https://github.com/typst/typst/pull/2665/files#diff-ea95583ebe87e25ee323e9ad5d1a654f2fbf243e9f4d1727c49d7271585149bdL40-R46

sturdy sequoia Feb 6, 2024, 12:14 PM

#

left night I'm pretty sure that this change is the cause: <https://github.com/typst/typst/p...

How would you modify it such that it works again?

#

Because if so I don't mind fixing it locally and testing it

#

(since I can reproduce the issue)

#

I can confirm that it fixes it

#

I'll open a PR right now 😉

left night Feb 6, 2024, 12:28 PM

#

sturdy sequoia I'll open a PR right now 😉

Thanks for testing! I'll wait for @cunning wadi to say what the original idea behind the change was in case some change was necessary after all.

sturdy sequoia Feb 6, 2024, 1:31 PM

#

@left night Have you thought about adding a swap_remove method to EcoVec, my reasoning is that for arguments, we use a linear lookup into an array of Args and then we remove them, but that means that for each argument we eat, consume, etc. we do a full memcpy of the contents of the arguments. While that is not necessarily slow, it's wasted CPU instructions & memory bandwidth as opposed to a swap_remove!

#

I've actually done a ton of optimizations to other aspects of closures and this seems to be the main remaining bottleneck

left night Feb 6, 2024, 2:11 PM

#

sturdy sequoia <@311948531835469827> Have you thought about adding a `swap_remove` method to `E...

Wouldn't that mess with the order of positional arguments?

sturdy sequoia Feb 6, 2024, 2:12 PM

#

left night Wouldn't that mess with the order of positional arguments?

isn't their order stored in the actual Arg struct?

#

thinkinglare

#

Or am I stupid?

#

'cause then you're probably right

#

😭

#

but but memcpy 😐

#

Ok, the VM is on the verge of being 3x faster than main

#

💪

#

and the code is still readable

left night Feb 6, 2024, 2:13 PM

#

sturdy sequoia isn't their order stored in the actual `Arg` struct?

nope

sturdy sequoia Feb 6, 2024, 2:14 PM

#

left night nope

unfortunate I suppose

left night Feb 6, 2024, 2:14 PM

#

It would require an iterator style approach

#

or we could leave behind empty slots or something

sturdy sequoia Feb 6, 2024, 2:15 PM

#

left night or we could leave behind empty slots or something

that could work I suppose, but the overhead might be higher than the gains

#

I'll try that real quick

sturdy sequoia Feb 6, 2024, 2:51 PM

#

@atomic violet if you wanna try it I just pushed loads of optimizations 😎

sly pecan Feb 6, 2024, 4:56 PM

#

so what is the status of The Thesis™️ ?

#

@sturdy sequoia

sturdy sequoia Feb 6, 2024, 4:56 PM

#

sly pecan so what is the status of The Thesis™️ ?

it compiles 😎

#

Without the math equations *

sly pecan Feb 6, 2024, 4:57 PM

#

sturdy sequoia Without the math equations *

did you break it?

sturdy sequoia Feb 6, 2024, 4:57 PM

#

sly pecan did you break it?

no, math functions have special semantics I haven't replicated yet

#

the information is in the instruction, I just don't use it yet

sly pecan Feb 6, 2024, 4:57 PM

#

sturdy sequoia <@284257720406638594> if you wanna try it I just pushed loads of optimizations �...

best optimization is just not rendering some stuff 👌

sturdy sequoia Feb 6, 2024, 4:58 PM

#

sly pecan best optimization is just not rendering some stuff 👌

I have literally three equations in my thesis 😂

#

Only two of them are disabled

#

three equations that use functions *

sly pecan Feb 6, 2024, 4:58 PM

#

so how much faster is it? 20%?

sturdy sequoia Feb 6, 2024, 4:58 PM

#

sly pecan so how much faster is it? 20%?

about 10% total 😐

#

Which isn't much

#

but I think it can be further improved by decreasing the cost of instantiating a VM

#

which is possible mind you

#

But, it uses 32% less RAM

#

which is amazing too imo

#

and well worth it

feral imp Feb 6, 2024, 5:00 PM

#

I think a thesis shouldn't consume 2 gigs in the first place... So honestly, it might be very good.......

sturdy sequoia Feb 6, 2024, 5:00 PM

#

feral imp I think a thesis shouldn't consume 2 gigs in the first place... So honestly, it ...

I think that with selective memoization based on function complexity, we might be able to reach even lower!

sly pecan Feb 6, 2024, 5:01 PM

#

2 GB for an entire thesis isn't that bad

feral imp Feb 6, 2024, 5:01 PM

#

sly pecan 2 GB for an entire thesis isn't that bad

If you've got no way of lowering that? Sure. But if you can get that down a notch, that may justify a vm by itself.

left night Feb 6, 2024, 5:16 PM

#

sturdy sequoia But, it uses 32% less RAM

do you know why it does that? I would expect comemo usage to be roughly the same and shouldn't the active evaluation be dwarfed by the comemo cache?

sturdy sequoia Feb 6, 2024, 9:29 PM

#

left night do you know why it does that? I would expect comemo usage to be roughly the same...

I know right

#

I have no idea why it does that

#

Weird, I retook my measurements and got different results from yesterday

#

It's still lower, but only by 10%

#

🤔

#

Which I guess makes more sense

#

It gets slightly better when factoring incremental going to ~15% better after a few changes

#

so there is definitely something here

#

just less than I had measured

#

weird

#

Maybe I misread? 💀

#

Now I confirm, on @atomic violet's raytracer it's 32% better

#

so that number didn't come from my thesis but the ray tracer

#

@atomic violet that's awesome

low sapphire Feb 6, 2024, 9:55 PM

#

webgpu support when?

feral imp Feb 6, 2024, 9:55 PM

#

low sapphire webgpu support when?

DOOM port when.

sturdy sequoia Feb 6, 2024, 10:44 PM

#

feral imp DOOM port when.

to be fair, with the new VM, you know...

sturdy sequoia Feb 7, 2024, 10:27 AM

#

https://tenor.com/view/dj-khaled-another-one-gif-26093316

Tenor

#

And by one I mean 10% on the raytracer

#

😎

#

I managed to remove sooooooo much cloning

tight glade Feb 7, 2024, 10:32 AM

#

Nice! ❤️

surreal hemlock Feb 7, 2024, 10:34 AM

#

sturdy sequoia I managed to remove sooooooo much cloning

Typst: the clone wars

sturdy sequoia Feb 7, 2024, 11:23 AM

#

surreal hemlock Typst: the clone wars

That's the name I was going to give to the commit 😂

#

Ok my thesis is now 12% faster (which doesn't seem like a lot, but it's nice anyway)

#

I have done tons of improvement to show and set rule handling

#

reducing cloning, complexity, etc.

#

@left night what do you think of making the argument names as PicoStr

#

since those are pretty much always very similar

#

it would make argument matching much cheaper overall

left night Feb 7, 2024, 11:41 AM

#

sturdy sequoia <@311948531835469827> what do you think of making the argument names as `PicoStr...

we can try

sturdy sequoia Feb 7, 2024, 12:53 PM

#

left night we can try

It's actually a fairly simple change, I modified the macros to directly build the PicoStr using a static lazy, and then I made a pico macro to do the same everywhere else (to avoid rehashing the keys everytime)

left night Feb 7, 2024, 1:06 PM

#

sturdy sequoia It's actually a fairly simple change, I modified the macros to directly build th...

It is so sad that we need to build PicoStr in a lazy static instead of being able to do it at compile time :/

sturdy sequoia Feb 7, 2024, 1:07 PM

#

left night It is so sad that we need to build PicoStr in a lazy static instead of being abl...

ikr, I was thinking that maybe there would be a way to do that

#

but it would not be wasm compatible afaik 😦

left night Feb 7, 2024, 1:08 PM

#

You know what's funny: The wasm binary is in some respects optimized in ways the local one cannot be.

#

Specifically, when control flow depends on comparing addresses of statics, which happens when checking which kind of element some content is

sturdy sequoia Feb 7, 2024, 1:10 PM

#

I didn't know 🤔

#

Why tho?

left night Feb 7, 2024, 1:10 PM

#

Because the native binary is ASLR-compatible, so the addresses are resolved at startup time

#

Meanwhile, for WASM the Rust compiler does also only provide the addresses at link time (so post-optimization) BUT after that we run wasm-opt which can work with the constants.

sturdy sequoia Feb 7, 2024, 1:11 PM

#

of course!

#

That's actually kind of awesome 😂

#

So named arguments as PicoStr, did it help? yes, but not as much as I was hoping 😦

#

I expected that it would make hashing Args cheap enough to help, but it seems that generally the values of the args are the expensive bit 😦

#

Is it worth it? I think so because it also re-uses existing systems we have in place for other, similar things, but I am dissapointed that it doesn't do more 😦

#

It does improve memory usage by about 80MB on my thesis

#

so it's something 😂

left night Feb 7, 2024, 1:20 PM

#

seems like it's worth it

left night Feb 7, 2024, 1:20 PM

#

sturdy sequoia I expected that it would make hashing `Args` cheap enough to help, but it seems ...

I think a lazy hash in the future non-inline Value repr would go a long way

sturdy sequoia Feb 7, 2024, 1:30 PM

#

left night I think a lazy hash in the future non-inline Value repr would go a long way

I fully agree

#

I actually implemented something similar for closures: they have a lazy Value::Func(_) inside of them so that when they get called multiple times, it doesn't need to reallocate a Repr::Closure

#

BTW, using PicoStr on @atomic violet saved around 300M instructions out of 49B instructions (real CPU ones that is)

feral imp Feb 7, 2024, 1:45 PM

#

0.61224489795 %. Not bad.

#

sturdy sequoia Feb 7, 2024, 1:48 PM

#

feral imp

don't make fun of me µwµ

feral imp Feb 7, 2024, 1:48 PM

#

sturdy sequoia don't make fun of me µwµ

I'm not. When reporting metrics, it is better to represent the numbers as clear as possible.

sturdy sequoia Feb 7, 2024, 1:48 PM

#

I just gained another 500M instructions 😎

feral imp Feb 7, 2024, 1:49 PM

#

I'd love time in between commits as well, then I can make a timeseries of the reduced instructions..

sly pecan Feb 7, 2024, 1:49 PM

#

It all adds up

#

500 million here, another 500 million there

#

and baby you've got a stew goin

sturdy sequoia Feb 7, 2024, 1:50 PM

#

Especially since the changes I'm doing are not very intrusive

#

I'm doing my best to keep the code readable and explain why I'm doing what I'm doing

left night Feb 7, 2024, 1:54 PM

#

sturdy sequoia I actually implemented something similar for closures: they have a lazy `Value::...

can you explain that more?

#

where would it allocate multiple times?

#

also: regarding performance improvements. If some of the ones you are doing are independent of the VM (e.g. PicoStr), it would be great if those could be done independently in PRs

sturdy sequoia Feb 7, 2024, 1:57 PM

#

essentially, closures are treated in two steps, when they're compiled, they become a CompiledClosure, then they get instantiated (the idea here is to pre-do as much work on the closure as possible, and doing the capture "in situe" since we need the actual values to capture). After instantiation you have a Closure (which is ready to be called). That process is memoized, but since a named closure is declared within itself (to allow recursion), that process is stored inside of the Closure using a OnceCell, the idea here being that it enables me to avoid the Prehashed<...> to be done again neededlessly on subsequent instantiation of the closure

sturdy sequoia Feb 7, 2024, 1:57 PM

#

left night also: regarding performance improvements. If some of the ones you are doing are ...

i'll try doing that this week then

feral imp Feb 7, 2024, 1:57 PM

#

left night also: regarding performance improvements. If some of the ones you are doing are ...

Depending on the number, it will be rebasing fest ~~/ bitch~~ then.

sturdy sequoia Feb 7, 2024, 2:39 PM

#

Another 2B instructions spared 😎

#

(real world instruction that is *)

sturdy sequoia Feb 7, 2024, 6:44 PM

#

@left night I did end up trying the dynamic size register thing (6, 16, 32, and then infinite using a vec) and on the thesis it's pretty awesome: we are now 15% faster!

#

So an extra 5% just from that change

#

!!!

#

And the size of the binary barely increased (about 200KB)

left night Feb 7, 2024, 6:47 PM

#

sturdy sequoia <@311948531835469827> I did end up trying the dynamic size register thing (6, 16...

wait, so is this with generics or something like smallvec?

sturdy sequoia Feb 7, 2024, 6:47 PM

#

left night wait, so is this with generics or something like smallvec?

generics

#

smallvec gave barely any improvements

left night Feb 7, 2024, 6:48 PM

#

I am confused why this would give a benefit

sturdy sequoia Feb 7, 2024, 6:48 PM

#

left night I am confused why this would give a benefit

no allocation for small/cheap closures is my guess

#

I'll profile it now

left night Feb 7, 2024, 6:48 PM

#

did you try limiting the amount of registers during compilation, but keeping vec storage?

sturdy sequoia Feb 7, 2024, 6:48 PM

#

left night did you try limiting the amount of registers during compilation, but keeping vec...

that's already the case, it agressively re-uses registers as soon as it can with the exception of function arguments

#

for example the whole of tablex compiles with a max number of registers of ~40

left night Feb 7, 2024, 6:49 PM

#

I have this feeling that the gain can also be had without generics

#

not sure how though

sturdy sequoia Feb 7, 2024, 6:50 PM

#

left night I have this feeling that the gain can also be had without generics

maybe, but I don't know how

left night Feb 7, 2024, 6:51 PM

#

so does it pick between the VM sizes at runtime based on the number of registers used in compilation?

#

what if you use a slice pointing to an array on the stack instead of a smallvec?

sturdy sequoia Feb 7, 2024, 6:52 PM

#

So, yes, it looks how many registers it used and uses an array (or a vec if too big) based on that

#

it behaves like a slice during evaluation regardless whether it's an array or a vec

#

my idea is that small functions and closures that get called often will be more effective due to not allocating a Vec and using the stack instead

#

another theory is that it gets cached immediately in these small closures 🤔

left night Feb 7, 2024, 6:54 PM

#

left night what if you use a slice pointing to an array on the stack instead of a smallvec?

if that's the cause, then this should also help

sturdy sequoia Feb 7, 2024, 6:55 PM

#

left night if that's the cause, then this should also help

I was already no longer using a smallvec btw

#

because it gave no gains but made it use way more stack

left night Feb 7, 2024, 6:55 PM

#

Okay but still

#

using a slice would get rid of generics while keeping it non-allocating

sturdy sequoia Feb 7, 2024, 6:56 PM

#

left night using a slice would get rid of generics while keeping it non-allocating

ah I see what you mean!

#

Oh yes, that's even better, it's the same "advantage" but doesn't use generics

#

nice

#

advantage * not complexity

#

my brain is fried 💀

sly pecan Feb 7, 2024, 7:12 PM

#

sturdy sequoia <@311948531835469827> I did end up trying the dynamic size register thing (6, 16...

To play devil's advocate, it would be good to test on more than one document in case there are regressions for other things 😉

#

(sounds great though!)

sturdy sequoia Feb 7, 2024, 7:47 PM

#

sly pecan To play devil's advocate, it would be good to test on more than one document in ...

Oh you're absolutely right, I also use @atomic violet's raytracer, definitely the most representative document out there ||/s||

atomic violet Feb 7, 2024, 7:49 PM

#

it's representative of 70% of my documents, so as a typical typst user, I agree

surreal hemlock Feb 7, 2024, 7:56 PM

#

@sturdy sequoia I can send you some notes I made for a course (around 35 pages), if you want. It is mostly text and math equations with a few cetz drawings. Could be another interesting testing document. Last time I checked, cold compilation was around 1 sec

sturdy sequoia Feb 7, 2024, 7:56 PM

#

surreal hemlock <@130737672951037952> I can send you some notes I made for a course (around 35 p...

sure, please feel free too

#

I will need to fix math then 😂

sly pecan Feb 7, 2024, 7:57 PM

#

sturdy sequoia sure, please feel free too

Did I send you a polylux presentation at one point?

#

I think I did

surreal hemlock Feb 7, 2024, 8:10 PM

#

I send them here in case anyone else wants to use them

📎 refs.bib 📎 notes.typ

sturdy sequoia Feb 7, 2024, 9:25 PM

#

surreal hemlock I send them here in case anyone else wants to use them

Every topological manifold has a countable basis of precompact coordinate balls.

#

lol

#

coordinate balls

#

lmao

#

😂

sly pecan Feb 7, 2024, 9:32 PM

#

https://encyclopediaofmath.org/wiki/Wiener_measure

Wiener measure

sturdy sequoia Feb 7, 2024, 9:37 PM

#

sly pecan https://encyclopediaofmath.org/wiki/Wiener_measure

😂

#

Wiener measure

#

lol

sturdy sequoia Feb 7, 2024, 10:59 PM

#

Hey, I was reading that @untold turret 😂

#

Do you mean that I could use a thread-local pools of "pre-used" vecs?

#

Because I cannot use only one vec forever 🤔

#

(since I can call functions recursively)

untold turret Feb 7, 2024, 11:00 PM

#

sturdy sequoia Oh yes, that's even better, it's the same "advantage" but doesn't use generics

After replace generic const sized register array from vm, you may use a thread local register pool to reduce allocation furthermore. Another thing is to reduce size of the vmstate struct. we have so many referencing slices to module.inner when constructing VMState. We can reduce the size by replacing module.inner.* to a single module.inner or module reference.

sturdy sequoia Feb 7, 2024, 11:01 PM

#

untold turret After replace generic const sized register array from vm, you may use a thread l...

that's true, I could harmonize Inner to be the same for functions and closures, it would make it smaller quite nicely imo

#

functions and modules *

sturdy sequoia Feb 7, 2024, 11:02 PM

#

untold turret After replace generic const sized register array from vm, you may use a thread l...

I'll do that right now 😎

untold turret Feb 7, 2024, 11:02 PM

#

🐱then vmstate will be cheap to allocate I think.

untold turret Feb 7, 2024, 11:17 PM

#

sturdy sequoia Do you mean that I could use a thread-local pools of "pre-used" vecs?

Yes, I see you uses a vec! to allocate them. You could give a try, because I got better performance in my c++ executor by reusing large objects in struct.

untold turret Feb 7, 2024, 11:24 PM

#

sturdy sequoia Because I cannot use only one vec forever 🤔

A vec of vec registers in thread local may simply help. We can "pop_or_default" it to get a vec register and push it for future reusing. We can have three that for 6,16,32-sized vec registers, but not for variable sized vec registers.

onyx furnace Feb 7, 2024, 11:41 PM

#

sturdy sequoia Do you mean that I could use a thread-local pools of "pre-used" vecs?

does this make sense? i feel like this should be handled in allocator🤔

#

good allocators maintain a arena of different size and other complex structs. in some simple cases like

for (...) {
vector<..> v
}

the memory will very likely be reused(maybe also thanks to optimizer)

#

i didnt read the whole thread though. just feel recent thing sounds weird.🙏

untold turret Feb 8, 2024, 12:35 AM

#

onyx furnace good allocators maintain a arena of different size and other complex structs. in...

if you think of that then we don't have to use many other allocators in rust, bumpalo, arena etc. In general a default allocator is fast, but other allocators are even faster in specific pattern.

sturdy sequoia Feb 8, 2024, 9:52 AM

#

So, I was in bed, but I did do the change, it seems to me to be positive overall, I introduced some bugs while doing it which I haven't debugged yet but it definitely makes the code shorter and easier to follow!

sturdy sequoia Feb 8, 2024, 2:13 PM

#

@left night any idea what might be causing this error:

#

#

The span is clearly valid since it can make an error from it

#

angrythunk

left night Feb 8, 2024, 3:00 PM

#

sturdy sequoia The span is clearly valid since it can make an error from it

The trace has a span. The error doesn't.

#

The argument span seems to be missing

sturdy sequoia Feb 8, 2024, 3:08 PM

#

left night The argument span seems to be missing

hmm

#

it's weird because I think it's there

sturdy sequoia Feb 8, 2024, 3:40 PM

#

@left night you were right, I was reading the wrong spans and out of chance it was reading a Span::detached() 💀

sturdy sequoia Feb 8, 2024, 4:25 PM

#

While doing some cleanup I've also managed to gain about 1B real CPU instructions on @atomic violet's raytracer

#

uwu

#

that's like 3%, but it's something!

feral imp Feb 8, 2024, 4:37 PM

#

That's good. I mean you've got a huge implementation, and you being able to improve it, also means it is certainly a little cohesive. So not only are you gaining performance ground, you're giving credence to the VM.

sly pecan Feb 8, 2024, 5:45 PM

#

sturdy sequoia While doing some cleanup I've also managed to gain about 1B real CPU instruction...

Soon there won't be any instructions left

glad urchin Feb 8, 2024, 5:48 PM

#

sly pecan Soon there won't be any instructions left

Snap your fingers and just like that

#

Done!

surreal hemlock Feb 8, 2024, 5:52 PM

#

Just add a raytrace instruction. Easy

sturdy sequoia Feb 8, 2024, 5:55 PM

#

sly pecan Soon there won't be any instructions left

There's still 40-ish bilions of them left

#

don't worry 😂

atomic violet Feb 8, 2024, 6:01 PM

#

; typst.asm
section .data
     msg: db "Do you really need that document though?.. Think about all the time you will have to go outside and touch grass, if you stopped writing that paper", 0
section .text
global _start
_start:
    mov eax, 1
    mov edi, 0
    mov rsi, msg
    mov edx, 100
    syscall
    mov eax, 60
    mov edi, 1
    syscall

#

typst 2025

sturdy sequoia Feb 8, 2024, 6:22 PM

#

atomic violet ```nasm ; typst.asm section .data msg: db "Do you really need that document...

that's not 40B angryeyes

atomic violet Feb 8, 2024, 6:22 PM

#

sturdy sequoia that's not 40B <:angryeyes:1004114265176821790>

40 bytes? No, but it is pretty close

tight glade Feb 8, 2024, 6:25 PM

#

sly pecan Soon there won't be any instructions left

Yea careful there!

sturdy sequoia Feb 9, 2024, 8:24 AM

#

@left night any objection to the use of a Box<[T]>? On the struct I want to use it in, it saves > 128 bytes just from all of the capacity: usize being saved

feral imp Feb 9, 2024, 8:31 AM

#

sturdy sequoia <@311948531835469827> any objection to the use of a `Box<[T]>`? On the struct I ...

Instead of what? Vec?

sturdy sequoia Feb 9, 2024, 8:33 AM

#

feral imp Instead of what? Vec?

yes

#

note that it's read only after creation

#

and it doesn't require re-allocations

#

it just removes the capacity field afaik

feral imp Feb 9, 2024, 8:33 AM

#

Recent blogs talks about this.... Owned slice thing. So people are adopting that pattern.

sturdy sequoia Feb 9, 2024, 8:34 AM

#

The thesis now compiles 18% faster!

#

(than main)

glossy shore Feb 9, 2024, 8:35 AM

#

yooo

feral imp Feb 9, 2024, 8:37 AM

#

sturdy sequoia The thesis now compiles 18% faster!

Nice! Memory foot print is also of interest. I think 18% is a substantial speedup...

sturdy sequoia Feb 9, 2024, 8:43 AM

#

feral imp Nice! Memory foot print is also of interest. I think 18% is a substantial speedu...

indeed since it's stored in a Arc in the end, it does account towards memory use, although not by much of course

sturdy sequoia Feb 9, 2024, 8:58 AM

#

feral imp Nice! Memory foot print is also of interest. I think 18% is a substantial speedu...

BTW, fun fact, with the other improvements I have made, 40% of the execution time of @atomic violet's raytracer is now spent just on argument hashing for function memoization 💀

feral imp Feb 9, 2024, 8:59 AM

#

That's good!?

sturdy sequoia Feb 9, 2024, 8:59 AM

#

feral imp That's good!?

I don't know 🤔

#

I don't like it!

#

it's basically wasted work

sly pecan Feb 9, 2024, 9:01 AM

#

sturdy sequoia BTW, fun fact, with the other improvements I have made, 40% of the execution tim...

Make the hashing faster?

#

Is it because the hash is 128 bit that it's slow @sturdy sequoia ?

sturdy sequoia Feb 9, 2024, 9:03 AM

#

sly pecan Is it because the hash is 128 bit that it's slow <@130737672951037952> ?

yes, it's comemo hashing

#

😦

sly pecan Feb 9, 2024, 9:04 AM

#

How much faster is 64 bit?

#

Yes I know we can't use it

sturdy sequoia Feb 9, 2024, 9:04 AM

#

sly pecan How much faster is 64 bit?

no idea, but we need 128-bit for collision resistance 😦

sly pecan Feb 9, 2024, 9:04 AM

#

The thing is

sturdy sequoia Feb 9, 2024, 9:04 AM

#

sly pecan How much faster is 64 bit?

probably quite a bit faster indeed

sly pecan Feb 9, 2024, 9:04 AM

#

So

#

Hear me out

#

Only fall back on 128 bit on an actual collision

#

Or use two different 64 bit hashes

#

Instead of 128

sturdy sequoia Feb 9, 2024, 9:06 AM

#

sly pecan Or use two different 64 bit hashes

that would be worse on most CPUs since we have SIMD for wider instructions

sturdy sequoia Feb 9, 2024, 9:06 AM

#

sly pecan Only fall back on 128 bit on an actual collision

the thing is that you can't detect collisions

sly pecan Feb 9, 2024, 9:06 AM

#

sturdy sequoia the thing is that you *can't* detect collisions

Right, in my defense I just woke up 😂

feral imp Feb 9, 2024, 9:10 AM

#

What do you mean wasted work?
VM allows for decreasing memory usage; that's a win
VM gained 18% performance increase on an actual document: that's a win too
VM enchanced raytracing example to now be.. 2-3 times within python performance; double win I guess?

sturdy sequoia Feb 9, 2024, 9:13 AM

#

feral imp What do you mean _wasted_ work? VM allows for decreasing memory usage; that's a ...

Well it would be at python performance if it wasn't for all of this hashing angryeyes

sly pecan Feb 9, 2024, 9:13 AM

#

Presumably these functions don't even have to be memoized, since they're never called with the same argument twice

feral imp Feb 9, 2024, 9:14 AM

#

Implementing RenderMonkey in typst is of course an important goal, but for now, we can just bask in the glory of having typst jump leaps and bounds afar from what it was mere months ago. I guess that's a small win.

feral imp Feb 9, 2024, 9:14 AM

#

sly pecan Presumably these functions don't even have to be memoized, since they're never c...

Yes. Implement function attribute in typst, and disable caching @sturdy sequoia .

sly pecan Feb 9, 2024, 9:14 AM

#

I mean, it's not a terrible idea

feral imp Feb 9, 2024, 9:15 AM

#

sly pecan I mean, it's not a terrible idea

Not at all. Probably belongs to another PR.. And actually.. couldn't it be used by other packages rather than just raytracing?

sly pecan Feb 9, 2024, 9:15 AM

#

Yes

feral imp Feb 9, 2024, 9:15 AM

#

AFK meeting

sturdy sequoia Feb 9, 2024, 9:16 AM

#

I increased the ray bounce

sly pecan Feb 9, 2024, 9:18 AM

#

sturdy sequoia Feb 9, 2024, 9:19 AM

#

sly pecan

just... no

#

https://tenor.com/view/not-how-it-works-confused-doesnt-work-that-way-no-not-like-this-gif-4953698

Tenor

sly pecan Feb 9, 2024, 9:19 AM

#

sturdy sequoia just... no

(That was a joke)

sturdy sequoia Feb 9, 2024, 9:20 AM

#

sly pecan (That was a joke)

I mean, we should first hash to 64-bit then to 128-bit hashing using the 64-bit

#

bigbrain

#

(obvious /s)

sly pecan Feb 9, 2024, 9:21 AM

#

Anyway short of faster 128 bit hash, or the disabling of memoization for specific functions, I'm not sure there is much you can do

sturdy sequoia Feb 9, 2024, 9:21 AM

#

sly pecan Anyway short of faster 128 bit hash, or the disabling of memoization for specifi...

yeah that's about it

#

😦

sly pecan Feb 9, 2024, 9:29 AM

#

How is hashing of multiple arguments handled? Is it one hash for everything?

sturdy sequoia Feb 9, 2024, 9:29 AM

#

sly pecan How is hashing of multiple arguments handled? Is it one hash for everything?

I mean it always hashes them one by one

#

but yet you get one big hash out

sly pecan Feb 9, 2024, 9:31 AM

#

sturdy sequoia I mean it always hashes them one by one

Does this step have to happen if the inputs are all numeric?

#

In general, does numeric input even need hashing?

#

I guess that's overcomplicating things though @sturdy sequoia

sturdy sequoia Feb 9, 2024, 9:40 AM

#

@sly pecan it's more that we don't know they're numeric anyway

#

Because it's essentially a Vec<Value> where Value can be numeric or it can be anything

sly pecan Feb 9, 2024, 9:40 AM

#

sturdy sequoia <@399269065388195842> it's more that we don't *know* they're numeric anyway

But typst has types

sturdy sequoia Feb 9, 2024, 9:40 AM

#

sly pecan But typst has types

yes, but from the point of view of comemo they're opaque af

sly pecan Feb 9, 2024, 9:42 AM

#

Do they have to be?

#

(sorry for the questions, I'm aware they're naive)

sturdy sequoia Feb 9, 2024, 9:42 AM

#

sly pecan Do they have to be?

No, but then comemo would be pretty much unusable for others

#

or we would get into trait hell for every single value we want to pass into comemo

feral imp Feb 9, 2024, 10:21 AM

#

sturdy sequoia or we would get into trait hell for every single value we want to pass into come...

Are you sure? Maybe this new stuff with impl-trait in assoc type can help with a slick API for that?

#

Full disclosure, I wish this was possible for my own interests / projects 🙃

sturdy sequoia Feb 9, 2024, 10:21 AM

#

feral imp Are you _sure_? Maybe this new stuff with impl-trait in assoc type can help with...

maybe, but I don't think it's worth it either way

#

The only solutions I can think of are:

Using a faster hashing function (assuming one exists that meets the requirements)
Detecting whether memoization is worth it, but this is a double edged sword, to check whether it's worth it, you would still need to hash the input arguments to check whether it's memoized so it's kind of... useless

#

Annotate functions that don't need memoization (in typst that is)

#

Caching hashes as suggested by @left night in the Value rework

atomic violet Feb 9, 2024, 10:35 AM

#

does typst has any good incremental benchmarks?

#

all this talk is more about incremental compilation, than a single compilation, and I wonder if there are ways to measuring recompilation already?

sly pecan Feb 9, 2024, 10:37 AM

#

sturdy sequoia The only solutions I can think of are: - Using a faster hashing function (assumi...

Regarding number 2, it would reduce memory usage

#

Presumably

#

At least

atomic violet Feb 9, 2024, 10:37 AM

#

I like option 3 as an advanced feature for package authors

#

Disabling memoization is like adding #[inline], I feel. Probably won't help much, but if you know what you are doing, go for it

#

the one downside, I feel, is that if Typst memoization mechanism ever changes internally, it may affect API too

sly pecan Feb 9, 2024, 10:39 AM

#

atomic violet Disabling memoization is like adding `#[inline]`, I feel. Probably won't help mu...

Shouldn't it help quite a bit for simple functions such as used in ray tracing? Fewer memory accesses.

atomic violet Feb 9, 2024, 10:40 AM

#

It may

#

the question is whether it can help, say, cetz

#

it will also help in this context #1176509648707256370 message

#

maybe when custom types are going to come out there can be a distinciton like "we cache function, but not methods"

sly pecan Feb 9, 2024, 10:49 AM

#

atomic violet the question is whether it can help, say, cetz

I think what would help cetz the most would be moving some of the computation into the rust side

tight glade Feb 9, 2024, 11:48 AM

#

Dumb idea but have you tried turning comemo off to see if it actually makes performance gains in the eval step?

feral imp Feb 9, 2024, 11:50 AM

#

tight glade Dumb idea but have you tried turning comemo off to see if it actually makes perf...

That's causal profiling. Very, very advanced idea. See https://github.com/plasma-umass/coz

tight glade Feb 9, 2024, 11:53 AM

#

So im actually really smart? 🔥

left night Feb 9, 2024, 12:04 PM

#

sturdy sequoia - Caching hashes as suggested by <@311948531835469827> in the `Value` rework

I'm pretty sure that this is the most important thing here

#

There is a lot of recursive redundant hashing going on currently

left night Feb 9, 2024, 12:06 PM

#

sturdy sequoia <@311948531835469827> any objection to the use of a `Box<[T]>`? On the struct I ...

I do not have a problem with using Box<[T]> where appropriate.

sturdy sequoia Feb 9, 2024, 4:14 PM

#

left night I do not have a problem with using Box<[T]> where appropriate.

In this case it saves 128-bytes in the structure containing the compiled data for a module/closure, since it's read-only it's worth it imo

sturdy sequoia Feb 9, 2024, 4:14 PM

#

left night There is a lot of recursive redundant hashing going on currently

yes indeed, I think this could help

left night Feb 9, 2024, 4:19 PM

#

sturdy sequoia yes indeed, I think this could help

Unfortunately the lazy hash would need to be atomic I think

sturdy sequoia Feb 9, 2024, 4:20 PM

#

left night Unfortunately the lazy hash would need to be atomic I think

Another thing I was just thinking about is to make arrays be sharded, each shard being pre-hashed

#

it's a bit convoluted, but it could definitely work because upon initial hashing, that can be done in parallel

#

divide and conquer type of approach

left night Feb 9, 2024, 4:21 PM

#

Tree arrays?

#

That's ambitious

sturdy sequoia Feb 9, 2024, 4:21 PM

#

left night Tree arrays?

I was thinking more of sharding each array into chunks of N elements

#

so an array of 1000 elements would be like 4 arrays of 250 elements or something

#

Might be completely dumb mind you

ornate merlin Feb 9, 2024, 7:00 PM

#

sturdy sequoia I mean, we should first hash to 64-bit then to 128-bit hashing using the 64-bit

Maybe you can use 64bit hashing and direct comparison?

I mean we have two ideas:

Collisions may rarely occur even when using the 64-bit version.
A function is rarely called twice with the same arguments.

Then we can do as follows:

Do not use hushing under any circumstances.
Switch to using a 64-bit hash plus direct comparison. So if we don't have the same function hashes, obviously they will be different. At the same time, any collisions will be caught by direct comparison. This strategy will actually be more reliable than even using a 128-bit hash.

sly pecan Feb 9, 2024, 7:09 PM

#

ornate merlin Maybe you can use 64bit hashing and direct comparison? I mean we have two ideas...

You can't do direct comparison unless you also store the original object used to call the function

west light Feb 9, 2024, 7:10 PM

#

What does python do for hashing. It can be quickish?

shy sage Feb 9, 2024, 7:21 PM

#

I don't know rust, if it helps, there was a discussion for Godot to replace the hash functions by more powerful versions, there was in particular this link where there is a comparison between different algorithm: https://github.com/Cyan4973/xxHash

GitHub

GitHub - Cyan4973/xxHash: Extremely fast non-cryptographic hash alg...

Extremely fast non-cryptographic hash algorithm. Contribute to Cyan4973/xxHash development by creating an account on GitHub.

ornate merlin Feb 9, 2024, 7:22 PM

#

sly pecan You can't do direct comparison unless you also store the original object used to...

Can you please describe this explanation a bit more detailed? I really don't know why we need to know something about the object

feral imp Feb 9, 2024, 7:22 PM

#

There must be a hashing, that is made, for detecting hashing of a similar object, that has changed ever so slightly.. 😅

ornate merlin Feb 9, 2024, 7:24 PM

#

feral imp There must be a hashing, that is made, for detecting hashing of a similar object...

And why we can not compare functions directly?

feral imp Feb 9, 2024, 7:33 PM

#

ornate merlin And why we can not compare functions directly?

Don't know.. Something, something reflection, dynamic something?

ornate merlin Feb 9, 2024, 7:40 PM

#

feral imp Don't know.. Something, something reflection, dynamic something?

May be 🤷‍♂️

In rust function pointer is just a number (with some provenance and address-space information but this used only by compiler itself), so that comparisons are cheap

feral imp Feb 9, 2024, 7:41 PM

#

ornate merlin May be 🤷‍♂️ In rust function pointer is just a number (with some provenance an...

This is literally in what was released today: https://doc.rust-lang.org/stable/std/ptr/fn.addr_eq.html it says that it ignores the meta data..

addr_eq in std::ptr - Rust

Compares the addresses of the two pointers for equality, ignoring any metadata in fat pointers.

sturdy sequoia Feb 9, 2024, 7:45 PM

#

ornate merlin Maybe you can use 64bit hashing and direct comparison? I mean we have two ideas...

Well, the state of the art we're based on, tend to use 128-bit hashes, I would assume that it does come from experimentation. Although, I agree that for most cases it would likely be enough.
That's not really true: consider that typst runs your code multiple times in a raw until your document stabilizes, this means that all of your query(...), locate(...), etc. get called N times (up to five), memoization does save a lot of time there

sturdy sequoia Feb 9, 2024, 7:46 PM

#

ornate merlin Maybe you can use 64bit hashing and direct comparison? I mean we have two ideas...

Switch to using a 64-bit hash plus direct comparison.
Direct comparison would be bad because the structures are very tree-like in Typst, which means that you would get extreme cache innefficiency, which has improved a lot during the content rework a couple of months ago

sturdy sequoia Feb 9, 2024, 7:46 PM

#

sly pecan You can't do direct comparison unless you also store the original object used to...

That is also a factor indeed

sturdy sequoia Feb 9, 2024, 7:47 PM

#

shy sage I don't know rust, if it helps, there was a discussion for Godot to replace the ...

gxhash would likely be a good alternative too since it's in rust but it currently has no fallback for wasm and unsupported platform (it uses AES instructions in the CPU for acceleration) and is, to some extent, good, it's not cryptographically secure but it's on-par with SipHash 1-3 for collision resistance and DOS protection. The problem with xxHahs is mostly that it's a C dependency and we try to avoid those because they can lead to linking hell on WASM

ornate merlin Feb 9, 2024, 7:48 PM

#

sturdy sequoia > Switch to using a 64-bit hash plus direct comparison. Direct comparison would ...

Yes, direct comparison is very expensive, but it can occur only in one billion of times!!!

Or maybe even rare, I don't remember actually😅

sturdy sequoia Feb 9, 2024, 7:48 PM

#

ornate merlin May be 🤷‍♂️ In rust function pointer is just a number (with some provenance an...

comparing functions is cheap, comparing inputs to functions not so much, if we used direct comparisons, we would need to do one unholy thing: we would need to store the arguments of each call in addition to the hashes which would eat up RAM like there ain't no tomorrow

sturdy sequoia Feb 9, 2024, 7:49 PM

#

ornate merlin Yes, direct comparison is very expensive, but it can occur only in one billion o...

While I agree, the main problem is detecting problematic collisions, let's take a simple case:

You have a function calls twice with two arrays
They cause a 64-bit collision
You have to either compare the arrays in their entirety (therefore needing to store all previous arguments)
Or you must have some kind of additional information that you can use to dissambiguate which is not necessarily trivial 😐

#

For arrays, you could maybe just store the length and have a tuple (u64, usize) with the hash and the length, but do you do that for each argument?

#

It gets really tricky

ornate merlin Feb 9, 2024, 7:52 PM

#

Hmm... Creating unique ID for each function and track it😂

sturdy sequoia Feb 9, 2024, 7:53 PM

#

ornate merlin Hmm... Creating unique ID for each function and track it😂

HMMMMM

#

that sounds awefully like... a hash 😄

ornate merlin Feb 9, 2024, 7:54 PM

#

sturdy sequoia that sounds awefully like... a hash 😄

It can be counter along with 64bit hash?

sturdy sequoia Feb 9, 2024, 7:54 PM

#

ornate merlin It can be counter along with 64bit hash?

yes, but that does help you dissambiguate, remember, we want to check if two calls to the same function are the same

#

we already cache per-function

#

admittedly, since we store per-function maybe we could do with 64-bit hashes

#

since functions aren't called a bagilion times

#

🤔

#

@left night what do you think?

shy sage Feb 9, 2024, 8:04 PM

#

sturdy sequoia gxhash would likely be a good alternative too since it's in rust but it currentl...

I didn't know about gxhash, are you sure there's a C dependency? Xxhash has two implementations in Rust. I don't see a compiled or c file in the 3 repos, but I must be missing something I guess.

sly pecan Feb 9, 2024, 8:06 PM

#

@sturdy sequoia I honestly feel like anything other than a 128 bit hash might just be overcomplicating things. The raytracer is pretty much worst case scenario

#

How much time is spent hashing on a more sane document?

atomic violet Feb 9, 2024, 8:09 PM

#

atomic violet does typst has any good incremental benchmarks?

sorry to bring that again, but are there any incremental benchmarks? 😅 I am genuently curious

sturdy sequoia Feb 9, 2024, 8:24 PM

#

atomic violet sorry to bring that again, but are there any incremental benchmarks? 😅 I am gen...

I do them by hand my taking multiple samples and writing them in an excel sheet

#

science

sturdy sequoia Feb 9, 2024, 8:25 PM

#

sly pecan How much time is spent hashing on a more sane document?

My thesis is still 20 percent or so

sturdy sequoia Feb 9, 2024, 8:25 PM

#

shy sage I didn't know about gxhash, are you sure there's a C dependency? Xxhash has two ...

My bad then 😱

#

I confused it with another hash library most likely

ornate merlin Feb 9, 2024, 8:25 PM

#

sturdy sequoia yes, but that does help you dissambiguate, remember, we want to check if two cal...

Just like another strange idea:

What if we use two hash functions? One for example 64bit and other 32bit? May be with some parallel hash computation.

Whis may definitely exclude any collisions

sturdy sequoia Feb 9, 2024, 8:26 PM

#

sly pecan <@130737672951037952> I honestly feel like anything other than a 128 bit hash mi...

Well if we could do with 64 bits that would be even easier 🫠

sturdy sequoia Feb 9, 2024, 8:26 PM

#

ornate merlin Just like another strange idea: What if we use two hash functions? One for exam...

Doing two hash round wouldn’t give you better performance because you’d most likely be bottleneck by your cache and memory bus science

ornate merlin Feb 9, 2024, 8:28 PM

#

sturdy sequoia Doing two hash round wouldn’t give you better performance because you’d most lik...

Is SipHash 1-3 calculated in parallel? In this case, yes, there's no advantages

atomic violet Feb 9, 2024, 8:31 PM

#

I feel like hashes are so fast to compute, you will lose a shit tone of cycles due to syncronization overhead and false sharing 💀

sturdy sequoia Feb 9, 2024, 8:31 PM

#

atomic violet I feel like hashes are so fast to compute, you will lose a shit tone of cycles d...

yep

atomic violet Feb 9, 2024, 8:31 PM

#

hmmmm... maybe not false sharing, actually, due to COW semantics 🤔 ....

#

but it's still won't be very fast, semaphors are not free

sly pecan Feb 9, 2024, 8:33 PM

#

sturdy sequoia My thesis is still 20 percent or so

Out of the eval part or total?

#

I'm curious by the way, does x86-64-v3 improve that? @sturdy sequoia

sly pecan Feb 9, 2024, 8:34 PM

#

ornate merlin Just like another strange idea: What if we use two hash functions? One for exam...

It would be significantly weaker than the current 128 bits

sturdy sequoia Feb 9, 2024, 8:35 PM

#

sly pecan Out of the eval part or total?

total

sturdy sequoia Feb 9, 2024, 8:36 PM

#

sly pecan I'm curious by the way, does x86-64-v3 improve that? <@130737672951037952>

not much afaik

#

I did try it and it wasn't much

atomic violet Feb 9, 2024, 8:36 PM

#

SIMD in Typst 👀

sturdy sequoia Feb 9, 2024, 8:36 PM

#

Swapping allocators is much more impactful

sly pecan Feb 9, 2024, 8:38 PM

#

I assume you've tried two different 64 bit hashes instead of one 128 bit?

#

(yes i know it's unlikely to be faster)

atomic violet Feb 9, 2024, 8:39 PM

#

I think optimizing hashing now will yield inaccurrate results in the long run because semantics of Value are about to change

#

I would rather wait a bit until API changes will stabilize, and only then try to squeeze out cycles out of hashing

#

it's not that hard to do later, doing it now will probably just complicate further design changes and more productive optimizations

#

(let alone increased compile times and binary size)

ornate merlin Feb 9, 2024, 8:42 PM

#

sly pecan It would be significantly weaker than the current 128 bits

Not so much. It will be 2^32 × 2^16 = 2^48. Of course it less than 2^64 for the 128bit hash, but it is actually huge number

sly pecan Feb 9, 2024, 8:54 PM

#

ornate merlin Not so much. It will be 2^32 × 2^16 = 2^48. Of course it less than 2^64 for the ...

You've reduced the expected number of hashes before collision from a number with 20 digits to one with 15

#

That's the difference between functionally never to maaaaaybe

#

And it really is important that it never happens

sturdy sequoia Feb 9, 2024, 8:58 PM

#

sly pecan I assume you've tried two different 64 bit hashes instead of one 128 bit?

no I haven't 🤔

sturdy sequoia Feb 9, 2024, 9:27 PM

#

The thing is that other than caching hashes, no matter the hash function, it will still show up

#

hashing over and over again will always be there

#

the difference is whether we can cache those results

#

and the answer is: yes we can

#

https://tenor.com/view/yes-we-can-obama-gif-27099330

Tenor

#

On a side note: Damn Obama was so charismatic, can we go back please?

sly pecan Feb 9, 2024, 9:40 PM

#

sturdy sequoia On a side note: Damn Obama was so charismatic, can we go back please?

Nah, you'll get four more years of Trump instead

sturdy sequoia Feb 9, 2024, 9:42 PM

#

sly pecan Nah, you'll get four more years of Trump instead

We're together in this sunglassed_crying

#

Knowing you'll be in the US too

feral imp Feb 9, 2024, 9:43 PM

#

You know that guy is still alive right? And he has time? Maybe we can ask him to be spokesperson for typst? (Obama!)

sturdy sequoia Feb 9, 2024, 9:44 PM

#

feral imp You know that guy is still alive right? And he has time? Maybe we can ask him to...

Clearly that's what we need, to pay obama to come make a speech at Typstcon

#

He recently gave a talk in Belgium for the meesly sum of 650k€, I'm sure <@&1200368100202258552> can get that much cache just for him 🙂

feral imp Feb 9, 2024, 9:44 PM

#

pay?! are you nuts? I'll just show him the gradients+VM+raytracer, and he'll do it probono.

sly pecan Feb 9, 2024, 9:44 PM

#

sturdy sequoia Clearly that's what we need, to pay obama to come make a speech at Typstcon

Will there be a ball pit?

sturdy sequoia Feb 9, 2024, 9:45 PM

#

sly pecan Will there be a ball pit?

You get 20 minutes extra in the ballpiiiiiiiiiiiiiiiiiiiit!!!!

#

never forget

left night Feb 9, 2024, 10:54 PM

#

sturdy sequoia <@311948531835469827> what do you think?

I am not convinced that the 128 bit hashes or even the hash function is a real problem. I think it's really the hashing of the same thing over and over again.

Until we do the value rework, if you want you could experiment with doing the lazy hash only on content rather than arbitrary values. We can then remove the Prehashed on flow, par, etc. and see if we get gains or not.

Jumping to conclusions and risking hash collisions in real documents seems unwise to me.

sturdy sequoia Feb 9, 2024, 11:08 PM

#

left night I am not convinced that the 128 bit hashes or even the hash function is a real p...

Fair, I might just try the whole lazy hashing, see how that plays out 😉

#

I'm thinking of an Atomic<u128> that gets reset when the DerefMut impl of Packed is called

#

something like that

left night Feb 9, 2024, 11:10 PM

#

sturdy sequoia I'm thinking of an `Atomic<u128>` that gets reset when the `DerefMut` impl of `P...

exactly!

#

you can also play with whether the hash is just for the element data or also the metadata

#

if it's the former, no invalidation is necessary for metadata changes

sturdy sequoia Feb 9, 2024, 11:15 PM

#

I was first trying to set it inside Inner<T> since it should be pretty easy to detect modifications there

#

@left night is it normal that Content::with_mut doesn't check uniqueness? 😱

#

There's definitely something, my initial kind of baboon-level implementation does speed things up, not as much as I was expecting, but it does make a difference

left night Feb 10, 2024, 12:01 AM

#

sturdy sequoia <@311948531835469827> is it normal that `Content::with_mut` doesn't check unique...

It's explained in the comment above ;)

glossy shore Feb 10, 2024, 9:55 AM

#

sturdy sequoia The only solutions I can think of are: - Using a faster hashing function (assumi...

can't we somehow specialise hashing such that values smaller than 128 bits are just copied with some TypeId-esque tag or something?

sly pecan Feb 10, 2024, 10:35 AM

#

glossy shore can't we somehow specialise hashing such that values smaller than 128 bits are j...

I think I suggested that, but @sturdy sequoia said it was impossible

sturdy sequoia Feb 11, 2024, 12:58 AM

#

glossy shore can't we somehow specialise hashing such that values smaller than 128 bits are j...

No, I don't think this had been suggested, that's actually quite interesting for comemo, inside of typst it would be hard, but inside comemo it could be interesting!

#

@left night I removed all Prehashed<Content> (I had forgor 💀) and I am shocked, it really does give a nice bump in performance across the board, I think this really is the way to go for Value has well

#

It does contain the label, location, lifecycle, etc. but not the span (which tbf is small enough that we shouldn't care too much imo)

left night Feb 11, 2024, 9:39 AM

#

sturdy sequoia <@311948531835469827> I removed all `Prehashed<Content>` (I had forgor 💀) and I...

very nice!

sturdy sequoia Feb 11, 2024, 10:48 AM

#

left night very nice!

Btw I did measure the hit rate and it’s about 90%

#

So 90% of the time the hash was already stored

#

science science science

#

This is also the case in incremental but sometimes (depending on change) drops down to 82%, so I am wondering whether not pre-hashing the location would be beneficial

#

to avoid recursively hashing the T which is likely the most expensive anyway

left night Feb 11, 2024, 11:20 AM

#

sturdy sequoia This is also the case in incremental but sometimes (depending on change) drops d...

yes, I also think lazy-cache-hashing just T would be worth a try

sturdy sequoia Feb 11, 2024, 11:28 AM

#

left night yes, I also think lazy-cache-hashing just `T` would be worth a try

I'll just add a LazyHash<T> to util then!

left night Feb 11, 2024, 11:28 AM

#

sounds good!

left night Feb 11, 2024, 12:04 PM

#

sturdy sequoia I'll just add a `LazyHash<T>` to `util` then!

how did you realize a 128 bit atomic btw?

sturdy sequoia Feb 11, 2024, 12:05 PM

#

left night how did you realize a 128 bit atomic btw?

with the atomic crate

#

since it's simple swap and store operations, you can just do atomic memory operations on it 😉

#

load and store *

#

I'm tired 😴

left night Feb 11, 2024, 12:06 PM

#

sturdy sequoia since it's simple swap and store operations, you can just do atomic memory opera...

will that incur two atomic ops or do CPUs have 128 bit atomics nowadays or does it use a secondary atomic for controlling access?

sturdy sequoia Feb 11, 2024, 12:07 PM

#

left night will that incur two atomic ops or do CPUs have 128 bit atomics nowadays or does ...

As far as I can tell it's using atomic operations on pointers

#

And there is an AtomicU128 in core apparently 😱

left night Feb 11, 2024, 12:09 PM

#

looking at the crates source it sort of seems like they use a spinlock on stable rust

sly pecan Feb 11, 2024, 12:09 PM

#

sturdy sequoia No, I don't think this had been suggested, that's actually quite interesting for...

That's essentially what I meant when I said numeric types, though that was a more fleshed out idea

#

What do you think @left night ?

sturdy sequoia Feb 11, 2024, 12:09 PM

#

left night looking at the crates source it sort of seems like they use a spinlock on stable...

This library will use native atomic instructions if possible, and will otherwise fall back to a lock-based mechanism. You can use the Atomic::<T>::is_lock_free() function to check whether native atomic operations are supported for a given type. Note that a type must have a power-of-2 size and alignment in order to be used by native atomic instructions.

left night Feb 11, 2024, 12:10 PM

#

sturdy sequoia > This library will use native atomic instructions if possible, and will otherwi...

https://github.com/Amanieu/atomic-rs/blob/79ace14a69d8d4a45cc7f6cd06ff0cbb5849b2fd/src/ops.rs#L46

GitHub

atomic-rs/src/ops.rs at 79ace14a69d8d4a45cc7f6cd06ff0cbb5849b2fd · ...

Generic Atomic type for Rust. Contribute to Amanieu/atomic-rs development by creating an account on GitHub.

sturdy sequoia Feb 11, 2024, 12:10 PM

#

left night https://github.com/Amanieu/atomic-rs/blob/79ace14a69d8d4a45cc7f6cd06ff0cbb5849b2...

#

You're right, on nightly it would have 128-bits atomic

#

I mean it's still faster 😄

left night Feb 11, 2024, 12:10 PM

#

I wonder whether a spinlock is more efficient than two AtomicU64

sturdy sequoia Feb 11, 2024, 12:11 PM

#

left night I wonder whether a spinlock is more efficient than two AtomicU64

I'm curious as well now

#

AtomicU128 is actually only available on ARM64 cores 😱

left night Feb 11, 2024, 12:12 PM

#

sly pecan That's essentially what I meant when I said numeric types, though that was a mor...

I do not yet really understand how that should work.

#

Switch dynamically between a hash and the raw value?

sturdy sequoia Feb 11, 2024, 12:13 PM

#

left night I do not yet really understand how that should work.

instead of storing the hash you could store the arguments directly if they're small enough

#

since comemo is now generic it could be done very efficiently

sly pecan Feb 11, 2024, 12:13 PM

#

I think using the raw value instead of the hash

#

Together with the type

left night Feb 11, 2024, 12:13 PM

#

But that will pretty much always be larger than the hash

sly pecan Feb 11, 2024, 12:14 PM

#

We're talking only for integers essentially

left night Feb 11, 2024, 12:14 PM

#

Note that this wouldn't apply to Typst functions that take integers. Just to Rust functions that are memoized.

sly pecan Feb 11, 2024, 12:15 PM

#

Aha

left night Feb 11, 2024, 12:15 PM

#

There are almost none of those that take just integers.

#

Maybe actually none.

sly pecan Feb 11, 2024, 12:15 PM

#

Hopes and dreams dashed

left night Feb 11, 2024, 12:16 PM

#

I was think of another approach recently though

#

A more probabilistic approach: When invoking a Typst function we measure how long it took. This does not let us skip hashing right away. But if we call it a bunch of times and it was always cheap, the next time we can skip memoization and hashing altogether. If it is more expensive in a later run, we can deopt and start memoizing again. (All of this would probably happen on the Typst level rather than the comemo level.)

sly pecan Feb 11, 2024, 12:21 PM

#

Could that information be cached between compiles? It would presumably be a cheap way to improve "cold" compiles

feral imp Feb 11, 2024, 12:21 PM

#

left night A more probabilistic approach: When invoking a Typst function we measure how lon...

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

left night Feb 11, 2024, 12:26 PM

#

feral imp > A Bloom filter is a space-efficient probabilistic data structure, conceived by...

I have been trying to find an application for a bloom filter in Typst for years but I have never found one where it really seems to make sense. I think it's mostly useful for really huge data sets.

feral imp Feb 11, 2024, 12:27 PM

#

I was doing a chat gpt, when you talked about the probabilistic approach. 🤷‍♂️

left night Feb 11, 2024, 12:27 PM

#

sly pecan Could that information be cached between compiles? It would presumably be a chea...

I don't know but I still don't really want to open that can of worms. Where would you even store it?

sly pecan Feb 11, 2024, 12:28 PM

#

I don't know. It was just a hypothetical thought

left night Feb 11, 2024, 12:28 PM

#

feral imp I was doing a chat gpt, when you talked about the probabilistic approach. 🤷‍♂️

Bloom filters are a really cool data structure.

sturdy sequoia Feb 11, 2024, 12:28 PM

#

left night I don't know but I still don't really want to open that can of worms. Where woul...

let's make a target folder 😈

#

@left night two atomics are slower 😦

#

basically goes back to main perf

left night Feb 11, 2024, 12:30 PM

#

Not that surprising I guess. The spinlock wasn't contended.

sly pecan Feb 11, 2024, 12:30 PM

#

I'm the guy who just throws out dumb ideas to see if anything sticks 🙂

sturdy sequoia Feb 11, 2024, 12:30 PM

#

left night Not that surprising I guess. The spinlock wasn't contended.

indeed since we're single threaded

left night Feb 11, 2024, 12:31 PM

#

do you have an Atomic<Option<u128>> or what is the null state?

#

just 0? :p

sturdy sequoia Feb 11, 2024, 12:31 PM

#

left night do you have an Atomic<Option<u128>> or what is the null state?

null state is 0 :-p

left night Feb 11, 2024, 12:32 PM

#

it should be fine I guess

sturdy sequoia Feb 11, 2024, 12:32 PM

#

Atomic<Option<u128>> isn't an acceptable type

#

for atomic because it doesn't impl some trait

left night Feb 11, 2024, 12:32 PM

#

we are relying on no collisions after all

feral imp Feb 11, 2024, 12:32 PM

#

sly pecan I'm the guy who just throws out dumb ideas to see if anything sticks 🙂

And below you, I'm the guy who throws a reference based on keywords in conversation. Powerful team 😛

left night Feb 11, 2024, 12:32 PM

#

and that'd be just another collision

sturdy sequoia Feb 11, 2024, 12:32 PM

#

left night we are relying on no collisions after all

worst case here is that it will rehash every time

#

which isn't too terrible

left night Feb 11, 2024, 12:32 PM

#

true

sturdy sequoia Feb 11, 2024, 12:32 PM

#

since I would expect a hash of 0 to be rare

#

like 1/2^128 levels of rare 😂

#

expecting an "ackchually" momment from our resident mathematicians

left night Feb 11, 2024, 12:36 PM

#

sturdy sequoia <@311948531835469827> two atomics are slower 😦

have you tried hashing just T yet?

sturdy sequoia Feb 11, 2024, 12:36 PM

#

left night have you tried hashing just T yet?

like... no prehashing at all?

left night Feb 11, 2024, 12:36 PM

#

no like not lazy hashing location etc

#

only the element itself

sturdy sequoia Feb 11, 2024, 12:36 PM

#

yes, the two atomics are just T

left night Feb 11, 2024, 12:37 PM

#

the two atomics?

sturdy sequoia Feb 11, 2024, 12:37 PM

#

left night the two atomics?

the version with two atomics is just T being hashed

#

the LazyHash approach *

#

left night Feb 11, 2024, 12:38 PM

#

okay, so this was faster than all of Inner being wrapped in LazyHash?

sturdy sequoia Feb 11, 2024, 12:39 PM

#

left night okay, so this was faster than all of Inner being wrapped in LazyHash?

well using two atomics is slow as heck

#

I'll try doing a single Atomic<u128> and then test both

left night Feb 11, 2024, 12:39 PM

#

what I'm talking about should be orthogonal to one vs two atomics

sturdy sequoia Feb 11, 2024, 12:40 PM

#

left night what I'm talking about should be orthogonal to one vs two atomics

yes I am testing all four cases

#

but that's bound to take some time since release build take 2m30s 💀

left night Feb 11, 2024, 12:41 PM

#

I know, it sucks

sturdy sequoia Feb 11, 2024, 12:42 PM

#

left night I know, it sucks

btw have you merged the fix by @cunning wadi for watch

#

because it's a really annoying bug 💀

left night Feb 11, 2024, 12:47 PM

#

sturdy sequoia btw have you merged the fix by <@162509247257509888> for watch

I will merge it soon. I wanted to test it again on my system before merging.

sturdy sequoia Feb 11, 2024, 12:54 PM

#

@left night I can confirm that a single atomic/spinlock is faster than two atomics, but whether it's just T or T + location, etc. doesn't matter much, it seems to me like in watch, T + other fields is slightly faster

left night Feb 11, 2024, 1:04 PM

#

sturdy sequoia <@311948531835469827> I can confirm that a single atomic/spinlock is faster than...

okay

ornate merlin Feb 11, 2024, 2:08 PM

#

@sturdy sequoia I'm just curious. What about using salsa in the typst?

https://rustc-dev-guide.rust-lang.org/salsa.html

Salsa is a library for incremental recomputation.

Salsa - Rust Compiler Development Guide

A guide to developing the Rust compiler (rustc)

feral imp Feb 11, 2024, 2:26 PM

#

ornate merlin <@130737672951037952> I'm just curious. What about using `salsa` in the typst? ...

This was discussed ages ago. Something, something rust analyzer is not librarified enough.

#

I actually don't remember.. But it was considered deeply.

left night Feb 11, 2024, 2:44 PM

#

ornate merlin <@130737672951037952> I'm just curious. What about using `salsa` in the typst? ...

I had looked at salsa before building comemo, but ultimately didn't feel like it would work well for Typst. But they are ultimately at the same level of abstraction, so we are already more or less doing what they do.

sturdy sequoia Feb 11, 2024, 4:10 PM

#

left night I had looked at salsa before building comemo, but ultimately didn't feel like it...

It seems to me (through a quick read-through) that they support slightly more advanced features, but I don't think they'd gain us anything significant

left night Feb 11, 2024, 9:26 PM

#

sturdy sequoia It seems to me (through a quick read-through) that they support slightly more ad...

It's just a different design. It is more advanced in some aspects, but also more rigid in others.

sturdy sequoia Feb 18, 2024, 2:33 PM

#

https://github.com/typst/typst/pull/3451

GitHub

Added `LazyHash` by Dherse · Pull Request #3451 · typst/typst

Adds a new LazyHash that mostly replaces PreHashed, it essentially performs the same work: caching the hash, but performs it when the structure is first hashed, using atomics to store the result. I...

#

There you go @left night as we discussed the lazy hash thingy

lavish tree Feb 18, 2024, 7:55 PM

#

I am a little bit curious whether 128bit CAS can be used to atomically update the hash without lock, which should be available in most hardware

sturdy sequoia Feb 19, 2024, 12:07 AM

#

lavish tree I am a little bit curious whether 128bit CAS can be used to atomically update th...

According to the atomic crate, 128-bit CAS is only available on AARCH64, make of that what you will 😦

lavish tree Feb 19, 2024, 12:08 AM

#

Shouldn’t be

#

https://github.com/taiki-e/portable-atomic/blob/HEAD/src/imp/atomic128/README.md

GitHub

portable-atomic/src/imp/atomic128/README.md at 2e0bfd4b676aef185591...

Portable atomic types including support for 128-bit atomics, atomic float, etc. - taiki-e/portable-atomic

sturdy sequoia Feb 19, 2024, 12:08 AM

#

lavish tree Shouldn’t be

I did check in the core doc for rust, and it only has u128 CAS on ARM too

#

Perhaps it would require target-cpu=native on x86 that's why it's not supported by default?

#

or it requires nightly bing_shrug

lavish tree Feb 19, 2024, 12:09 AM

#

I am not so sure but I believe 128bit cas is very important (because of the ABA problem) so most recent arch should support it

lavish tree Feb 19, 2024, 12:09 AM

#

sturdy sequoia or it requires nightly <:bing_shrug:583791581497393162>

That can be true

#

I think the core lib doesn’t implement this

lavish tree Feb 19, 2024, 12:10 AM

#

lavish tree https://github.com/taiki-e/portable-atomic/blob/HEAD/src/imp/atomic128/README.md

Maybe this might be a better crate to use as we only need 128bit atomic

sturdy sequoia Feb 19, 2024, 12:11 AM

#

Indeed, lemme swap it and test it 😉

lavish tree Feb 19, 2024, 12:11 AM

#

Sweet

#

They also indicate that 128bit load and store can be supported via avx

#

So we don’t need a cas

#

On the other hand, I am very curious how likely we are going to have race condition?

sturdy sequoia Feb 19, 2024, 12:16 AM

#

That is indeed quite a bit faster

sturdy sequoia Feb 19, 2024, 12:16 AM

#

lavish tree On the other hand, I am very curious how likely we are going to have race condit...

zero atm since we don't use multithreading in the "engine" so-to-speak

#

but we will eventually

lavish tree Feb 19, 2024, 12:16 AM

#

Oh nice to hear

sturdy sequoia Feb 19, 2024, 12:16 AM

#

we already parallelized comemo fully to make this possible in the future

#

but we have (so far) decided that in such cases, racing is okay, at worst you're just doing the same compuation twice, and it's likely that the cost of additional sync would be higher than the gains from it

lavish tree Feb 19, 2024, 12:17 AM

#

I was a little bit curious do we need really need atomic or not🤔

#

Yah I would like to see multithreaded and your work there really interest me!

sturdy sequoia Feb 19, 2024, 12:17 AM

#

lavish tree I was a little bit curious do we need really need atomic or not🤔

I mean typst could mostly be done with only Rc and RefCell and stuff like that, but it limits this multithreaded work

lavish tree Feb 19, 2024, 12:18 AM

#

lavish tree I was a little bit curious do we need really need atomic or not🤔

Because hash should be constant for every instance

sturdy sequoia Feb 19, 2024, 12:18 AM

#

lavish tree Yah I would like to see multithreaded and your work there really interest me!

actually @left night has a multithreaded branch

sturdy sequoia Feb 19, 2024, 12:18 AM

#

lavish tree Because hash should be constant for every instance

well yes, but here we want to compute it only once for the value

lavish tree Feb 19, 2024, 12:18 AM

#

Very nice

sturdy sequoia Feb 19, 2024, 12:18 AM

#

that's the trick, it saves tons of computation on nested structures (which happen a lot in typst)

lavish tree Feb 19, 2024, 12:18 AM

#

sturdy sequoia well yes, but here we want to compute it only once for the value

Yah I am just thinking whether we can perform an optimistic concurrent control there to avoid atomic

sturdy sequoia Feb 19, 2024, 12:19 AM

#

lavish tree Yah I am just thinking whether we can perform an optimistic concurrent control t...

I mean technically it's true that racing doesn't matter, it's more the possibility of the structure having been modified by another thread the issue

#

and therefore the hash having been reset

lavish tree Feb 19, 2024, 12:19 AM

#

Oh it can be reset?

#

Then never mind it would make things much harder

#

If the hash is only computed once and never changed then I think it might be useful to consider that

sturdy sequoia Feb 19, 2024, 12:20 AM

#

lavish tree If the hash is only computed once and never changed then I think it might be use...

yes, that was the previous Prehashed which just stored a u128

#

now the idea is to still hash on-demand, but store the hash anyway

lavish tree Feb 19, 2024, 12:21 AM

#

No I mean once it is stored why changed?

sturdy sequoia Feb 19, 2024, 12:21 AM

#

I haven't tested, but I wonder how a RwLock<u128> performs too 🤔

lavish tree Feb 19, 2024, 12:21 AM

#

Might be very expensive

sturdy sequoia Feb 19, 2024, 12:21 AM

#

lavish tree No I mean once it is stored why changed?

In case the value being hashed is changed

#

like a Content to assign its Location, stuff like that

lavish tree Feb 19, 2024, 12:21 AM

#

Ah ok

#

I thought types avoid mutable stuff to make things easier

lavish tree Feb 19, 2024, 12:23 AM

#

sturdy sequoia I haven't tested, but I wonder how a `RwLock<u128>` performs too 🤔

It seems to provide more than what we need🤔

#

Well wait if a content is changing then atomic hash is not gonna help?

#

We need to hold the things with a lock (maybe rwlock) to protect the potential that hash doesn’t match the content?

sturdy sequoia Feb 19, 2024, 12:26 AM

#

lavish tree We need to hold the things with a lock (maybe rwlock) to protect the potential t...

the hash gets atomically reset when the value is modified when the DerefMut impl is called

lavish tree Feb 19, 2024, 12:26 AM

#

Yes but if multithreaded

sturdy sequoia Feb 19, 2024, 12:27 AM

#

lavish tree Yes but if multithreaded

in that case yes, the inner value would be held in some lock

#

the problem is that when Hash::hash is called you don't have a mutable reference

#

therefore you need to use interior mutability

lavish tree Feb 19, 2024, 12:27 AM

#

Another thread can still call it and compute the hash even after reset but before the content change available to it

sturdy sequoia Feb 19, 2024, 12:27 AM

#

in addition, the value might have already been modified and it just needs to be hashed on another thread

sturdy sequoia Feb 19, 2024, 12:27 AM

#

lavish tree Another thread can still call it and compute the hash even after reset but befor...

no, because you'd have a lock around the LazyHash<T>

#

which means if you can mutate it, you can't also have a reference to the hash inside of it

lavish tree Feb 19, 2024, 12:28 AM

#

Ah ok

#

I didn’t really thought in the way rust allows

#

That’s interesting

#

Then I don’t really think we need a atomic128

sturdy sequoia Feb 19, 2024, 12:29 AM

#

lavish tree Then I don’t really think we need a atomic128

yes indeed, it appears that using a parking_lot::RwLock is about as fast 😉

lavish tree Feb 19, 2024, 12:29 AM

#

I guess they implement some very lightweight rwlock

sturdy sequoia Feb 19, 2024, 12:29 AM

#

And doesn't introduce an additional dependency which I like

sturdy sequoia Feb 19, 2024, 12:29 AM

#

lavish tree I guess they implement some very lightweight rwlock

you have to think of it more as being much lighter than actually re-computing the hash 😂

lavish tree Feb 19, 2024, 12:30 AM

#

What would be the performance of a plain u128? As we don’t have multhrrad yet

#

Haha of course

sturdy sequoia Feb 19, 2024, 12:30 AM

#

lavish tree What would be the performance of a plain u128? As we don’t have multhrrad yet

can't do that since we need mutable access from an immutable reference

lavish tree Feb 19, 2024, 12:30 AM

#

Well a little be unsafe?

sturdy sequoia Feb 19, 2024, 12:30 AM

#

lavish tree Well a little be unsafe?

https://tenor.com/view/no-nooo-nope-eat-fingerwag-gif-6587305437700023105

Tenor

lavish tree Feb 19, 2024, 12:30 AM

#

Just for testing

sturdy sequoia Feb 19, 2024, 12:31 AM

#

lavish tree Just for testing

Fine, but just for you uwu

lavish tree Feb 19, 2024, 12:33 AM

#

Would it be helpful if only one thread is computing the hash and other just wait? But probably computing the hash is fast enough that cost of synchronization is higher

#

But we can’t benchmark without a real multithreaded model🫨

#

Maybe we can gain a lot as this is definitely a hot path🤔

sturdy sequoia Feb 19, 2024, 12:36 AM

#

Well it doesn't even compile using unsafe 😂

#

or I'd need to use an UnsafeCell

lavish tree Feb 19, 2024, 12:37 AM

#

Yes

#

Too bad syncunsafecell is not available in stable

#

You would also need to unsafe impl Sync and Send I suppose😂

lavish tree Feb 19, 2024, 12:40 AM

#

sturdy sequoia yes indeed, it appears that using a `parking_lot::RwLock` is about as fast 😉

Oh might because we have zero contention

#

The rwlock is just as cheap as an u64 atomic read/write😂

#

Even mutex should perform the same I believe

sturdy sequoia Feb 19, 2024, 12:42 AM

#

Using a raw u128 in an UnsafeCell basically doesn't gain anything 🤷‍♂️

#

shrugging

lavish tree Feb 19, 2024, 12:43 AM

#

Thanks for testing hh maybe atomic is cheaper than I thought

sturdy sequoia Feb 19, 2024, 12:43 AM

#

lavish tree Thanks for testing hh maybe atomic is cheaper than I thought

atomics on a modern x86 CPU should be relatively cheap in the grand scheme of things

lavish tree Feb 19, 2024, 12:44 AM

#

I am not expert in arch in which I should😪

#

Yeah should be

#

Especially without contention

sturdy sequoia Feb 19, 2024, 12:44 AM

#

lavish tree Especially without contention

exactly

lavish tree Feb 19, 2024, 12:46 AM

#

sturdy sequoia That is indeed quite a bit faster

But why can we gain this🤔

#

Shouldn’t the atomic crate use a very lightweight lock

#

Maybe just some error on the benchmark

sturdy sequoia Feb 19, 2024, 12:48 AM

#

lavish tree Shouldn’t the atomic crate use a very lightweight lock

my guess is that the exact implementation it uses differs and is somehow slower?

#

This is likely very microarchitecture dependant

lavish tree Feb 19, 2024, 12:48 AM

#

Probably

sturdy sequoia Feb 19, 2024, 12:49 AM

#

I know that my other machine with a 12900k performs very differently in typst mostly due to lower memory latency than my main machine (7950x3d)

lavish tree Feb 19, 2024, 12:52 AM

#

The atomic crate cache a bunch of spin lock

#

Well no idea😪

sturdy sequoia Feb 19, 2024, 1:00 AM

#

lavish tree The atomic crate cache a bunch of spin lock

yes

lavish tree Feb 19, 2024, 1:01 AM

#

I have no idea how parking lot can be faster

sturdy sequoia Feb 19, 2024, 1:01 AM

#

lavish tree I have no idea how parking lot can be faster

parking lot is just stupidly well optimized 😂

lavish tree Feb 19, 2024, 1:02 AM

#

True😂

sturdy sequoia Feb 19, 2024, 1:02 AM

#

lavish tree True😂

https://github.com/typst/typst/pull/3451#issuecomment-1951524560

GitHub

Added `LazyHash` by Dherse · Pull Request #3451 · typst/typst

Adds a new LazyHash that mostly replaces PreHashed, it essentially performs the same work: caching the hash, but performs it when the structure is first hashed, using atomics a RwLock to store the ...

#

I updated the PR 😉

lavish tree Feb 19, 2024, 1:02 AM

#

I see they have a dark magic of using the elision but I have no idea about that

#

Sweet

sturdy sequoia Feb 19, 2024, 1:02 AM

#

Anyway, I'm off to bed it's 2AM here 😭

lavish tree Feb 19, 2024, 1:03 AM

#

Oh good night

lavish tree Feb 19, 2024, 1:22 AM

#

I found it, absolutely crazily optimized

#

https://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

Transactional Synchronization Extensions

Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speeding up execution of multi-threaded software through lock elision. According to different benchmarks, T...

#

that should says that parking_lot is probably as fast as an u128

tight glade Feb 19, 2024, 7:39 AM

#

lavish tree I thought types avoid mutable stuff to make things easier

I'm also curious about this 😇

sturdy sequoia Feb 19, 2024, 10:55 AM

#

lavish tree https://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

lol Intel had to disable it all the way to skylake 😂😂😂😂

#

And it was removed in comet lake from every sku sad

sly pecan Feb 19, 2024, 11:13 AM

#

sturdy sequoia lol Intel had to disable it all the way to skylake 😂😂😂😂

tsx was a security nightmare

sturdy sequoia Feb 19, 2024, 11:18 AM

#

sly pecan tsx was a security nightmare

well I guess it makes a lot of sense on a machine that only runs tasks that you have verified, but I can see how it would be a nightmare on a regular PC

lavish tree Feb 19, 2024, 3:47 PM

#

sturdy sequoia lol Intel had to disable it all the way to skylake 😂😂😂😂

lol

lavish tree Feb 19, 2024, 3:49 PM

#

tight glade I'm also curious about this 😇

Actually @sturdy sequoia said that it is verified by the single mutable reference rule in rust, so I suppose it’s almost like immutable

lavish tree Feb 19, 2024, 4:02 PM

#

sly pecan tsx was a security nightmare

Do you know whether it is the issue with transactional memory in general or just intels issue?🤔

sly pecan Feb 19, 2024, 4:02 PM

#

lavish tree Do you know whether it is the issue with transactional memory in general or just...

I think Intel is the only one that pursued this. At least wikipedia says that AMD abandoned their corresponding tech

#

https://en.wikipedia.org/wiki/Advanced_Synchronization_Facility

Advanced Synchronization Facility

Advanced Synchronization Facility (ASF) is a proposed extension to the x86-64 instruction set architecture that adds hardware transactional memory support. It was introduced by AMD; the latest specification was dated March 2009. As of October 2013, it was still in the proposal stage. No released microprocessors implement the extension.

lavish tree Feb 19, 2024, 4:10 PM

#

I think that arm also have such extension

atomic violet Feb 19, 2024, 4:22 PM

#

From what my prof told me, transactional memory is a miracle everyone wants, but no one can implement without introducing terrible SPECTRE-like vulnerabilities. He does not know whether it's just an implementation issue or a deeply-rooted problem due to safe transactional memory is just impossible conceptually. But at least that's the reason why both AMD and Intel abandoned it.

lavish tree Feb 19, 2024, 4:24 PM

#

ah good to learn, are you an arch student?

atomic violet Feb 19, 2024, 4:24 PM

#

just comp sci with an arch course

lavish tree Feb 19, 2024, 4:24 PM

#

curious will the one for arm expose such vulnerability hh? https://developer.arm.com/documentation/102873/0100/Overview

lavish tree Feb 19, 2024, 4:25 PM

#

atomic violet just comp sci with an arch course

very cool I was planning to wait one prof for the arch course but not successful and will graduate soon 😂

feral imp Feb 19, 2024, 4:27 PM

#

lavish tree very cool I was planning to wait one prof for the arch course but not successful...

I had to take OS and an arch course to qualify for a computer science master's programme, and for those reasons, I noped out and took a degree in mathematical statistics instead.

lavish tree Feb 19, 2024, 4:27 PM

#

oh man OS is fun!

feral imp Feb 19, 2024, 4:27 PM

#

Obviously off-topic, so we can talk about that in #off-topic. Sorry, I did it again..

lavish tree Feb 19, 2024, 4:27 PM

#

oops sorry yes

sturdy sequoia Feb 19, 2024, 4:27 PM

#

sly pecan I think Intel is the only one that pursued this. At least wikipedia says that AM...

I mean, IBM and a few others also support it

#

not on x86 mind you

#

on their POWER architecture

sly pecan Feb 19, 2024, 4:28 PM

#

does IBM even make anything other than mainframes at this point?

sturdy sequoia Feb 19, 2024, 4:28 PM

#

sly pecan does IBM even make anything other than mainframes at this point?

yes, their POWER 9 is available in servers angryeyes

#

Don't talk down to my darling

#

one day I'll have one of their CPUs to test

#

uwu

atomic violet Feb 19, 2024, 4:29 PM

#

tbf transational memory is good if you trust the code you are running

feral imp Feb 19, 2024, 4:30 PM

#

POWER arch wasn't that Mac in the 00s?

sly pecan Feb 19, 2024, 4:42 PM

#

yes

sly pecan Feb 19, 2024, 4:44 PM

#

sturdy sequoia yes, their POWER 9 is available in servers <:angryeyes:1004114265176821790>

Presumably https://www.ibm.com/z ?

IBM Z Mainframe Servers and Software

IBM Z mainframe servers and software deliver secure, reliable, and fast IT infrastructure for digital transformation.

sturdy sequoia Feb 19, 2024, 4:50 PM

#

sly pecan Presumably https://www.ibm.com/z ?

yeah basically

sly pecan Feb 19, 2024, 4:50 PM

#

Those are mainframes though, IN YOUR FACE

#

🙂

sturdy sequoia Feb 19, 2024, 4:50 PM

#

sly pecan Those are mainframes though, IN YOUR FACE

THEY ARE IN SERVER FORMATS

#

angryeyes

#

https://tenor.com/view/p1ture-gif-20727532

Tenor

sly pecan Feb 19, 2024, 4:50 PM

#

Anyway, it wasn't meant in a negative way

sturdy sequoia Feb 19, 2024, 4:51 PM

#

https://www.ibm.com/servers

Enterprise Business Server Solutions | IBM

Enterprise servers built to handle mission-critical workloads while maintaining security, reliability and control of your entire IT infrastructure.

sly pecan Feb 19, 2024, 4:51 PM

#

sturdy sequoia Feb 19, 2024, 4:58 PM

#

sly pecan

Well well well

glad urchin Feb 19, 2024, 5:23 PM

#

sly pecan

I told you not to download the transactional memory CPU extensions

#

Anyway, the damage is done now... It cannot be stopped

feral imp Feb 19, 2024, 5:24 PM

#

glad urchin I told you not to download the transactional memory CPU extensions

This is my type of humor. Absolutely hilarious.

#

So @sturdy sequoia, what's your immediate intentions with the beloved VM?

sturdy sequoia Feb 19, 2024, 5:27 PM

#

feral imp So <@130737672951037952>, what's your immediate intentions with the beloved VM?

First let's get LazyHash merged for the nice bump in perf

#

then I think it's pretty much ready

#

not merge ready, but the perf is in a good place!

glad urchin Feb 19, 2024, 5:27 PM

#

Will 10000 tables compile in 0 seconds now?

#

(Note: 0.1ms is not acceptable)

feral imp Feb 19, 2024, 5:32 PM

#

sturdy sequoia First let's get `LazyHash` merged for the nice bump in perf

Cute little PR.

sturdy sequoia Feb 19, 2024, 5:33 PM

#

feral imp Cute little PR.

ikr

sturdy sequoia Feb 19, 2024, 5:33 PM

#

glad urchin Will 10000 tables compile in 0 seconds now?

chadNO

glad urchin Feb 19, 2024, 5:34 PM

#

sturdy sequoia <:chadNO:931251916871196683>

Sad

#

Also I may have made a confusion in #discussions

#

I think once I said 10000 tables of 5000 cells compiled in 1 second

#

But that's actually a separate benchmark

#

I think it was like 400 tables or so

#

lol

#

10000 i think takes 30s or so

#

:p

#

But yeah after your PR I'm sure we're getting to 1s 👍

#

Trust

sturdy sequoia Feb 19, 2024, 5:47 PM

#

glad urchin But yeah after your PR I'm sure we're getting to 1s 👍

Héhé no

sly pecan Feb 19, 2024, 5:48 PM

#

sturdy sequoia Héhé no

Is héhé French laughter?

sturdy sequoia Feb 19, 2024, 5:50 PM

#

sly pecan Is héhé French laughter?

héhé

#

è_é

sturdy sequoia Feb 19, 2024, 6:00 PM

#

sly pecan Is héhé French laughter?

it's more of an evil laugh

atomic violet Feb 19, 2024, 6:22 PM

#

sly pecan

POV: you are not using typst VM yet

sturdy sequoia Feb 19, 2024, 6:23 PM

#

https://tenor.com/view/dog-husky-soon-gif-7801078

Tenor

atomic violet Feb 19, 2024, 6:23 PM

#

I have a really dumb optimization idea: every time document recompiles, choose like 80% of the functions and don't memoize them

sturdy sequoia Feb 19, 2024, 6:24 PM

#

atomic violet I have a really dumb optimization idea: every time document recompiles, choose l...

that's just plain cursed lol

#

I think having an estimate of the "size" of the input and the "complexity" of the function would be smarter and clearly would just work™️

atomic violet Feb 19, 2024, 6:26 PM

#

motivation is that

20% will still memoize something, and maybe something worth memoizing
If a call worth memoization, it is expected to get memoized in 5 interations anyway
Who cares about the first few recompilations anyway, when typst-preview does than 3 times a second?
It decreases memory usage and potentially causes less hashing

feral imp Feb 19, 2024, 6:38 PM

#

I think having function attribute in typst to opt out of memo is a powerful concept.

sturdy sequoia Feb 19, 2024, 8:58 PM

#

feral imp I think having function attribute in typst to opt out of memo is a powerful conc...

yes, better than what @atomic violet is suggesting

#

although I wouldn't mind some of whathever he's smoking 😂

atomic violet Feb 19, 2024, 8:58 PM

#

https://tenor.com/view/disappointment-gif-21343121

Tenor

sturdy sequoia Feb 19, 2024, 8:59 PM

#

atomic violet https://tenor.com/view/disappointment-gif-21343121

it ain't my fault 😭

feral imp Feb 19, 2024, 9:01 PM

#

Heuristics heuristics are truly powerful concept.

#

Basically, I'm implemented by a collection of heuristics on top of each other.

sturdy sequoia Feb 19, 2024, 9:32 PM

#

feral imp Heuristics heuristics are truly powerful concept.

Well if you want to make a heuristic for when memoization is worth it, go ahead 😂

#

I am thinking of a simple threshold based system, like if it takes over say 100µs, it gets memoized

#

with that threshold being adjustable of course

#

I think this could greatly reduce memory size and reducing hashing, what do you think @left night

#

Guess it wouldn't reduce hashing actually

#

since you'd still need to check whether it's in the cache

#

So it's not worth it

#

angrythunk

feral imp Feb 19, 2024, 9:40 PM

#

Can you count instructions?

sturdy sequoia Feb 19, 2024, 9:40 PM

#

feral imp Can you count instructions?

real ones or vm ones?

feral imp Feb 19, 2024, 9:41 PM

#

Either would do for determining memo potential?

#

🤔

low sapphire Feb 19, 2024, 9:45 PM

#

sturdy sequoia yes, their POWER 9 is available in servers <:angryeyes:1004114265176821790>

I have a power9 server at home as well!!!!

#

well I mostly used it as a desktop :^)

untold turret Feb 19, 2024, 11:17 PM

#

sturdy sequoia I am thinking of a simple threshold based system, like if it takes over say 100µ...

how do we measure time?

sturdy sequoia Feb 19, 2024, 11:18 PM

#

untold turret how do we measure time?

yes, that's the issue we can't do it either way

#

there is no good way, we must memoize from the get go

untold turret Feb 19, 2024, 11:19 PM

#

I think we can measure the instruction cost like https://llvm.org/docs/CommandGuide/llvm-mca.html

#

And always memorize recursive functions

sturdy sequoia Feb 19, 2024, 11:36 PM

#

untold turret And always memorize recursive functions

Problem is we can't really do that for typst functions (i.e functions written in typst) 😂

ornate merlin Feb 20, 2024, 3:58 AM

#

sturdy sequoia there is no good way, we must memoize from the get go

Maybe try not caching some certain functions? For example functions with locate, etc. That is, those that are likely to always be called with different inputs?

left night Feb 20, 2024, 7:53 AM

#

sturdy sequoia Guess it wouldn't reduce hashing actually

#1176509648707256370 message

glossy shore Feb 20, 2024, 9:36 AM

#

atomic violet motivation is that 1. 20% will still memoize *something*, and maybe something wo...

in 5 iterations you would expect 67% of functions to be memoised following that scheme, not 100%

atomic violet Feb 20, 2024, 9:39 AM

#

well it's going to be 97% in 15

#

not 100%, sure, but you don't need 100% either

glossy shore Feb 20, 2024, 9:40 AM

#

unless you like, systematically do 20% of the total each recomp and never choose the same one twice

feral imp Feb 20, 2024, 10:14 AM

#

glossy shore in 5 iterations you would expect 67% of functions to be memoised following that ...

Thank you.

sturdy sequoia Feb 20, 2024, 10:47 AM

#

left night https://discord.com/channels/1054443721975922748/1176509648707256370/12062127569...

I had forgor about that 💀

#

But I might give it a whir in the VM

sturdy sequoia Feb 20, 2024, 10:48 AM

#

feral imp Thank you.

The statistics gods are appeased... for now

left night Feb 20, 2024, 1:51 PM

#

@sturdy sequoia PR looks nice, just a few comments, in particular about the unsafe, which I think isn't necessary. I think once merged I will remove Prehashed from comemo. It is too opinionated for it, in particular also the PartialEq and Eq implementations.

sturdy sequoia Feb 20, 2024, 11:53 PM

#

left night <@130737672951037952> PR looks nice, just a few comments, in particular about th...

Nice!

#

@left night regarding that PR, I was wondering something, it's now trivial to make Array and Dict use a LazyHash (such that hashes are conserved), should we do that? 🤔

quaint blaze Feb 21, 2024, 7:16 AM

#

so what cursed performance shenanigans have been done since last release?

left night Feb 21, 2024, 7:49 AM

#

sturdy sequoia <@311948531835469827> regarding that PR, I was wondering something, it's now tri...

For Dict it's trivial but not for Array because of EcoVec and the memory size of Value

feral imp Feb 21, 2024, 8:38 AM

#

quaint blaze so what cursed performance shenanigans have been done since last release?

~~cursed~~ majestic LazyHash and few easy wins, followed by a massive VM that improves memory and performance of typst tremendously. The latter is not yet merged, and probably won't make it into next release. But it is in a state of glory atm.

Declaration: I'm not involved in any of the work, think of me as a correspondent

atomic violet Feb 23, 2024, 5:38 PM

#

https://github.com/python/cpython/blob/5e29021a5eb10baa9147fd977cab82fa3f652bf0/Python/ceval.c#L1269-L1305 interesting comment I found

#

I am afraid doing threaded interpreting may cover the code in the hell of unsafe, and actually won't be as effective because typst is simply not as well optimized as python, but it does not mean it should be disregarded entirely

sly pecan Feb 23, 2024, 5:42 PM

#

atomic violet I am afraid doing threaded interpreting may cover the code in the hell of unsafe...

isn't python a special case because of the GIL?

#

not sure if it's related

atomic violet Feb 23, 2024, 5:43 PM

#

sly pecan isn't python a special case because of the GIL?

I was afraid of someone saying that

#

"threaded code" is unrelated to multithreading, it is a way of writing switch statements

sly pecan Feb 23, 2024, 5:47 PM

#

atomic violet "threaded code" is unrelated to multithreading, it is a way of writing switch st...

The more you know

atomic violet Feb 23, 2024, 5:50 PM

#

actually, looks like llvm may just do that without any unsafe hackery 🤔
https://github.com/ziglang/zig/issues/8220
https://internals.rust-lang.org/t/computed-gotos-tco-threaded-interpreters-experiments-and-findings/4668
so maybe the code is already compiled that way

#

very unlikely though, llvm probably saw the first bound check in the interpreter loop and went "uhm, yeah, I don't know what the hell the dumb human wrote here... yeah screw optimization, gotta go recompile that regex crate instead" 😂

sturdy sequoia Feb 23, 2024, 7:47 PM

#

atomic violet very unlikely though, llvm probably saw the first bound check in the interpreter...

Question is, how would I give it... more information to make it optimize smarter?

atomic violet Feb 23, 2024, 8:47 PM

#

sturdy sequoia Question is, how would I give it... more information to make it optimize smarter...

looks like "rewrite it in zig" is an answer 😔

#

seriously though, idk, compilers are a hard guys to negotiate with

sturdy sequoia Feb 23, 2024, 9:04 PM

#

atomic violet looks like "rewrite it in zig" is *an* answer 😔

||N|| ||O||

#

||O||

sturdy sequoia Apr 5, 2024, 5:40 PM

#

ping

#

||what could this mean? 😎||

sly pecan Apr 5, 2024, 6:05 PM

#

what

feral imp Apr 6, 2024, 5:51 AM

#

https://tenor.com/view/im-so-excited-jonah-hill-shouts-gif-11193901

Tenor

untold turret Apr 6, 2024, 6:03 AM

#

performance time 🚀

sly pecan Apr 8, 2024, 11:35 AM

#

#contributors message maybe this belongs here

feral imp Apr 8, 2024, 11:44 AM

#

Isn't it a big accuracy concern more than speed (but also speed)

sly pecan Apr 8, 2024, 11:48 AM

#

feral imp Isn't it a big accuracy concern more than speed (but also speed)

The difference in accuracy is likely extremely minimal

sturdy sequoia Apr 9, 2024, 1:54 PM

#

sly pecan https://discord.com/channels/1054443721975922748/1088371867913572452/12268461686...

I would be curious if the == E even works since you know... floats

#

I would argue the int vs float one likely makes sense, the others? probably not so much, would need to see how libc does it

left night Apr 9, 2024, 2:06 PM

#

sturdy sequoia I would be curious if the == E even works since you know... floats

it works if you literally pass calc.e and that's the whole purpose of it

sturdy sequoia Apr 9, 2024, 2:11 PM

#

left night it works if you literally pass `calc.e` and that's the whole purpose of it

I guess that makes sense

cunning wadi May 2, 2024, 10:34 PM

#

I just read part of this: https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/

#

This design might work quite well for typst

feral imp May 3, 2024, 5:27 AM

#

@sturdy sequoia VM Part II?

#

Subtitle: Stackless is Thankless

silk wedge May 4, 2024, 9:04 PM

#

cunning wadi I just read part of this: https://kyju.org/blog/piccolo-a-stackless-lua-interpre...

a stackless design could be helpful for catching mistakes leading to infinite loops (or, for that matter, someone trying to DoS typst.app)

#

not sure how useful being able to continue a piece of computation in Typst code would be, though

low sapphire May 4, 2024, 9:10 PM

#

silk wedge a stackless design could be helpful for catching mistakes leading to infinite lo...

Well, the Typst compilation happens in the browser via WASM.

sturdy sequoia May 5, 2024, 10:27 AM

#

silk wedge a stackless design could be helpful for catching mistakes leading to infinite lo...

my existing implementation is stackless 😉

#

it just has an infinite number of registers

#

I mean to work on it but between work and now this week which sucked ass (one of my doggo went to the great park in the sky 😢 ) it's been difficult to find the motivation and time to work on Typst 😦

lunar kettle May 5, 2024, 10:59 AM

#

Are u in the US now btw?

#

(Sorry kinda off topic 😂)

sturdy sequoia May 5, 2024, 11:05 AM

#

lunar kettle Are u in the US now btw?

no no, I ended up taking a job in Belgium, and it's great!

#

My boss is awesome, they're financing my PhD, and they're paying me really well (for a PhD candidate)

#

and I get to work from home 4 days a week 😎

west light May 5, 2024, 5:19 PM

#

sturdy sequoia My boss is awesome, they're financing my PhD, and they're paying me really well ...

Definitely not in the US then.

silk wedge May 6, 2024, 8:03 PM

#

I wish I had a job remotely as good as that

#

I’m comically introverted so 😦

left night May 22, 2024, 5:14 PM

#

@sturdy sequoia I feel a little bad asking this, but since the bytecode VM work seems very stalled, should we perhaps close the PR for the time being? We can always reopen it in the future. I'd just like to keep the PR tracker from growing and growing.

sturdy sequoia May 22, 2024, 10:06 PM

#

left night <@130737672951037952> I feel a little bad asking this, but since the bytecode VM...

I think that's fair no worries 😄

#

I'll work on it at some point, but motivation has been tough lately

sturdy sequoia May 23, 2024, 3:50 PM

#

You know what, I am doing it tonight, I need to work for 1-2 hours, but after that, it's VM time ❤️

#

Won't be done tonight

#

but I'll try and merge with upstream

#

(the existing code I meant)

feral imp May 26, 2024, 1:33 PM

#

How did it go?

sturdy sequoia May 27, 2024, 8:52 AM

#

feral imp How did it go?

Working on it ferrisUwU

#

I'll see if I can finish tonight after work

#

I walked the 20kms of Brussels yesterday, my feet are ✨ destroyed ✨

#

But, I had a significantly easier time than last year, so that's a good sign 😄

sly pecan May 27, 2024, 8:53 AM

#

sturdy sequoia I walked the 20kms of Brussels yesterday, my feet are ✨ destroyed ✨

Is this a race or something?

feral imp May 27, 2024, 8:55 AM

#

sturdy sequoia Working on it <:ferrisUwU:920768833130737746>

No rush. We just love hearing from you.

sturdy sequoia May 27, 2024, 8:59 AM

#

sly pecan Is this a race or something?

it's a semi marathon, but after runners there's a walking race too

cunning wadi May 27, 2024, 9:05 AM

#

sturdy sequoia I walked the 20kms of Brussels yesterday, my feet are ✨ destroyed ✨

Very nice!

sturdy sequoia May 27, 2024, 9:09 AM

#

cunning wadi Very nice!

Thanks ❤️

sturdy sequoia May 27, 2024, 11:43 AM

#

@left night I am thinking of splitting the VM into its own crate (i.e typst-vm), but this creates the issue of having external components, etc. The way I am thinking of doing it is by having a Vm<T: VmImpl> where VmImpl allows the creation of all the relevant values (a function to create a SpaceElem, etc.). Because adding all of the (fairly long) VM code into typst feels a bit ugly, what do you think?

#

Actually that really doesn't work 😦

left night May 27, 2024, 12:32 PM

#

sturdy sequoia <@311948531835469827> I am thinking of splitting the VM into its own crate (i.e ...

Did you envision typst depending on typst-vm or the other way around?

#

I think only typst-vm depending on typst is at all feasible since the VM will need access to many of the central data structures. I have thought a bit about how to shatter the crates in the future, and my conclusion was that we need most of the type definitions in a core crate, but could extract computations into separate crates (rustc works similarly) which are dynamically "linked" to together through a table of function pointers produced by one thin top-level crate that depends on the core and all computation crates. But I don't think it's necessary to do this for the VM now, we can do it for everything at once sometime in the future.

sturdy sequoia May 27, 2024, 1:03 PM

#

I was thinking on typst depending on the VM, admittedly it could be the other way arround, but I'll keep doing what I am currently doing: shortening the heck out of the VM, I am trying to make it short enough that it's not 10k lines long

sturdy sequoia May 27, 2024, 7:40 PM

#

@left night I have changed the logic for import from dynamic values (i.e paths which are not known at compile time): I am now allowing it (which I think it's cool since I saw a package the other day rely on it) but for it to work you must specify the imports manually or use the name of the import (i.e if I import codly, I need to use codly)

#

I think it's a good tradeoff, if the import can be done at compile time: then it is done fully at compile time

#

Complexity wise it's really quite okay

left night May 27, 2024, 8:24 PM

#

sturdy sequoia <@311948531835469827> I have changed the logic for `import` from dynamic values ...

so basically not allowing dynamic wildcard imports, right? I think that makes sense. They are cursed.

sturdy sequoia May 27, 2024, 8:46 PM

#

left night so basically not allowing dynamic wildcard imports, right? I think that makes se...

yes exactly

#

everything else should work as before hopefully

untold turret May 28, 2024, 3:15 AM

#

sturdy sequoia <@311948531835469827> I have changed the logic for `import` from dynamic values ...

do you disallow the import cetz.draw: *? I think it is quite often used.

glad urchin May 28, 2024, 3:23 AM

#

untold turret do you disallow the `import cetz.draw: *`? I think it is quite often used.

i think they might be referring to something like #import "./the" + file: ...

#

but not 100% sure

sturdy sequoia May 28, 2024, 8:54 AM

#

untold turret do you disallow the `import cetz.draw: *`? I think it is quite often used.

no, that would work

#

because I can get cetz and .draw at compile time, I have a specific case for these kinds of uses!

#

It really only is what @glad urchin showed: completely dynamic imports

#

note that this would also not work:

let cetz2 = cetz

import cetz2.draw

#

but this limitation could be lifted with a const evaluation and inlining

#

Actually it would be fairly easy to remove this limitation, I may do it

#

Just not just yet 😄

sturdy sequoia May 28, 2024, 10:38 AM

#

Ok, I have implemented it, it's dead easy with the architecture

feral imp May 28, 2024, 11:02 AM

#

We are soooo back baby!!

sturdy sequoia May 28, 2024, 12:17 PM

#

feral imp We are soooo back baby!!

oooooh YEAAAAAAAAAAAAAh

#

It's also quite a bit shorter

#

and it is quite a bit more powerful with dynamic imports (I know some packages rely on them) and automatic constant inlining which imo is pretty darn cool 😎

#

(I should note that the constant inlining it almost free)

sturdy sequoia May 30, 2024, 12:44 PM

#

BTW I should mention that the new compiler produces much better data structures which may improve performance quite a bit by using linear structures instead of tree-like structures, so I am hopeful it will be better 🤞

sly pecan May 30, 2024, 12:51 PM

#

sturdy sequoia BTW I should mention that the new compiler produces much better data structures ...

Wdym new compiler

sturdy sequoia May 30, 2024, 1:14 PM

#

sly pecan Wdym new compiler

I am partially re-writing it while I port it to the latest typst

#

mostly simplifications

sly pecan May 30, 2024, 1:31 PM

#

We really need some sort of CI for performance

sly pecan May 30, 2024, 1:47 PM

#

Would've detected the performance regression in the 0.11 release candidate for instance

feral imp May 30, 2024, 2:53 PM

#

sly pecan Would've detected the performance regression in the 0.11 release candidate for i...

You're right. But to be honest, typst's very pertinent performance issue is the memory usage...
There is no issue at all with testing memory usage in CI, regardless of instancing.. and even hardware..
Alas memory monitoring tools suck. I wouldn't let my worst enemy configure ci mem consumption mont scripts.......

sly pecan May 30, 2024, 2:54 PM

#

feral imp You're right. But to be honest, typst's very pertinent performance issue is the ...

Why not both

feral imp May 30, 2024, 2:56 PM

#

I think that memory consumption should be a bit higher priority than compile/inc-compile performance, so I was a lil slick with that relocation of the goal post...

untold turret May 30, 2024, 3:04 PM

#

sly pecan We really need some sort of CI for performance

I remember there was discuss about it. It is troublesome since git action bot may have different hardware to affect absolute performance. So we need to set up some platform independent environment for tracking performance.

#

Besides, we need a nice set of documents for benchmark 😉

feral imp May 30, 2024, 3:07 PM

#

And incremental-compilation isn't that easy to test either...

feral imp May 30, 2024, 3:08 PM

#

untold turret I remember there was discuss about it. It is troublesome since git action bot ma...

but if you start with memory consumption then that isn't such a big issue.

untold turret May 30, 2024, 3:10 PM

#

feral imp but if you start with _memory consumption_ then that isn't such a big issue.

Good metrics.

untold turret May 30, 2024, 3:14 PM

#

feral imp but if you start with _memory consumption_ then that isn't such a big issue.

I used some heap profiler, which instruments heap (de)allocations for profiling. Since typst's comemo heavily uses memory, some small documents will take seconds to collect data. I doubt whether we can perf both CPU and memory at the same time then.

sturdy sequoia May 30, 2024, 5:01 PM

#

sly pecan Would've detected the performance regression in the 0.11 release candidate for i...

https://tenor.com/view/the-what-smile-whut-weird-stare-gif-16592004

Tenor