#Performance
1 messages Β· Page 6 of 1
I do wonder what had happened tho π€
I can't find an issue on GH
I don't remember, other than a discord discussion
This at least

How did I not find it what the heck
π
Oh a 7x increase
oof
Ah I see, the introduction of Context decreased memoization for non-contextual areas, makes a lot of sense
really like the fix that @left night found for it
comemo is too strong πͺ
Someone mentioned some other performance regressions with 0
11, but nothing that major
Well I need to start making measurements then
/s
How close are you to finishing the jit compiler?
Actually decently close
i know, right? :)
maybe salsa can do that, but I would have no idea how
BTW, did you make progress on multithreaded typst?
it's still blocked by location assignment, but I want to tackle that soon. it will probably require a bit of breakage though.
Figures, maybe somebody has approached this in a research paper? π€
this = a similar problem
Can't make an omelet without breaking a few eggs
I think it's a bit too specific for that
It's not really a general problem, it's more about what's right for Typst I feel like
Apparently this is mentioned elsewhere https://discord.com/channels/1054443721975922748/1229435835439517716
Unfortunately there's not much we can do currently, I think that to improve memory usage we would need to be cleverer about how we memoize (i.e avoid memoizing something that is already memoized at a higher level in the backtrace) but like... automatically which might prove difficult. Another way would be a "smarter" cache eviction so... a garbage collector π
Just make typst pro come with a stick of ram π
@left night What do you think of the mod name containing both the compiler & vm being called typst::lang?
Since it's not really evaluation that's how I've been calling it in v3
lang is already Lang and Region and much of the stuff in foundations is language-level as well
hmm, I'll rename it eval then, I am just afraid that'll make the diff a complete mess π
yeah the diff will be useless indeed
I'm also open to other proposals, I just think that lang is a bit ambigious
currently, it's compiler? I agree that that's not ideal since the whole things is the compiler.
In the older versions I had separate mods for compiler, vm, and the "common" stuff (whose name I forgot) and it was a mess
in principle, it's pointless to look at the diff anyway
will just be bit difficult to leave review comments
yeah that's why I wanted to move it that way it's cleaner
overall
though wait: will the files even have the same names?
some (albeit few) do
but I wouldn't want to pick a worse name just for review
the common ones mostly follow the same naming scheme
likes ops.rs
which is still the same just uses references rather than owned values
@sturdy sequoia I was thinking a bit about debugging memory usage and was wondering whether we could override the global allocator and use the exact same typst-timing tracing spans to record where things are allocated and where they are deallocated. We could even export the output in the normal format and use perfetto.dev to view, just with times replaced by bytes allocated/deallocated. What do you think? Note that I haven't yet researched in-depth what other solution exists, so maybe there's something better out there.
There's a crate for that afaik!
tracking-allocator
It's a fairly common task in game dev where memory usage can be critical
and allocation on the "hot path" are an absolute no-no
Could definitely implement support for it in typst-timing and report it as an extra value over time and where they occur, should be easy-ish
I can look at it once I'm done with the VM if you want, will be a great way of optimizing the VM once it's done π
That'd be great
... We are sooo back!!!
1338 passed, 707 failed, 0 skipped
You know what, not too shabby for a first run with so many changes
Solid C grade
I should note that most errors are changes in span, there's actually very few actual bugs in there
10 hours and 49 minutes until all is passed
Did you actually calculate that lmao
1498 passed, 547 failed, 0 skipped
Nooo. ChatGPT.
1537 passed, 508 failed, 0 skipped
That's not too bad
1550 passed, 495 failed, 0 skipped
1631 passed, 414 failed, 0 skipped
414 failed π
I am taking a wee-bit of a break π
dherse's eval bug fix twitch (replace with cooler alternative) stream
I should do a programming twitch about typst π
1632 passed, 413 failed, 0 skipped
Now that's a huge improvement!!! /s
1651 passed, 394 failed, 0 skipped
@left night This is a very weird test case:
// Error: 2-38 maximum show rule depth exceeded
// Hint: 2-38 check whether the show rule matches its own output
#layout(_ => include "recursion.typ")
But with my compiler since the include is performed at compile time, it will produce the error:
// Error: 22-37 cyclic import
Which imo makes more sense, can we agree on this?
1727 passed, 318 failed, 0 skipped
1785 passed, 260 failed, 0 skipped
1792 passed, 253 failed, 0 skipped
I agree
if i understood correctly, what's happening here is that the evaluation of that include is only called on layout, and its recursion detector is used to detect infinitely recursive show rules
in that sense, the error is definitely wrong
no show rules are being run
im gonna assume here that Laurenz wanted a way to be able to tell things apart better but that was the best design possible at the time
I think so too
but since the compiler does as much as possible statically, it can detect this case trivially
surely
but we also gotta make sure to not produce regressions here
in particular regarding delayed errors
I did change the test to have a better error
i think it's likely that what you're doing is right, just gotta make sure it's consistent
it shouldn't be impacted in general since this happens long before the code is actually evaluated and if it is dynamic, then it will be done at runtime and with the full context
in principle that sounds fair to me, but yeah, gonna need a whole lot of testing :p
Yeah for sure!
Currently I am trying to get the test suite to work
then my thesis, then some other docs I have
then onto the community to help test this π
Only about ~10% of them anyway
We are sooooooo back!!!
it ain't too much
yeah
surely we can delete them without any consequences
and immediately merge the PR
eeeasy, I'm starting to think you're serious..
π
nah
not immediately
we can wait a few hours
just to make sure you know
the bugs might magically report themselves
Good enough, push to production
1825 passed, 220 failed, 0 skipped
5-by-5
(it's a joke, no worries)
Ok, so I'm off to bed
tomorrow I fix the rest
and then Laurenz can hate me for the size of the PR
when did pg turn blue?
when they joined the dark side.
precisely at this moment: #discussions message
1833 passed, 212 failed, 0 skipped
I've discovered a bug in loops which I am now fixing, it's funny because I thought I had made a mistake in the old VM but it was correct and by ""fixing"" it, I introduced a bug lmao
It is great you can still remember.
@left night When are spans for Content assigned? I can't figure it out π
Ok I think I figured it out
It looks like it's in Eval for Expr
exactly
that makes things... complicated π
'cause on the one hand I can just add a new OpCode/Instruction called Spanned that assigns a span to a value
which is... okay
but it's wasteful if the value doesn't need a span
can't you integrate the spans into the const table during compilation (you probably have such a table right)
Yes for all const things that's the case but not for dynamically created elements
and for dynamic content it would be part of the op that constructs that content
like calling a function creates a value but it isn't spanned
although I could make it spanned
so = Heading #x is fine but #heading[Heading #x] doesn't get a span?
in principle it could happen inside of the function call I think
Mind you, I can make it spanned on function calls relatively easily
exactly, it's what I'm trying
1860 passed, 185 failed, 0 skipped
That does indeed fix a ton of errors
so I think it's a good way of doing it
Now I think a lot of the remaining issues are math stuff
because it's partially broken sigh
ugh I think piping loads of files through codelst might be a tad slow xD
hmu if you want to chat about math stuff! Been really diving into the math parser lately in case you have AST related problems.
The AST is untouched, I actually have no idea what's causing the issue because if I print out the components being produced, they're fine, so I have no clue where the issue lies
I think something about function calls in math is broken
but I have yet to figure out what
But I'll be sure to ping you π
π
multithreaded typst would be amazing
I'm the kid reading sweety Performance messages everyday.
how many files are we talking?
22
https://github.com/jneug/typst-codelst/blob/main/codelst.typ there's a lot of for loops here
not to mention showybox on top of that
Multithreading would probably help a bit, but the underlying reason seems to be that it's just doing a lot of stuff, and eval is pretty slow
yeah...
I've recently read the source code of showybox and the things it puts the compiler through ...
It's placing and hiding and introspecting and counting and stating
All to place the box over the other box
Perhaps, but I think it would probably also be possible to simplify
And do it with just measure and place
But a placement anchor system would go a long way to simplify such things
Basically just a second parameter for place to pick the anchor point
One more branch can't hurt πΆ
1871 passed, 174 failed, 0 skipped
1895 passed, 150 failed, 0 skipped
Good progress so far
These fixes.. are they changing the VM in some way? Would you expect different performance outputs once this is all done? I guess.. yes?
no, it's mostly spans, small errors, etc.
overall the impact should be negligeable
My friend thou are a mad man, happy pride month btw
@left night I discovered a very bad bug in the implementation of PicoStr which you may be aware of already: it can produce two IDs for the same string meaning that you cannot purely rely on comparison (it's fine to use as a storage of string however)
It's rare but it does happen π
I fixed that on main
Darn!
You're so fast π
I was panicking that I had broken everything
I ran into that while adding IDE tests which run multi-threaded
I mean I ran into the issue with tests of the compiler since it relies quite heavily on PicoStr to avoid storing string and doing string compares
1899 passed, 146 failed, 0 skipped
That's not very impressive of a change, but I completely revamped function calls & field accesses to fix a ton of edge cases and make it much simpler
1905 passed, 140 failed, 0 skipped
1938 passed, 107 failed, 0 skipped
Wow that's almost a 30% improvement in one tiny fix π
1946 passed, 99 failed, 0 skipped
Get rules work now π
(along with loads of simplifications)
Is that slow? I get that compiling a document without codelst π
Extremely
in a context you can get the values of set stuff
like:
#context {
text.lang
}
will print en or fr, etc.
yeah you're right
I know of that
But at one point far in the past, we talked about "get rules" and the name is engraved in my brain
π
I also use that name still
Γ¨_Γ©
yeah I somehow forgot
Though I think I've intentionally not used that term anywhere officially
Like in the docs
Ok so most remaining errors are spans and error messages that are changed (which I'll change back duh!)
"contextual expression"
--- ops-assign-to-invalid-unary-op ---
// Error: 2:3-2:8 cannot apply 'not' to string
#let x = "Hey"
#(not x = "a")
@left night can we agree that I can remove these kinds of tests for the VM?
Because of the initial tree traversal, it gets immediately detected as a cannot mutate a temporary value
i've turned it into two tests:
--- ops-assign-to-invalid-unary-op ---
// Error: 2:3-2:8 cannot mutate a temporary value
#let x = "Hey"
#(not x = "a")
--- ops-invalid-unary-op ---
// Error: 2:3-2:8 cannot apply 'not' to string
#let x = "Hey"
#(not x)
two tests are good
Nice, there's a few cases like this in the ops test which I am splitting into smaller tests
because errors are caught super early (which is a breaking change*)
yeah, but it's a good one
I wonder: When I write import, do you eval that module or just compile it?
Yeah, some tests also have like a dont-care which I have changed with panic(...) instead
it's eval'ed immediately
Makes sense
Could change it to compiling it but then I can't statically inline the imported values
yeah
But if it just compiled it, that could be cached on disk fairly easily (but compiling is actually super duper fast)
compared to eval *
Do you inline small functions? :D
no not yet π
But it should be doable fairly easily actually
I think that would be a huge win down the road
Yeah it skips memoization
Actually also for memory usage, because memoization.
but that would require a good way of determining what a "small" function is
could be just a number of instructions/opcodes
I hand-inlined a few of the core util functions in cetz a week ago and it gave a 20% boost ^^
that's what I'd do
at least if it doesn't have loops without statically known upper bound
I'll have a look at it with some real docs to find a good balance once I'm done with the current bug hunting I am doing
that's reasonably easy to check
not for while loops mind you
but I could add some metadata on that in the compiler
like a undefined_loop: bool or something that gets switched to false when I encounter a loop without a defined number of iterations
does cause issues for iterators however since those can't be statically analysed (the compiler just isn't sofisticated enough)
I also think we could add a feature to comemo for optional memoization like:
#[comemo::memoize(..., enabled = array.len() > 1000)
fn does_iteration(array: &[u8]) {
}
Something like that
it should be fairly easy actually
It's like 20 lines top in comemo's proc-macros
It's all heuristics based of course, but I think that's okay for reducing memory consumption
it's like a space vs time tradeoff
yeah, that doesn't look bad
I'll make a PR once I'm done working 'cause you know... Typst isn't my actual job π
Consider slowing down a bit so you don't get burned out π
Fair point, but I also geek on Civ VI outside of working hours π
I actually was taking a new medication supposed to boost my moods and for 6 weeks it did the exact opposite, I wasn't sad, but I was angry all the freaking time. So I stopped taking it (after talking about it to my doc, duh!) and I am feeling so much better now π
And I was sleeping like shit, I was always tired, always angry, it gave me headaches, really not good
And it feels good to finally have energy again
see? it worked!
1955 passed, 91 failed, 0 skipped
I'm happy you're no longer the hulk π
Honestly, it feels good not to be
poor Dr Baner π¦
@left night sorry for kind of spamming you the last few days but I am a bit puzzled about thisd test:
--- parser-backtracking-destructuring-assignment ---
#{
let s = "(x) = 1"
let pat = "(x: {_}) = 1"
for _ in range(100) {
s = pat.replace("_", s)
}
// Error: 8-9 cannot destructure integer
eval(s)
}
The compiler complains that x is undefined, but on main it only complains about the destructuring of an integer, can I declare the x before it? that gives me the correct error. Or is there something wrong with the way I am allocating variables?
Code that gives the exact same error (note the let x)
--- parser-backtracking-destructuring-assignment ---
#{
let s = "let (x) = 1"
let pat = "(x: {_}) = 1"
for _ in range(100) {
s = pat.replace("_", s)
}
// Error: 8-9 cannot destructure integer
eval(s)
}
1984 passed, 63 failed, 0 skipped
shouldn't let x be a noop because eval doesn't assume the parent scope?
(meaning the fix should be eval(s, scope: (x: none)))
ah, I did not think of that, weird
since eval does not pass any scope indeed
right, I typed the fixed example wrong
it's fixed now
that also fixes it!
Nice, thanks β€οΈ
I think it's better with scope because the test is about destructuring assignment
(let s = "let x; (x) = 1" would also work)
I agree, I changed it to use scope π
yeah, this test is really only about the parser. the error message just happened to occur for the simplest snippet I came up with.
1992 passed, 55 failed, 0 skipped
I'm sorry to interrupt but what are you working on?
Thank you!
Math functions are the bane of my existence

I have completely re-built my function & access system which removed like 500+ lines
but the math functions are completely incompatible with this
since they can be both a function and not a function
π
impl CompiledAccess {
pub fn get<'a: 'b, 'b, 'c, O>(
&'a self,
vm: &'b mut Vm<'a, 'c>,
engine: &mut Engine,
accessor: impl for<'d> FnOnce(AccessedValue<'d>) -> SourceResult<O>,
) -> SourceResult<O> {
I'll admit that the function definitions are a bit cursed tho
Jesus Christ
to be fair, this does recursive borrowing (both immutable and muttable) of values and allows keepng track of it all using only safe rust
I can only do so much π
good luck π
@left night as @tight glade and I have been thinking of inspection of values (something the VM currently doesn't support), I did figure out one "easy" and neat way of doing it: recompiling the module (so the opcodes) with a special "inspect" opcode, it allows it to be in the same place as the old code. But: it means that we're decreasing memoization, and I would rather avoid the whole process causing huge amounts of memory use, so my idea is that when a node is inspected, it would disable memoization of the compile & vm system temporarily. Which plays along really nicely with the comemo PR I have opened, and could reduce memory usage nicely in the IDE
this could work. question: have you looked into how breakpoints work in other VMs? since it's essentially the same.
Not yet, my primary worry is that more than one instruction can have the same span
i.e a load followed by a copy for example
thinkg of #my_var.value will load the value (because it's an access) and then copy to the Joiner
Mind you there might be something better
If more than one instructions have the same span it's both - not a problem : their output (and input?) should be inspected - a problem : if they share inputs, the risk is to inspect them twice isn't it?
Are there any cases where those would be semantically two different inspections?
Otherwise we could just arbitrarily choose to not attach the span to one of the instructions
Even the Copy instruction must have a Span, since if you're copying to Joiner, it could still result in an error π
Oh, because spans are also used for errors...
yep π
Do we happen to have a convenient unused bit somewhere in instructions?
then you could sort of mark inspectable: false
that could be done by just aditing the macro
but I think it's easier to make it at compile time
and just not memoize that compilation
Why is it a problem to memoize with inspection again btw?
it just increases memory use a lot
for nothing
No but itβs probably okay
I'm always asking stupid questions but where did you learn all of these things? I'm super curious it looks very interesting
I started programming at 12-13 yo and I have a bachelor's degree in CS & EE so that helped
Okaaaay thank you!
other than that, if you're more specific about what you want to learn about, it would help π
Didn't mean to sound like a dickhead, sorry
Don't worry I didn't perceive like that π
Yeah to be precise, how did you learn VMs like you write? Did you have any recommendations to learn this subject?
Well it's a mix of having a lot of Java experience (especially bytecode engineering as they call it, i.e generating custom bytecode at runtime to do interesting stuff), trial and error, and reading other projects online
always use open source projects to your advantage
currently I am looking into Rune a lot for their VM but I don't understand how they deal with mutable methods
For bytecode VMs and programming language implementation, you might try the book Crafting Interpreters: https://craftinginterpreters.com/contents.html I've referenced it a few times, and it's quite helpful!
Java is my nightmare with their syntax...
I'll do it!
I'll check that, if you can't understand all of it I think I'll be in trouble compare to you x)
Ooooh I'll check that! I know this book from a friend because it explain how to create a programmaing langage but I didn't know they explain vms too!
Thank you all!
yes that too, how could I forget? π±
And why didn't I think of checking their function call chapter!
@left night I was actually wondering, is accessing methods as closure something we'd like to do?
#let a = (1, 2, 3)
#repr(a.at.with(1))
I think rather not because it is ambigious with field access
yes that's what I was wondering
But it's really tricky overall
I think we have a poorly defined concept of function, fields, and methods π
And they all have somewhat different semantics
especially when you factor in math
and that makes things difficult
And modules!
What about let clamp10 = calc.max.with(10)
That's unproblematic, since max is not a method
How so?
But why is that not ambiguous?
because a.b is always a field access (and it is here) things are not amibigious right now. things would become ambigious if we also allowed to access methods like this
if a is an array, a.at is simply not defined currently
only array.at is defined and a.at(x) is sugar for type(a).at(a, x)
notably at is not again dispatched as a method call here because the "type" type does not / cannot have methods
I don't know, I just feel like it all kind of clashes with one another, and especially when math is factored in
it's difficult to find a good access pattern
Could I just desugar this in the VM then?
that would make method functions much easier
probably yes, with the exception of mutable methods, which are a hack
by having a:
{ instructions for a }
Type { a } // Stores the type of a
Method { the-type, args }
yeah mutable methods are actually where the pain lies
like they really make things messy
code-wise I mean
but there are very few of them
yes but I hade the hard-coding of the name of methods
because it means that the day we have custom types it will be messy too
and without having type info it's basically impossible to solve at compile time
custom types will not support mutable methods most likely
which leads to access being complicated
they could still define a .at() method or a .insert() method
they just wouldn't be mutable

Do you see a solution for that?
not off the top of my head. we can discuss this more later. I need to get some work done now or else the multi-threading PR won't be up today π
Γ¨_Γ© I am working too π
But on slides, which have been sucking the life force out of my for two weeks
mind you they're π₯
hopefully done with Typst?
no, didn't have time to make the template
next time π
meh
1739 passed, 333 failed, 0 skipped
Math is 100% broken, but judging by the fact I redid the entire function, method, and field access sytem, I'd say it's pretty good
(and most are span issues)
2003 passed, 69 failed, 0 skipped
Ok, I don't know what I changed that caused this big of a difference (for a few days I have been focusing only on a small subset of tests) but darn I am good
@feral imp are you proud of me? 
I am proud of you
I will now be willing to adopt you, or we can both be adopted by @cunning wadi
let's both adopt dherse then
ecin
2006 passed, 66 failed, 0 skipped
The number of the best π
I can now compile The Thesisβ’οΈ, which yields a speedup of 32% (6.7 vs 5.05s)
For a first initial impl, I'd say that's pretty gewd
vs v0.11.0 *
And a nice incremental improvement of 29%
and I think there's a lot of room for improvement, one of which I already know I am going to do
then there's inlining
and actual optimization work
(there's also a bug where it doesn't converge in 5 iters for some reason, so it's actually better than that!)
nice!
yes π
And it's before any opt
so I am quite confident I can get that number up
looking forward to that
@left night does this test really make sense?
--- label-in-block ---
// Test that label only works within one content block.
#show <strike>: strike
*This is* #[<strike>] *protected.*
*This is not.* <strike>
in my case the #[<strike>] is equivalent to repr(<strike>) which is weird but imo this should just be an error
39 of the remaining test are math func calls which I still haven't fixed π
i guess a better test would have something like #[#sym.zws <strike>] or smth
but i guess it sorta makes sense if u consider it to be labeling an empty sequence
still weird tho
?r #block(stroke: 1pt)[#sym.zws <strike>]
Weird, when I do this I have an extra space 
Found a bug not covered by tests
No I lied, it works
?r
#show <strike>: strike
*This is* #[#sym.zws<strike>] *protected.*
*This is* #[<strike>] *protected.*
*This is not.* <strike>
well, i guess it makes sense
ah, the two surrounding space are now used?
yes
The test tests specifically whether the label can leak out of the block when nothing is in front of it.
2031 passed, 41 failed, 0 skipped
Good progress in math stuff
yeah I ended up adding a special case: joining a label on an empty Joiner does nothing, it's a no-op
2034 passed, 38 failed, 0 skipped
Getting there, math now works π
there's actually only one bug left, all the others are just small changes in spans which I'll fix afterwards
and then it's optimization time: parallel eval which is now trivial, inline of small functions, actual optimization, etc.
@left night just a quick aside: I am trying the comemo::memoize(enabled = instructions.len() > 100) for testing whether it works, and it reduces memory usage by a ton
of course this is a very very simplistic way of doing things
We would need to find a good heuristic, but it does seem to me like a nice way to go
@atomic violet with comemo enabled, we can run much higher res:
how much are we talking?
well, for my thesis about 1 GiB of memory usage
from 3.2GB to 2.2GB
of course my thesis still uses tablex *
and on @atomic violet's path tracer, it leads to not compiling (so > 32 GiB) to compiling ~22 GiB
for my thesis it does lead to a small performance drop of about 5% cold, no changes incremental
5% compared to the VM version without enable = ..., so still around 20% faster than main roughly
(I haven't done a full benchmark suite with many samples yet)
Mind you, I am seriously thinking of making Args take references, it would remove the bulk of the cost of calling functions
since the VM already uses mostly references
it kind of makes sense
That doesn't seem that significant.
Have you tried creating a graph which shows memory usage versus the number (100 currently)? And also how it affects compilation time
(Of course it's likely that a smarter heuristic is required)
Just wondering how far this simple one can be taken.
yes I can do that
but later :-p
probably, I think that checking whether there are big arrays in the args would matter
something like adding a value.len() that counts the length of the entire hierarchy
How about the execution time?
As in base caching on how much time it took
Just a naive question
We can't do that since we'd need to hash the elements, see whether they are cached, if not then execute and cache
while this could be done, it has the major drawback that we're still hashing even if we don't need to
Do you know what kind of performance impact it would have, or are you just guessing?
It depends
that's the thing
On small inputs (like small functions) it's gonna be expensive af because the hashing would likely be more expensive than eval
on functions with lots of processing not so much
@tight glade I finally pushed the new function call, method call, and field access code
I also pushed a new Observe opcode to handle observed code π
(I didn't modify the ide crate, just the compiler & VM to use it)
2038 passed, 34 failed, 0 skipped
@left night is a let binding of type closure always followed by a closure declaration?
Ok, I fixed a nasty bug in closures that was very tricky and only tested in one test
2040 passed, 32 failed, 0 skipped
the rest are span discrepancies which needs fixing but all major bugs are fixed afaik
there is one more that I haven't found yet which causes issue on The Thesisβ’οΈ (i.e it doesn't converge in five iterations and I don't know why)
Maybe it shouldn't have converged in five iterations and you fixed a bug π
my internet is so bad over where I am at the moment, it honestly doesn't seem like it, but I trust you π₯°
(good luck with this VM sprint btw!)
that's....... interesting?
That's my suspicion at the moment π
Have you looked at what the differences are after 5 iterations?
I cannot... see it
Ok so there are differences
around tablex stuff
which relies on locate(...) so that lends credit to the idea that it doesn't converge
But which one is correct?
main
I think
it's hard to tell from the output
(of diff-pdf)
Actually no
The VM is correct
main seems to add some spacings (which cause the issues) where tablex tables are
really good
The VM first, then main on the right (main = v0.11.1)
The gap at the top of the page is different
@left night do you know why this could happen?
Mind you, it could just be that the VM has a bug that causes it to need more iteration to reach the same state
which would be weird
Yeah there are differences caused only by convergence issues (it's always around locate, etc.)
so there must be a way of fixing this
I retract my prior statement, it is not a VM bug
it is a main bug, I was comparing with v0.11.1 which doesn't have a bug that appears (like @upper bobcat's bibliography bug) with the pure location assignment
basically, my thesis no longer converges
and it seems to be somewhat related
Essentially, your bug made me think that maybe the bug I was having was not VM-related but due to merging with main recently
which does appear to be the case
||I don't really understand, but I also don't really want to. Too much to sink in at once.||
2060 passed, 29 failed, 0 skipped
it is going down...
(and it's merge with the latest version, which is more multithreaded)
And it converges the the exact same document as main
π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π
But now for a reality check: it's only 15% faster than main on The Thesisβ’οΈ
cold compile *
and it's slower incremental
π
(it's not optimized yet, but still, this is shockingly bad)
if I had a cetz heavy doc, I could test that (looking at you @slim sequoia)
I'm still really just looking forward to improvement in memory usage. If it is feasible for you to report change in memory usage that would be great.
sure I'll do that next π
Which main commit did you branch from?
right now I just merged with main "main"
before I had branched from v0.11.1
then I had moved on to the commit right before pure location assignment
There have been various changes that would subtly affect spacing I think
Okay, happy to say that while still 15% faster in cold, I am now on-par in hot compiles
I think it's a convergence issue
because the spacing issues only occur around locate calls
which would point to convergence issues
I think that the new pure location assignment has a bug in it
π«
I would but I'm surrounded by NDAs and blood firstly lawyers
even if you did the typst-mutilate thing?
of course if you have any hesitation, it's better not to
I'll find a way of optimizing π
That tool needs some love
Like obfuscating variable names and stuff would be really useful
And a smarter string (so as not to break constants as often if you enable it) and math obfuscation
Oh and most importantly: the bibliography should be mutilated
Well, more important tools first for now
I'd argure a label with nothing in front of it is an immediate syntax error.
well of course, it passed over twenty years ago
TBF I probably could
I'll send tomorrow
probably best if you don't
I would hate for any of those pesky lawyers to become an annoyance
I'll figure something out
I would also mention that on main, in my thesis, all bibliography links are wrong π
wdym?
the links generated like the [1] that are clickable are all wrong on main
yea but define "wrong"
Wrong destination probably
@sturdy sequoia can u send the pdf? i could try to take a look during the week
yes
@glad urchin Clicking on the links in the text leads back to the first page
clearly something is wrong with the location assignment π¦ @left night
I'll check it out on Monday
The bibliography uses a fairly niche feature of the location system (backlinks), so it might be just that which is broken.
I'm not concerned that it's a fundamental problem, just a small bug probably
probably yeah, but I think it's critical to be fixed before realease even though it's probably an easy fix for you
Goals:
- PicoStr everywhere (including
Args) - Unify constants (no more separate labels, constants, strings, etc.)
- Borrowed
ArgsusingCow<'a, Value> - (temporary) instrumentation for better results in V-Tune
- Access simplification (if possible)
- Optimizing joiner: no longer recursive struct, instead having
JoinerSegmentsfor storing rules, andVec<Content>for creatingSequenceElem
I think that with this I should already see a very nice speedup
what kind of instrumentation is needed for better results in V-Tune?
enabling instrumentation only in eval
to remove most of the noise
Moving to PicoStr is done!
Time to see if there are any improvements
That's a nice extra 10% on top of the previous 15% speedup on The Thesisβ’οΈ (so now it's 25% faster, a full second!!! 4s -> 3s)
And in raytracer, it's a good chunk faster as well (main: 32s, pre-PicoStr: 17s, now: 14s)
thats pretty dope
Does that include multithreading?
it is compared to main always so yes, multithreaded
but no compiler & eval multithreading yet
Wait but wasnβt your thesis already at 2 seconds with multi threaded layout?
Different cpu maybe?
how do you do gc on PicoStr?
yes, there is a performance degradation on main + I am exporting in PDF not in SVG
What performance degredation?
on main, I don't know why
it's worse than in the initial multithreading PR
It has to be pretty recent
I don't see anything in particular that should be Relevant
What part is slower exactly?
Can you bisect?
I can confirm in SVG the regression is lesser, but still there
Ok so luckily itβs not my subsetter
π
No, I think it's the paragraph spacing thing
it's the only PR I haven't tested individually
and since the only one before that is the multithreading one which I did test
it can only be that

There are a few between those two no?
could also be 4345 (a10e3324c21daf25fd8e8fc5336aa72718a87e47)
but i'd say 4390 is more likely
True, but it's really the only one that makes sense to me, no?
I can bisect yeah but I finish testing what I am doing first π
Which is the new far more efficient joiner
it's about 1/4 the lines of the old ones
wait, are you saying that performance degraded from 2s to 4s on main?
yeah
that is a lot
and I checked, I am 100% on main when I run that test
let me test that
on my machine it's 3s on main, just like the result I was getting right after multi-threading landed
huh, weird
I also just don't see how any of the things after that could have any perf impact like that
lemme test on that commit and on main
Wait, I did change one thing
I updated rust 

Debug build ?
Weird, I am getting the same results on both main and just after multithreading
the only thing that could explain such a huge performance degradation is the rust update then?
π±
I run with manual commands so it's realease:
@left night And does the thesis converge on your end, or does it not like on my side
(note that I am on WSL as I have almost always been so it's not some Windows weirdness in theory)
Oh no
converges for me
it's just 2x faster on Windows
wtf
What is going on
Why is it slower on linux
well, i guess it's a VM right
no
It doesn't converge on main
on linux
The other day, I did my tests on Windows
These days I am working on WSL
i meant wsl being a vm
not sure whether we are compiling the exact same thesis
ah, maybe not
I am running a checkout I made months ago
Should be the same in principle
I also changed some words here and there during incremental testing, but that shouldn't affect things much
I think mine has an updated version of codly because the VM found a bug in codly π
no indeed
also, protip: compile both binaries separately and use hyperfine with both
easier than compile -> test -> compile -> test
it also runs multiple times and gives you an average
so thats dope
Wsl 2 has near native performance
ruh roh
yep
π§
π
i can imagine, just thought it could make a difference
Why don't I do this?
but doubling the compile time would be weird yea
I never had one before
I have a benchcp shell command that copies the current release build to that dir
and then a benchit <hash> for raw benchmarks and timeit <hash> for timings.json
the only thing that's missing is that these commands can run different benchmarks instead of just the thesis
no need for anything but the thesis
π
It does test a lot of stuff!
But I need some benchmark that really exercises layout
Not just tablex ^^
I know, I am just being annoying π
ik
maybe something a little more balanced :p
So, I can confirm that on Windows, using an older version of rust, I get the same results as before
throw in some columns, footnotes and you're set
faster or slower?
So on Windows, I get the same results as when multithreading dropped, showing that the issue is on the linux side
so it's either:
- the slight modifications from masterproef/main I have
- the rust update (unlikely)
So I'll rebase masterproef and check again
The convergence issue is due to a codly update
Performance is still weirdly low on Linux
but codly is to blame for the convergence changes
I am still shocked how slow it is on WSL
maybe WSL doesn't have as smart scheduling as on Windows?
I don't know at this point
π
maybe it's running with a different number of threads?
I don't think so, 'cause I can see both of them running with 32-threads
so I think the issue is either:
- the rust update
- some WSL fuckery
Ok so Windows behaves much more predictibly, so let's say that's some WSL fuckery
I need to re-install Linux 
but like... a real linux
Friends, I think it is time to accept the fact that the VM just... is not worth it
π¦
I guess that over the months, eval performance has outweighted any gains that the VM provides
like 15% just isn't worth it for the complexity
Wasn't it 25%?
no, I've been getting a lot more variability since the multithreading
Actually it was 25%
I am tired
Though
Didn't you say it made the raytracer compile 2x faster?
So maybe it matters more on eval-heavy documents
yeah raytracer gets a huge perf bump
If I had a doc heavy on cetz I could check if it's worth it for cetz
(my thesis still uses tablex which is kind of deprecated)
Well
You could test on my experimental 0.1.0 branch which has CeTZ renderer support
I'm just trying to find time to release it but i never do so
;-;
Well that's just an excuse tbh
I released two versions of a brand new project in the meantime
π€£
yeah, I had undid an optimization, with it it's ~25% faster total
relatable
I need to release the next codly
and here I am, building a VM nobody needs
that will never be merged lol
that's a bit unnecessarily harsh
it's still incredible work
and maybe you'll find even more optimization opportunities which will make it worth it
But I remember the old one was 5x faster
and this one is only 2x faster
I don't know why
It should be more optimized π
You could try profiling the two and comparing
yeah I'll do that when I've got time
And to be fair other optimizations have happened since then so the comparison likely isn't totally fair here
PicoStr everywhere (including Args)Unify constants (no more separate labels, constants, strings, etc.)- Borrowed Args using Cow<'a, Value>
- (temporary) instrumentation for better results in V-Tune
- Access simplification (if possible)
Optimizing joiner: no longer recursive struct, instead having JoinerSegments for storing rules, and Vec<Content> for creating SequenceElem
That's the state of the low-hanging fruit things I thought about
function calls are still super slow
yeah
A perhaps more fair comparison would be to compare time spent exclusively in eval
I think that one of the biggest issues is that function calls are actually really really slow even when compared to main I think
it's due to the way I am doing accesses
but I don't know how to improve
without profiling / flamegraphs / etc. you can only guess
but those are also hard to generate for The Thesis
i guess it'd just get too large
maybe a smaller but still accurate benchmark can be found
nah, I need to instrument to limit tracing in V-Tune to only the eval portion
it's actually very easy to do
I'll probably do it this week
although a less image-heavy doc would be nice (since it skews towards decode_image and encode_image lol)
Ok, I discovered that the bulk of gains on performance in my thesis is:
- The fact that small functions aren't being memoized
that's it
If I remove that it's slower because:
- Compilation is single-threaded and takes ~100ms on the main thread
- All other eval operations are done in parallel so it just doesn't matter
This means that to be interesting:
- Compilation would need to speed up dramatically
In summary, the VM is useless, yes, it's much faster a not-so-real raytracing test, but anything other than that will likely see little to no gains. Perhaps having a document with lots of cetz drawings would help but I am not 100% convinced
The only way forward is to reduce compile time, and I don't know how to do that, it's already pretty optimized
So I think it's time to shelve it, it was fun, but there are other things for me to work on π
BTW, for anyone wondering, here's how it compares on startup:
- main: eval takes 258ms (of which 95ms is loading syntaxes)
- VM: compile takes 143ms & eval takes 99ms (of which 95ms is loading syntaxes)
And therein lies the problem, we gain 20ms on startup and then since the rest of the process is paralleled, the gains just don't matter much. Yes it's slightly faster, but it's also 11.5k lines of new code, which is a lot. And 500 kiB bigger executable size (not that much).
While I am thoroughly disappointed, it's just not worth it, compilation is too slow to make the gains meaningful, making more things parallel and/or faster would be much more interesting.
I would go even as far as to say that if I converted my doc to use native tables (instead of tablex), it would likely be slower since most of the eval code would be gone...
@left night I don't know what you make of it, but I think it's okay to sunset the VM now before I waste more time on it.
Now you might be wondering: "but what about incremental", the truth is the VM is slower because it has to re-compile, it adds ~250 ms to every incremental compile (because module compilation isn't memoized) but even if it was, the gains would be negligeable and that's the truth.
And I'd add that module eval is still memoized, so most of the work isn't being re-done.
Before Typst was parallelized, the VM made a lot of sense, but not anymore and that's okay
i think it's a possibility that should stay open for the future
but that doesnt mean that you have to keep pushing it
someone else could pick it up in the future
make sure to share everything you have now so interested folks can hack on it
haha
How about energy usage? I think that's also a consideration.
Performance on computers with fewer cores might also be relevant
If you have a branch I can send you timings on mine?
I tried decreasing with the new j params and it stays the same
Other than that my branch bytecode-vm2 is where you want to look
Niiiice lovely I'll take a look tomorrow β€οΈ
could you send both timings on main and on the VM?
That way I can compare compile + eval and just eval
Oh that took a turn
Is compilation multi threaded? Memory usage is also important and definitely worth it
The only gains on memory use is the ability to decrease memoization more easily for eval stuff
And compilation is single threaded
but I don't think that making it multithreaded would give significant gains
Of course, I invite all of you to compare main and the VM with your docs on your PCs and see if there are any gains, maybe my PC just doesn't scale that well and you guys'/gals' will
If so then I may continue (wink wink @tight glade @feral imp @left night @slim sequoia @lunar kettle @silk wedge @glad urchin)
Try free tier oracle, they don't have a lot of cores (4 exactly).
Or else if you give me exactly what to run I can try directly since I've a lot of weird and not powerfull computers
Basically, I have a branch called bytecode-vm2, running it on a document (for example my thesis), and providing me with the --timings output for both main and the VM would help a lot
Will do this evening. I take it I'm checking out main, running, checking out your branch, running, yeah?
I'll try tonight after I got home, thanks!
yeah once on main and once and on the VM should be enough β€οΈ
Thanks 
(if you can give me a copy of your thesis i'll love you)
$ time ~/conlang/typst/target/release/typst compile main.typ --font-path=fonts/ --timings
error: cannot assign to this expression
ββ @preview/cetz:0.2.2/src/matrix.typ:225:9
β
225 β (matrix.at(j), matrix.at(i)) = (matrix.at(i), matrix.at(j))
β ^^^^^^^^^^^^
help: error occurred while importing this module
ββ @preview/cetz:0.2.2/src/canvas.typ:1:8
β
1 β #import "matrix.typ"
β ^^^^^^^^^^^^
help: error occurred while importing this module
ββ @preview/cetz:0.2.2/src/lib.typ:3:8
β
3 β #import "canvas.typ": canvas
β ^^^^^^^^^^^^
help: error occurred while importing this module
ββ main.typ:5:8
β
5 β #import "@preview/cetz:0.2.2"
β ^^^^^^^^^^^^^^^^^^^^^
________________________________________________________
Executed in 350.83 millis fish external
usr time 276.30 millis 421.00 micros 275.88 millis
sys time 73.14 millis 77.00 micros 73.07 millis
(this is with my 9n-proto)
At this point, it's quite clear that the overhead of compilation is simply too high to make it worthwhile on frequently changing markup/code. For primarily textual files, an interpreter is much simpler and not really slower. This, to me, seems okay and expected. If the code runs just once, compilation can't really make it go faster, after all. Stuff that runs over and over is where the interpreter is bad, so the question is how much the VM helps here. If the VM can present good speedups for this scenario, it can still be worth it. I think most of the initial eval of your thesis isn't a great way to measure this since it is primarily eval of modules that contain just function definitions and set up, while most of the heavy lifting happens in context blocks. I would expect eval of "set-up" modules to be faster on main since it's mostly skipping the functions itself.
Still, I tried to determine how the timings play out. There is one spot to measure this in thesis: The hexagonal illustration code, which changed from ~100ms on main to ~70ms with the VM. This indicates to me that there are considerable amounts of eval slowness that the VM doesn't yet or can't easily eliminate. I think a big part of it is probably function call overhead, but also simply general runtime costs like method dispatch and operations on core data structures.
It might be worth investigating specifically this piece of code more to see where the performance is lost. I'm not entirely up-to-date on how different constructs are compiled, but I think fairly often there are calls into the "runtime" and perhaps there is just more time spent in the runtime than on tree-walking in the interpreter on main.
All that said, I think there are other weakness of the current interpreter. It can be very inefficient in unexpected ways: For instance, if you define a constant dictionary in some function close to where it is used, it will be reallocated over and over again for every call. Preventing such unnecessary costs is a worthwhile endeavour in my view.
Perhaps, the right thing to do here is use the lessons learned from your VM work to incrementally bring improvements to the interpreter? I could, for instance, imagine a mechanism to notice that an expression is entirely constant during a function call and then storing its result in a data structure kept in the function itself. Similarly, we can bring your comemo enabled optimization to main to reduce memory usage. We can also try to add hot paths for functions with few positional arguments to reduce parameter parsing overhead. Cetz, for example, has quite a bit of overhead stemming from core functions that are written with a zero-cost abstraction mindset while Typst abstractions are far from zero cost.
idk what's going on but I can't compile your thesis with your branch @sturdy sequoia it works fine on main tho
you need to bump codly to 0.2.0
it has a bug that isn't caught by main since it happens in a function that isn't called
A think is that we can mix VM approach and AST interpretation, but it will become more complex.
I think we can cherry-pick a few nice things of the VM into the AST interpretation (e.g. constant folding) without too much complexity.
ah
it's taking much longer for me :( but might also be because it doesn't converge within 5 iterations anymore
I think a use case for the VM could be for packages only since that's where the slowness is
@left night I'll read your long message over my midday break because I am deep in the weeds of some technical issues
I think a use case for the VM could be for packages only since that's where the slowness is
@left night this is what I just referred to. And we can use some simple strategy to let interpreter not compile it to bytecode if a fresh updated module can be always executed faster in AST interpretation, like a durability concept.
We can investigate this, but the results would need to be very good for it to be worth to maintain an interpreter and a bytecode VM. I think the problem isn't per se that the VM is slower on code that just runs one (meaning we'd need both), but more than it isn't currently sufficiently faster on code that runs multiple times to be worth the complexity.
It also worth investigating or evaluating how much code we need to change for adding/changing some syntax, since the syntax should change frequently at current stage.
I personally tend to see VM get merged, as long as it doesn't get thing slower, as I believe one VM will be eventually added. But we may also implement VM in another way, such as using cranelift to build a JIT for typst, in long future, so we don't have to get it merged right now.
Indeed, function call overhead remains largely unchanged due to Args hashing
I agree with you, I think that doing things like:
- Making args use
PicoStrdoes provide a nice speedup by removing a ton of string hashing which is benefitial - Trying to reduce string compares in field accesses (i.e use
PicoStrfor field names) - I think that detecting constants should be doable albeit difficult, we could have a mechanism that tracks dependencies and caches code that has none, this could be done fairly easily since we know what dependencies are in code ((im)mutable accesses, etc.)
- Optimizing function calls (somehow) would also be worthwhile, like you said, optimization aroudn smaller function could be benefitial
I think overall there are lessons learned and I will slowly open PRs with these lessons brought into main and into the eval rather than a complete VM
Sounds good!
I don't think the optimization value of moving aspects of packages into typst itself should be discounted. Not necessarily the exact functionality, but making it more convenient and faster for them to achieve what they're setting out to do. Instead of relying on very slow hacky pure typst implementations
If a package slows down a document by an order of magnitude, then it's clear that something is wrong
Subfigures, subequations
And to a certain extent, cetz would be a nice thing to handle natively
As a low-level, massively-utilized plug-in, I think so too, this must have a major impact on performance on many typst files.
The problem for whoever takes on the task will be to figure out a new line for what should be rust-side and what should be typst-side
I think all the drawing and anchoring should be rust side, instead of relying on huge chains of array and content joins
Charts and plots could be done in typst for that finer control
the equate package is a big one
it was very slow the last time I tried it
and the functionality is very basic
Regarding memory usage, does comemo use any form of compression? @left night
Same for me. It'll feature in my thesis but only in the final version :p
When I'm content with latex compile speeds
See #1244916604106702849 message
man my memory is failing me
@left night Actually, I think the VM would perform much better if we had types π€
Because I feel like typst will slow down eval a lot
where the VM will be able to resolve most at compile time
food for thought
sorry for the ping I meant tu use @silent
π
Atleast it is a weekday π
oof, I feel attacked π
I ping too ππΆπ₯
ok so pretend I'm a noob who has never compiled rust stuff before
I've got both branches
I tried running cargo build -p typst-cli
where did the executable go?
target/debug
ta
and don't forget --release :-p
Otherwise it'll be sloooooooooooooooooooooooooooooooooooooooooooooooooooooow
||yes||
π π
so I'm gonna give you two sets of timings, one with the "final" option enabled on my thesis (equate, cetz, general slow jazz) and one with it disabled (speeeed version)
as @slim sequoia is finding out, the VM does have one major advantage: it finds bugs in published packages
π
like codly 0.1.0
tinger is about to get a lot of pings, sorry D:
π
how many can i expect

next bug is in cetz and I can't be doing that at 10pm
also so many absolute path imports in hydra π
Yeah it's a style any super import looks ugly with relative syntax
I've skimmed the channel a bit and it seems the VM is not being finished?
for sure, though it did make it less easy to transplant into my project, I wonder if its worth reconsidering that when it comes to writing the definitive style guide π
(I don't know what vendoring is)
but if you meant that I copied it wholesale into a subdirectory in my project in order to change some code, then yes
yeah that would be it
was it an unreleased bug or feature? or simply a non-configurable part of the API?
if there's something to be improved please open an issue
When using the vm2, it found some bugs in code that doesn't usually run (and yeah I've made them as issues dw)
I thought if I fixed them one after the other, eventually I would get it to compile on the vm branch, but then error messages from cetz appeared and I made a very high pitch whimer noise
I hope the code path analysis can be ported to a linter pr typst itself in some way
that's really valuable imo
Yeah surely this is the start of a linter right dherse :^)
Tried making a graph today with cetz that has a series with 6k data points and it has been over 5 minutes and is still compiling
Cetz being slower than tikz is quite the surprise. Definitely warrants being moved rust-side
That's gonna be slow in a pdf reader regardless
or a super duper fast VM π
Indeed lol
huh, I didn't think of that
could be a neat project
This evening I'll go through more bugs that your VM picks up
Though in the meantime I wonder if it's possible to silence the error or turn it into a warning?
I can't π
It really doesn't support that
because it needs a register to get the data, if it can't, then it crashed
Not to worry then, was only on the off chance that it was an easy thing to do
I guess I could add a panic instruction for this
oddly, slowness seems to be from when things are outside the axes (i.e., it is much faster to render an entire series than to render only a portion of it)
@proven umbra
The path clipping function can be very very very slow. It should be way faster with cetz from main (or cetz-plot (github.com/cetz-package/cetz-plot)), which both hava a fix for that.
BTW @slim sequoia should send you the bug he found: you use a var that isn't declared
in a code path that hasn't been tested I assume
Was an illegal assignment, I'll make a GH issue this evening
ah my bad
Well actually don't
I just had a look at it now that I it's not midnight
it's a VM bug
Ooooo
There are sometimes issues with destructure into arrays/maps
But it doesn't matter since the VM is ||probably|| dead, long live the eval
did somebody say, the VM as a plugin? π
Lmao
Noo
So yeah, item of note for @left night: there is a significant performance regression on The Thesisβ’οΈ when compiling using context, all of the changes are available on my GH (Dherse/masterproef), along with it using the latest version of codly
(I did isolate the performance regression to the use of context in glossary.typ at line 38, it may not appear like a regression because I did another change that speeds things up, but with that change locate is even faster-er too, and the regression is still about the same ~0.5s on my machine in any case)
@left night The plot thickens, with the VM, it's 0.8s faster

All of a sudden, the VM provides a bigger performance boost
what in heaven's name
mind you, 1.4s COLD COMPILE TIME
(in SVG output)
1.7s in PDF output
that's kind of insane
This here is really weird
where in the code? (i.e typst file)
the function calls are equally expensive
but context just somehow takes much longer
/common/glossary.typ:38
weird
the whole later iterations just take much longer for some reason
this is really odd, because the code for calling the function is basically the same (outside of the location arg)
on a plain test file with
= Hello
#let c = context query(heading).first().body
#for _ in range(10) [
#c \
]
context is faster
(a bit)
yeah, it's just the same. as locate also provides context.
With the VM, I don't see that
the only difference is that query() doesn't access the context if the loc is provided
so might be worth testing whether keeping locate but removing that arg changes things
On my end, things look fine even on main
Looks like it might be some time taking issue on macOS?
I must admit, seeing the VM almost 30% faster makes me happy
it's not everywhere


