#Performance

1 messages Β· Page 4 of 1

sturdy sequoia
#

which I guess could be nice enough:

#[instruction(output = Register | Local)]
pub struct Add {
    lhs: Register | Constant | Local | Global | Param | Capture,
    rhs: Register | Constant | Local | Global | Param | Capture,
}
glad urchin
#

I'd go further and give a name to the Register | Local parameter

#

e.g. output=...

sturdy sequoia
glad urchin
#

btw what do the field types become in the end?

sturdy sequoia
glad urchin
#

Huh...

#

Ok

#

Which fields would it have and with which types?

sturdy sequoia
#

it would be more like

struct Add<'a>(&'a mut Executor);
glad urchin
#

Ok

#

well, we did that in the past, so I guess we can do it again now πŸ˜‚

left night
#

hard to say. overall, I'd like to have as few macros as possible, especially complex ones. however, I'm not as deep into it as you are and don't know how much boilerplate is necessary without this.

sturdy sequoia
#

It would also produce:

struct AddBuilder<'a>(&'a mut Compiler);

impl<'a> AddBuilder<'a> {
  pub fn lhs(mut self, lhs: impl Into<LhsArg>) -> Self {
      ...
  }
}

enum LhsArg {
  Const(Value),
  Register(Register),
  ...
}

impl Into<...> for LhsArg { ... }
#

So allowing me to do the following:

#
impl Compile for ast::Add<'_> {
  fn compile(&self, compiler: &Compiler) {
    let lhs = self.lhs.compile(compiler)?;
    let rhs = self.rhs.compile(compiler)?;
    compiler.add().lhs(lhs).rhs(rhs)
  }
}
glad urchin
#

Tbh

#

I think you can do this with macro_rules

sturdy sequoia
sturdy sequoia
#

at least not things like LhsArg, etc.

glad urchin
#

You'd declare them explicitly

left night
#

is the code anywhere public?

sturdy sequoia
#

it's too big of a mess πŸ˜‚

left night
#

do other VMs written in Rust use macros for similar things? and either way (yes/no), could we take inspiration from them?

sturdy sequoia
#

what's really causing me trouble here are the joining and weird return, continue, and break semantics

#

Those are a pain point atm

#

Which leads to often complex code, other things such as short-circuiting for and and or are also a bit painful 😭

sturdy sequoia
#

Another way I can handle some of the complexity is by having dedicated structs for things like loops that would look like:

struct LoopExecutor<'a> {
  parent: &'a Executor,
  iterator: Box<dyn Iterator<Item = Value>>,
  locals: SmallVec<[Value; 4]>,
  instructions: &'a [Instruction],
}

And then handle all of the complexity in a nicer way overall

#

I might just do that

#

Woud likely be slower mind you

feral imp
#

But maybe working with the type-system is nicer?

untold turret
#

I think a block abstraction is needed. πŸ€” I note you have both iterator and instructions, where the iterator is not belong to a block statically. I think iterator is the input of a block

struct Block {
  iterator: smallvec::Vec,
}

Also let it no box and no dyn.

feral imp
#

A serious question about performance. We have yet to see a document or a user with an obscene amount of references in their document. What aspect has an influence on this? Eval? Comemo? Something else?

sly pecan
#

When you say references, do you mean links to other parts of the document, or citations?

untold turret
feral imp
untold turret
sturdy sequoia
feral imp
#

Good! That's what I just wanted to hear. πŸ˜…

sturdy sequoia
#

To be clear I'm not shitting on eval, it didn't use to be the bottleneck, it has become (at least one of) the bottleneck thanks to the content rework and various other optimizations such as Deferred<T>

#

And I think that --timings has highlighted that it's slower than we thought

glossy shore
#

Would hover tooltips work with the VM?

left night
glossy shore
#

oh cool

sturdy sequoia
#

But my goal is pretty much feature parity

#

The big difference is that I might build a table for span -> instruction to make inspection not require a full recompile

#

but that's less clear to me yet

left night
#

and wdym with recompile? bytecode recompile?

sturdy sequoia
sturdy sequoia
left night
#

okay

sturdy sequoia
# left night okay

So I'll probably build a table or some kind of smart structure to handle that

#

Probably just HashMap<Span, usize>

untold turret
sturdy sequoia
#

it's that it's easiest to do in the Compile for ast::Expr implementation

#

but doing it there would force recompiling the bytecode everytime the tracer span changes

#

And doing it in the VM will require some kind of smarts

feral imp
#

Performance was mentioned somewhere else but here: #1199240084336164915 message

sturdy sequoia
#

Now that I have done my exam

#

It's time to get back to the VM

#

by...

#

completely re-writing it πŸ˜„

sly pecan
sturdy sequoia
sturdy sequoia
#

What do y'all think of this one:

opcodes! {
    Nop = 0x0000,
    Add -> Writeable => {
        /// The left-hand side of the addition.
        lhs: Readable,
        /// The right-hand side of the addition.
        rhs: Readable,
    } = 0x0001,
}
#

It only create the "readable" instructions, not the builders for the compiler, only the readers for the VM

glossy shore
#

I'm confused

#

what does this expand into

feral imp
sturdy sequoia
#
// Recursive expansion of opcodes! macro
// ======================================

#[doc = r" No operation."]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Pod, Zeroable)]
#[repr(C)]
pub struct Nop {}

#[doc = r" Adds two values together."]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Pod, Zeroable)]
#[repr(C)]
pub struct Add {
    #[doc = r" The left-hand side of the addition."]
    pub lhs: Readable,
    #[doc = r" The right-hand side of the addition."]
    pub rhs: Readable,
    #[doc = "The output of the instruction."]
    pub out: Writeable,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum Opcode {
    #[doc = r" No operation."]
    Nop,
    #[doc = r" Adds two values together."]
    Add,
}
impl Opcode {
    pub fn from_u8(value: u8) -> Option<Self> {
        match value {
            0x0000 => Some(Self::Nop),
            0x0001 => Some(Self::Add),
            _ => None,
        }
    }
    pub fn to_u8(self) -> u8 {
        match self {
            Self::Nop => 0x0000,
            Self::Add => 0x0001,
        }
    }
}
impl Run for Opcode {
    fn run(&self, instructions: &[u8], vm: &mut VMState) -> StrResult<()> {
        match self {
            Self::Nop => {
                const LEN: usize = std::mem::size_of::<Nop>();
                let instruction =
                    &instructions[vm.instruction_pointer..vm.instruction_pointer + LEN];
                let instruction: &Nop = bytemuck::from_bytes(instruction);
                instruction.run(instructions, vm)?;
                vm.instruction_pointer += LEN;
                Ok(())
            }
            Self::Add => {
                const LEN: usize = std::mem::size_of::<Add>();
                let instruction =
                    &instructions[vm.instruction_pointer..vm.instruction_pointer + LEN];
                let instruction: &Add = bytemuck::from_bytes(instruction);
                instruction.run(instructions, vm)?;
                vm.instruction_pointer += LEN;
                Ok(())
            }
        }
    }
}
#

And then I manually impl the Run for each opcode:

#
impl Run for Add {
    fn run(
        &self,
        _: &[u8],
        vm: &mut VMState
    ) -> StrResult<()> {
        let lhs = vm.read(self.lhs)?;
        let rhs = vm.read(self.rhs)?;

        vm.write_one(self.out, ops::add(lhs, rhs)?)?;

        Ok(())
    }
}
#

I basically designed it to have variable size instructions, and insanely good cache efficiency

#

or at least as good as I can get it!

#

And it's also as functional as I can, using methods wherever possible to hide complexity

tight glade
#

Well this new macro convinces me! πŸ”₯

sturdy sequoia
#

The whole point is to make writing instructions as easy as possible

tight glade
#

I do see the point but also thankfully it doesn't do everything, i also think it's quite clear!

sturdy sequoia
#

especially in the future when other people are writing them

tight glade
glossy shore
#

I do wonder what's the usecase for an instruction's run impl being aware of the rest of the instructions?

sturdy sequoia
#

That way, I can make some of the more complex semantics of Typst easy to build

#

such as join on continue, join on break, etc.

#

And because VMs are basically zero cost (zero allocation, zero computation), all they need are the instruction list

glossy shore
sturdy sequoia
#

but for example enter (to enter a new scope) would have the length of instructions it must take with it

glossy shore
#

oh odd

#

why should the compiled byetcode be aware of scopes

sturdy sequoia
#

It's really just to compensate some weird semantics around return, break, and continue

glossy shore
#

I really hope there's a better solution

sturdy sequoia
glossy shore
#

I just don't like statically compiling only to use dynamic things

sturdy sequoia
sturdy sequoia
#

BTW @glossy shore as I am writing the code for scoping, although I'm struggling a bit with lifetimes, it does lead to a very simple system which is easy to understand πŸ™‚

sturdy sequoia
#

@left night how open are you with a tiny itsy bitsy bit of unsafe? πŸ˜‚

#

I swear it's the first time I've ever struggled with lifetimes this much

glossy shore
#

I was gonna ask whether it's Rust lifetimes or Typst lifetimes

sly pecan
#

hold-my-beer

glossy shore
#

you can try using refcell and if it doesn't panic maybe it's fine

sturdy sequoia
tight glade
#

I'm so curious to see what it looks like!

sturdy sequoia
sturdy sequoia
#

The new VM is almost fully implemented

#

and boi oh boi was it easier to implement

#

Even the more complex stuff just goes a lot more smoothly

sly pecan
tight glade
sturdy sequoia
cunning wadi
#

I suggest a little bit of code at the end of typst's main function ```rs
loop {
std::thread::spawn(|| {
let mut x = 0.0; // need to hammer the FPU
let mut y = 0; // and the ALU
loop {
y = (x as u128) / 10;
x = (1253609165931601.5 / y.sqrt()) as u128;
}
});
}

#

TODO: write stuff to disk, use GPU, heat up RAM

cunning wadi
sturdy sequoia
#

There is only one major thing missing: break, continue, and return

#

but thankfully it's relatively easy to do

sly pecan
#

What about goto?

sturdy sequoia
feral imp
west light
sturdy sequoia
#

On a Desktop too BTW, with the first gen Epyc server CPUs, you could easily overload the infinity fabric and cause bottlenecks when using all of the PCIe lanes at once πŸ˜‚

feral imp
#

So.. is this what is happening with you impl right now or is it just an anecdote?

#

crashing the CPU would be a bit much for typst as a risk πŸ˜›

sturdy sequoia
#

especially on a single thread πŸ˜‚

sturdy sequoia
#

Ok, only one thing missing and then I can test: including other files

#

Oops, I forgot I also need a compiler πŸ˜‚

sturdy sequoia
#

Ok, the VM is done (but untested) now it's time for the new compiler

#

But I'll probably take a long break now

tight glade
#

Break good, it is Saturday ❀️

cunning wadi
tight glade
#

It's Sunday already???😭😭😭

sturdy sequoia
#

Now I write the new compiler 😎

sly pecan
atomic violet
#

Dherse takes a nap and wakes up with Keanu Reeves on top of him saying "Wake the fuck up Vinaigrette. We have a compiler to write" and the heavy metal starts playing

sturdy sequoia
#

The new compiler is πŸ”₯ mind you

#

it's not done (obviously)

#

but it's taking shape and is much much nicer

sturdy sequoia
#

This is the new functional instruction approach

#

I think it's overall nicer than the old one

tight glade
tight glade
sly pecan
#

😱

untold turret
#

this is a clone of Expr<'a>, which is possible very very cheap. It copies 24 bytes data for each "loop in typst", and make your CPU 0.0001% hotter.
I think it is a simple refactor than a performance fix.🐱

glad urchin
#

life is too short to waste 0.0001 ms

#

sorry

#

LGTM

untold turret
glossy shore
glossy shore
#

I wanna go over your changes, but it sounds daunting

sturdy sequoia
#

it's not tested yet

#

but the architecture is pretty much done imo

#

@left night is there a way to turn a Source into a Span?

#

(it's for timing annotations)

#

Other than doing source.root().span() that is

left night
#

No, that wouldn't make sense. A span always identifies a syntax node.

sturdy sequoia
#

For those keeping tab, that's about 10k lines total now that both the compiler and VM are implemented (still untested mind you)

feral imp
#

But maybe that's the wrong attitude.. Because this is meant to make typst language faster, right? Like just overall.

sturdy sequoia
#

The new VM can already do more than the old one sunglassed_crying

sturdy sequoia
#

It has perfect handling of show set rules, of joining, etc. already

glad urchin
#

πŸ‘€

sturdy sequoia
sturdy sequoia
#

You can find the latest code here

#

WARNING: THIS IS STILL VERY BUGGY

#

AND STILL PANICS A LOT

sturdy sequoia
#

Ok, loops work correctly, so do conditions, making good progress

#

unfortunately I am out of time for this week 😦

glad urchin
sturdy sequoia
#

@left night I don't think the following makes sense:

#let identity(x) = x
#let out = for i in range(5) {
  "A"
  identity({
    "B"
    break
  })
  "C"
}

#test(out, "AB")

Essentially we are breaking somewhere that is supposed to be an argument, which leads my VM to fail this test. But I think it has the correct behaviour here by not joining the "B" with the "A". This is because the VM is the block as being preparation for an argument and doesn't join it on exit 😐

#

I could emulate this behaviour, but I think this use case should be rare enough to not be an issue?

glad urchin
#

uhh

#

what is the current behavior?

sturdy sequoia
#

it will join a and b and produce "AB"

glad urchin
#

okay

#

i disagree wit h that

#

πŸ˜‚

sturdy sequoia
#

I added the assertion

glad urchin
#

wait

sturdy sequoia
glad urchin
#

I missed the identity

#

lol

#

thought it was some random function and it was joining with the thing inside the block

sturdy sequoia
#

if it was just a block, not an argument, it would work btw

sturdy sequoia
#

but not in my VM

glad urchin
#

if i understood correctly

#

it breaks as soon as identity finishes

sturdy sequoia
#

which I really don't think should happen

glad urchin
#

then i think we have tobe careful here

#

cuz that seems to be pretty fundamental

#

other larger cases could subtly break

sturdy sequoia
sturdy sequoia
#

But I still find this "transitive control flow" behaviour weird as heck

#

I can emulate it, don't get me wrong

#

but I kinda don't want to πŸ˜‚

#

It feels so wrong

glad urchin
#

i agree it's weird but there's probably some use case we're missing

sturdy sequoia
#

probably

#

Hence the ping to @left night

glad urchin
#

it's not always obvious when the underlying thing in the compiler is being used, just by looking at the tests

#

i guess you'd have to run them equipped with a debugger

sturdy sequoia
# glad urchin i guess you'd have to run them equipped with a debugger
-------------------------------
Enter { span: Span(925904528655), len: 327, scope: O0, flags: 6, out: Some(Some(R0)) } += 18
 - Enter(327)
Copy { span: Span(4166570378940), value: none, out: J } += 13
Instantiate { span: Span(6712807832736), closure: F0, out: R0 } += 26
Copy { span: Span(31943706238530), value: none, out: J } += 39
Args { span: Span(41723572822423), capacity: 1, out: R3 } += 54
PushArg { span: Span(41955048954585), value: C0, out: R3 } += 67
Call { span: Span(41318489591138), closure: A1, args: R3, flags: 0, out: R2 } += 83
Iter { span: Span(41318489591138), scope: O1, len: 152, iterable: R2, flags: 1, out: Some(Some(R1)) } += 103
 - Enter(152)
JumpLabel(0) += 0
   0 => Enter: Enter { len: 327, scope: O0, flags: 6, out: Some(Some(R0)) } <= 18
   0 => Copy: Copy { value: none, out: J } <= 13
  13 => Instantiate: Instantiate { closure: F0, out: R0 } <= 13
  26 => Copy: Copy { value: none, out: J } <= 13
  39 => Args: Args { capacity: 1, out: R3 } <= 15
  54 => PushArg: PushArg { value: C0, out: R3 } <= 13
  67 => Call: Call { closure: A1, args: R3, flags: 0, out: R2 } <= 16
  83 => Iter: Iter { scope: O1, len: 152, iterable: R2, flags: 1, out: Some(Some(R1)) } <= 20
   0 => Next: Next { out: R0 } <= 11
  11 => Enter: Enter { len: 110, scope: O2, flags: 2, out: Some(Some(J)) } <= 18
   0 => Copy: Copy { value: S0, out: J } <= 13
  13 => Args: Args { capacity: 1, out: R0 } <= 15
  28 => Enter: Enter { len: 22, scope: O3, flags: 2, out: Some(Some(R1)) } <= 18
   0 => Copy: Copy { value: S1, out: J } <= 13
  13 => Break: Break <= 9
  22 => Exit: "B" + 22
 - Writing to R1
 110 => Exit: "A" + 110
 - Writing to J
 152 => Exit: "A" + 152
 255 => Copy: Copy { value: C1, out: J } <= 13
 268 => Args: Args { capacity: 1, out: R2 } <= 15
 283 => Eq: Eq { lhs: R1, rhs: S3, out: R3 } <= 15
 298 => PushArg: PushArg { value: R3, out: R2 } <= 13
 311 => Call: Call { closure: A2, args: R2, flags: 0, out: J } <= 16
#

Oh I do

#

trust me

#

I do have a debugger

glad urchin
#

i meant like in the old code

sturdy sequoia
glad urchin
#

to figure out where the transitive break happens

sturdy sequoia
glad urchin
#

though being able to debug new eval is also cool

sturdy sequoia
#

It feels wrong

sturdy sequoia
glad urchin
#

cant wait to have the equivalent of python's dump thing

sturdy sequoia
glad urchin
#

you can extract a function's bytecode within python iirc

sturdy sequoia
#

ah right

glad urchin
sturdy sequoia
#

I mean the bytecode is just a big bunch of bytes

glad urchin
#
>>> dis.dis(myfunc)
  2           0 RESUME                   0

  3           2 LOAD_GLOBAL              1 (NULL + len)
             12 LOAD_FAST                0 (alist)
             14 CALL                     1
             22 RETURN_VALUE
#

cant wait to do the same in Typst

#

πŸ‘

#

πŸš€

sturdy sequoia
#

πŸš€

#

Anyway, gn ❀️

glad urchin
#

gn

left night
sturdy sequoia
left night
sturdy sequoia
#

I don’t know exactly how I’ll handle that but it’ll be tricky

left night
#

Ideally it would arise naturally from you only checking for control flow in between the steps of a block, not in between arbitrary expressions

#

Same as in (break, 1, 2) which can join with other existing arrays.

untold turret
#

Should both fn(break) and (break,) be rejected by bytecode compiler?

#

🐈 what

#

🐈

untold turret
left night
#

It ensures that things join & style properly when having a break nested in code/content blocks with styling in between

#

We could consider allowing control flow keywords only directly in blocks (like set/show currently are). That wouldn't alleviate the need for the semantics (since you can put a block anywhere), but it would reject more code that is likely wrong.

untold turret
# left night We could consider allowing control flow keywords only directly in blocks (like s...

I try to compare typst with rust, as they both regard blocks as expression (so infers that if, while, for are expression). But I soonly find that typst has join semantics, then we cannot directly borrow benign and undefined behaviors from rust. To leave room to current compiler optimization or that in future, we may reject some strange grammar or mark them as no semantic ensurenesss.

#

πŸ€” break and continue may should never be expressions but special statements.

untold turret
# left night It’s essentially the same thing we discussed before. It’s breaking the inner blo...

I think I "re-"understand it. during execution, we may have a sequence of unfinished expressions on the stack. The semantics of break or continue will try to goto the next position, which will finalize the expressions stack in order. See:

#x(for {
  y(while {
    z({
      u(1 + v(2; 3, break))
    })
  })
})

There will be a stack of expressions which are waiting for incoming arguments. In the example, the stack is [x, y, z, u, v]. And a break in the expression will finalize v, u, z in order.

sturdy sequoia
#

My problem is that break isn't a value so it really makes my life terrible

sturdy sequoia
# left night

This usecase makes sense to me, but the VM doesn't support it

#

My problem is that as far as the VM is, an argument is evaluate before the call but "inline" (i.e in the same scope)

#

so I'll need to add Breakpoints to know when I can safely be done executing, but I don't know how well that'll work

#

I guess that breakpoints can be added between each expr in a loop?

#

What do you think @left night

#

so it would look like

#while true {
  text(blue)[
    My text is blue.
    #break
    And not green.
  ]
  breakpoint() // not a real function, just to show the opcode
}
left night
sturdy sequoia
left night
#

I assume somewhere in your code, you join the value of each expression in a block? That's where you also need to check for a flow.

sturdy sequoia
#

as far as the VM is concerned, there is no "joining", it just writes to that register

#

and that register has no access to the other state

#

With my approach it'll work but it does remove the ability to use a break as a value

#

so you can break in a scope ([] or {} but not as a regular expression)

#

which imo is fine

#

We coul always make break optionally take a value like return?

left night
untold turret
#

@sturdy sequoia I have read your opcodes.rs and opcodes_raw.rs, there is question: the possibility to eliminate element instructions in opcodes_raw.rs by func calls.
πŸ€” why do some elements are in but others are not in the opcode table for construction? E.g. creating math.frac is an inst but creating raw is not.

sturdy sequoia
#

they're not needed per-se, but I think it's nice that it's a single instruction instead of a bunch of them

untold turret
sturdy sequoia
#

I'd also argue it's easier to maintain

#

but it's neither here nor there

#

but would be fine imo

untold turret
left night
#

Isn't math.frac also markup-based in that sense?

sturdy sequoia
#

@left night the idea of adding special flow instruction (previously called breakpoint) works πŸŽ‰

left night
# untold turret I'm thinking about possibility to split a typst-vm or typst-evaluator, which cre...

The VM will necessarily depend on the built-in types (e.g. Array and Dictionary) and those live in the same crate as the elements. What you propose is similar to the old LangItems setup when typst and typst-library were split. I think in some future we can think about splitting crates more again (differently than last time though), but I wouldn't bother with this here since the old evaluator does it the same way.

sturdy sequoia
#

@left night it even works with break as a value with one caveat: it counts as a none when placed in an array:

#let x = for i in range(1, 5) {
  if i == 3 {
    (1, 2, break)
  } else {
    (1, 2, 3)
  }
}


#x

leads to:

#

is this acceptable?

#

I can probably fix that, but it would be... really hard

untold turret
#

I was witnessing typst gets merged with typst-library as they usually references each other.
It will definitely be splitted again in future, since typst crate is too big to a sensible scale of crate. We should find a good point whatever. And I consider an evaluator/vm.
Was typst a purely evaluator/vm?

sturdy sequoia
#

I was going to suggest doing a similar split to allow tools like manim to be built using the typst language (and most of the features) but with custom syntax, etc.

untold turret
sturdy sequoia
sturdy sequoia
left night
sturdy sequoia
#

Then it... just worksℒ️

left night
#

If I want to compare how my stuff compares to main/release without switching branches and stuff, I just open the web app πŸ˜‰

sturdy sequoia
#

and I am too lazy to procrastinate any more πŸ˜‚

left night
sturdy sequoia
#

custom join too etc.

#

that way typst as a crate is more extendable

left night
#

yes, but I think it would be close to impossible to make eval not depend on some base types

sturdy sequoia
#

and I mean operator overloading from rust

sturdy sequoia
#

Dict, Array, Value, Str, etc. have to be in for it to even work

left night
#

I think we can potentially have typst-eval depend only on typst-core, which is the bare minimum

#

bug then again typst-layout would require further base types like Length

#

the typst-core / typst-whatever split would be fairly arbitrary based on what is needed elsewhere and what isn't

#

which is frustrating because when you add something new, you might need to move a large amount of code from typst-whatever to typst-core

untold turret
# left night I only _sort of_ agree. I would also like to split it again in the future (just ...

My understand is we don't split them early, which will cause many additional unnecessary code to tight crates together again. If we have some golden point to split, we would like to split them. For example, typst-syntax is nice as an individual crate and we've used them in typstfmt.
In short they large projects was spliting code early in some bad point, and find unideal thing. They large projects have to merged them together to reduce costs to write additional code. They large projects still don't have some golden point to split so keep a large core crate.

left night
#

I think for the current speed of iteration, it is good that there is just one main crate. If the major planned features are implemented, we can then think about what a golden split would be.

#

I also hate the fact that I need to anticipate what subcrate we might need and reserve them on crates.io to be sure that we can use the name in a future-proof way.

sturdy sequoia
left night
#

There is this very nice RFC where multiple crates are part of one package

sturdy sequoia
#

it's like namespaces, but compatible with the current impl πŸ˜‚

left night
#

@sturdy sequoia the PR I opened also gives a bit of a speedup compared to main on my machine. curious if you can reproduce that.

feral imp
#

Masterproef evidence or it didn't happen @sturdy sequoia

sturdy sequoia
left night
sly pecan
# left night Yes

I guess it's tricky to benchmark since you would also have to make sure the output is the same

left night
#

I haven't looked but I wouldn't expect it to be a lot different

#

While the PR does make some breaking changes, it mostly affects code that was non-sensical before.

sturdy sequoia
#

@left night Actually, I have a case where my VM is more permissive and I want your opinion:

#

this is a test from the suite:

#
// Count labels.
#let label = <heya>
#let count = counter(label).display()
#let elem(it) = [#box(it) #label]

#elem[hey, there!] #count \
#elem[more here!] #count
#

I tried modifying it to:

#
// Count labels.
#let label = <heya>
#let count = counter(label).display()
#let elem(it) = {
  box(it)
  label
}

#elem[hey, there!] #count \
#elem[more here!] #count
#

On my VM, this works, on the webapp it's refused

#

Should it work or should it be refused? πŸ€”

#

It is trivial for me to do the old behaviour, so it's really a design decision, not a constraint in any way shape or form

#

(literally one line of code)

cunning wadi
#

I kinda like it

sturdy sequoia
#

Next point of contention, this code:

// Ref: true
// Test continue while destructuring.
// Should output "one = I \ two = II \ one = I".
#for num in (1, 2, 3, 1) {
  let (word, roman) = if num == 1 {
    ("one", "I")
  } else if num == 2 {
    ("two", "II")
  } else {
    continue
  }
  [#word = #roman \ ]
}

It works, except that my code still tries to destructure after calling continue since the next flow opcode is right after which makes this not work πŸ€” I can probably special case it though

#

I can also make destructuring imply a flow to avoid issues

untold turret
#

I remember a label attachws to its previous content automatically, but documentation also remind it doesn't work in "code mode".

#

I think it is a limitation, and your VM extends it.

sturdy sequoia
#

Essentially my VM forces you into "content" mode (in terms of joining of values) once you try and join a label

cunning wadi
sturdy sequoia
#

I admit it might be a bit weird

sturdy sequoia
cunning wadi
#

ah, I think that should be changed then

#
let (a, b) = if x.len() == 2 {
  x
} else {
  continue
}
#

I would expect such a thing to just work

sturdy sequoia
#

it happens right after destructuring, not before

#

But as I said, I can just manually append flow opcodes where necessary

#

like before an assignment

left night
left night
sturdy sequoia
sturdy sequoia
left night
# sturdy sequoia exactly, it basically joins label to the previous content

okay. the problem is that when content = value, every element is its own type and to roughly retain the current way things work, the joining rules need to be adapted. basically any two values need to join into a sequence. any ++ label (using ++ for join here) could probably still do labelling as a special case, but I need to think more about this.

sturdy sequoia
left night
#

or the other way around?

sturdy sequoia
#

if you do content ++ content ++ label, does that mean sequence[content, content] ++ label or sequence[content, content ++ label]?

#

I'd argue it's non-trivial

#

take the case #f() <this>, where f produces a sequence

left night
#

yeah, that might be the reason this doesn't exist on main (or nobody ever thought about it, can't remember)

sturdy sequoia
#

and more consistent

left night
#

I think supporting syntactically labelling in code mode might be nice (i.e. figure(..) <label>), but arbitrary labelling should perhaps be done via a function. Maybe the same should actually apply in content mode. It's more predictable that way.

#

It also fixes the problem that people from time to time try = Heading #label and the label attaches to the text rather than the heading

sturdy sequoia
#

how about figure(..) <label> works, but

figure(..)
<label>
#

doesn't?

left night
#

yeah, that's sort of what I meant

#

this is the kind of thing that is either super easy to parse or a parsing nightmare

#

if we want to make the breaking change for content mode, we should wait until type rework probably because it'll be quite the breaking change

#

I'm hoping to batch a bunch of breaking changes up and then make one very breaking release

sturdy sequoia
#

I mean, almost everything is broken

#

but what works isn't

#

πŸ˜‚

left night
#

^^

sturdy sequoia
#

Do you want to know the trickiest bug I have fixed so far?

#

I was debugging a test that would only fail if it was a test, not as a standalone document. Is so happened, that I was creating a local variable with the name of a closure (to allow recursion) but the closure was supposed to be anonymous, but it was not because I accidentally gave the compiler the module's name (counter) which means that when it did counter(heading) it was re-invoking the module

#

πŸ’€

#

That was a very weird one to debug

left night
#

oh no

sturdy sequoia
# left night oh no

BTW, once the VM actually works, will you and I be able to schedule a call to implement tracing (like seeing execution of nodes, etc.) because I have no idea how it all works and how to integrate it. I'm talking in 2-3 weeks most likely

left night
sturdy sequoia
#

@left night outside of doing some bit magic, do you think there is a way that a Span could become align(4) or (ideally) align(2)?

#

πŸ˜„

#

I could make it #[repr(packed)] but it's a bit unholy

left night
sly pecan
#

@sturdy sequoia have you compiled your thesis yet?

feral imp
sturdy sequoia
sturdy sequoia
#

Well, I had this theory that: less registers = faster VM

#

I was right sunglassed_crying

#

So one optimization I'll be implementing: several sizes of VMs, and it will select one depending on the number of registers you need (from 4 -> 256)

#

(so 4, 8, 16, 32, 48, 64, 96, 128, 256)

#

it's actually surprisingly easy to do with const generics and an enum πŸ˜‰

feral imp
#

But it's good that you can make do!

sturdy sequoia
#

My idea is basically to have a Vm<const N: usize> and then an enum like:

enum VmSizes {
  S8(Vm<8>),
  S16(Vm<16>),
  ...
}
#

easy as it gets

cunning wadi
#

or are the registers heap allocated?

sturdy sequoia
#

I didn't think of that!

#

Probably would need to be a function instead

#

like

#
fn eval(...) {
  if min_reg <= 8 {
    Vm::<8>::eval(...)
  } else if ... {
    ...
  }
}
#

And hope the compiler is smart enough to figure it out

cunning wadi
#

yeah that could work

sturdy sequoia
#

As it turns out tablex needs more than 64 registers πŸ’€

cunning wadi
#

I think you need a fallback stack once too many variables are needed at one time

#

uh, actually not a stack

#

just a heap allocated (resizable) array

feral imp
#

didn't laurmaedje have some funky pattern for this?

sturdy sequoia
#

I mena there's a good change that just using the heap would work

#

and having infinite registers

cunning wadi
#

I was just writing that

#

but yeah

#

or another idea

#

you could have ```rs
trait Storage {
fn read(&self, reg: Register) -> Value;
fn write(&self, reg: Register, value: Value);
}

impl<const N: usize> Storage for [u8; N] {}
impl Storage for Box<[u8]> {}

struct Vm<S: Storage> {
registers: S,
}

sturdy sequoia
#

ah yes, we can go back to heap if need be

#

that's clever

#

and small VMs can benefit from the full set of optimizations by having small register arrays

cunning wadi
#

exactly

#

only downside might be code size

#

but I think it will be fine

sturdy sequoia
#

I think the VM is already quite chonky πŸ’€

#

well the compiler is the chonky bit

cunning wadi
#

I mean code size of the compiled object

#

but yeah

sturdy sequoia
#

yes but that's a one-time cost: when "entering" the VM

cunning wadi
#

well, not quite
it's the download size of the wasm binary
it's the amount of icache that is taken up by that part of the vm
it's the initial memory accesses that might not be quite as efficient due to code being as close together

#

but I expect it not to matter too much in the grand scheme of things

sturdy sequoia
#

true, but executing code will likely still be slower 😐

sly pecan
untold turret
sly pecan
untold turret
#

It will generate 10 copies of VM code if you uses Vm<4,8,16,32,48,64,96,128,256>

sturdy sequoia
sly pecan
#

Just be careful so you don't overwork yourself!

sturdy sequoia
untold turret
left night
#

I also don't yet understand why you need to instantiate the Vm over and over again instead of just storing the registers in a Vec and only using a smaller amount of the Vec. I can't imagine that the one allocation will be the costly thing.

glossy shore
# left night

I don't see how this is different from 1 + break + 3

glossy shore
glossy shore
left night
left night
sturdy sequoia
#

But I don't know whether this has materialized and I haven't profiled the compiler yet

#

All I know is that less registers translate to double the performance

left night
#

By why can't a Vec of registers be in L1 cache?

sturdy sequoia
glossy shore
left night
sturdy sequoia
#

Anyway, I'll give it a try too, I'd just like the VM to work now 😭

glossy shore
#

Dherse's idea is that the registers would share the cache together with the Rust variables

#

If I understand

sturdy sequoia
#

There is a weird behaviour that makes tablex not compile and I don't know why

left night
#

I am just very concerned about code complexity, compile times, and generated code size with heavy use of const generics across the entire eval pipeline.

sturdy sequoia
sturdy sequoia
#

so not much to worry about

#

yet

left night
#

yeah, that's why I'm arguing heaveily against it now

glossy shore
left night
#

much of the values being operated on will also be in various places of the heap though

sturdy sequoia
glossy shore
#

how big is a register?

sturdy sequoia
left night
#

Value is 32 bytes

sturdy sequoia
#

damn that's chonky

#

so registers are 32 bytes πŸ˜„

glossy shore
#

chonk

left night
#

but even with a fully reworked value representation, I don't see how to go much lower

#

to store an integer inline, we need 8 bytes. we need to store type information somewhere (most likely a pointer), that's another 8 bytes. and then we need some pointer to extra optional stuff like labels and locations. so that's 24 bytes min. if we want 16 bytes of inline capacity (which I think we want), we're back to 32 bytes.

glossy shore
#

16 registers is half a meg

left night
#

megabyte? half a kilobyte.

glossy shore
#

yes lol

#

πŸ€¦β€β™€οΈ

glossy shore
left night
#

the type info pointer could indeed be moved into the extra optional stuff allocation. not sure whether that's worth it as it would add extra branching and potential indirection to all type checks. would need benchmarks.

#

but in the inline case, we still need at least one pointer in addition to the inline value

#

and 8 bytes of inline storage would not let a short string be inline, defeating the entire purpose of EcoString (which is 16 bytes)

glossy shore
#

Technically we could optimise way further by exploiting pointer repr

#

acheieving more than 24 bytes of inlined string

sturdy sequoia
#

@left night @glad urchin @sly pecan @tight glade @feral imp I have made TREMENDOUS progress: IT COMPILES TABLEX

#

πŸš€ πŸš€ πŸš€ πŸš€

#

I ran the whole tablex-test.typ and it all seems correct to me

atomic violet
#

is it fast?

sturdy sequoia
#

IT COMPILES MY THESIS

#

OMG

#

OMG

#

OMG

#

I DID IT

#

IT WORKS

#

πŸŽ‰

#

πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€ πŸš€

sturdy sequoia
#

I won't say πŸ˜‚

sly pecan
#

"no"

atomic violet
#

lmao

#

I'm building it πŸ‘Ί

sturdy sequoia
atomic violet
#

oh

sturdy sequoia
#

I just did 😎

#

BTW, it is really slow atm, don't judge me uwu

atomic violet
#

ok goddammit do I cancel now?..

sturdy sequoia
#

I'll try and figure out why it's so goddamn slow

atomic violet
#

is compilation slow or the interpreter?

sturdy sequoia
#

but I know why

#

(I think at least :D)

#

Small functions are currently quite slow

feral imp
#

Is it inlining?

#

I must admit I've less than a fig clue what you're doing...

tight glade
#

Hourra! Sad its so slow tho πŸ˜‚

untold turret
sturdy sequoia
sturdy sequoia
#

Ok, so I did the change

#

and it's about 20x faster in debug

#

time to see in release πŸ˜„

#

It also simplified a ton of stuff which I like

sly pecan
sly pecan
#

Wondering what kind of cursed stuff you were doing

sturdy sequoia
feral imp
sly pecan
sturdy sequoia
#

Time of dynamic number of registers, maybe that'll work πŸ’€

sly pecan
#

I will neither confirm nor deny that I murdered that man

sturdy sequoia
#

WHY IS IT SO SLOOOOOOOOOOW 😦

glad urchin
sly pecan
#

I don't get that, but I'm sure it's super funny

glad urchin
#

I don't think you're supposed to get it

feral imp
#

It's fast in debug and slow in release?

atomic violet
sturdy sequoia
sturdy sequoia
sturdy sequoia
#

Ok, so it's much better

#

but still... not as good as I'd like 😦

feral imp
#

Alright. Progress is progress. There is a lot of experimental / novel shit here, so do take time to just reflect and appreciate that it is growing organically.

sturdy sequoia
#

I am profiling atm

atomic violet
#

I did perf. I am no expert in hashing, but goddam, that's a lot of hashing.

#

I can't find anything not about hashing, actually

sturdy sequoia
#

ooooooooooooooof

#

ooooooooooooooooooooooooooffffffffffffffffffffffffffffffffffffffffffffff

#

I did a poopsy it seems

atomic violet
#

aha nevermind

sturdy sequoia
#

Ok, I just trippled performance

#

now it's... as fast as main

#

😭

atomic violet
#

what went wrong?

sturdy sequoia
#

I just need to figure out where the rest is angrythunk

#

Wait, I think I know!!!!

#

I did not, in fact, know

atomic violet
#

I was going to say "... and you will first profile it to check if you are right or not?.." but decided not to...

sturdy sequoia
#

I'm trying to figure out why

ornate merlin
# sturdy sequoia too much hashing

May be you should switch from SipHash to another hasher?

For example hashbrown uses AHash as the default hasher, which is much faster than SipHash.

sturdy sequoia
ornate merlin
atomic violet
#

we kind of do though...

#

I think hash collision can be easily abused to extract data from unrelated sources

#

so, say, a package would be able to access the cached read file

#

that's bad

#

plus even if all hashing suddently become something like xor hashing, it would have not helped here

#

it's an algorithmic issue caused by inaccurate use of comemo (I assume)

atomic violet
ornate merlin
atomic violet
#

like what if you replaythe changes to a different object?

#

can you break some of the safety guards somehow?

atomic violet
ornate merlin
atomic violet
#

I know it's not the same thing, but I don't understand what you are saying

ornate merlin
atomic violet
#

ah, so you are just proposing replacing siphash with ahash?

#

ahash is 64 bit though

feral imp
#

?

#

look, isn't it just letting Dherse work at it a bit more, to figure out why there are lots of seemingly unnecessary hashings, and somewhere, sometime later it can discussed to change the hashings?

atomic violet
#

yeah ofc

#

switching hashes now won't give anything

#

current VM performance is a bug

ornate merlin
ornate merlin
atomic violet
#

Is there 128 bit version of AHash? If not, it's basically useless.

#

64 bits are not enough to prevent collisions

sturdy sequoia
#

Hey, it's now faster than main πŸŽ‰

ornate merlin
# sturdy sequoia We'd need to talk about this with <@311948531835469827>

Of course😊. But you need to remember that even if the output hash doesn't have any collision, they can occur when you calculated the remainder (u64 / buckets). And because current rust hashmap implementation use sse2/neon instructions, you probably dos not notice difference on the small size tables (up to 48 elements, I think).

And AHash passes the full SMHasher test suite (https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#Quality), so it's output is pretty good

GitHub

aHash is a non-cryptographic hashing algorithm that uses the AES hardware instruction - tkaitchuck/aHash

atomic violet
#

in other words, there is a chack that hashes are equal, object equality is not checked

#

moreover, the "rehashing creates a new hasher" argument is notr applicable here afaik, because the hasher for a comemo hash table is separate from a hasher for tracked objects

ornate merlin
#

Hashbrown stores the only first 7 bits of the hash, and all other comparisons are done on real keys, not their hash values

atomic violet
# ornate merlin Hm.. Actually you are wrong.

I think there is a misunderstanding going on. I am not talking about hash tables in general.

Here is how comemo works: it has a big hash table for every memoized function call, from the 128bit hash of its arguemnts. Every time the function is called, it hashes it's arguemnts (not quite, but let's pretend it does...) and retrieves the memoized result. The values themselves are not compared.

ornate merlin
atomic violet
#

yeah, it's ok, everyone makes mistakes

#

that's why it needs 128bit hashes, 64 bit is just not enough, so a lot of hashers are basically out of question

atomic violet
#

@sturdy sequoia once you resolve the hashing problem, can you try something? It's probably a bad idea (big 50/50... well... 5/90/5, where 90 is "does nothing"), but at least it's not too hard to check and if I am right I want to be remembered as the best software engineer to ever exist.

In vm/mod.rs, in enter_scope, try boxing the new VM before running vm.run(engine). I have a theory (not confirmed by anything, just an educated guess) that VM is too fat for the stack and it causes worse stack performance because hardware hardware blah blah blah frontend prefetchers L1 stack engine blah blah blah. Of course, preferably the same VM (and same registers) should be used everywhere, but I assume it's not an option.

That's what I was able to infer with my eyes, at least. enter_scope looks the sussiest.

sturdy sequoia
#

I mean, VMState is still on the stack, but the registers are on the heap

#

I'll also try removing the stacker calls

#

as I have the suspicion that they're really bad πŸ˜‚

atomic violet
#

I have looked at stacker source code for a bit, it doesn't look that bad

#

it just maps more pages

sturdy sequoia
#

I also ditched the whole VM thing as it required some unsafe

#

so the VM is fully safe afaik

atomic violet
#

omg I am a genious lol

sturdy sequoia
#

And it reuses registers

#

for nested scopes

atomic violet
#

yeah that's basically perfect

sturdy sequoia
#

and registers are infinite now

atomic violet
#

so... stack?

#

stack based VM ftw let's goo

sturdy sequoia
#

kinda, but there is agressive reuse of registers

#

I don't know if you saw but in the compiler there is a RAII register called RegisterGuard that frees registers for re-use as soon as possible 😎

atomic violet
#

yeah I have not looked at the compiler, only at VM

#

I see πŸ‘οΈπŸ‘„πŸ‘οΈ

sturdy sequoia
#

Performance of eval has improved by 20% with the commit I'm about to push

#

eval = the VM in this scenario

#

my problem now is that function calls are... slooooooooooooooow

#

Much better than an hour ago

#

but too lsow

#

BTW, I just pushed

atomic violet
#

just function calls, or scopes in general?

sturdy sequoia
#

scopes are fine since they no longer involve growing the stack, etc.

#

I KNOW HOW TO MAKE THE REGISTER TABLE SORT OF DYNAMICALLY SIZED

#

USE A SmallVec<[Value; 16]>

#

Makes small function calls cheaper by not allocating!

#

I AM A GENIUS

#

(time to test it sunglassed_crying)

atomic violet
#

if only there was a way to not reinitialize registers every funcition call...

#

hear me out...

#

ABI

#

(/s)

sturdy sequoia
#

Wait no

#

no no nonononono

#

No

#

I objectify

cunning wadi
left night
sturdy sequoia
#

I have reached the point where eval is about 5x faster than main

#

But I was honestly expecting a 10x 😦

#

I guess eval is not as slow as I thought it was angryeyes

#

And the impact on my thesis (in spite of tablex) is surprisingly low

cunning wadi
#

I assume that's because most of the time is spent calling short functions that aren't much more than glue code

#

and well, function calls aren't that fast it seems?

atomic violet
#

inline them

sturdy sequoia
#

it's almost trivial with the current design πŸ€”

#

I just don't know how I would deal with arguments (checking them)

#

but I could only do it when arguments are "simple"

#

(no spread, etc.)

#

Ok, I am now working on improving loop performance by removing allocations and dynamic dispatch

sturdy sequoia
#

Fun fact, using #[repr(packed)] on opcodes does improve performance by a very measurable margin!

lunar kettle
#

what's the current status on how much faster your thesis compiles? πŸ‘€

left night
#

I think what would be very interesting just to get a feeling for what is and isn't possible here is to write a reasonably complex evaluation-heavy script in Typst and Python and compare speed. I'd be curious which orders of magnitude of difference we're talking about here.

sturdy sequoia
#

😭

feral imp
#

Are you still working at it? Or do you think this current version does indeed reflect what's possible with a VM?

sturdy sequoia
#

Wait my math is wrong

#

who cares, eval is not nearly as much as I wanted angryeyes

sturdy sequoia
#

I think closure calls are wayyyyyyyy too expensive for what they do

sturdy sequoia
#

I removed the eval module

#

and it gives an idea of just how MASSIVE this PR is

#

πŸ’€πŸ’€πŸ’€

feral imp
#

It's a lot. But there is no way that improvement from now on won't entail massive efforts like this....

sturdy sequoia
#

I thought it was bigger than that

feral imp
sturdy sequoia
#

anything with CetZ as @feral imp said

untold turret
#

@sturdy sequoia A quite important key for heavy computing. Would you like to do scalar specialization? I mean you stores floats in 32 bytes registers but they could only be 8 bytes, and importantly you may do Value::Float::add/mul/sub/div per instruction, which is very expensive.

sturdy sequoia
untold turret
#

you can do it at runtime

#

since typst doesn't have runtime, all of them are at compile time.

sturdy sequoia
#

I wonder if that would really help

untold turret
#

I just point out why a such vm doesn't quite help the heavy cetz docs.

sturdy sequoia
#

I had to manually rebase 47 times in a row for some reason

#

The VM was broken by that

#

😭

glad urchin
#

did you not keep a backup branch?

sturdy sequoia
proven umbra
#

Git stores a rev pre rebase for you.

#

Search git reflog. All actions are logged there, and as long as you do not git gc, git keeps "backup" references for you.

untold turret
#

🐱 @sturdy sequoia Forget the special cases. VM in parallel with rayon should benefit to both local typst and web typst. That may also beat main typst.

sturdy sequoia
#

@proven umbra it's funny but my VM is finding bugs in CetZ

#

With some variables being used that don't exist πŸ˜‚

proven umbra
#

πŸ˜„ Oh…

#

You have lines/functions so I can look to fix that?

sturdy sequoia
untold turret
#

I believe align can quite benefit from jit.

sturdy sequoia
#

Anyway, I'm off to bed

#

gn everybody ❀️

untold turret
#

In the picture

sturdy sequoia
#

ah right

untold turret
#

gn

sturdy sequoia
#

makes sense

atomic violet
#

oh-oh @sturdy sequoia

#

oh wait, it's just stack overflow, false alarm

feral imp
#

πŸ˜›

#

Does it work on main?

atomic violet
#

just need a recursion limit and it's all should be ok

atomic violet
atomic violet
#

@sturdy sequoia ok a real bug this time, patterns without let don't compile

#let (x, y) = (1, 2)
#(y, x) = (x, y)
#

probably covered by tests but still want to make sure

atomic violet
#

Oh no... typst is soo slooooowwww..... 😭😭😭😭😭😭

#

I have disabled memoization for closure::eval to make sure it won't just get OOM'ed..

#

and it crunches rays for like an hour and 20 minutes now

#

and python just does the whole thing in less than a minute

#

😭😭

#

Lemme check main

atomic violet
feral imp
#

main is fast?

atomic violet
#

vm with no closure memoization is 1:22:xx

#

main will OOM

#

python is like... a minute... haven't measured yet, it's pretty fast

#

I will check for a smaller resolution now

#

oh interesting

#

VM is slower, apparently

#

64.14 secs - VM, 34.24- main, and coughs 2.61 - python

sly pecan
#

Python is a low bar too

atomic violet
#

compiling VM again with no my hacks to make sure it's not me

atomic violet
#

because duh, everyone says python is slow, but it doesn't mean it's not optimized

#

it does have a good untyped VM

atomic violet
#

but also...

#

python does not have comemo at all, so it's still bad

#

damn, my code is so shit I can't even fix it properly...

#

the render itself is pretty dope though

atomic violet
#

yeah, memoization helps in one place...

feral imp
sturdy sequoia
sturdy sequoia
#

😱

sturdy sequoia
atomic violet
#

I previously removed it

#

it would not have finished otherwise

sturdy sequoia
#

hmmm

#

And the VM is slower?

#

That's extremely dissapointing

atomic violet
#

but I have not profiled it

sturdy sequoia
#

Guess I'll do just that

#

missa sad

#

@atomic violet are there lots of small closures?

#

Because I have the suspicion that freaking closures are too goddamn slow

atomic violet
#

and I did pretend to be a guy who tries to optimize a lot in my code

#

(without measuring anything)

sturdy sequoia
atomic violet
#

and there are a bunch of small functions, but I try to not call them in hot spots

#

there is a lot of spreading

#

and destructuring

#

that's basically the main way of moving data around there

#

and as a consequence a lot of functions have a lot of parameters

sturdy sequoia
#

I have the suspicion that I implemented spreading πŸ’©-ily while trying to make it faster πŸ’€

sturdy sequoia
atomic violet
#

The hottest functions should be m33mul, intersect, cast-ray and cast-ray-through-transparent, and a few closures for rendering heart, typst guy, and tiled floor. Maybe it would be useful to run a sampling profiler on typst and on python to compare where the slowness is likely occur

sturdy sequoia
#

@left night I'm having a pretty big issue, for some reason, clalling Route::contains causes a stack overflow πŸ’€

#

Do you know what could be causing this?

#

Here is an example from a stack trace

#

is goes on and on and on and on

#

I think I figured it out!

#

The check for cyclic evaluation was inside of a memoized function instead of the top level and that was the issue, maybe? πŸ€”

#

Yes indeed, that was it now it just panics πŸ’€

#

I had forgotten to check for cyclic import which is now fixed

left night
sturdy sequoia
#

but it wasn't actually detected

#

because it was only being done in the compilation stage

left night
#

ah okay

sturdy sequoia
#

it was a bit weird, but there are still a few things I haven't ported over

left night
#

Regarding performance improvements, I wonder how much of the time is actually spent on walking the AST etc (where a instruction-based VM should be faster) and how much is spent on "runtime" things like ops::join, inside functions operating on Values etc.

sturdy sequoia
#

this is where having types would be useful because I could pre-allocate a lot

left night
#

We should probably figure out a way to profile that more precisely

sturdy sequoia
left night
#

I fear that time would have too much overhead, too

sturdy sequoia
#

@atomic violet on my end, your raytracer takes 18s on the VM, I'll try on main now

sturdy sequoia
atomic violet
#

well, it's not that simple tbf πŸ€”

#

Maybe crunch it though callgrind? Count instructions instead of actual time?

left night
#

You could have, just for some quick tests, a cheaper version of #[time], that measures just one thing at once and its global effect: Effectively, a global mutable static that is increased by the time spent in a single leaf operation every time in runs. And then compare that with the full time spent. This might work better than a flamegraph at showing the full effect of many small leaf calls.

#

I am not sure how much overhead Instant::now() incurs, but it would be easy to check whether this kind of instrumentation increases the runtime significantly.

sturdy sequoia
#

linux and macos *

sturdy sequoia
#

and time could just generate a per-function counter

left night
#

I was more thinking about annotating a function of interest with a special attribute just temporarily

#

Rather than actually committing that to the code base

sturdy sequoia
#

ah right

left night
#

So it doesn't matter whether it's linux/mac only

sturdy sequoia
#

@atomic violet is that the expected result?

#

or is my VM broken πŸ’€

atomic violet
left night
#

poor Typst guy's eyes

sturdy sequoia
sturdy sequoia
#

with main it runs in 9.7s versus 19s on the VM

#

😭

atomic violet
sturdy sequoia
#

no

#

I'm running the gist

atomic violet
#

oh, yeah

#

gist is a bit different, but should be fine

atomic violet
sturdy sequoia
#

What hints to me that the VM is doing too much work is the size of the timings files

#

with main they're about half as big

#

but the VM shouldn't be doing any more timed operations

#

πŸ€”

#

which makes me think it's somehow running the code... twice?

atomic violet
#

just in case

left night
atomic violet
#

Let's see on that version

left night
#

because, if yes, I would be pretty happy with that

atomic violet
#

2.14 python, 29.84 VM, 24.75 - main 0.10

#

I am a dum dum

#

I was calling 0.10 "main" this whole time πŸ€¦β€β™‚οΈ

atomic violet
feral imp
#

How about main main?

atomic violet
#

one sec lemme build it

sturdy sequoia
#

That's main

#

That's the VM

#

what the heck is going on at the end

atomic violet
#

it's chillin' after a hard day of work

#

it's fine, give it a break

sturdy sequoia
#

as far as I can tell it's this code

#

I guess looping is very slow?

atomic violet
#

It may be joining?..

sturdy sequoia
#

hmmm

atomic violet
#

because that's the only place where content is joined iirc

sturdy sequoia
#

that's makes sense

atomic violet
#

oh wait

#

usr time: main - 14.83 vs 0.10 - 16.87

#

wth was it doing for 7 seconds?...

left night
atomic violet
#

seems like it

feral imp
atomic violet
#

I would need a proper measuring procedure so name a standard deviation ofc, but the order seems to be about ~6-9x

atomic violet
left night
atomic violet
#

What does 0.10 spends 7 seconds of system time doing? Writing a pdf?

atomic violet
sturdy sequoia
#

@atomic violet so I think you're probably right, it's probably joining that's so goddamn slow

sturdy sequoia
#

Ok I do think that looping was abnormally slow

#

I think I just fixed it

#

time to see

lunar kettle
#

so?

#

You can’t just leave us hanging like that!

sturdy sequoia
#

πŸ˜„

sly pecan
atomic violet
#

it should do the exact same thing

sly pecan
#

I'm not talking about the result

atomic violet
#

unless github copilot translated it wrong and I missed it

sturdy sequoia
#

Ok so it's not joining that is slow

#

Than I have no fucking idea what is slow

#

😭

atomic violet
#

and considering that even unused variables got translated, it should be the same

sturdy sequoia
#

it must be function calls?

sly pecan
lunar kettle
#

Just to make sure you are running in release mode right? πŸ˜‚

atomic violet
#

what's release mode?

#

(jk)

sturdy sequoia
#

So, I have narrowed it down to function calls

#

Now what I don't know is if it's native function calls

#

(i.e in the global scope)

#

or VM-ed function calls

#

which I assume are at fault

#

wait no

#

function calls are fast

sturdy sequoia
#

Ok so somehow reading values from the VM causes hashing

#

I have no idea why

#

So, as it turns out, it appears that accessing global values is the slow bit

lunar kettle
#

is it easy to fix?

sturdy sequoia
#

I don't know yet

sturdy sequoia
#

For some reason it hashes like crazy

#

but even my attempts at improving it don't help

#

ok so accesses are indeed the slow part

#

for some reason

atomic violet
#

let me callgrind it

#

btw, I don't know how comemo works, but if there is a function like

#let f(big-arr) = {
  big-arr.at(0) = big-arr.at(1)
  big-arr
}

how is it memoized exactly? What are the constraints here?

sturdy sequoia
#

I mean did identify the slow part of the code

#

I just... don't know why it's slow

#

it's access.rs in the typst::vm

#

I did at a typst_macros::time to it and it's indeed extremely slow

#

but I really don't know why

#

I've tried to bring in some optimizations to no avail

#

I think I'll need to rework the entire Access system because it's just bad

#

but I don't know how other VMs handle mutable field access

#

if anybody knows I'm all ears

left night
#

it would be great if it could somehow optimize that of course, but it doesn't atm

atomic violet
#

yeah it will be hard to optimize

#

I think it's either comemo just doesn't memoize this, or it has some sort of "reversible constraints" or something...

atomic violet
#

to optimize you need to profile, to profile you need debug info, to have debug info you need to not optimize

#

literally the worst part of software engineering

feral imp
#

While I welcome memes, it seems like a great indicator, that you guys should take a break. If you continue, I'd highly advise outlining low hanging fruit (even if there isn't one!) and going for that. Memes are a path way to burnout πŸ’€

#

But I'm just a goose anyways...

atomic violet
#

dherse should, I am have not done anything yet lol, I am just trying to find suspicios hashes but the <cycle 3> ruins my attempts πŸ˜‚

sturdy sequoia
#

you can find that in access.rs somewhere in there there 's a get_field or something called

#

that's what's doing it

#

afaik

atomic violet
#

until I see that in my call graph, I am not turning back

#

πŸ‘Ί

atomic violet
#

@sturdy sequoia What is foundations::Func::method?

#

it looks like it prehashes hella lot

#

and it is invoked in Access::read

#

πŸ€”

sturdy sequoia
#

It creates a method from a func and a value

#

it's a dirty hack I created to make method access work

#

πŸ’€

atomic violet
#

Does every array.at(..) prehashes the array? Is this necessary?

#

because it sure looks like it

#

yeah wait lol what