#A new, experimental and unofficial Gleam to WebAssembly compiler

1 messages · Page 1 of 1 (latest)

drowsy mist
#

I spent a good part of the last year to write my Master’s thesis about compiling Gleam to WebAssembly lucywasm

It does not cover all Gleam features, but it's enough to do some useful things with it. Some exciting stuff includes Gleam Externals to call foreign WebAssembly functions, tail recursion and generics (through monomorphization).

I also made an example project to showcase the supported features and to compare the performance between Gleam compiled to JavaScript and Gleam compiled to WebAssembly (spoiler: wasm wins 🚀).
It’s a raytracer running in the browser, rendering to an HTML canvas (with some JS glue).

Compiler source: https://github.com/tufteddeer/gleam-webassembly
Demo: https://tufteddeer.github.io/raytracing-gleam/

I planned writing a few blog posts highlighting different aspects of the project, but since I have not managed to do that yet, I decided to publish the project right away before it ends up forgotten in a private repo.
Nonetheless, if you are interested in certain aspects feel free to reach out and I will do my best to answer or provide an english version of my thesis.

The implementation is kinda messy in some parts to be honest, but I was more focused on features and figuring out how to do stuff (I don’t have a background in compilers) than the quality of the implementation. In hindsight, I should have done more research here and there to come up with cleaner solutions but given the overall results I am still quite happy with what I achieved.

GitHub

A modified Gleam compiler to translate Gleam to WebAssembly - tufteddeer/gleam-webassembly

marsh karma
#

Wow, amazing!

#

How did you approach memory management?

drowsy mist
# marsh karma How did you approach memory management?

I used the wasm gc types, which are managed by the runtime.

Implementing custom memory management would have been fun though, but time was limited.
There are also things where raw memory might have been easier to use than the gc types and instructions, but on the other hand they give you static analysis before execution which saved me a few times 😄
JS/Wasm interop for gc types is pretty much non-existant at this point, so at this point you have to touch unmanaged memory. I only support FFI with strings, float and int for this reason.

For strings, i used wasm js-string-builtins. They use the same string implementation as the runtime executing the binary, as long as the proposal is supported (works with v8/node/deno/chrome, not in Safari and not in non-js runtimes like wasm). Runtime manages everything for you, just call the provided functions and pass a string reference.
This approach is very handy when you want to pass strings arround between js and wasm.

marsh karma
#

I think GC is a wise choice, keep that code size down

#

And making a good memeory management system is a ton of work

#

This is super cool!

fossil rampart
#

yeah usually wasm blobs are comparatively huge, but this looks to be relatively small in comparison!
very impressive!

brave fox
#

That’s so cool!!

drowsy mist
#

thanks for the feedback, great to hear 🙂

drowsy mist
fossil rampart
#

does "no cyclic references" mean that recursive types cannot be expressed?

fossil rampart
drowsy mist
# fossil rampart does "no cyclic references" mean that recursive types cannot be expressed?

yes, some types are a problem here. this can absolutely be fixed, i just failed to find a good solution in time.
this was only an issue after i optimized the custom type representation. the first version used abstract references which simplified declaring the types in the wasm module.
it should be possible to compute the wasm type indices before declaring the types though and then you could reference type 5 when declaring type 2 for example, wasm should accept that

#

in general, managing the type and function indices was kind of a headache most of the time 🥲

haughty hearth
# drowsy mist I spent a good part of the last year to write my Master’s thesis about compiling...

That's impressive ! I'm just not sure why I would like Gleam run in a WebAssembly environment but that's interesting. And so from your test you can optimize your Wasm build to be less space efficient on the disk but in term of runtime performance and interoperability with the real world, is it still great or are we far aways than just having native JS ? (I love WebAssembly but sometime the fact it's a sandbox without stuff like socket make me very sad)

vapid rose
#

Yay I don't have to do this anymore

#

Nice work

drowsy mist
drowsy mist
vapid rose
#

it is definitely no small task despair

drowsy mist
# haughty hearth That's impressive ! I'm just not sure why I would like Gleam run in a WebAssembl...

i don't think i fully understand the question sorry.
for me the best part of wasm is that its a compile target for multiple languages. maybe some day we can build some web ui with gleam and link our favorite rust crates, but you could already do similar things with some js.
The sandboxing is a plus in my opinion and with the component model and WASIp2 there will be ways to interact with the outside world via well defined boundaries. i managed to get this to work for gleam integers only.

marsh karma
#

Small fast self contained binaries 👍

#

Very useful for many many use cases

fossil rampart
#

tbh I'm not even sure anymore that's really that true, or like true to the extent where it makes a big difference

marsh karma
#

What isn't?

fossil rampart
#

that it would be a big improvement (in terms of size) compared to shipping an erlang vm, a wasm vm isn't that much smaller and would require a lot more runtime support

marsh karma
#

Yeah, that's fair, shipping the BEAM is looking progressively more viable these days

haughty hearth
#

I did not check, how large and heavy it’s for an application to install the BEAM ?

#

Is it larger in term of size compared to a node runtime ? How much larger ?

fossil rampart
#

the vm and the jit are a 5mb binary, the kernel/stdlib applications are another 5 mb in total

#

you can potentially strip a bunch of things from the stdlib that are not used, and you can replace the beam with atom too (in a limited way)

fossil rampart
#

for a real-live example, if you were to bundle lustre-dev-tools into a single file it would be a 15mb binary. the baseline is worse, but it seems to grow a lot slower so we're smaller than an equivalent go program would be

marsh karma
#

Just need to remove that 1 second pause now

fossil rampart
#

nah it's fine I can write my own entrypoint that calls halt

#

if it's good enough for escripts I can do it too 😄

marsh karma
#

Nah we should get it removed

#

Seeing as it's supposed not needed any more

haughty hearth
fossil rampart
#

inside the shutdown code of a built-in erlang module there's a 1 second sleep that we think is not needed anymore

haughty hearth
haughty hearth
fossil rampart
#

I'm not exactly sure, it's over 17 year old code but I suspect something like that yeah

marsh karma
#

Nah, application shut down is handled by the supervision tree.
At a guess it was flushing some IO buffers. You'd need to ask the OTP maintainers I suppose

haughty hearth
#

I'm too shy to do this 😅

bright eagle
#

this is so impressive, great work! i love the demo of raytracing too

fallow light
#

Writing a (way smaller) gleam to wasm compiler has been on my list for a while now, yours is so cool, well done!

#

What was your masters thesis about?

drowsy mist
# fallow light What was your masters thesis about?

"development of a webassembly compiler backend for gleam".
i focused on the practical challenges i faced and how i solved them, including some experiments and benchmarking. theres some background information on some topics, but it's mostly documenting my practical work and less research/theory (which my supervisors did not really appreciate 😂 )

marsh karma
#

Will the thesis be published for folks to read?

drowsy mist
drowsy mist
forest pawn
#

Very cool! I've dabbled in compiling Gleam to wasm too (https://github.com/gertvv/gleam-wasm), so I will definitely need to check this out. I was particularly interested in targeting the component model but the specs and tools weren't quite there at the time (which lead me down an interesting rabbit hole). Anyway, real life got in the way, and now I've been overtaken on both sides by advances in Gleam and wasm. I feel like there's a huge design space to explore in compiling Gleam to wasm and it's a bit of a shame that there isn't an easier path to doing that experimentation in Gleam. I still hope to get back into it though, and maybe bridge that gap. Would love to see some blog posts and learn from your experience!

#

Thanks to the gleam weekly for highlighting this BTW!

drowsy mist
forest pawn
#

Thanks. I wrote it because that's what I felt I had been missing too.

forest pawn
#

Hey. This is something I've been thinking about for a while, and this seems like a good place to bring it up. The fact that gleam only needs to compile each module / package once is really great for the developer experience. So in my experimenting, I've been looking for a way to compile each module individually to a valid wasm binary. But obviously you end up doing a lot of boxing/unboxing of primitive types, so while it's great for compiler performance, it's terrible for executable performance. On the other hand, if I understand your process correctly, with monomorphization happening early, you'd need access to the typed AST for every (relevant) module, including for dependencies, in order to generate the monomorphic representation (since polymorphism doesn't end at package boundaries). So I've been wondering if there's a way to have our cake and eat it too. Could we compile each gleam module into a valid wasm binary using a uniform representation, but also store sufficient type and constraint information in a custom section, so that when we know the entry point, we could merge the relevant wasm binaries together, eliminate any dead code, and THEN perform monomorphization? What do you think?

forest pawn
#

By the way that example of higher-rank polymorphism had me worried for a second but fortunately it isn't valid in Gleam - the compiler will complain that False was expected to be an Int:

fn id(x) { x }
fn apply_pair(f, x, y) { #(f(x), f(y)) }
pub fn main() { apply_pair(id, 123, False) }
drowsy mist
#

i was really lucky because that paper was published a month or so after i started working on this. It was really helpful because my own basic implementation was super limited.
also, i don't know any fancy words or concepts so they allowed me to sound smart 😎

loud linden
#

Wow, amazing work! I was thinking about creating something similar

#

Now I need to create fork of Gleam with WASM/WASI support 🙂

drowsy mist
# forest pawn Hey. This is something I've been thinking about for a while, and this seems like...

do you need the (un)wrapping for compiling/merging individual modules or because generics are implemented by boxing everything in a wrapper type?

yes, monomorphization includes information from every gleam module involved.

can you elaborate how the uniform representation would look like?
would the wasm binary be valid before you perform monomorphization on it? so it would be more of a post-processing optimization pass to get rid of the boxing? That would be interesting for other problems as well!

I think there are ways to make compilation more efficient. maybe it's not necessary to create valid wasm modules as intermediary artifacts but wasm code snippets or some sort of intermediary representation that can be translated more efficiently, constraint information could be part of that IR.

for intermediary artifacts, the formerly generic functions don't have to be in the same wasm module as the non generic code, so that might allow for better caching depending on what your module does.
i also think rust has some work on extracting non-generic code from generic functions so the overhead of monomorphization is reduced.

#

generally speaking, trying alternative approaches to monomorphization and boxing and/or combining them would be interesting future work. i am sure there is previous work from other languages

#

regarding the merge of wasm modules: it's easy with functions, but importing/exporting types (like gc structs) is not supported. i figured it should be possible to manually copy the definition into the importing module and merge the modules because wasm cares more about the structure of a type than the index.

forest pawn
#

Yes, for types you need to duplicate them into each module that uses them and then de-duplicate again when you link the modules together.

#

Regarding uniform representation / monomorphization: yes, that's exactly what I meant, valid and executable wasm modules where the uniform representation is implemented in a carefully considered way, so that it's easy to eliminate the boxing and unboxing operations. I should try to write this up in more detail because I'm still going around in circles on how it could work exactly. But it only occurred to me last night that monomorphization might be possible as an optimization pass after generating the uniform representation. I agree other IRs could achieve more or less the same thing, but it seems like such an elegant (and general) idea that I can't let it go now...

#

I think basically you could box everything and pass it around as (ref any)'s, so you never end up unboxing anything unless you hit a basic operation. And you'd implement those basic operations as functions taking (ref any)'s as well, so the unboxing of the arguments and the boxing of the result is isolated within those function calls. Then when you monomorphize, you just substitute in the primitive types for the (ref any)'s and the primitive operation for the function call, and you're done.

#

I think you could probably even assign unique type IDs to "different" (ref any)'s so they're easier to substitute.

forest pawn
#

I think this may have resolved my writer's block on my own compiler 😹 - I was sort-of taking the type erasure route but got stuck on the idea of keeping as much type information as possible. This suggests the opposite.

#

By the way @drowsy mist are you aware of the proposed canonical GC ABI for the component model? That'll be amazing when it arrives.

drowsy mist
drowsy mist
forest pawn
#

So about polymorphic packing. Is there an actual example you could construct where Gleam's type inference wouldn't eliminate the problem? It seems like it would take care of the example they show. How did you deal with it?

ivory falcon
drowsy mist
# forest pawn So about polymorphic packing. Is there an actual example you could construct whe...

yes, this is not a problem with normal gleam.
I don't think type inference can solve this because as far as i understood it, the problem is that we can't define all the types at compile time because they are defined recursively. In 2.5.4 Lutze et al show that we need A to define B and vice versa. we would just come up with an infinite set of type definitions in wasm but maybe this could be fixed by breaking the cycle with some abstract references.
I think In normal gleam this is not an issue because with JS and Erlang its good enough if it works at runtime.
To be fully honest, i did not dig too deep into the constraints mentioned in the original paper (or type theory in general) so my understanding is limited and my assumptions regarding gleam might be wrong!
As for my implementation, the example given in the original paper errors after monomorphization because it struggles to find a type definition it needs to generate code. While i think this is because of the mentioned recursion, there must be some problem in my code because the error should come up during monomorphization

My Gleam version of the example: https://gist.github.com/tufteddeer/843c5fefdb4673956c76f2d4185e6277
fyi, constraints and solution are printed at the trace level, but i just noticed that it does not include nested types in all detail

Gist

Polymorphic packing in Gleam, based on https://dl.acm.org/doi/10.1145/3720472 - gist:843c5fefdb4673956c76f2d4185e6277

drowsy mist