glubs - Subtitle parser and tokenizer | The Gleam Programming Language | Page 1

primal rose Nov 18, 2023, 9:14 AM

#

In order to practice writing some Gleam (my first project) I wrote a WebVTT parser and payload tokenizer! There is still quite some things to be done, but I wanted to get early feedback as I am used to write elixir instead and wasn't sure about some patterns.

https://github.com/philipgiuliani/glubs

GitHub

GitHub - philipgiuliani/glubs: WebVTT parser and payload tokenizer ...

WebVTT parser and payload tokenizer in Gleam. Contribute to philipgiuliani/glubs development by creating an account on GitHub.

primal rose Nov 18, 2023, 9:42 AM

#

@short plover I just looked at your toml parser and noticed that you worked with string.graphemes and then matched on the list elements! Whats the reason for that? Do you think that would also be better for the webvtt parser/tokenizer? I worked a lot with binary split and pattern matching: https://github.com/philipgiuliani/glubs/blob/main/src/glubs/webvtt.gleam#L129

short plover Nov 18, 2023, 9:43 AM

#

String matching like that is currently slow on JS runtimes due to how strings are implemented, and I wanted it to be faster there even if that means it's slower on Erlang

#

What's best I'm not sure. I haven't done extensive research or any proper benchmarking

primal rose Nov 18, 2023, 9:45 AM

#

Ok thanks, sounds reasonable!

short plover Nov 18, 2023, 9:48 AM

#

I had to look up WebVTT

#

What a wonderful bit of tech!

#

Are you making something?

primal rose Nov 18, 2023, 9:50 AM

#

In my company we are working a lot with livestreaming and live subtitle transcription/editing! So the first thing that came to my mind was making a WebVTT parser 😄
I'm thinking about writing a WebVTT editor using lustre afterwards 🤔

short plover Nov 18, 2023, 9:53 AM

#

That's super cool!

primal rose Nov 18, 2023, 7:44 PM

#

I've just added support for the SRT format and published it on hex! https://hexdocs.pm/glubs/index.html

glubs · v0.2.0

WebVTT and SRT parser and serializer.

#glubs - Subtitle parser and tokenizer