#Gleam ISO/IEC 8859 decoding library

1 messages Β· Page 1 of 1 (latest)

lusty void
#

This is my first Gleam library 😎: https://github.com/richard-viney/iso_8859

It decodes ISO/IEC 8859 text content to a native Gleam string. When targeting JavaScript it falls back to the built-in TextDecoder which is faster.

It's fairly small/niche. I've got a much larger library in the works and decided to extract this small part of it as a trial run for actually publishing something!

Any feedback on it would be gratefully received.

GitHub

Gleam library to decode ISO/IEC 8859 binary data into native UTF-8 strings. - GitHub - richard-viney/iso_8859: Gleam library to decode ISO/IEC 8859 binary data into native UTF-8 strings.

steady yew
#

TIL you can have both an implementation and @external for a function

#

why is there a @external(javascript, "./text_decoder.mjs", "bits_to_codepoints") that leads to a function that just throws?

#

a thought: if the LUTs are not used in the JS target, they could be moved off to their own module (perhaps internal if you don't want them to be shared) so that a browser doesn't need to load that data. or could constants be marked with @target or something? I forget

#

that LUT part is genious btw πŸ™‚

lusty void
#

Managed to remove that empty external by not specifying unsigned in the bit array pattern match (not supported in JavaScript currently but it's the default so can do without in this case)

#

Might be possible to not have the LUTs in JS, true, not sure if it's possible tbh

steady yew
#

ah, I realise the browser would import the constants anyway, so just moving them to another module isn't a solution

lusty void
#

These single-byte character encodings are all pretty simple mappings, so some bit array LUTs made sense, should be faster than a List(String) but I've not benchmarked it

steady yew
#

as a Finn I have plenty of personal experiences with ISO-8859-1 and -15 pain πŸ˜„

lusty void
#

Oh is it not largely UTF-8 these days?

#

In my case I'm dealing with binary formats that have been around for decades and support various non-UTF-8 string encodings

steady yew
#

it is, these days. πŸ˜›

lusty void
#

Fair point

glossy citrus
#

I've got a much larger library in the works and decided to extract this small part of it as a trial run for actually publishing something!
multiple small libraries is quite cool

fresh dirge
#

Nice code!

#

What sort of thing does one use this encoding for?

steady yew
#

I remember using this sort of thing on IRC where half the clients were speaking ISO-8859-15 and half were speaking UTF-8 πŸ™‚

magic flower
#

I use a TextDecoder for keyboard input in teashop.
So this might be helpful for non-JS support for me.