Is there a less ugly way to make a case insensitive string map? | Rust Programming Language Community | Page 1

visual mango Jun 21, 2024, 12:45 AM

#

you could make the parse function a function within your Token enum

#

Im confused why you wouldn't just use to_lowercase() on the str

proud skiff Jun 21, 2024, 12:54 AM

#

I would use some bit manipulation to force it to be lowercase
This only works for ascii letters, and rust has make_ascii_lowercase to do that. But if you only expect to compare each string once or twice, then there's eq_ignore_ascii_case which doesn't require modifying the string. This is what unicase::Ascii does.

If you need non-ascii case comparisons/transformations, then you need an ugly way.

visual mango Jun 21, 2024, 12:57 AM

#

Why did you choose to use a static mapping?

#

Its probably best to use match

#

And make that a method on your enum

proud skiff Jun 21, 2024, 12:59 AM

#

All the uppercase versions of the letters you have there are ascii, so you don't need to worry about non-ascii stuff if that's what you're doing.

visual mango Jun 21, 2024, 1:00 AM

#

I believe under the hood it becomes a hashmap

proud skiff Jun 21, 2024, 1:01 AM

#

Then you should use Ascii instead of UniCase

visual mango Jun 21, 2024, 1:04 AM

#

I honestly would advise you not to worry about optimizing copying early in a project.

#

Just get a working version going and then refine it

proud skiff Jun 21, 2024, 1:04 AM

#

Also phf is better than a match when there's lots of items, but how many is a lot is gonna vary by use case.

visual mango Jun 21, 2024, 1:06 AM

#

I think it depends on the case, but there a lot of behind the scenes optimizations out there. for example doubling a value in just a bit shift.

#

Looks like a compiler, whats your plan with this project?

#

Let me know how things go

#

Havent tried my hand at a compiler yet

#

Probably

#

Yeah its got a lot of those features

#

What are your thoughts on the build tools like cargo?

#

Yeah, cpp is not very friendly

#

What about js

#

Let me know how your project goes

plucky frost Jun 23, 2024, 5:26 AM

#

to add to what's been said, the copying usually isn't what's slow, but the allocation

#

which can be avoided in a myriad of ways, including:

#

using a crate that provides a stackstring or a smallstring

#

having a set of constant strings that you're matching against, it makes very easy to know what would the max size of the string would be.
in that case, if it's bigger than the max size, it's not an ident
otherwise, copy/translate each byte to a stack array with the max size

#

changing the input text wouldn't be a good idea in c++ either, because you most likely want to do some kind of diagnostics that show the original line, not the modified one

#

so a copy would be required there too

#

though I do question the usefulness of idents that can be any case

plucky frost Jun 23, 2024, 5:59 AM

#

I just realized that the max size point applies to everything

#

you can keep a string around for the lowercasing that you only allocate and deallocate once. then in parse_kw, you just clear and assign the string, avoiding the allocation which is the slow part

#

while it would be a bit weird, I think you make it work using a &mut str that you can use to modify using make_ascii_lowercase, with absolutely no copy!

#

this would work because tokens usually don't keep references to the original string, but integers that signify start+end or start+size, so no borrow from the original string

#Is there a less ugly way to make a case insensitive string map?