#Is there a less ugly way to make a case insensitive string map?

33 messages · Page 1 of 1 (latest)

visual mango
#

you could make the parse function a function within your Token enum

#

Im confused why you wouldn't just use to_lowercase() on the str

proud skiff
#

I would use some bit manipulation to force it to be lowercase
This only works for ascii letters, and rust has make_ascii_lowercase to do that. But if you only expect to compare each string once or twice, then there's eq_ignore_ascii_case which doesn't require modifying the string. This is what unicase::Ascii does.

If you need non-ascii case comparisons/transformations, then you need an ugly way.

visual mango
#

Why did you choose to use a static mapping?

#

Its probably best to use match

#

And make that a method on your enum

proud skiff
#

All the uppercase versions of the letters you have there are ascii, so you don't need to worry about non-ascii stuff if that's what you're doing.

visual mango
#

I believe under the hood it becomes a hashmap

proud skiff
#

Then you should use Ascii instead of UniCase

visual mango
#

I honestly would advise you not to worry about optimizing copying early in a project.

#

Just get a working version going and then refine it

proud skiff
#

Also phf is better than a match when there's lots of items, but how many is a lot is gonna vary by use case.

visual mango
#

I think it depends on the case, but there a lot of behind the scenes optimizations out there. for example doubling a value in just a bit shift.

#

Looks like a compiler, whats your plan with this project?

#

Let me know how things go

#

Havent tried my hand at a compiler yet

#

Probably

#

Yeah its got a lot of those features

#

What are your thoughts on the build tools like cargo?

#

Yeah, cpp is not very friendly

#

What about js

#

Let me know how your project goes

plucky frost
#

to add to what's been said, the copying usually isn't what's slow, but the allocation

#

which can be avoided in a myriad of ways, including:

#
  • using a crate that provides a stackstring or a smallstring
#
  • having a set of constant strings that you're matching against, it makes very easy to know what would the max size of the string would be.
    in that case, if it's bigger than the max size, it's not an ident
    otherwise, copy/translate each byte to a stack array with the max size
#

changing the input text wouldn't be a good idea in c++ either, because you most likely want to do some kind of diagnostics that show the original line, not the modified one

#

so a copy would be required there too

#

though I do question the usefulness of idents that can be any case

plucky frost
#

I just realized that the max size point applies to everything

#
  • you can keep a string around for the lowercasing that you only allocate and deallocate once. then in parse_kw, you just clear and assign the string, avoiding the allocation which is the slow part
#
  • while it would be a bit weird, I think you make it work using a &mut str that you can use to modify using make_ascii_lowercase, with absolutely no copy!
#

this would work because tokens usually don't keep references to the original string, but integers that signify start+end or start+size, so no borrow from the original string