#Is there a less ugly way to make a case insensitive string map?
33 messages · Page 1 of 1 (latest)
I would use some bit manipulation to force it to be lowercase
This only works for ascii letters, and rust hasmake_ascii_lowercaseto do that. But if you only expect to compare each string once or twice, then there'seq_ignore_ascii_casewhich doesn't require modifying the string. This is whatunicase::Asciidoes.
If you need non-ascii case comparisons/transformations, then you need an ugly way.
Why did you choose to use a static mapping?
Its probably best to use match
And make that a method on your enum
All the uppercase versions of the letters you have there are ascii, so you don't need to worry about non-ascii stuff if that's what you're doing.
I believe under the hood it becomes a hashmap
Then you should use Ascii instead of UniCase
I honestly would advise you not to worry about optimizing copying early in a project.
Just get a working version going and then refine it
Also phf is better than a match when there's lots of items, but how many is a lot is gonna vary by use case.
I think it depends on the case, but there a lot of behind the scenes optimizations out there. for example doubling a value in just a bit shift.
Looks like a compiler, whats your plan with this project?
Let me know how things go
Havent tried my hand at a compiler yet
Probably
Yeah its got a lot of those features
What are your thoughts on the build tools like cargo?
Yeah, cpp is not very friendly
What about js
Let me know how your project goes
to add to what's been said, the copying usually isn't what's slow, but the allocation
which can be avoided in a myriad of ways, including:
- using a crate that provides a stackstring or a smallstring
- having a set of constant strings that you're matching against, it makes very easy to know what would the max size of the string would be.
in that case, if it's bigger than the max size, it's not an ident
otherwise, copy/translate each byte to a stack array with the max size
changing the input text wouldn't be a good idea in c++ either, because you most likely want to do some kind of diagnostics that show the original line, not the modified one
so a copy would be required there too
though I do question the usefulness of idents that can be any case
I just realized that the max size point applies to everything
- you can keep a string around for the lowercasing that you only allocate and deallocate once. then in parse_kw, you just clear and assign the string, avoiding the allocation which is the slow part
- while it would be a bit weird, I think you make it work using a &mut str that you can use to modify using make_ascii_lowercase, with absolutely no copy!
this would work because tokens usually don't keep references to the original string, but integers that signify start+end or start+size, so no borrow from the original string