I'm in the early stages of building a text editor in Odin and I'm confused about the different types there are to represent character strings in Odin.
As far as I can tell "string", "[]u8" and "[]rune" are the main ones, with "cstring" and "[^]u8" being for the purpose of communicating with C libraries. "[^]u8" seems to be the equivalent of the C "char *", which I like to use, but I feel like it probably isn't very "idiomatic" to use in Odin. I don't understand why both "cstring" and "[^]u8" exist given that the second one seems to be the more useful version of the first (?).
Anyway for the purposes of regular string manipulation in Odin, what should I generally use?
Here are some of my thoughts, but I may be wrong and I'd like to hear others'. I assume if possible you would generally would want to stick with the string type. You iterate them by rune and that seems sensible. I could imagine situations where []u8 makes sense if you need to iterate by byte instead of by rune. The one that puzzles me most is []rune. I noticed that the length of the string (len(my_str)) seems to refer to the amount of bytes while the length of the []rune seems to refer to the amount of runes. I noticed that if you index into a string, you do it by byte index, but you index into a []rune you do by it by rune index (an emoji is worth one instead of 4 or whatever). How does this even work with as a simple array if they are of variable size? I would think this is possible if all runes are stored as 4 bytes regardless of usefulness of the second to fourth bytes, but I don't know how to check if this is the case given len() returns the amount of runes. string and []u8 also seem nice to go to and from as they don't need to be reallocated, while going from string to []rune does.
I would appreciate some explanations around this matter, whether I got anything wrong and if anyone has rules of thumb or a good internalized understanding for when to use each, I'd love to hear!