I just finished making a simplified regex parser from scratch, and would like to see if there are things that could be done better.
This version of regex is targeted at what has given me the most issues, so it's limited to simple groups ( ), alternations |, star repeats *, and literal chars.
The main new idea I had is to use splicing on the groups, so a sequence of tokens [..., GroupStart, ... GroupEnd, ...] gets coalesced into [..., Group(...), ...], and [..., x, Repeat] to [..., Repeat(x)].
Additionally I plan to stuff most of the complexity like char classes and escapes into the lexing stage, since they can be treated as literals for the actual parsing stage.
I really don't like how the tokens and output share the Regex enum instead of being a separate Token enum, but I don't see a clean way to do the splicing without it.
Theoretically Regex: Clone shouldn't be needed, since if I could split the Vec<Regex> into 3 owned Vecs, only the middle one needs to be moved, and then the two remaining get joined back with the resulting group, but I don't see a good way to do that.
I also still don't see how the parsing is possible using any of the existing libraries despite trying almost all of them for way too long, so if anyone wants to make an example it would be very welcome.
A browser interface to the Rust compiler to experiment with the language