#Language parsing library choice

5 messages · Page 1 of 1 (latest)

shell iron
#

Hello, I'm planning to build a language server for a language that's not particularly small, and since its compiler is quite slow by itself, I'm considering writing my own parser. I'd like to know which library would be the best fit for this task so I don't have to go back at some point. I found that the most commonly used ones are tree-sitter, chumsky, and nom. My main concern is the ease of use of the library (e.g. how difficult it is to implement incremental parsing) with performance in mind.

acoustic tusk
#

I have used a couple:

Tree-Sitter is nice for prototyping. You can quickly get a grammar written and check its correctness via CLI tools, as well as get "free" syntax highlighting.
However, its API is "stringly typed" which is not super comfortable in rust.

I have used chumsky a couple of times. It's also really nice because combinators are really easy to use. The performance is also quite decent.
My main problem with chumsky is that the type errors are abominable and that I can't create parsers in a static context (at compile time). Realistically this isn't a huge problem.

I have only used nom briefly, I'm not sure why I would choose it over chumsky for parsing programming languages. It might be my choice for data formats though.

One of my favorites is lalrpop. It's quite pleasant and easy to use IMO. Having a custom rusty DSL makes the grammar very readable. Additionally, the generated parser is statically checked for type safety and grammar errors.
The main downside here is poor IDE support. Highlighting works fine, but it is generally not aware of the rust types you are using.

Pest has good IDE integration, at least in vscode. It's been a while since I used it. Writing the grammar was quite nice, even if the syntax was unusual to me.
My main gripe with pest was that converting your parse result to an AST has to be done somewhat manually. You end up writing your tree structure three times: once for parsing, once for defining your structs, and once to convert between them (tree-sitter is similar in this regard)

shell iron
# acoustic tusk I have used a couple: Tree-Sitter is nice for prototyping. You can quickly get ...

hey, thank you for the detailed response. i’ve played around with the libraries and decided to stick with your favorite one haha. it is unfortunate that there’s no ide support but that's fine. though i'm currently facing a problem: i’m working on a Kotlin language parser, where statements are separated with either a semicolon or a newline, except for the last one. when I try to write a grammar for this, lalrpop throws tons of errors at my face, due to ambiguity i suspect:

pub Source: Vec<Box<Stmt>> = {
    <stmts:StmtWithTerminator*> <last_stmt:Stmt> => {
        let mut result = stmts;
        result.push(last_stmt);
        result
    },
};

StmtWithTerminator: Box<Stmt> = {
    <stmt:Stmt> <term:Terminator> => stmt,
};

Stmt: Box<Stmt> = {
    /* ... */

    <e:Expr> => Box::new(Stmt::ExprStmt(Box::new(e))),
};

Terminator: () = {
    ";" => (),
    "\n" => (),
};
acoustic tusk
shell iron