#Data Structure for parsing

174 messages · Page 1 of 1 (latest)

worthy citrus
#

I'm making an interpreter.. After the tokenizer tokenizes the code, it will store it in an array.. Now, I want another 'data structure' to use grammar and produce stuff. I think this figure says what I want ot achieve, I also want to implement a form of backtracking, which i can probably do through making temp version of that data structure and copying stuff? But how do I make this type of data structure

spice chasmBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

worthy citrus
#

if i am looking at parsing wrong, please do tell me about it, i am just trying to apply what i learned in theory about recursive descent parser..

karmic flax
#

Honestly, backtracking seems a bit much, maybe just have a lookahead

#

The data structure you want would probably be AST

#

Convert the tokens into a tree

#

Then for interpreter you can just walk thru the AST

#

I have a simple C++ like interpreter on my github if you want to check some stuff, its not best code practices probably but it works

worthy citrus
worthy citrus
karmic flax
worthy citrus
karmic flax
#

You can incorporate it

worthy citrus
karmic flax
#

Not that hard

#

Type system is pretty hard tho

worthy citrus
karmic flax
#

I mean if you want code to run

worthy citrus
#

can i ask a question

#

an abstract one

karmic flax
karmic flax
worthy citrus
karmic flax
#

Probably easier

karmic flax
worthy citrus
#

you check grammar and then build an ast?

#

or do you check gratmmar after u build an ast

karmic flax
#

I see what is the statement I'm parsing

#

If for example, after variable name comes something thats not a semicolon or =, i return null

#

(i actually dont return null but set an error code)

worthy citrus
#

so you're parsing as you're lexing?

#

understandable

karmic flax
worthy citrus
#

you tokenize and you store it?

karmic flax
#

Yeah

worthy citrus
#

and then check after?

karmic flax
#

Wdym

worthy citrus
#

check if there's ; or = after a variable

karmic flax
#

I do a semantic check during parsing

worthy citrus
#

through grammar?

karmic flax
#

For example

if(tokenizer->GetToken().type != SEMICOLON){
    errorCode = MISSING_SEMICOLON;
    return;
}
karmic flax
#

Ik what grammar is in my head so i just make sure the tokens conform to it

worthy citrus
karmic flax
#

Parsing

worthy citrus
#

after doing semantical analysis?

karmic flax
#

Lexing -> convers string into tokens
parsing -> converts tokens into AST

worthy citrus
#

like you analyse if there's an = or ; after a variable, then if it pases you make an ast?

karmic flax
worthy citrus
karmic flax
#

You can do it before

worthy citrus
#

cause parsing is such a vague thing it seems

karmic flax
#

Its not

#

Converts tokens into a tree

#

Simple as that

worthy citrus
#

some people say that you check grammar after making a tree, by recursively iterating through it, so it makes me more confused

karmic flax
#

For example

"1 + 2 * 3"
becomes
binOp:
    lhs: 1
    op: +
    rhs: binOp:
        lhs: 2
        op: *
        rhs: 3

ir:
res = 2 * 3
res += 1
worthy citrus
#

ahh

karmic flax
#

You'll need an expression evaluator

#

So you can deal with precedence

worthy citrus
#

okay oaky

#

thank you

karmic flax
#

I recommend just parsing math expressions for starters

#

Since theyre crucial to any language

#

And easy to test

worthy citrus
karmic flax
#

Could you show the code so i could see if its on github or something

#

I was also stuck at parsing for a long time

#

Thought its the hardest shit but then IR generation came

worthy citrus
karmic flax
#

Lol

#

Dw abt it

worthy citrus
karmic flax
#

Work on getting tomenizer made by urself

#

Big dislike of the unions and structures like that

#

We have polymorphism for a reason

worthy citrus
karmic flax
#

Yeah

#

Make it by urself then

#

Dont follow someone else's practices

worthy citrus
#

i was impressed by his c skills

karmic flax
#

And this is C code

#

Not C++

worthy citrus
#

and his didnt work... so i had to change like 40%

karmic flax
#

Idk why ur using c++ tbh

worthy citrus
#

i like classes

karmic flax
#

Then use them properly

#

You mix c and c++ style

worthy citrus
#

and string in c is wierd, in graphics programming when i'm working with shaders, string in cpp is much better

karmic flax
#

We have constructors for a reason

#

No need for Lexer *createLexer()

#

All your AST structs can derive a AST_Node base structure

#

And just store a smart pointer or normal one in the parser class

worthy citrus
karmic flax
#

Also your token structure doesnt keep track of line number and character offset, that'll make errors much easier to spot

karmic flax
worthy citrus
#

poiners everywhere, i finally experienced dangling pointers and errors and shit

karmic flax
#

Language dev is hard, dont make it harder for no reason

worthy citrus
worthy citrus
karmic flax
#

Use C++ properly with smart pointers

worthy citrus
#

can you make sense of hwat i'm tryna do in the parser.cpp

karmic flax
#

If you think of C++ as C with classes you might want to relearn cpp

worthy citrus
#

i've uniornically learnt so much in the last 3 weeks tryna make a shitty language...

karmic flax
#

Well, TOKEN_UNION name doesnt make sense

#

Token array too

#

Since it takes a AST node, not tokens

#

And you're doing alot of typedefing

#

Which is literally useless in C++

#

For structs

worthy citrus
#

idk why

karmic flax
#

Since

struct name{};

name object;
worthy citrus
worthy citrus
karmic flax
#

Yeah not that

#

Im talking about using typedef at all

karmic flax
worthy citrus
#

no no by typedefing here i meant what you are saying

karmic flax
#

No need to do struct T var for creating variables

worthy citrus
#

LEXER_STRUCT{}LEXER_T;

karmic flax
#

You have issues with understanding c++

worthy citrus
#

its not just c with classes 😦

karmic flax
#

No lol

#

Fundamental differences

worthy citrus
#

okayy i'll change those habits

karmic flax
#

Do yourself a favor and learn cpp before continuing the project

worthy citrus
#

i mean i have been reading like articles from many programmers who get mad at cpp so i think that also has influenced me

karmic flax
#

After you learn it somewhat, you will probably opt for rewrite

worthy citrus
karmic flax
#

Yeah but like, you're not learning c++ now

#

You're forcing C standards into C++

worthy citrus
#

i get that...

#

can u understand what i'mm trying to do tho

karmic flax
#

Like TOKEN_T* parse_num(LEXER_L* lexer) that can be a method, no reason to have it as a function

karmic flax
#

But hard to read since you mix c and cpp

worthy citrus
worthy citrus
karmic flax
#

Grammar rules can be done during parsing

#
Statement *ParseDecl(Tokenizer &tokenizer){
    auto typeTok = tokenizer.NextToken();
    if(typeTok.type != TOKTYPE_TYPE) throw;
   auto varName = tokenizer.NextToken();
    if(varName.type != TOKTYPE_IDENT) throw;
    ...

   return new VarDeclStatment(typeTok, varName,...);
}
worthy citrus
#

so i make an array where the stream of token is stored, then i make another array that basically uses the grammar rules to generate that stream of token... (i named them asts because i knew i was gonna make them into an ast, but dont know when)
So I check the grammar like that, cause in class we glanced over it to learn automatic parsers

#

i'm tryna do something like that, if u think this is very complicated/stupid please do tell me

karmic flax
#

In lexer, just have a std::vector of tokens

worthy citrus
#

isn't std vector just arrays but that we can dynamically change the size of?

karmic flax
#

Yes

#

But learn c++ properly

worthy citrus
karmic flax
#

It creates an instance of struct

karmic flax
#

You dont add whitespace to your token lizt

worthy citrus
worthy citrus
karmic flax
#

Yes

#

It needs to be added to tree

#

By the called function

worthy citrus
# karmic flax It needs to be added to tree

how did u implement a tree? does it just have pointers to the left right and root nodes? or did u implement in an array where it's implied where the nodes are thorugh left node = 2i, right node = 2i+1, type of way?

worthy citrus
#

i was looking at ur other compiler project

#

lol

karmic flax
#

Lol

spice chasmBOT
#

@worthy citrus Has your question been resolved? If so, type !solved :)

worthy citrus
#

okayy i think my brain is lagging... i'll think more about this later but now
i guess i gotta change my whole datastructure? or make a new datastructure for the tree ?
Like now i'm thinking eithe I just do basic syntax analysis using the shit i've created (like if there's a ; and if there's closed paranthesis after an open one) and incorporate it into the whoel parser by making a tree and do an actual grammar analysis in that?

or stick with this shit and do grammatical analysis using this sludge and make a tree later.... i need fresh air lmao

karmic flax
#

Rewrite probably for the best