I'm making an interpreter.. After the tokenizer tokenizes the code, it will store it in an array.. Now, I want another 'data structure' to use grammar and produce stuff. I think this figure says what I want ot achieve, I also want to implement a form of backtracking, which i can probably do through making temp version of that data structure and copying stuff? But how do I make this type of data structure
#Data Structure for parsing
174 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.
if i am looking at parsing wrong, please do tell me about it, i am just trying to apply what i learned in theory about recursive descent parser..
Honestly, backtracking seems a bit much, maybe just have a lookahead
The data structure you want would probably be AST
Convert the tokens into a tree
Then for interpreter you can just walk thru the AST
I have a simple C++ like interpreter on my github if you want to check some stuff, its not best code practices probably but it works
thats what I did now, i have a peek function that will look at what's ahead and decide what production rule
after parsing?
okayyy i will
Parsing is the conversion of tokens into AST
i thought parsing was just semantic analysis
You can incorporate it
how hard is ir generation and llvm stuff after making an ast?
don' tworry i 'm not gonna pimplement that 🤣 I am dying tryna parse and build an ast lmao
I mean if you want code to run
You'll need to know variable sizes and so on
Sure
i'll only do ints 💀
Idk how dynamic typing system works tho
Probably easier
Sure
so like... you get tokens for the parser right? How do you build an ast?
you check grammar and then build an ast?
or do you check gratmmar after u build an ast
I see what is the statement I'm parsing
If for example, after variable name comes something thats not a semicolon or =, i return null
(i actually dont return null but set an error code)
No
you tokenize and you store it?
Yeah
and then check after?
Wdym
check if there's ; or = after a variable
I do a semantic check during parsing
through grammar?
For example
if(tokenizer->GetToken().type != SEMICOLON){
errorCode = MISSING_SEMICOLON;
return;
}
I dont have something like a grammar file, just hardcoding checks with ifs
Ik what grammar is in my head so i just make sure the tokens conform to it
ye ye me too... so like i sound like an idiot right now but when is the ast generated?
Parsing
after doing semantical analysis?
Lexing -> convers string into tokens
parsing -> converts tokens into AST
like you analyse if there's an = or ; after a variable, then if it pases you make an ast?
Well, i do it during parsing
i get that but i don't get it fully
You can do it before
cause parsing is such a vague thing it seems
some people say that you check grammar after making a tree, by recursively iterating through it, so it makes me more confused
For example
"1 + 2 * 3"
becomes
binOp:
lhs: 1
op: +
rhs: binOp:
lhs: 2
op: *
rhs: 3
ir:
res = 2 * 3
res += 1
ahh
I recommend just parsing math expressions for starters
Since theyre crucial to any language
And easy to test
yea thats what i'm tryna doooo... i'm jsut so confused ahaha
Could you show the code so i could see if its on github or something
I was also stuck at parsing for a long time
Thought its the hardest shit but then IR generation came
its so bad you'll die... i can show wait
the tokenizer, i followed a codethrough from a youtuber... then i thought "okay parser is the hardest one, i'll learn something" and tried to do it myself...
If you cant make a tokenizer, parser is much harder
Work on getting tomenizer made by urself
Big dislike of the unions and structures like that
We have polymorphism for a reason
tokenizer is easy though...
i was impressed by his c skills
and his didnt work... so i had to change like 40%
Idk why ur using c++ tbh
i like classes
and string in c is wierd, in graphics programming when i'm working with shaders, string in cpp is much better
We have constructors for a reason
No need for Lexer *createLexer()
All your AST structs can derive a AST_Node base structure
And just store a smart pointer or normal one in the parser class
i wanna go barebones in memory with c so i can learn stuff yk
Also your token structure doesnt keep track of line number and character offset, that'll make errors much easier to spot
Then use C
You'll learn bad manners for C++
poiners everywhere, i finally experienced dangling pointers and errors and shit
Language dev is hard, dont make it harder for no reason
i like classes too though
that is true
Use C++ properly with smart pointers
can you make sense of hwat i'm tryna do in the parser.cpp
If you think of C++ as C with classes you might want to relearn cpp
i've uniornically learnt so much in the last 3 weeks tryna make a shitty language...
Well, TOKEN_UNION name doesnt make sense
Token array too
Since it takes a AST node, not tokens
And you're doing alot of typedefing
Which is literally useless in C++
For structs
i was working in neovim in my laptop... and if i didn't typedef, the gcc compiler went crazy
idk why
Since
struct name{};
name object;
Cuz you set it to be C not C++
it's a cpp file though
This is valid in C++
no no by typedefing here i meant what you are saying
No need to do struct T var for creating variables
LEXER_STRUCT{}LEXER_T;
Yeah the part after could be invalid in C++
Man, https://learncpp.com honestly
You have issues with understanding c++
its not just c with classes 😦
okayy i'll change those habits
Do yourself a favor and learn cpp before continuing the project
i mean i have been reading like articles from many programmers who get mad at cpp so i think that also has influenced me
After you learn it somewhat, you will probably opt for rewrite
i can't learn without going dirty with it
Like TOKEN_T* parse_num(LEXER_L* lexer) that can be a method, no reason to have it as a function
Yeah
But hard to read since you mix c and cpp
yeah doing this project i realised why classes are important and stuff cause struct pointing functions are just methods
ahaha im sorry
Grammar rules can be done during parsing
Statement *ParseDecl(Tokenizer &tokenizer){
auto typeTok = tokenizer.NextToken();
if(typeTok.type != TOKTYPE_TYPE) throw;
auto varName = tokenizer.NextToken();
if(varName.type != TOKTYPE_IDENT) throw;
...
return new VarDeclStatment(typeTok, varName,...);
}
so i make an array where the stream of token is stored, then i make another array that basically uses the grammar rules to generate that stream of token... (i named them asts because i knew i was gonna make them into an ast, but dont know when)
So I check the grammar like that, cause in class we glanced over it to learn automatic parsers
i'm tryna do something like that, if u think this is very complicated/stupid please do tell me
In lexer, just have a std::vector of tokens
isn't std vector just arrays but that we can dynamically change the size of?
hhmm so basically we checked to see if the type of token after the variable token is "static type" or "indent" and if it's not, we call a function? In this example what does vardeclstatement do? Does it create a struct/class?
Huh
It creates an instance of struct
Whitespace is skipped
You dont add whitespace to your token lizt
yeah it's not there
now does that struct act a a node of the tree?
how did u implement a tree? does it just have pointers to the left right and root nodes? or did u implement in an array where it's implied where the nodes are thorugh left node = 2i, right node = 2i+1, type of way?
Lol
@worthy citrus Has your question been resolved? If so, type !solved :)
okayy i think my brain is lagging... i'll think more about this later but now
i guess i gotta change my whole datastructure? or make a new datastructure for the tree ?
Like now i'm thinking eithe I just do basic syntax analysis using the shit i've created (like if there's a ; and if there's closed paranthesis after an open one) and incorporate it into the whoel parser by making a tree and do an actual grammar analysis in that?
or stick with this shit and do grammatical analysis using this sludge and make a tree later.... i need fresh air lmao
Rewrite probably for the best