Could anyone that have already worked with Markov Chains help me improve this code // tell me if it is well written? I started to study this topic and made this algorithm. If someone could help I'd appreciate, thanks.
Quick Explanation:
I'm splitting a file into ngrams. As you might know,
a ngram is a substring of a part of a text.
Example:
- An
Bi-gramfor
"I ate pizza"
would be
["I ", " a", "at", "te", "e ", " p", "pi", "iz", "zz", "za"]
- An
Tri-gramfor
"I ate pizza"
would be
["I a", " at", "ate", "te ", "e p", " pi", "piz", "izz", "zza"]
And so on...
I'm trying to make it abstract so the user of my API would chose what he wants, and then receive a list of ngrams if he calls getNgramList() or receive a hashmap of ngrams and their respective common next ngrams if they call the method mapNgrams()
I'm attaching the output if a test that I've done using Shrek's script.
I'll also send the code as images because I think discord formatting is confusing with bigger codes:
, so not the worst decision.