#Markov Chain Algorithm - Code Review

44 messages · Page 1 of 1 (latest)

desert thorn
#

Could anyone that have already worked with Markov Chains help me improve this code // tell me if it is well written? I started to study this topic and made this algorithm. If someone could help I'd appreciate, thanks.

Quick Explanation:
I'm splitting a file into ngrams. As you might know,

a ngram is a substring of a part of a text.

Example:

  1. An Bi-gram for
"I ate pizza"

would be

["I ", " a", "at", "te", "e ", " p", "pi", "iz", "zz", "za"]
  1. An Tri-gram for
"I ate pizza"

would be

["I a", " at", "ate", "te ", "e p", " pi", "piz", "izz", "zza"]

And so on...

I'm trying to make it abstract so the user of my API would chose what he wants, and then receive a list of ngrams if he calls getNgramList() or receive a hashmap of ngrams and their respective common next ngrams if they call the method mapNgrams()

I'm attaching the output if a test that I've done using Shrek's script.
I'll also send the code as images because I think discord formatting is confusing with bigger codes:

open geyserBOT
#

This post has been reserved for your question.

Hey @desert thorn! Please use /close or the Close Post button above when you're finished. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

lavish owl
#

i fail to see why that amount of code is needed in the first place. building an array of strings from a string like you described is rather trivial

#

i'd just solve it with a sliding window over 0 until n-s, where n is the length of the string and s is the size of the window

marsh iron
#

I mean if the code works, that is what matters the most. My only comment would in the mapNgrams function.

for (int index = 0; index <ngramsInput.size() - n; index++) // Index here is personal prefernce, I'm honestly not sure if i++ / ++i are different in Java, but i++ is what I'm used to.
  String key  = ngamsInput.get(index);
  ngram.ToNgrams.computeIfAbsent(key, () -> new ArrayList<String>()); // (__) isn't used in Java. Empty Suppliers should just be ()
  int nextNgramIndex = index + n;

  List<String> ngramList = ngramToNgrams.get(key);
  if(!ngramList.contains(value)) { // Better to invert your if and remove the continue, since that means the loop is easier to read.
     ngramList.add(value);
  }
}
return ngramToNgrams;
lavish owl
marsh iron
#

I mean ArrayList is going to have similar performance with more functions shrugging , so not the worst decision.

#

And I don't believe you can do Map<String, String[]> You can do that

desert thorn
# lavish owl i fail to see why that amount of code is needed in the first place. building an ...

As I'm working with files like csv's, txt's etc -> There is some thrash characters that I need to remove. Like "!", "?", "-" that sometimes are not necessary. That may add more complexity to my code. The real real loop
(

for (int i = 0; i < content.length() - n; ++i) {
    result
            .add(
                    content.substring(i, i + n)
            );
}

)
Isn't that long. I'm perfoming the same operation that you did. In my case, I'm doing this for my project too, and I know that I will deal with files.

But I understand your point of view and your code too

desert thorn
marsh iron
#
for (int i = 0; i < content.length() - n; ++i) {
    result.add(
                    content.substring(i, i + n)
            );
}

We'd normally also format like this btw. Easier to read when your Object and function are on the same line, but putting the logic on the next line is good.

desert thorn
marsh iron
#

What IDE?

desert thorn
#

JetBrains

marsh iron
#

OHH it's not supplier, it's a function

lavish owl
#

it is, takes in a key to compute the value for

desert thorn
#

Yeah, it's a mapping function

marsh iron
#

Yeah I'd put in Key there, even though it isn't used.

desert thorn
lavish owl
#

it just puts the key in "__"

#

best to just call it "key" and not use it

marsh iron
#

It helps show that the key isn't directly related to the value outside of the map. Which is nice in some cases.

desert thorn
lavish owl
desert thorn
#

but key var is already defined, I'd need to choose a dif name

marsh iron
#

Is that what the IDE tells you

lavish owl
#

no

marsh iron
#

I'm asking him

lavish owl
#

but i know it didnt when he removed the arg

desert thorn
#

There is a key variable that I've declared just to make the code more readable. I'd need to remove that

marsh iron
#

Ah, I see their example in the java doc. Try this.
ngramToNgrams.computeIfAbsent(key, k -> new ArrayList<>());

desert thorn
#

If i wanted the lambda to contain this

#

okay

marsh iron
#

Perfect

#

So yeah, your initial code wasn't bad (it did what it was supposed to do), but a lot of development/feedback is about improving the readability of the code. Your code wasn't super readable (and honestly still isn't, but algorithm implementation is usually a bit harder to read), but this is better.

desert thorn
#

Okay, I'll improve it

#

Thank you for the help @ Crain @ 0x150

open geyserBOT
# desert thorn Thank you for the help @ Crain @ 0x150

If you are finished with your post, please close it.
If you are not, please ignore this message.
Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.

desert thorn
#

The main thing abt this is that I'm going to use this for a random sentence generator and use the HashMap in my favor