Markov Chain Algorithm - Code Review | Java Community | Help. Code. Learn. | Page 1

desert thorn Jun 17, 2023, 3:04 PM

#

Could anyone that have already worked with Markov Chains help me improve this code // tell me if it is well written? I started to study this topic and made this algorithm. If someone could help I'd appreciate, thanks.

Quick Explanation:
I'm splitting a file into ngrams. As you might know,

a ngram is a substring of a part of a text.

Example:

An Bi-gram for

"I ate pizza"

would be

["I ", " a", "at", "te", "e ", " p", "pi", "iz", "zz", "za"]

An Tri-gram for

"I ate pizza"

would be

["I a", " at", "ate", "te ", "e p", " pi", "piz", "izz", "zza"]

And so on...

I'm trying to make it abstract so the user of my API would chose what he wants, and then receive a list of ngrams if he calls getNgramList() or receive a hashmap of ngrams and their respective common next ngrams if they call the method mapNgrams()

I'm attaching the output if a test that I've done using Shrek's script.
I'll also send the code as images because I think discord formatting is confusing with bigger codes:

📎 output.txt

open geyserBOT Jun 17, 2023, 3:04 PM

#

⌛ This post has been reserved for your question.

Hey @desert thorn! Please use /close or the Close Post button above when you're finished. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

lavish owl Jun 17, 2023, 3:10 PM

#

i fail to see why that amount of code is needed in the first place. building an array of strings from a string like you described is rather trivial

#

i'd just solve it with a sliding window over 0 until n-s, where n is the length of the string and s is the size of the window

#

marsh iron Jun 17, 2023, 3:15 PM

#

I mean if the code works, that is what matters the most. My only comment would in the mapNgrams function.

for (int index = 0; index <ngramsInput.size() - n; index++) // Index here is personal prefernce, I'm honestly not sure if i++ / ++i are different in Java, but i++ is what I'm used to.
  String key  = ngamsInput.get(index);
  ngram.ToNgrams.computeIfAbsent(key, () -> new ArrayList<String>()); // (__) isn't used in Java. Empty Suppliers should just be ()
  int nextNgramIndex = index + n;

  List<String> ngramList = ngramToNgrams.get(key);
  if(!ngramList.contains(value)) { // Better to invert your if and remove the continue, since that means the loop is easier to read.
     ngramList.add(value);
  }
}
return ngramToNgrams;

lavish owl Jun 17, 2023, 3:16 PM

#

lavish owl i'd just solve it with a sliding window over 0 until n-s, where n is the length ...

looking at your code a bit more, you did exactly that. but why read the file in that way, and why use a list if an array is entirely within reason..?

marsh iron Jun 17, 2023, 3:17 PM

#

I mean ArrayList is going to have similar performance with more functions shrugging , so not the worst decision.

#

~~And I don't believe you can do Map<String, String[]>~~ You can do that

desert thorn Jun 17, 2023, 3:20 PM

#

lavish owl i fail to see why that amount of code is needed in the first place. building an ...

As I'm working with files like csv's, txt's etc -> There is some thrash characters that I need to remove. Like "!", "?", "-" that sometimes are not necessary. That may add more complexity to my code. The real real loop
(

for (int i = 0; i < content.length() - n; ++i) {
    result
            .add(
                    content.substring(i, i + n)
            );
}

)
Isn't that long. I'm perfoming the same operation that you did. In my case, I'm doing this for my project too, and I know that I will deal with files.

But I understand your point of view and your code too

desert thorn Jun 17, 2023, 3:21 PM

#

lavish owl looking at your code a bit more, you did exactly that. but why read the file in ...

yeah, you got me in the list. I'm just practicing DS and thought it was a good idea to use a list

marsh iron Jun 17, 2023, 3:21 PM

#

for (int i = 0; i < content.length() - n; ++i) {
    result.add(
                    content.substring(i, i + n)
            );
}

We'd normally also format like this btw. Easier to read when your Object and function are on the same line, but putting the logic on the next line is good.

desert thorn Jun 17, 2023, 3:21 PM

#

marsh iron I mean if the code works, that is what matters the most. My only comment would i...

I tried to remove the (__) but the IDE complained

marsh iron Jun 17, 2023, 3:22 PM

#

What IDE?

desert thorn Jun 17, 2023, 3:22 PM

#

JetBrains

marsh iron Jun 17, 2023, 3:22 PM

#

OHH it's not supplier, it's a function

lavish owl Jun 17, 2023, 3:22 PM

#

it is, takes in a key to compute the value for

desert thorn Jun 17, 2023, 3:22 PM

#

Yeah, it's a mapping function

marsh iron Jun 17, 2023, 3:23 PM

#

Yeah I'd put in Key there, even though it isn't used.

desert thorn Jun 17, 2023, 3:23 PM

#

marsh iron I mean if the code works, that is what matters the most. My only comment would i...

if(!ngramList.contains(value)) { // Better to invert your if and remove the continue, since that means the loop is easier to read.
     ngramList.add(value);
  }

Got it

lavish owl Jun 17, 2023, 3:23 PM

#

desert thorn JetBrains

to make sure you know that __ doesn't do anything special: __ doesn't do anything special

#

it just puts the key in "__"

#

best to just call it "key" and not use it

marsh iron Jun 17, 2023, 3:24 PM

#

It helps show that the key isn't directly related to the value outside of the map. Which is nice in some cases.

desert thorn Jun 17, 2023, 3:24 PM

#

lavish owl best to just call it "key" and not use it

Wdym?

lavish owl Jun 17, 2023, 3:24 PM

#

desert thorn Wdym?

(key) -> new ArrayList<>()

desert thorn Jun 17, 2023, 3:25 PM

#

but key var is already defined, I'd need to choose a dif name

marsh iron Jun 17, 2023, 3:25 PM

#

Is that what the IDE tells you

lavish owl Jun 17, 2023, 3:25 PM

#

no

marsh iron Jun 17, 2023, 3:25 PM

#

I'm asking him

lavish owl Jun 17, 2023, 3:25 PM

#

but i know it didnt when he removed the arg

desert thorn Jun 17, 2023, 3:25 PM

#

#

There is a key variable that I've declared just to make the code more readable. I'd need to remove that

marsh iron Jun 17, 2023, 3:27 PM

#

Ah, I see their example in the java doc. Try this.
ngramToNgrams.computeIfAbsent(key, k -> new ArrayList<>());

desert thorn Jun 17, 2023, 3:27 PM

#

If i wanted the lambda to contain this

#

okay

desert thorn Jun 17, 2023, 3:27 PM

#

marsh iron Ah, I see their example in the java doc. Try this. `ngramToNgrams.computeIfAbsen...

Worked 😉

marsh iron Jun 17, 2023, 3:28 PM

#

Perfect

#

So yeah, your initial code wasn't bad (it did what it was supposed to do), but a lot of development/feedback is about improving the readability of the code. Your code wasn't super readable (and honestly still isn't, but algorithm implementation is usually a bit harder to read), but this is better.

desert thorn Jun 17, 2023, 3:30 PM

#

Okay, I'll improve it

#

Thank you for the help @ Crain @ 0x150

open geyserBOT Jun 17, 2023, 3:30 PM

#

desert thorn Thank you for the help @ Crain @ 0x150

If you are finished with your post, please close it.
If you are not, please ignore this message.
Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.

desert thorn Jun 17, 2023, 3:33 PM

#

The main thing abt this is that I'm going to use this for a random sentence generator and use the HashMap in my favor

#Markov Chain Algorithm - Code Review