#Program not counting word occurrence correctly, just incrementing, why?

140 messages ยท Page 1 of 1 (latest)

daring bladeBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

river bay
#

Program not counting word occurrence correctly, just incrementing, why?

twin drum
#

!f

charred kilnBOT
#

@river bay's code (missing deletion permissions), requested by @twin drum:
Hi, I am trying to count the occurrence of words in a string. It should take in a string, count the word, and say how many times each word is input. My string is delimited by the whitespace and then added to a vector.

my approach was to start the vector at 0 outside of a for loop. then, i would loop through the size of the vector. then, if vector[0] equals vector[i], increment a counter. then at the end of the loop vector[0] would increment to vector[1]. I thought this would work, but it just increments every time from 1 to the end (1 to 10 for example). can someone look at my code and help me understand why it is just incrementing, and not adding the numbers correctly? i'll post it all below

int count = 0;
int inc_value = 0;

for (string word; ps >> word;)
    vs.push_back(word);
vs[inc_value];

while (inc_value < vs.size()) {
    for (int i = 0; i < vs.size(); ++i) {
        if (vs[inc_value] == vs[i]) {
            count++;
        }

        else {
            count = 0;
        }
        cout << vs[inc_value] << "    " << count << '\n';
        inc_value++;
    }
}```

the output is:

hello 1
this 2
is 3
a 4
test 5
twin drum
#

that's the output, ok, but for what input? what is ps

#

you're incrementing inc_value in the inner/second for loop, so by the time that for loop completes the first time, your program exits, the while loop may as well not exist

river bay
#

the input is just "hello this is a test". ps is something else entirely for checking whitespace and taking in files etc. basically the ps is just user input, sorry for that confusion

twin drum
#

so yes, count is always incremented

river bay
#

ahhh okay, that was what i was afraid of was happening. my plan was for the whole for loop to go over inc_value[0], then once the for loop is done with that it would go over inc_value[1], but that is definitely not happening. how would you recommend I go about changing that? or should i approach it entirely different ?

twin drum
#

do you have restrictions? like is this for school where you're not allowed to use various things?

#

ranges is C++20 tho

river bay
#

yea right now im on c++11 i think, and I can only use basic stuff unfortunately yeah

twin drum
#

then move your inc_val++ outside the for loop, but still inside the while loop

river bay
#

i was gonna put it in a map and do it that way, but i have to stick with code and act like i've never coded before

river bay
#

outside the for loop, but inside the while loop.... got it. ill give it a go and see what's going on. thank you so far

twin drum
#

although, technically this probably isn't 100% what you need

#

hello world hello would end up giving a count of 4

#

because you'll "count the occurrences of hello" twice

#

and there are two occurrences, so two times two is four (last I checked, in non-complex Euclidian geometry at least)

#

so really, in fact, without std::count yes a map is probably the appropriate tool here

#

without a map, you'll need to keep track of not only the count of a word, but if you have seen that word already, you need to keep track of that, so you don't recount the word a second time when that word occurs later

#

or really maybe just a std::set

#

but you could use a std::vector to store the words you have already "seen"

#

it's just a bit more cumbersome to answer the question "have I already seen this word?" by that method

#

it appears you're at least allowed to use std::vector so... ๐Ÿคท

river bay
#

alright well i took the cout and the inv_value outside of the for loop but still in the while loop, and it's giving me this output

#

hello 1
world 0
hello 1

twin drum
#

well, your logic is also still a bit off in several various ways

#

when you see "world" when you're handling "hello" you will reset the count of "hello" to zero because hello != world

#

but I assume that's not what you want

river bay
#

yeah definitely not haha. i didn't think of it until i saw the hello world hello function that way. i'm thinking your idea of a vector would work well. but wouldn't the problem of counting the occurrence still be around? there would be multiple of the same word in a vector and they would be out of order and all over in the vector

twin drum
#

no

#

you search the vector for the current word

#

if the word is in the "seen" vector, you simply do nothing and continue to the next word

#

otherwise you add the current word to the list of "seen" words

#

using a std::set would do this for you basically

#

without std::set you'll need to search your "seen" words yourself

river bay
#

ah okay. well in that case i'll bend the rules and use set haha.

twin drum
#

I mean, you actually still need to consult the set

#

to see if the word is yet "seen" or not

#

it's just a bit simpler to do that with a std::set than by hand with a vector

river bay
#

okay. i like that idea. so generally, i can use set to see if the current word has already been seen. and if it has i increment a counter? and if it hasnt been seen I increment the counter to 1?

twin drum
#

also, on a different note, are you licensed? if not that's piracy and bancat yamikek

river bay
#

HUHH licensed?

twin drum
twin drum
river bay
#

oh lmao i am still within the 30 day trial period flooshed

river bay
twin drum
#

conceptually, from a high level perspective, you need to answer the question "have I counted this word already"

#

std::set or std::map could both be reasonable tools for doing this

#

probably better to use std::unordered_set or std::unordered_map tho

#

also, on a more related but also different note, do you know how to use a debugger?

daring bladeBOT
#

@river bay Has your question been resolved? If so, run !solved :)

twin drum
river bay
twin drum
#

ok

#

I like Sublime Text, I think ST is one of the best editors available

#

but I would likely recommend you use Visual Studio (the IDE) instead

#

I rather doubt ST has a debugger? but idk, it's been years since I used ST

twin drum
#

Visual Studio does have a debugger, and you shouldn't need much study to use it

#

set a breakpoint, click the green arrow, and you're basically done

river bay
#

oh interesting. it seems like a debugger would come in handy lol. after i finish this assignment i'll have to look into it.

#

i ended up getting a set working instead of a vector. and when i do cout, it doesn't output duplicates. so that would definitely help with outputting a certain amount. so now i am just stumped on actually counting the occurrence

#

im kinda thinking like i was when i had the vector. maybe i can sort the set, then compare the current with the next? if it is the same, then count++. if not, i start the counter back to 1. then i can sort it by most frequent?

twin drum
#

slightly better example

#

note that Visual Studio (the IDE) is not the same as VSCode

#

sadly, MS appears intent on confusing people into believing they are the same, but they are not

#

VS is a full IDE, it comes with a compiler, a linker, the standard library and a debugger

#

VSCode is mostly just a good "plain text editor" similar to ST in that way

#

though VSCode does actually have a debugger, but then you'll need to learn VSCode, and you will probably have a much easier time if you just use actual VS instead, actual VS is free

#

make sure this checkbox โ˜๏ธ is checked when you install VS, and that should be all you need to get started

#

create a blank console app C++ project โ˜๏ธ and you're off to the races

river bay
#

oh wow okay awesome, i'll definitely be sure to look into this when I get this done. thanks for all the pointers with vs

twin drum
#

well, I suppose, an important question...

#

are you on Windows?

river bay
#

yeah i am lol

twin drum
#

ok, then all is well, VS only really works on Windows, but if you are on Windows, then VS is widely considered one of the best IDEs on any OS

#

I prefer the "plain text editor" approach too, like Sublime Text (or in my case vim) but realistically life will probably be much simpler, faster, easier for you if you just use VS the IDE for now

river bay
#

okay, sounds great! thank you :)

#

haha i do have a question though... it might be kind of a silly one

twin drum
#

only one way to find out ๐Ÿ˜‰

river bay
#

so i'm noticing that the set outputs the words in alphabetical order. this seemed fine when my idea was to compare the current word to the next word, but in the set they are not automatically alphabetized! i looked online to see if there was a non convoluted way to sort it before outputting it, but i can't find one. do you have any ideas? i'll give you my new code i wrote for clarity

twin drum
#

yes that's a std::set

#

and hence the existence of std::unordered_set

river bay
#
set<string> words;
    for (string word; ps>>word;)

        words.insert(word);
int i = 0;

        for (string str : words) {

            if (str[i] == str[i + 1]) {
                count ++;
            }
            else {
                count = 1;
            }
            cout << str << "    " << count << '\n';
            i++;
        }```
twin drum
#

I see

#

you would still use the original code you had

#

the suggestion is not to replace vector with set

#

the suggestion is to also add some code to track the answer to the question "have I counted this word yet?"

#

a set is a natural and reasonably performant way to do that

#

if you don't need order (which you don't, you only need "have I seen this?"), then std::unordered_set is likely to be somewhat more performant than std::set because constant amortized time complexity

#

and in theory, de-facto O(1) time lookups

river bay
#

ah man, i thought i was getting somewhere with this one lol. so i go back to the vector thing and add a set. i loop through the vector, and add each word to the set as i go along. then, if the word already exists in set, i just increment a counter? is that seeming somewhat correct?

twin drum
twin drum
#

have a try and show us what you come up with

river bay
#

okay ill give it a go lol

twin drum
#

and O(log n) is still technically worse than O(1)

#

quite good in its own right, but still quite a lot worse than constant time

river bay
#

ah, i am sad to admit that what i thought would work did not in fact

#

i'll show you what i did and explain my thoughts

#
set<string> words;
    vector<string> vs;
    for (word; ps>>word;)
        vs.push_back(word);    // read words
vs[inc_value];

    while (inc_value < vs.size()) {

        for (int i=0; i<vs.size(); ++i) {

            //adding each word to the set
            words.insert(vs[i]);

            //if the vector is already in the set
            if (words.find(vs[inc_value]) != words.end()) {

                //increment counter
                count++;
            }

            else {

                //otherwise reset it
                count = 1;
            }  
        //outputting 
        cout << vs[inc_value] << "    " << count << '\n';
        inc_value++; 
        }
    }```
#

so im having it loop through the size of the words typed just like before

#

and each loop the word gets added to the set

#

then, if the set finds the current iteration of the string, it will increment the counter, otherwise reset it with 1

#

then, it outputs those results

#

what happens is this still:

#

hello 1
world 2
hello 3

#

im not sure why it keeps incrementing. i guess it could be because inc_value is set to 0, and the for loop starts at zero. so it makes me wonder if i need to sort the vector

#

but maybe that wouldn't fix it, im not sure anymore haha im at a loss

twin drum
#

well, if the word is already in the set, you should just skip the inner for loop entirely

#

it means you have "already counted" this word before

#

the inner for loop counts occurences

#

if you have already counted occurrences of X, you don't want to count X a second time, or your count will be double the actual/real count

charred kilnBOT
#

@river bay's code (missing deletion permissions), requested by @twin drum:

set<string> words;
vector<string> vs;
for (word; ps >> word;)
    vs.push_back(word);  // read words
vs[inc_value];

while (inc_value < vs.size()) {
    for (int i = 0; i < vs.size(); ++i) {
        // adding each word to the set
        words.insert(vs[i]);

        // if the vector is already in the set
        if (words.find(vs[inc_value]) != words.end()) {
            // increment counter
            count++;
        }

        else {
            // otherwise reset it
            count = 1;
        }
        // outputting
        cout << vs[inc_value] << "    " << count << '\n';
        inc_value++;
    }
}```
river bay
#

ah, so looking if a word is already in a set should not even be in the for loop? interesting okay. then what would be in the for loop if not that?

twin drum
#

checking for and counting matches

#

which you are doing, and that's fine

#

your count reset is still (probably) wrong though

#

;compile

#include <iostream>
#include <vector>
#include <string>

using namespace std;

int main() {
  vector<string> words{"apple", "banana", "apple"};

  int inc_val = 0;

  while (inc_val < 3) {
    for (int i = 0; i < 3; i++) {
      cout << words[inc_val] << " " << words[i] << endl;
    }

    inc_val++;
  }
}
modest reefBOT
#
Program Output
apple apple
apple banana
apple apple
banana apple
banana banana
banana apple
apple apple
apple banana
apple apple
twin drum
#

consider for a moment what your loops are actually doing when you put them together like this โ˜๏ธ

river bay
#

ohhh wow okay, yeah i see that my while loop and for loop are causing me issues

twin drum
#

I think maybe you are causing you issues ๐Ÿ˜›

#

if you don't want to use a debugger, then cout all the things so you can see what's really going on, step by step

#

delete the extraneous couts once you get it working

river bay
#

okay, i'll go ahead and do that. but first im gonna take a little break, lol im starting to get a little annoyed with the issues im causing myself and im getting sleepy but i'll come back to it after a little bit. thank you for the help so far. it has definitely been helpful so far figuring out what's wrong and why it is wrong

twin drum
#

rest is good, I dare suggest required. good luck.

daring bladeBOT
#

This question thread is being automatically closed. If your question is not answered feel free to bump the post or re-ask. Take a look at !howto ask for tips on improving your question.

daring bladeBOT
#

This question thread is being automatically closed. If your question is not answered feel free to bump the post or re-ask. Take a look at !howto ask for tips on improving your question.