#EASY: problem with strtok parsing a line

67 messages · Page 1 of 1 (latest)

worldly patioBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

wraith salmon
#

from what I can see on the docs, there's only one delimiter and that's what strtok searches for.

#

so you'd need to either split up the strtok calls, or just write out an alternate version of strtok yourself.

brazen ravine
#

you can use multiple delimiters in strtok

wraith salmon
#

what are you putting into stdin?

#

sorry, it only showed me examples of it using one

brazen ravine
#

its a file

wraith salmon
#

oh yeah it only gets one token per line

brazen ravine
#

yea

wraith salmon
#

because you put NULL into the input str at the bottom of the second while loop

#

shouldn't you put something like strlen(token) + 1 + line into the first argument of the second strtok()?

brazen ravine
#

ill check

#

didn't work

#

NULL gets the next token i thought

wraith salmon
#

wdym lol NULL = no string

#

it doesn't internally store your string iirc

#

oh nvm it does

#

wtf

brazen ravine
#

yea

#

every source online uses NULL

#

i think im placing it in the wrong spot

wraith salmon
#

no it's in the right place

#

it could be that it mallocs the token so that the token you're modifying is actually not really the original string

brazen ravine
#

wat

wraith salmon
#

when you're lowercasing the token string

#

it isn't modifying the line string

#

also i would really just split that into a separate function

brazen ravine
#

ok

wraith salmon
#

if it helps at all, this would print out every word like thing:```c
#include <stdio.h>
#include <ctype.h>
int main() {
char str[] = "the fox is very fast";

int start = -1;
int end = 0;
for(int i = 0; str[i]; i ++) {
if(start < 0 && isalnum(str[i]) || str[i] == '')
start = i;
else if(isalnum(str[i]) || str[i] == '
') end = i;
else {
printf("%.*s\n", end - start + 1, str + start);
start = -1;
}
}
if(start > -1) printf("%s\n", str + start);
}

#

in a line

#

if you want to use something like this, i think it would do the same tihng?

brazen ravine
#

would this work with an array that im using for my keywords i need to search the file for

wraith salmon
#

umm do you want me to try to adapt it to your program?

brazen ravine
#

nah

wraith salmon
#

but yeah it very much would

#

do you get how it works? it just keeps up a slice and then it sees if theer's an unfinished word on the second to last line

brazen ravine
#

ill try to see if i can use some of this

wraith salmon
#

im pretty sure this is faster than using strtok or wahtever too, and completely threadsafe

brazen ravine
#

an else if strcmp with the keyword array would work right

wraith salmon
#

put it inside the else ye

#

and at the end

brazen ravine
#

when it prints to stdout some of the words are printed twice & some of them moved onto new lines

wraith salmon
#

can you post the code?

brazen ravine
#
  while (fgets(line, sizeof(line), stdin))
   {
      int start = -1;
      int end = 0;
      for (int i = 0; line[i]; i++)
      {
         if (start < 0 && isalnum(line[i]) || line[i] == '_')
            start = i;
         else if (isalnum(line[i]) || line[i] == '_')
            end = i;
         else
         {
            printf("%.*s\n", end - start + 1, line + start);
            start = -1;
         }
      }
      if (start > -1)
      {
         printf("%s\n", line + start);
      }
   }
real torrent
#

The issue is with the printf("%s\n", line); statement at the end of the while loop. It is printing the original line variable which only contains the first word (the one that was tokenized and potentially modified) and not the entire line with all the modified words. To fix this, you can replace the line variable with a new variable, such as output, and concatenate each modified token to it before printing. Then at the end of each iteration of the while loop, you should reset the output variable to an empty string, so that the next line is printed correctly.

craggy pivot
#

try this

while (fgets(line, sizeof(line), stdin))
   {
      for (i = 0; i < sizeof(keywords) / sizeof(keywords[0]); i++)
      {
         if (strstr(line, keywords[i]) != NULL)
         {
            for (j = 0; j < strlen(line); j++)
            {
               line[j] = tolower(line[j]);
            }
            break;
         }
      }
      printf("%s", line);
   }
wraith salmon
craggy pivot
#

Here is an updated version of the code that will replace multiple keywords at once:

while (fgets(line, sizeof(line), stdin))
   {
      for (i = 0; i < sizeof(keywords) / sizeof(keywords[0]); i++)
      {
         char *keyword = keywords[i];
         int keyword_len = strlen(keyword);
         char *found = strstr(line, keyword);
         while (found != NULL)
         {
            for (j = 0; j < keyword_len; j++)
            {
               found[j] = tolower(found[j]);
            }
            found = strstr(found + keyword_len, keyword);
         }
      }
      printf("%s", line);
   }

#

In this version, we are using the strstr() function to check if any of the keywords are present in the line, if they are we are then converting the keyword to lowercase using the tolower() function. We are using while loop to find all occurrence of keyword in the line and replace it with lowercase.

wraith salmon
#

also, i don't know if that code is very performant

#

since you're looping through every keyword every line

#

you could just remove the while fgets though

brazen ravine
wraith salmon
#

yeah that's also another downside

wraith salmon
#

ok lemme fix it

craggy pivot
#

!format

worldly patioBOT
#

In this version, I have added a check before converting the keyword to lowercase. The check is to verify that the keyword is a standalone word and not a part of another word. We are checking if the keyword is at the start of the line or if the keyword is at the end of the line. If these conditions are satisfied, we are converting the keyword to lowercase.
This way we can avoid partial word matching and also keep it done using strcmp.


while (fgets(line, sizeof(line), stdin)) {
  for (i = 0; i < sizeof(keywords) / sizeof(keywords[0]); i++) {
    char* keyword = keywords[i];
    int keyword_len = strlen(keyword);
    char* found = line;
    while ((found = strstr(found, keyword)) != NULL) {
      int start = found - line;
      int end = start + keyword_len;
      if ((start == 0 || !isalpha(line[start - 1])) && (!isalpha(line[end]))) {
        for (j = 0; j < keyword_len; j++) {
          found[j] = tolower(found[j]);
        }
      }
      found += keyword_len;
    }
  }
  printf("%s", line);
}

whiteh4cker
wraith salmon
#

  char* cur;
  while ((cur = fgets(line, sizeof(line), stdin)))
  {
      int start = -1;
      int end = 0;
      for (int i = 0; cur[i]; i++)
      {
         if (start < 0 && isalnum(cur[i]) || cur[i] == '_')
            start = i;
         else if (isalnum(cur[i]) || cur[i] == '_')
            end = i;
         else
         {
            printf("%.*s\n", end - start + 1, cur + start);
            start = -1;
         }
      }
      if (start > -1)
      {
         printf("%s\n", cur + start);
      }
  }
#

@brazen ravine

brazen ravine
#

thanks for your help @wraith salmon @craggy pivot @real torrent

craggy pivot
#

you are welcome

wraith salmon
#

gl bro

brazen ravine
#

!solved

worldly patioBOT
#

Thank you and let us know if you have any more questions!