A clone of `wc` unix command | Together C & C++ | Page 1

fresh patrol Jan 17, 2026, 7:44 PM

#

Hello, relatively recently I started coding in C and decided to make a clone of a wc command that I called wcc which just stands for wc clone.
As of now I haven't fully finished it, it can't handle stdin right now and lacks --help and --version arguments, but I plan to finish it soon and maybe make a new thread or re-post it here.

So, it functions like wc:

-l or --lines for line count
-w or --words for word count
-m or --chars for character count
-c or --bytes for byte count

I would like a general feedback/hints on what to pay more attention to.

#

Couldn't post full source code, so here is main.c

📎 main.c

#

Also works only on Linux, haven't tested on Windows and I assume it won't work there.

fresh patrol Jan 17, 2026, 7:49 PM

#

fresh patrol Also works only on Linux, haven't tested on Windows and I assume it won't work t...

Or rather not only on Linux but on any Unix, like FreeBSD, but I haven't tested.

fresh patrol Jan 17, 2026, 7:50 PM

#

fresh patrol Couldn't post full source code, so here is `main.c`

Also I was compiling with gcc:
$ gcc --std=c23 -Wpedantic -Wall -Wextra -Werror -Wvla -fanalyzer -fsanitize=address,undefined,leak -fsanitize-trap=undefined

fresh patrol Jan 17, 2026, 7:51 PM

#

fresh patrol Also I was compiling with gcc: `$ gcc --std=c23 -Wpedantic -Wall -Wextra -Werror...

$ gcc --version
> gcc (GCC) 15.2.1 20260103 ...

fresh patrol Jan 17, 2026, 9:57 PM

#

fresh patrol Hello, relatively recently I started coding in C and decided to make a clone of ...

Now it handles stdin and has both --help and --version arguments

#

📎 main.c

fresh patrol Jan 17, 2026, 11:19 PM

#

fresh patrol

Fixed the bug, where if you invoked wcc -c <any file> it would try to print out word count, instead of a byte count, which would lead to printing 0

#

📎 main.c

fresh patrol Jan 17, 2026, 11:20 PM

#

fresh patrol

For anyone interested, check out print_result(), the if (opt_flags->use_bytes) part

fresh patrol Jan 18, 2026, 3:02 PM

#

fresh patrol

Fixed the bug, where if you tried piping something (e.g echo "Hello, world!" | wcc -lwmc or when reading from stdin it wouldn't count every option, in the example with piping it would only count lines. Reason is pretty simple, terminals don't usually support file positioning requests so you can't rewind(stdin). I fixed this by writing everything fromstdin to a temporary file and then reading from it instead.

#

📎 main.c

fresh patrol Jan 18, 2026, 3:19 PM

#

fresh patrol

Refactored print_result() to use just sprintf() with an offset, instead of a monstrocity that I wrote before

#

📎 main.c

fresh patrol Jan 18, 2026, 7:34 PM

#

fresh patrol

Refactored process_file() function, so that instead of doing 4 passes when using -lwmc options, it does only one, making it much more efficient, and removing need for a temporary file when handling stdin.

#

📎 main.c

fresh patrol Jan 18, 2026, 7:38 PM

#

fresh patrol

Refactored print_result() again, because I'm an idiot and forgot about printf()'s existence.

#

📎 main.c

harsh oar Jan 19, 2026, 10:35 AM

#

Why not use something like GitHub for easier version control ?

fresh patrol Jan 19, 2026, 8:29 PM

#

harsh oar Why not use something like GitHub for easier version control ?

I use git, but I didn't put the project on GitHub, because I didn't see any need in doing so

fresh patrol Jan 20, 2026, 7:05 PM

#

fresh patrol

Decided to document my code, somewhat

#

📎 main.c

#

And this probably will be the final version for now, will be glad to receive feedback for the final iteration of this code.

solemn laurel Jan 23, 2026, 10:27 PM

#

you should check ferror() to find out if your fgetwc() didn't like the input (invalid utf8 for instance) also character may be... a bit odd or off with smileys etc (in particular on windows where it's utf-16 inside)

fresh patrol Jan 24, 2026, 3:31 AM

#

fresh patrol

Refactored some error handling code and added ferror() check in process_file() in case of invalid wide characters, I/O errors

#

📎 main.c

fresh patrol Jan 24, 2026, 3:32 AM

#

solemn laurel you should check ferror() to find out if your fgetwc() didn't like the input (in...

Thanks! Anything else you can add or propose? Also, what can you say about the code in general?

solemn laurel Jan 24, 2026, 3:43 AM

#

on phone now so I can’t see the latest (you should really put it on github or at least a gist)

fresh patrol Jan 24, 2026, 4:29 AM

#

solemn laurel on phone now so I can’t see the latest (you should really put it on github or at...

Hey, sorry for late reply, but I put it on GitHub: https://github.com/HiddenWhistle/wcc

GitHub

GitHub - HiddenWhistle/wcc

Contribute to HiddenWhistle/wcc development by creating an account on GitHub.

solemn laurel Jan 24, 2026, 5:27 AM

#

i don’t get the tempbuf[32]

fresh patrol Jan 24, 2026, 5:28 AM

#

If I don't write to something in wcrtomb() it will always output 1 byte, so (fr->byte_count == fr->char_count) will be true

#

I probably should look more into it, maybe I just missed something but that is a workaround for now

fresh patrol Jan 24, 2026, 5:58 AM

#

Changed the comment inside to (hopefully) better clarify why

solemn laurel Jan 24, 2026, 2:00 PM

#

cool better than hardcoded 32. did you test what happens with non utf8 binaries?

fresh patrol Jan 24, 2026, 2:16 PM

#

Not yet, but I probably should

fresh patrol Jan 25, 2026, 2:10 AM

#

So I decided to test my program on VM with just ASCII locale, and it doesn't work :(

#

Can't test with UTF-16LE/BE, but since I don't support Windows it doesn't really matter I guess?

fresh patrol Jan 25, 2026, 2:13 AM

#

fresh patrol So I decided to test my program on VM with just ASCII locale, and it doesn't wor...

Worked with ISO-8559-1 though

solemn laurel Jan 25, 2026, 2:19 AM

#

(almost) nobody uses iso-latin-1 in 2026

fresh patrol Jan 25, 2026, 2:20 AM

#

I wonder why ISO-8559-1 works fine, but pure ASCII doesn't? Afaik ISO-8559-1 is not a unicode encoding, so shouldn't it give me an error too?

solemn laurel Jan 25, 2026, 2:20 AM

#

and you should define "doesn't work" what actually happens

fresh patrol Jan 25, 2026, 2:20 AM

#

ASCII doesn't support wide operations while ISO does

#

Since my program uses wide operations, it just zeroes everything out and prints an encoding error to stderr

solemn laurel Jan 25, 2026, 2:21 AM

#

if it works for iso-8559-1 it works for ascii because ascii is a subset. unless the input isn't ascii obviously (which is what I was hinting about non valid utf-8 test needed)

#

iso-latin-1 (the simpler name for 8559) does not use wide characters

fresh patrol Jan 25, 2026, 2:22 AM

#

But it supports them, so I can count normally

#

And I can't do the same with ASCII

#

Let me get some screenshots

solemn laurel Jan 25, 2026, 2:22 AM

#

no, you will never have your byte vs char count to not be exactly equal using iso-latin-1

#

also if you print your MB_CUR_MAX it will be 1 for both locales

fresh patrol Jan 25, 2026, 2:27 AM

#

fresh patrol Jan 25, 2026, 2:27 AM

#

solemn laurel no, you will never have your byte vs char count to not be exactly equal using is...

That's not the problem, the problem is that it just doesn't even work

#

using pure ASCII locale

#

Also something weird with my word counting in the second screenshot compared to wc, so I'll check it out too

solemn laurel Jan 25, 2026, 2:59 AM

#

so to repeat if you use ascii and have non ascii you get an error (made you check ferror) like if you had invalid
utf8

and to repeat again too, you can see bytes count char count on the one “working” are exactly == which shows there are no wide chars either in iso-latin-1

fresh patrol Jan 25, 2026, 3:00 AM

#

yes, that I understand that char and byte count on latin-1 would be equal

solemn laurel Jan 25, 2026, 3:00 AM

#

well ur not printing both… but…

fresh patrol Jan 25, 2026, 3:01 AM

#

Let me check real quick then

#

But that should be the case, yes

#

Oh yeah, I forgot that I made a check

#

So it won't even attempt to print out chars

#

Because the count would be equal to bytes

fresh patrol Jan 25, 2026, 3:04 AM

#

solemn laurel so to repeat if you use ascii and have non ascii you get an error (made you chec...

Also no, the text doesn't contain any specific utf-8 stuff like emojis or symbols/etc., so it should print out technically

#

Ohhhh wait, I'm dumb

#

It contains some utf-8 stuff, so that's why it doesn't output

solemn laurel Jan 25, 2026, 3:05 AM

#

fresh patrol ASCII doesn't support wide operations while ISO does

well your test should include both a valid utf8 and invalid ones (like binary files) etc

fresh patrol Jan 25, 2026, 3:06 AM

#

solemn laurel Jan 25, 2026, 3:06 AM

#

and iso latin 1 defines all 256 possible byte value so it accepts everything in a “garbage in garbage out” way

#

the first 3 bytes there is called the BOM

fresh patrol Jan 25, 2026, 3:07 AM

#

solemn laurel well your test should include both a valid utf8 and invalid ones (like binary fi...

True... My testing is all over the place :p

#

So in the end, I didn't catch that the text I was using had some UTF-8 symbols inside that ASCII can't read

#

therefore an error

solemn laurel Jan 25, 2026, 3:14 AM

#

how about you try a utf8 locale and see that you get 3 more bytes than character thanks to the BOM

fresh patrol Jan 25, 2026, 3:18 AM

#

fresh patrol

Ohhhh so that thing in the beggining is BOM

#A clone of `wc` unix command