Hey, I'm trying to read a text file and get the length of the text (amount of characters) using ftell. I do know newline is represented with 2 characters on windows and adjusted the length accordingly. My problem is with "encoded" code. For example, for a file containing the text "abcd" ftell will return 4. For a file containing the encoded text ")ְ" ftell will also return 4, but those are 3 characters. I do not know how to accommodate this, or why this is occurring. each character is represented by 8 bits and the encryption was 1:1 the amount of bytes. I tried to use both "r" and "rb" while opening the file and got the same results. Would appreciate any help and insight on that matter, thanks!
#Text length inconsistency reading from file.
75 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.
because the number of characters isnt the number of characters
not every character is 1:1
what you're likely using here is utf8 and so some characters use multiple individual bytes to encode them
also its encoding not encrypting they are different
;compile
std::cout << sizeof(")ְ");
In order to properly count UTF-8 characters you would need to read through the entirety of the file.
Program Output
5
veeloxfire | 41ms | c++ | x86-64 gcc 14.2 | godbolt.org
see its 4 chars (+ 1 for the null)
How can I approach this problem? Let me give one more example. For the text: "A!5GFs/" ftell will return 7. Those are 7 characters represented by 7 bytes. after encrypting them using the attached encryption (which is a byte for byte - so the amount of bytes will stay the same), the result will be "A!2§C׃/". ftell will return 9 for that text. How is that possible?
your implementation must be incorrect then because if you aren't adding extra bytes its impossible to add extra bytes
send your implementation
This is the encryption algorithm:
int enc(const unsigned char *data_in, unsigned int size_in, unsigned char *data_out, unsigned int size_out) {
if ((data_in == NULL) || (data_out == NULL))
return ERR_NULL_PTR;
int j, i, bit1, bit2;
char a1[6], a2[4], b1[4], b2[6];
a1[5] = '\0';
a2[3] = '\0';
b1[3] = '\0';
b2[5] = '\0';
int odd = size_in % 2;
printf("Size going in is %d", size_in);
char new_a[9], new_b[9];
for (i = 0; i < (int)size_in; i = i + 2) {
if ((odd == 1) && (i == size_in - 1)) {
data_out[i] = data_in[i];
break;
}
for (j = 0; j < 8; j++) {
if ((j >= 0) && (j < 5))
a1[j] = (char)!!((data_in[i] << j) & 0x80) + 48;
if ((j >= 5) && (j < 8))
a2[j - 5] = (char)!!((data_in[i] << j) & 0x80) + 48;
if ((j >= 0) && (j < 3))
b1[j] = (char)!!((data_in[i + 1] << j) & 0x80) + 48;
if ((j >= 3) && (j < 8))
b2[j - 3] = (char)!!((data_in[i + 1] << j) & 0x80) + 48;
}
strcpy(new_a, a1);
strcat(new_a, b1);
strcpy(new_b, a2);
strcat(new_b, b2);
bit1 = strtol(new_a, NULL, 2);
bit2 = strtol(new_b, NULL, 2);
data_out[i] = (unsigned char)bit1;
data_out[i+1] = (unsigned char)bit2;
}
return OK;```
strcpy(new_a, a1);
strcat(new_a, b1);
strcpy(new_b, a2);
strcat(new_b, b2);
bit1 = strtol(new_a, NULL, 2);
bit2 = strtol(new_b, NULL, 2);
```any reason you're using these?
I was able to verify the encryption with the text:
IT - This is my sentence. i like to moove it moove it and if you can't move it i don't know!
OK this sitting is unaccatple if i like to moove it! can you moove-it? not a single
MOoVE. or re-moOvE ok? am i (moove) or mooveing?
Done!it
which was correctly encrypted to:
J4!
"k q`k3#
y sekװc®ce)ְi k‰ke#iאk¯kצa k4#
kןsֵ# q€c.a€k&#kץ#c.#פ#
kצa k4# #kמ#פ#kֿqב
OI`sˆk3#k4s‰kַ# q`s®c#cask…# aְi k‰ke#iאk¯kצa k4! caiְ{/q k¯kצak49אkֿq€a sikַk…
MKןRֵ)ְkע#ak¯KצA kכ9אc-# !k¯kצa©#q@k¯kצc©kַ8ךCkֵ#)p
seems like an incredibly easy way to accidentally add more characters on to the end of your strings
I used those to form the new bytes, combining the 5 MSB of the first byte with with the 3 MSB of the second byte for the encrypted first byte for example
ah wait I misread nevermind
I can see how you would get 1 extra character
for (i = 0; i < (int)size_in; i = i + 2) {
// ...
data_out[i] = (unsigned char)bit1;
data_out[i+1] = (unsigned char)bit2;
if size is an odd number then i + 1 will add an extra character at the end
which might overwrite a null terminator I suppose
and then you accidentlly get n new characters
I think this segment of my code should cover this corner case:
for (i = 0; i < (int)size_in; i = i + 2) {
if ((odd == 1) && (i == size_in - 1)) {
data_out[i] = data_in[i];
break;
}
does it not?
If I'm currently on the last byte and size is odd, the last byte is copied unchanged and the break jumps out of the loop
ah I didnt see that
welp yeah I mean looks fine
are you expecting the output to be null terminated?
and is it zero'd to start?
I don't think I understand the question, can you ask again please? 😅
The output is the encrypted text, the memory was allocated with malloc according to the length of the text (obtained with ftell)
but how is it output?
also ftell isn't always that reliable about the size of files but that probably fine here
If the function enc return "OK", data_out will be written to a text file
Honestly I assumed ftell was the problem because it returned a value higher than expected
the strings that you sent are that length
Let me attach the code:
Memory allocation:
int allocate_buffer(void **buf, unsigned int buf_size) {
// TODO
if (buf == NULL)
return ERR_NULL_PTR;
*buf = (unsigned char*)malloc((buf_size) * sizeof(unsigned char));
if (*buf == NULL) {
return ERR_MEMORY;
}
return OK;
}
reading from input
int load_data_from_file(const char *input_file_path, unsigned char **buf,
unsigned int *buf_size) {
// TODO
if ((buf == NULL) || (input_file_path == NULL))
return ERR_NULL_PTR;
char c;
FILE* fp = fopen(input_file_path, "r");
if (fp == NULL)
return ERR_FILE;
for (int i = 0; i < (int)*buf_size; i++) {
fscanf(fp, "%c", &c);
(*buf)[i] = (unsigned char)c;
}
fclose(fp);
return OK;
}
writing to output:
int write_data_to_file(const char *output_file_path, const unsigned char *buf,
unsigned int buf_size) {
// TODO
if ((buf == NULL) || (output_file_path == NULL))
return ERR_NULL_PTR;
FILE* fp = fopen(output_file_path, "w");
if (fp == NULL)
return ERR_FILE;
for (int i = 0; i < (int)buf_size; i++) {
fprintf(fp, "%c", (char)buf[i]);
}
fclose(fp);
return OK;
}
so Im more suspicious of how data is getting into the fiel
Hopefully I understood your question?
Okay what
no no no thats not how you write data to files
look into fwrite and fread
you're not just writing strings you're writing arbitrary data
and trying to pass that through the character apis might add in extra encoding characters for locals and things
how you print a character is not always the same as the character itself
I tried to use fread and fwrite but didn't get the right results. Should I use a loop with those?
no you just need 1 function
I tried to make sure the buffer is unsigned char so the actual bytes values stay the same
its not that, its that a char value may be different when printed
I believe
regardless this is a terribly inefficient way to output
fwrite(buf, 1, buf_size, fp) something like that
and fread for reading in
Though for this text the output is right for a fact. So I assume the problem is with multi bytes characters?
idk but what you're doing is definitely wrong
I'll try to implement my functions with fread and fwrite, will update soon whether I get the expected results or not. Thank you!
It could also possible be ftell ... I think it unlikely. But you should probably switch to an alternative, since you actually don't want to be using raw r and w you want to be using rb and wb since you're not working with characters
and then ftell would be unreliable
The alternative here is probably fstat (or _fstat in windows) to get the actual size of the file
I changed my functions of reading and writing to the following:
int write_data_to_file(const char *output_file_path, const unsigned char *buf,
unsigned int buf_size) {
// TODO
if ((buf == NULL) || (output_file_path == NULL))
return ERR_NULL_PTR;
FILE* fp = fopen(output_file_path, "w");
if (fp == NULL)
return ERR_FILE;
fwrite(buf, sizeof(unsigned char), buf_size, fp);
fclose(fp);
return OK;
}
int load_data_from_file(const char *input_file_path, unsigned char **buf,
unsigned int *buf_size) {
// TODO
if ((buf == NULL) || (input_file_path == NULL))
return ERR_NULL_PTR;
FILE* fp = fopen(input_file_path, "r");
if (fp == NULL)
return ERR_FILE;
fread(*buf, sizeof(unsigned char), *buf_size, fp);
fclose(fp);
return OK;
}
The results are sadly the same, but this is definitely more efficient
I'll try that as well
there actually is no standard c way to determine the size of a file
which is a problem
but fstat is good enough
the problem is that with r or w the result has no meaning other than as a parameter to ftell (its not the actual offset)
but in rb and wb you can't seek to the end
It seems that for fstat I need to include the following:
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
Which are not allowed in the task I'm doing
I don't mind using "r" and "w" instead of "rb" and "wb", its just that I saw the same results for both of them
@gaunt forge Has your question been resolved? If so, type !solved :)
the problem is that you can't use r or w to detemrine the size of a file because the ftell result isnt a size
but you also can't use rb or wb because you can't fseek to the end to get ftell to tell you the size
generally the only way with these is to allocate as you read
so you allocate a block, then try to read, then if you use up your space you reallocate it to be bigger, etc
I see. I'll try to wrestle with it a bit longer and hopefully I'll figure it out. Thank you for all your help! I appreciate it 🙏