#Text length inconsistency reading from file.

75 messages · Page 1 of 1 (latest)

gaunt forge
#

Hey, I'm trying to read a text file and get the length of the text (amount of characters) using ftell. I do know newline is represented with 2 characters on windows and adjusted the length accordingly. My problem is with "encoded" code. For example, for a file containing the text "abcd" ftell will return 4. For a file containing the encoded text ")ְ" ftell will also return 4, but those are 3 characters. I do not know how to accommodate this, or why this is occurring. each character is represented by 8 bits and the encryption was 1:1 the amount of bytes. I tried to use both "r" and "rb" while opening the file and got the same results. Would appreciate any help and insight on that matter, thanks!

glacial gustBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

fierce thunder
#

not every character is 1:1

#

what you're likely using here is utf8 and so some characters use multiple individual bytes to encode them

#

also its encoding not encrypting they are different

#

;compile

std::cout << sizeof(")ְ");
heady lava
#

In order to properly count UTF-8 characters you would need to read through the entirety of the file.

wild yokeBOT
#
Program Output
5
fierce thunder
#

see its 4 chars (+ 1 for the null)

gaunt forge
fierce thunder
#

send your implementation

gaunt forge
# fierce thunder send your implementation

This is the encryption algorithm:

int enc(const unsigned char *data_in, unsigned int size_in, unsigned char *data_out, unsigned int size_out) {
    if ((data_in == NULL) || (data_out == NULL))
        return ERR_NULL_PTR;
    int j, i, bit1, bit2;
    char a1[6], a2[4], b1[4], b2[6];
    a1[5] = '\0';
    a2[3] = '\0';
    b1[3] = '\0';
    b2[5] = '\0';
    int odd = size_in % 2;
    printf("Size going in is %d", size_in);
    char new_a[9], new_b[9];
    for (i = 0; i < (int)size_in; i = i + 2) {
        if ((odd == 1) && (i == size_in - 1)) {
            data_out[i] = data_in[i];
            break;
        }
        for (j = 0; j < 8; j++) {
            if ((j >= 0) && (j < 5))
                a1[j] = (char)!!((data_in[i] << j) & 0x80) + 48;
            if ((j >= 5) && (j < 8))
                a2[j - 5] = (char)!!((data_in[i] << j) & 0x80) + 48;
            if ((j >= 0) && (j < 3))
                b1[j] = (char)!!((data_in[i + 1] << j) & 0x80) + 48;
            if ((j >= 3) && (j < 8))
                b2[j - 3] = (char)!!((data_in[i + 1] << j) & 0x80) + 48;
        }
        strcpy(new_a, a1);
        strcat(new_a, b1);
        strcpy(new_b, a2);
        strcat(new_b, b2);
        bit1 = strtol(new_a, NULL, 2);
        bit2 = strtol(new_b, NULL, 2);
        data_out[i] = (unsigned char)bit1;
        data_out[i+1] = (unsigned char)bit2;
    }
    return OK;```
fierce thunder
#
strcpy(new_a, a1);
strcat(new_a, b1);
strcpy(new_b, a2);
strcat(new_b, b2);
bit1 = strtol(new_a, NULL, 2);
bit2 = strtol(new_b, NULL, 2);
```any reason you're using these?
gaunt forge
#

I was able to verify the encryption with the text:

IT - This is my sentence. i like to moove it moove it and if you can't move it i don't know!
OK this sitting is unaccatple if i like to moove it! can you moove-it? not a single
MOoVE. or re-moOvE ok? am i (moove) or mooveing?
Done!it

which was correctly encrypted to:

J4!
"k    q`k3#
y sekװc®ce)ְi k‰ke#iאk¯kצa k4#
kןsֵ#    q€c.a€k&#kץ#c.#פ#
kצa k4#    #kמ#פ#kֿqב
OI`sˆk3#k4s‰kַ#    q`s®c#cask…#    aְi k‰ke#iאk¯kצa k4! caiְ{/q k¯kצa­k49אkֿq€a sikַk…
MKןRֵ)ְkע#a­k¯KצA kכ9אc-#    !k¯kצa©#q@k¯kצc©kַ8ךCkֵ#)pŠ
fierce thunder
gaunt forge
fierce thunder
#

ah wait I misread nevermind

#

I can see how you would get 1 extra character

#
for (i = 0; i < (int)size_in; i = i + 2) {
  // ...
  data_out[i] = (unsigned char)bit1;
  data_out[i+1] = (unsigned char)bit2;

if size is an odd number then i + 1 will add an extra character at the end

#

which might overwrite a null terminator I suppose

#

and then you accidentlly get n new characters

gaunt forge
#

I think this segment of my code should cover this corner case:

    for (i = 0; i < (int)size_in; i = i + 2) {
        if ((odd == 1) && (i == size_in - 1)) {
            data_out[i] = data_in[i];
            break;
        }

does it not?

#

If I'm currently on the last byte and size is odd, the last byte is copied unchanged and the break jumps out of the loop

fierce thunder
#

ah I didnt see that

#

welp yeah I mean looks fine

#

are you expecting the output to be null terminated?

#

and is it zero'd to start?

gaunt forge
#

I don't think I understand the question, can you ask again please? 😅

gaunt forge
fierce thunder
#

also ftell isn't always that reliable about the size of files but that probably fine here

gaunt forge
fierce thunder
#

how though

#

how is the buffer written

gaunt forge
fierce thunder
gaunt forge
# fierce thunder how is the buffer written

Let me attach the code:
Memory allocation:

int allocate_buffer(void **buf, unsigned int buf_size) {
  // TODO
    if (buf == NULL)
        return ERR_NULL_PTR;
    *buf = (unsigned char*)malloc((buf_size) * sizeof(unsigned char));
    if (*buf == NULL) {
        return ERR_MEMORY;
    }
    return OK;
}

reading from input

int load_data_from_file(const char *input_file_path, unsigned char **buf,
                        unsigned int *buf_size) {
  // TODO
    if ((buf == NULL) || (input_file_path == NULL))
        return ERR_NULL_PTR;
    char c;
    FILE* fp = fopen(input_file_path, "r");
    if (fp == NULL)
        return ERR_FILE;
    for (int i = 0; i < (int)*buf_size; i++) {
        fscanf(fp, "%c", &c);
        (*buf)[i] = (unsigned char)c;
    }
    fclose(fp);
    return OK;
}

writing to output:

int write_data_to_file(const char *output_file_path, const unsigned char *buf,
                       unsigned int buf_size) {
  // TODO
    if ((buf == NULL) || (output_file_path == NULL))
        return ERR_NULL_PTR;
    FILE* fp = fopen(output_file_path, "w");
    if (fp == NULL)
        return ERR_FILE;
    for (int i = 0; i < (int)buf_size; i++) {
        fprintf(fp, "%c", (char)buf[i]);
    }
    fclose(fp);
    return OK;
}
fierce thunder
#

so Im more suspicious of how data is getting into the fiel

gaunt forge
#

Hopefully I understood your question?

fierce thunder
#

Okay what

#

no no no thats not how you write data to files

#

look into fwrite and fread

#

you're not just writing strings you're writing arbitrary data

#

and trying to pass that through the character apis might add in extra encoding characters for locals and things

#

how you print a character is not always the same as the character itself

gaunt forge
#

I tried to use fread and fwrite but didn't get the right results. Should I use a loop with those?

fierce thunder
#

no you just need 1 function

gaunt forge
fierce thunder
fierce thunder
#

I believe

#

regardless this is a terribly inefficient way to output

#

fwrite(buf, 1, buf_size, fp) something like that

#

and fread for reading in

gaunt forge
fierce thunder
#

idk but what you're doing is definitely wrong

gaunt forge
fierce thunder
#

It could also possible be ftell ... I think it unlikely. But you should probably switch to an alternative, since you actually don't want to be using raw r and w you want to be using rb and wb since you're not working with characters

#

and then ftell would be unreliable

#

The alternative here is probably fstat (or _fstat in windows) to get the actual size of the file

gaunt forge
# fierce thunder It could also possible be `ftell` ... I think it unlikely. But you should probab...

I changed my functions of reading and writing to the following:

int write_data_to_file(const char *output_file_path, const unsigned char *buf,
                       unsigned int buf_size) {
  // TODO
    if ((buf == NULL) || (output_file_path == NULL))
        return ERR_NULL_PTR;
    FILE* fp = fopen(output_file_path, "w");
    if (fp == NULL)
        return ERR_FILE;
    fwrite(buf, sizeof(unsigned char), buf_size, fp);
    fclose(fp);
    return OK;
}

int load_data_from_file(const char *input_file_path, unsigned char **buf,
                        unsigned int *buf_size) {
  // TODO
    if ((buf == NULL) || (input_file_path == NULL))
        return ERR_NULL_PTR;
    FILE* fp = fopen(input_file_path, "r");
    if (fp == NULL)
        return ERR_FILE;
    fread(*buf, sizeof(unsigned char), *buf_size, fp);
    fclose(fp);
    return OK;
}

The results are sadly the same, but this is definitely more efficient

fierce thunder
#

there actually is no standard c way to determine the size of a file

#

which is a problem

#

but fstat is good enough

#

the problem is that with r or w the result has no meaning other than as a parameter to ftell (its not the actual offset)
but in rb and wb you can't seek to the end

gaunt forge
#

It seems that for fstat I need to include the following:

#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>

Which are not allowed in the task I'm doing

gaunt forge
glacial gustBOT
#

@gaunt forge Has your question been resolved? If so, type !solved :)

fierce thunder
#

generally the only way with these is to allocate as you read

#

so you allocate a block, then try to read, then if you use up your space you reallocate it to be bigger, etc

gaunt forge