#Help with Harvard PSET 4

104 messages · Page 1 of 1 (latest)

river crane
#

I'll attach some context in a comment because of word limit!

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        printf("Usage: ./recover FILE\n");
        return 1;
    }

    // Open the memory card
    FILE *card = fopen(argv[1], "r");

    // Create a buffer for a block of data
    uint8_t buffer[512];
    int jpeg_count = 0;
    FILE *jpeg_file = NULL;

    // While there's still data left to read from the memory card
    while (fread(buffer, 1, 512, card) == 512)
    {
        // To create JPEG from the data do this:
        // Look for the start of a JPEG
        for (int i = 0; i < 512; i++)
        {
            if (buffer[i] == 0xff && buffer[i + 1] == 0xd8 && buffer[i + 2] == 0xff && (buffer[i + 3] & 0xf0) == 0xe0)
            {
                // Open a new JPEG file
                char jpeg_filename[100];
                sprintf(jpeg_filename, "%03d.jpg", jpeg_count++);
                jpeg_file = fopen(jpeg_filename, "w");

                // Write bytes of data from that JPEG until a new JPEG is found (identified from the start of a JPEG)
                do
                {
                    fwrite(buffer, 1, 512, jpeg_file);
                } 
        while (fread(buffer, 1, 512, card) == 512 && !(buffer[i] == 0xff && buffer[i + 1] == 0xd8 && buffer[i + 2] == 0xff && (buffer[i + 3] & 0xf0) == 0xe0));
        // Above line of code checks again to ensure that we have data to read. Also checks that there is no JPEG signature so that it continues writing until true
                
        fclose(jpeg_file);

            }
        }
    }

    // This line should close the final image (image 49) since there will not be another JPEG signature.
    if (jpeg_count > 0)
    {
        fclose(jpeg_file);
    }

    fclose (card);
}
#

Hi there! My program is supposed to recover deleted JPEGs from a file, and for the most part, it works. However, there are 49 JPEGs to recover, and with my program, only 24 are successfully recovered.
I notice that in this part of my code I tend to have issues:

do
                {
                    fwrite(buffer, 1, 512, jpeg_file);
                } 
        while (fread(buffer, 1, 512, card) == 512 && !(buffer[i] == 0xff && buffer[i + 1] == 0xd8 && buffer[i + 2] == 0xff && (buffer[i + 3] & 0xf0) == 0xe0));
        // Above line of code checks again to ensure that we have data to read. Also checks that there is no JPEG signature so that it continues writing until true

When I removed this: fread(buffer, 1, 512, card) ... the program will load all 49 JPEGs, but display them all as blank. I'm a little confused on what to do, I've been trying to think of solutions for 3 days now, and decided to get some help! Thank you for reading this!

For context on the code, I'm reading the file with the deleted JPEGs, checking to make sure I have data to read still (512 bytes). If I do, then I will look for the JPEG signature. When the signature is found, I'll create a new JPEG file, and write in it until I find a new JPEG signature, in which case I will close the current JPEG file and open a new one (as long as there is still code to read).

fossil lintel
#

Algorithm machine broken

river crane
#

ahh, what do you mean?

fossil lintel
#

Detecting the jpegs / jpeg headers works correctly, right?

river crane
#

Yes, so all the files load fine except it abruptly stops at 24, and I can't really troubleshoot it. I attached the output.

#

There are supposed to be 49 JPEGs

fossil lintel
#

No idea if you're on Windows or Linux but if I remember correctly it's good practice to use rb and wb for Windows, not sure if it makes any difference these days though

river crane
#

oh sorry, windows

fossil lintel
#

I just dug up some old implementation when I did that assignment for fun a few years ago

river crane
#

I'll change those, good point

#

For FUN!?

#

I asipire

#

aspire this level

fossil lintel
#

Could it be that the individual images are null-padded by any chance?

river crane
#

I didn't see anything about that from the assignment instructions, but could you describe what that means?

fossil lintel
#

Let me just grab the assignment again

river crane
#

Ah okay, ty

fossil lintel
#

Ah yeah

#

I was wondering why my solution removes them

fossil lintel
river crane
#

OHH so null padding is just a bunch of 0s in the memory where there isn't meaningful storage, got it

fossil lintel
#

I have a feeling you're mashing together two images into one file

river crane
river crane
#

let me open the hex editor, ty

#

opening the card.raw, right?

fossil lintel
#

One of the output images

river crane
#

Ah okay

#

at the end of the JPEG I see some 0s

fossil lintel
#

Yeah, that's the null padding (but it's fine if we have that in there)

#

Can you spot the JPEG magic bytes at the top of the file?

river crane
#

Good to know

#

Is magic bytes the JPEG signature?

fossil lintel
#

Yeah sorry, the signature is also called magic / magic bytes

#

Basically the 4 values your code is using in the comparisons already

#

And now the real question, do those bytes occur once or twice in your file

river crane
#

That's the start

fossil lintel
#

Yeah so at the start we see the JPEG signature

Do we see it somewhere in the middle again?

fossil lintel
#

0xff, 0xd8, 0xff, 0xe0

#

or whatever it is for jpeg

#

Yeah those are the first 4 bytes of the file

#

My suspicion is that you kind of mash two of them together

river crane
#

so for just JFI (0xff, 0xd8, 0xff) cause I know the fourth byte can vary, only 1 result. and with the fourth byte, JFIF (0xff, 0xd8, 0xff, 0xe0), still only one result. but i'm only looking at one JPEG currently, and this JPEG loaded without issues

river crane
fossil lintel
#

Ah I know the issue

#

😈

river crane
#

I'm thinking the issue must happen at JPG 25 right?

#

TELL ME

fossil lintel
#

Just looked at your code, you have a for-loop from i = 0 to i = 511

#

Then for each i you check whether the magic starts at offset i, if it does you save out the entire buffer

river crane
#

Right right

fossil lintel
#

Let's say the image starts at offset 100

river crane
#

You think I'm checking for too many values in one byte or..?

fossil lintel
#

In that case we have 100 bytes in the buffer and then the next image

#

I think you don't even need that loop

river crane
#

HUH

fossil lintel
#

Based on the assignment

#

We can be sure that the magic bytes occur at the start of each 512 byte buffer

river crane
#

Ohhh I didn't consider that, I figured it could be between 1 - 512....

#

Let me remove the for-loop and test it again

river crane
fossil lintel
#

Yeah, the magic bytes are always in the first four bytes of our buffer

river crane
#

I see, ty ty

#

I'm just adjusting some thing, have to initialize my int i now

#

Actually..

#

If magic bytes occur at the start of every buffer, could I change this:
if (buffer[i] == 0xff && buffer[i + 1] == 0xd8 && buffer[i + 2] == 0xff && (buffer[i + 3] & 0xf0) == 0xe0)

to just be 0, 1, 2, 3?

if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0)

#

Not byte, I mean buffer, lol

river crane
#

OK you were right, the for-loop wasn't needed, everything loaded the same. Still only up to 24 though, I'm thinking the issue lies somewhere in this concoction I made:

do
  {
    fwrite(buffer, 1, 512, jpeg_file);
  }
  while (fread(buffer, 1, 512, card) == 512 && !(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0));

  fclose(jpeg_file);```
fossil lintel
#

So what does the code look like now?

river crane
#

Let me attach it

#
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        printf("Usage: ./recover FILE\n");
        return 1;
    }

    // Open the memory card
    FILE *card = fopen(argv[1], "rb");

    // Create a buffer for a block of data
    uint8_t buffer[512];
    int jpeg_count = 0;
    FILE *jpeg_file = NULL;

    // While there's still data left to read from the memory card
    while (fread(buffer, 1, 512, card) == 512)
    {
            if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0)
            {
                    // Open a new JPEG file
                    char jpeg_filename[100];
                    sprintf(jpeg_filename, "%03d.jpg", jpeg_count++);
                    jpeg_file = fopen(jpeg_filename, "wb");

                   // Write bytes of data from that JPEG until a new JPEG is found (identified from the start of a JPEG)
                   do
                   {
                        fwrite(buffer, 1, 512, jpeg_file);
                   }
                   while (fread(buffer, 1, 512, card) == 512 && !(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0));

                   fclose(jpeg_file);

           }
    }
    if (jpeg_count > 0)
    {
        fclose(jpeg_file);
    }

    fclose (card);
}
#

so funny trying to get it to paste correctly

fossil lintel
#

Now it gets a bit tricky

#

The idea is that we:

  1. Open the JPEG output file we're currently reading
  2. Read bytes and write them into the current output file
  3. If we encounter our magic bytes, we close the output file and open a new one
  4. We go back to 1. if there's stuff left to read
river crane
#

Right yes

#

I'm still here, just thinking it through in my head and trying some things out, so you know

fossil lintel
#

All good, that (ideal) approach is a bit different than what you currently have

#

Essentially we keep the current image file open at all times and just dump our buffer into that file until we see the magic bytes

river crane
#

Yes!

#

So if the issue is that I began to mash two JPGs together, and that's why it couldn't find anymore JPEGS after 24, why would that happen? I would think that my loop would here:

do
  {
    fwrite(buffer, 1, 512, jpeg_file);
  }
  while (fread(buffer, 1, 512, card) == 512 && !(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0));

.. would ensure that a header file isn't missed. I'm not checking for null cause it's null-padded... I'm not sure how it could get lost

fossil lintel
#

You don't need to check for null padding

river crane
#

Right of course

fossil lintel
#

The issue is that you read double

#

I think (?)

#

while (fread(buffer, 1, 512, card) == 512 && !(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0));

#

this loop breaks when you read the buffer and it has the start of a new image

#

Then we fclose our file and do the outer loop again

In the outer loop we read 512 bytes again (ignoring the last bytes we read?)

river crane
#

Ahhh, so you're thinking that I'm skipping bytes with the double fread? When I remove the fread from the while loop I get weirder outputs, I'll get all 49 images, but they're all blank

fossil lintel
#

I think you're double reading, yes.

river crane
#

I do have a double free going on, but that last fclose in my program is supposed to close the final JPEG since there will be no next magic bytes left to look for, not sure if it's unnecessary and I overthought it

river crane
#

Hmmm okay let me try some things and without the double read

#

Going through with hex editor on these blank images, they're REALLY short, like hardly even one full page.

river crane
#

Trying out new things, but for some reason my JPEGs are empty now lol. Respond later when you want, no rush :)