Decompressing text | Together C & C++ | Page 1

leaden hill Dec 21, 2024, 1:34 AM

#

I need help decompressing this text.

📎 compressed_1.txt

fast anchorBOT Dec 21, 2024, 1:34 AM

#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

leaden hill Dec 21, 2024, 1:34 AM

#

here is my code:

#

📎 decompression_debug.cpp

#

I dont think its doing it properly tho, im not sure what the issue is

small light Dec 21, 2024, 1:42 AM

#

first you need to tell us how the compression algorithm is supposed to work

leaden hill Dec 21, 2024, 1:49 AM

#

small light first you need to tell us how the compression algorithm is supposed to work

I can show you the instructions my professor gave

#

Is that fine

strong sequoia Dec 21, 2024, 2:20 AM

#

Your code appears to search for a digit (0-9) and if found repeat the previous character by that amount. However the compressed file looks like this:

jeffwh@m93p:~/git/tests/decompression$ cat compressed.txt 
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
 8xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
 8xmlns:xsd="http://www.w3.org/2001/XMLSchema"
 8xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 4<soapenv:Header>
 8<ns1:RequestHeader
 12soapenv:actor="http://schemas.xmlsoap.org/soap/actor/next"
 12soapenv:mustUnderstand="0"
 12xmlns:ns1="https://www.thehartford.claimscenter.com/apis/policy/search/v230456">
 16<ns1:networkCode>111212313</ns1:networkCode>
 16<ns1:applicationName>AgentManagerApi</ns1:applicationName>
 16<ns1:applicationProtection>~255~255~255~255~80</ns1:applicationProtection>
 8</ns1:RequestHeader>
 4</soapenv:Header>
 4<soapenv:Body>
 16<PolicySearchCriteria xmlns="urn:com.thehartford.claimscenter:policysearch.types">
 24<LOBSearchType_Ext>personal</LOBSearchType_Ext>
 24<LossDate>2014-08-08</LossDate>
 24<LossState>CT</LossState>
 24<LossType>AUTO</LossType>
 24<PolicyNumber>55PHT76</PolicyNumber>
 24<SearchType>P</SearchType>
 16</PolicySearchCriteria>
 4</soapenv:Body>
</soapenv:Envelope>

As you can see there are digits besides those that represent character repeat counts. Secondarily, this file appears to have multiple lines with two digit values rather than a single digit. In most cases these "repeat" counts are preceeded by a space character and tend to appear only at the start of the line, indicating to me that we are only attempting to compress leading whitespace on each line and not anywhere else in the file.

Another interesting observation is that every line appears to end with '\r\n\177'. Is this expected?

#

As far as I know XML must contain only printable characters so, the \177 (0x7f) characters would be illegal. So, they either must be part of the compression stuff or you have a corrupted file.

#

Perhaps the compression is 0x7f followed by the compressed character followed by an integer indicating the number of times to repeat the compressed character?

leaden hill Dec 21, 2024, 5:53 AM

#

strong sequoia Perhaps the compression is 0x7f followed by the compressed character followed by...

my friend said smth similar

#

based on the code i have above

#

I have a part of it but you Im not using the 0x7f character to compare

small light Dec 21, 2024, 5:53 AM

#

so what is the assignment?

#

it must be explaining the compression scheme

#

otherwise the whole exercise (for you and for us) is pointless

leaden hill Dec 21, 2024, 5:58 AM

#

@small light

#

📎 P1-RunlengthDecode-2_2.docx

small light Dec 21, 2024, 5:59 AM

#

please extract the relevant explanation of the compression scheme and post it as text here

#

I don't want to download a whole document

leaden hill Dec 21, 2024, 6:00 AM

#

1 Goals of this Assignment

To identify a compiler and an IDE that will meet your needs for this course.
To write a program in C++.
To use file I/O in C++ (open, read a char, test for eof).
To use char input that does NOT skip whitespace.
To use an acceptable coding style: Please see the style sheet.
To understand and use a type cast.

#

Most people now use rar or zip to compress files. These applications use a combination of compression mechanisms, in sequence. This assignment and the next explore the simplest compression scheme we have. Later this term we will implement a more complex compression scheme.

#

2.1 Run-Length Compression
Run-length Compression is effective on text or binary files that have the same byte repeated over and over. Think of a file containing a table of numbers: it has lots of consecutive space characters, and may have a repeated filler character, such as a ‘.’ . You will implement this simple kind of compression in Program 2. However, the algorithm to decompress the file is easier, so I am asking you to do the decompression first.

#

Runs. In this scheme, any “run” of the same character (4 or more identical consecutive bytes) is replaced by a triplet of bytes, consisting of

An escape character. We will use 0x7f, which is sometimes called “esc”. It is a non-printing ASCII character.
The letter that has been repeated.
A 1-byte count of the number of repetitions.
In addition, any esc character, or run of them, that occurs in the input must also be replaced by a triplet: esc esc count .

3 To do: Expand a Compressed File
I will give you a compressed file; your job is to restore it to the uncompressed representation. The algorithm is a loop that reads chars one at a time until eof is reached:

Read an input character named my_character (do not skip whitespace) from compressed.txt and quit if end of file happens.
If my_character is NOT an escape character, output it to the console and to a file: console_output.ext. Continue reading the next character in the loop.
If my_character IS an escape character, read two more chars: the first will be a letter, the second is the count.
Cast the count from type char to type unsigned short int and use it to output that many copies of the letter.
Continue reading the next character in the loop.
Please note: if you handle the end of file wrong, the last character of the output will be wrong. You must check for eof immediately after reading a character, not before.

small light Dec 21, 2024, 6:05 AM

#

So this is basically already giving you the exact steps and lines of code you need to write, even with variable names and types to use.

#

just translate them exactly into your function

#

with "escape character" the assignment means the '\177' character with value 0x7f mentioned earlier (and previously in your assignment)

#

your current function does something seemingly completely unrelated to these instructions

leaden hill Dec 21, 2024, 6:27 AM

#

small light your current function does something seemingly completely unrelated to these ins...

well apparently my code is good but i just need to put it through an if statement that checks for that char

#

can i send a video of me running the code?

small light Dec 21, 2024, 6:31 AM

#

no, that is not the only problem

#

after reading 0x7f you are supposed to read two characters and cast the second to unsigned short int

#

you are instead trying to interpret the character as a digit, which is not what the instructions say you are supposed to do

#

also, why do the instructions say to read the input from a file, but you have it in string literals instead?

#

are you sure that you copied and escaped all of the non-printable characters in the file correctly?

leaden hill Dec 21, 2024, 7:28 AM

#

@small light sorry, not entirely sure what you mean by that. is what you mean that im casting to unsigned short int, then doing the comparison where im supposed to compare, then cast

Also as for the input file, seems like saying im using a string literal for the file path of me assigning it to a variable

small light Dec 21, 2024, 7:30 AM

#

leaden hill <@1305581463177007196> sorry, not entirely sure what you mean by that. is what y...

no, you are supposed to cast the second character after 0x7f to unsigned short int and the resulting value is supposed to be the number of times that you repeat the first character after 0x7f in the output

#

how did you manage to turn the file contents into string literals?

#

I expect that you didn't do that correctly.

leaden hill Dec 21, 2024, 7:34 AM

#

small light I expect that you didn't do that correctly.

yea

#

let me try and fix this

small light Dec 21, 2024, 7:36 AM

#

I don't know how you are trying to do that, but given your apparent knowledge level I doubt that you can be able to do it correctly because it is far from trivial. Just read from the file as you are asked to?

strong sequoia Dec 21, 2024, 8:00 AM

#

Also, what did you make of this instruction?

In addition, any esc character, or run of them, that occurs in the input must also be replaced by a triplet: esc esc count .

small light Dec 21, 2024, 8:01 AM

#

strong sequoia Also, what did you make of this instruction? ***In addition, any esc character,...

that's for compressing, not decompressing

strong sequoia Dec 21, 2024, 8:01 AM

#

Seems that way, yes.

small light Dec 21, 2024, 8:01 AM

#

it is just a special case of the general algorithm

strong sequoia Dec 21, 2024, 8:04 AM

#

I am not yet convinced that the compressed file is correct because on some lines, it looks like there were two length digits rather than always one. If they are not part of the length, they must be part of the decompressed XML, eh?

small light Dec 21, 2024, 8:05 AM

#

strong sequoia I am not yet convinced that the compressed file is correct because on some lines...

given that the instructions talk about reading a file but the shown code has the input in a string literal, I suspect that there was some "converting" going on, probably by copy-paste from editor to LLM to code

fast anchorBOT Dec 21, 2024, 8:05 AM

#

small light given that the instructions talk about reading a file but the shown code has the...

Did you mean: LLVM

leaden hill Dec 21, 2024, 8:21 AM

#

I suspect that there was some "converting" going on, probably by copy-paste from editor to LLM to code

#

yes ur right @small light

small light Dec 21, 2024, 8:21 AM

#

see, that's the issue

#

the copy-pasting will not correctly preserve non-printable characters and the LLM is not going to get whatever you are asking it to do with the input right

#

just an overall terrible idea

leaden hill Dec 21, 2024, 8:22 AM

#

I’m screwed without it

small light Dec 21, 2024, 8:25 AM

#

given that the assignment says "To use file I/O in C++ (open, read a char, test for eof)." you should probably have learned how to read characters from a file already earlier

#

just reuse that to read everything into a string and then go back to improving your decompress function

strong sequoia Dec 21, 2024, 7:55 PM

#

leaden hill I need help decompressing this text.

May I see the actual unmodified compressed file you were given? Instead of pasting it here, press the "+" button and select "Upload a file".

leaden hill Dec 21, 2024, 7:57 PM

#

strong sequoia May I see the actual unmodified compressed file you were given? Instead of past...

that was it

#

i didnt change anything, took it straight from the assignment

strong sequoia Dec 21, 2024, 7:57 PM

#

did you upload it or copy / paste?

leaden hill Dec 21, 2024, 7:57 PM

#

📎 compressed_2.txt

#

holdup

#

strong sequoia Dec 21, 2024, 7:58 PM

#

Thanks, but do you understand the question?

leaden hill Dec 21, 2024, 7:59 PM

#

strong sequoia Thanks, but do you understand the question?

nope

#

i guess not

strong sequoia Dec 21, 2024, 8:00 PM

#

Press the + button to the left of where you type your response and select Upload a file. Is that what you did?

leaden hill Dec 21, 2024, 8:00 PM

#

strong sequoia Press the + button to the left of where you type your response and select Upload...

yes thats exactly what i did

strong sequoia Dec 21, 2024, 8:03 PM

#

Ok, well I must confess I am stumped. Decompressing that file according to the instructions does not yield valid xmlsoap. But that would be an assumption on my part that it should. So if it were me, I would have to just explicitly follow the instructions and turn in that result.

#

The reason I am stumped is that file seems to have some lines with two digit compression counts rather than one digit.

#

Like this line for example:

 12soapenv:actor="http://schemas.xmlsoap.org/soap/actor/next"

#

But if you allow for multiple digit counts, you are going to have problems with this line:

 16<ns1:applicationProtection>~255~255~255~255~80</ns1:applicationProtection>

#

So, I guess I am stumped. I am sorry I am not more help.

small light Dec 22, 2024, 3:59 AM

#

strong sequoia The reason I am stumped is that file seems to have some lines with two digit com...

the instructions do not say to decompress by interpreting characters as digits at all

#

according to the instructions the second character after the escape character is itself supposed to be interpreted as the number of repeats (as in the character's value, not a digit that the character represents)

#

an editor may not be interpreting the control characters with low ASCII value, which should be used for these repeat numbers correctly and do something unexpected with them

#

looking at the file in an editor is just pointless and if it has ever been copy-pasted through an editor or even just opened and saved through a text editor, there is a good chance that it will be broken

leaden hill Dec 23, 2024, 10:24 PM

#

small light looking at the file in an editor is just pointless and if it has ever been copy-...

What if I store the dynamically allocated array in the int variable?

#Decompressing text