#Decompressing text
80 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.
here is my code:
I dont think its doing it properly tho, im not sure what the issue is
first you need to tell us how the compression algorithm is supposed to work
I can show you the instructions my professor gave
Is that fine
Your code appears to search for a digit (0-9) and if found repeat the previous character by that amount. However the compressed file looks like this:
jeffwh@m93p:~/git/tests/decompression$ cat compressed.txt
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
8xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
8xmlns:xsd="http://www.w3.org/2001/XMLSchema"
8xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
4<soapenv:Header>
8<ns1:RequestHeader
12soapenv:actor="http://schemas.xmlsoap.org/soap/actor/next"
12soapenv:mustUnderstand="0"
12xmlns:ns1="https://www.thehartford.claimscenter.com/apis/policy/search/v230456">
16<ns1:networkCode>111212313</ns1:networkCode>
16<ns1:applicationName>AgentManagerApi</ns1:applicationName>
16<ns1:applicationProtection>~255~255~255~255~80</ns1:applicationProtection>
8</ns1:RequestHeader>
4</soapenv:Header>
4<soapenv:Body>
16<PolicySearchCriteria xmlns="urn:com.thehartford.claimscenter:policysearch.types">
24<LOBSearchType_Ext>personal</LOBSearchType_Ext>
24<LossDate>2014-08-08</LossDate>
24<LossState>CT</LossState>
24<LossType>AUTO</LossType>
24<PolicyNumber>55PHT76</PolicyNumber>
24<SearchType>P</SearchType>
16</PolicySearchCriteria>
4</soapenv:Body>
</soapenv:Envelope>
As you can see there are digits besides those that represent character repeat counts. Secondarily, this file appears to have multiple lines with two digit values rather than a single digit. In most cases these "repeat" counts are preceeded by a space character and tend to appear only at the start of the line, indicating to me that we are only attempting to compress leading whitespace on each line and not anywhere else in the file.
Another interesting observation is that every line appears to end with '\r\n\177'. Is this expected?
As far as I know XML must contain only printable characters so, the \177 (0x7f) characters would be illegal. So, they either must be part of the compression stuff or you have a corrupted file.
Perhaps the compression is 0x7f followed by the compressed character followed by an integer indicating the number of times to repeat the compressed character?
my friend said smth similar
based on the code i have above
I have a part of it but you Im not using the 0x7f character to compare
so what is the assignment?
it must be explaining the compression scheme
otherwise the whole exercise (for you and for us) is pointless
please extract the relevant explanation of the compression scheme and post it as text here
I don't want to download a whole document
1 Goals of this Assignment
- To identify a compiler and an IDE that will meet your needs for this course.
- To write a program in C++.
- To use file I/O in C++ (open, read a char, test for eof).
- To use char input that does NOT skip whitespace.
- To use an acceptable coding style: Please see the style sheet.
- To understand and use a type cast.
Most people now use rar or zip to compress files. These applications use a combination of compression mechanisms, in sequence. This assignment and the next explore the simplest compression scheme we have. Later this term we will implement a more complex compression scheme.
2.1 Run-Length Compression
Run-length Compression is effective on text or binary files that have the same byte repeated over and over. Think of a file containing a table of numbers: it has lots of consecutive space characters, and may have a repeated filler character, such as a ‘.’ . You will implement this simple kind of compression in Program 2. However, the algorithm to decompress the file is easier, so I am asking you to do the decompression first.
Runs. In this scheme, any “run” of the same character (4 or more identical consecutive bytes) is replaced by a triplet of bytes, consisting of
- An escape character. We will use 0x7f, which is sometimes called “esc”. It is a non-printing ASCII character.
- The letter that has been repeated.
- A 1-byte count of the number of repetitions.
In addition, any esc character, or run of them, that occurs in the input must also be replaced by a triplet: esc esc count .
3 To do: Expand a Compressed File
I will give you a compressed file; your job is to restore it to the uncompressed representation. The algorithm is a loop that reads chars one at a time until eof is reached:
- Read an input character named my_character (do not skip whitespace) from compressed.txt and quit if end of file happens.
- If my_character is NOT an escape character, output it to the console and to a file: console_output.ext. Continue reading the next character in the loop.
- If my_character IS an escape character, read two more chars: the first will be a letter, the second is the count.
- Cast the count from type char to type unsigned short int and use it to output that many copies of the letter.
- Continue reading the next character in the loop.
Please note: if you handle the end of file wrong, the last character of the output will be wrong. You must check for eof immediately after reading a character, not before.
So this is basically already giving you the exact steps and lines of code you need to write, even with variable names and types to use.
just translate them exactly into your function
with "escape character" the assignment means the '\177' character with value 0x7f mentioned earlier (and previously in your assignment)
your current function does something seemingly completely unrelated to these instructions
well apparently my code is good but i just need to put it through an if statement that checks for that char
can i send a video of me running the code?
no, that is not the only problem
after reading 0x7f you are supposed to read two characters and cast the second to unsigned short int
you are instead trying to interpret the character as a digit, which is not what the instructions say you are supposed to do
also, why do the instructions say to read the input from a file, but you have it in string literals instead?
are you sure that you copied and escaped all of the non-printable characters in the file correctly?
@small light sorry, not entirely sure what you mean by that. is what you mean that im casting to unsigned short int, then doing the comparison where im supposed to compare, then cast
Also as for the input file, seems like saying im using a string literal for the file path of me assigning it to a variable
no, you are supposed to cast the second character after 0x7f to unsigned short int and the resulting value is supposed to be the number of times that you repeat the first character after 0x7f in the output
how did you manage to turn the file contents into string literals?
I expect that you didn't do that correctly.
I don't know how you are trying to do that, but given your apparent knowledge level I doubt that you can be able to do it correctly because it is far from trivial. Just read from the file as you are asked to?
Also, what did you make of this instruction?
In addition, any esc character, or run of them, that occurs in the input must also be replaced by a triplet: esc esc count .
that's for compressing, not decompressing
Seems that way, yes.
it is just a special case of the general algorithm
I am not yet convinced that the compressed file is correct because on some lines, it looks like there were two length digits rather than always one. If they are not part of the length, they must be part of the decompressed XML, eh?
given that the instructions talk about reading a file but the shown code has the input in a string literal, I suspect that there was some "converting" going on, probably by copy-paste from editor to LLM to code
Did you mean: LLVM
I suspect that there was some "converting" going on, probably by copy-paste from editor to LLM to code
yes ur right @small light
see, that's the issue
the copy-pasting will not correctly preserve non-printable characters and the LLM is not going to get whatever you are asking it to do with the input right
just an overall terrible idea
I’m screwed without it
given that the assignment says "To use file I/O in C++ (open, read a char, test for eof)." you should probably have learned how to read characters from a file already earlier
just reuse that to read everything into a string and then go back to improving your decompress function
May I see the actual unmodified compressed file you were given? Instead of pasting it here, press the "+" button and select "Upload a file".
that was it
i didnt change anything, took it straight from the assignment
did you upload it or copy / paste?
Thanks, but do you understand the question?
Press the + button to the left of where you type your response and select Upload a file. Is that what you did?
yes thats exactly what i did
Ok, well I must confess I am stumped. Decompressing that file according to the instructions does not yield valid xmlsoap. But that would be an assumption on my part that it should. So if it were me, I would have to just explicitly follow the instructions and turn in that result.
The reason I am stumped is that file seems to have some lines with two digit compression counts rather than one digit.
Like this line for example:
12soapenv:actor="http://schemas.xmlsoap.org/soap/actor/next"
But if you allow for multiple digit counts, you are going to have problems with this line:
16<ns1:applicationProtection>~255~255~255~255~80</ns1:applicationProtection>
So, I guess I am stumped. I am sorry I am not more help.
the instructions do not say to decompress by interpreting characters as digits at all
according to the instructions the second character after the escape character is itself supposed to be interpreted as the number of repeats (as in the character's value, not a digit that the character represents)
an editor may not be interpreting the control characters with low ASCII value, which should be used for these repeat numbers correctly and do something unexpected with them
looking at the file in an editor is just pointless and if it has ever been copy-pasted through an editor or even just opened and saved through a text editor, there is a good chance that it will be broken
What if I store the dynamically allocated array in the int variable?