[an error occurred while processing this directive]
Problem Set 4 Huffman Coding |
Out: 13 March Due: Wednesday (beginning of class), 22 March |
Collaboration Policy - Read Carefully
For this assignment, you may work on your own or with any one other person of your choice except for anyone you worked with on PS3. If you work with a partner, you should turn in one assignment with both of your names on it. If you would prefer to be assigned a partner, send email to evans@cs.virginia.edu before 5pm on Monday, 13 March (include any constraints or preferences you have on your assigned partner). If a suitable match requests a partner, you will receive a partner assignment.
You may consult any outside resources including books, papers, web sites and people you wish, except you may not copy code from other Huffman encoding implementations. There are many implementations of Huffman encoding and decoding available on the web, and it would certainly defeat the purpose of this assignment if you copied one of them instead of thinking on your own. You are also encouraged to discuss these problems with students in the class. You must acknowledge any people and outside resources you work with on your assignment. If you discuss the assignment with people other than your partner, you may not take any written materials out of your discussion. It is fine to bounce ideas off other people, but the answers you turn in must be your own.
You are strongly encouraged to take advantage of the staffed lab hours posted on the CS216 web site.
Purpose
{ char *s = (char *) malloc (sizeof (*s)); s[0] = 'a'; printf ("%c", *(s + 1)); }
2.
{ char *s = (char *) malloc (sizeof (*s) * 6); char *t; strcpy (s, "cs216"); t = s; free (t); printf ("%s", s); }3.
char *select (int v, char **s) { if (v) return *s; return 1[s]; } int main (int argc, char **argv) { char **s = (char **) malloc (sizeof (*s) * 2); char *t1 = (char *) malloc (sizeof (*t1) * 6); char *t2 = (char *) malloc (sizeof (*t2) * 6); char *p; s[0] = t1; s[1] = t2; s[0][0] = 'b'; p = select (0, s); p[0] = 'a'; printf ("%c", **s); }
For the rest of this assignment, you will understand and complete an implementation of Huffman encoding and decoding. Huffman encoding was developed in a term paper David Huffman wrote instead of taking the final exam in an information theory course, and is now used in many applications including MPEG and MP3.
The provided code and directions assume you are using Visual Studio, as installed in the ITC labs. You are free to use any C compiler you want, but if you run into problems with some other compiler the course staff may not be able to help you.
Click on the file PS4.sln. Visual Studio should open, and you will see a view showing the huffman.c source file. The right side of the window shows all the source files in the project. There are three files:
Try running it from the Windows shell:
K:\cs216\ps4\Debug>huffman Usage: huffman [-d] [-b] <input> [<output>]The -d option is used for decoding. The -b option is used to select bit-encoding (instead of printing 0 and 1 as characters in the output file). We will use the character encoding until question 9.
typedef struct _htree { struct _htree *left; struct _htree *right; struct _htree *parent; int count; char letter; } *htree;Because C is designed to be compilable by a one-pass compiler and all types must be declared before they are used, declaring a recursive datatype in C is a bit awkward. Here, we use struct to create a structure datatype consisting of the five fields showing. The typedef defines the htree datatype as a pointer to the struct _htree structure. The left, right, and parent fields of struct _htree are themselves htree objects (but we have to use struct _htree * instead, since htree is not yet defined). Each tree node maintains a count (integer that represents the weight of that node, C(n) in the book), and letter (which is meaningful only for leaves).
a. A tree where the letter A is encoded using one bit.
b. A tree where each letter is encoded using exactly three bits.
c. A tree where a letter requires more than 6 bits to encode.
It uses htree_encodeChar to obtain a string representation of the Huffman encoding for each character in the input string, and writes it to the output file. Note that we are using a string like "01001" to encode a five-deep character. Since we are using strings, it will be easy to read the output file, but not very useful as a compressor! Each character's encoding expands to 8 * tree depth since we are using a full byte to represent each bit in the encoding. (In Questions 9 and 10 you will modify the encoding to only use one bit per encoding bit, instead of a full byte.)
> huffman test.txt test.hcode > huffman -d test.hcodeshould ouput the original contents of test.txt.
Complete the implementation of htree_decodeChars. We have provided some code that you may find useful in htree.c, but you can change the implementation however you want. (If you are stuck on this question, you may find it useful to examine the provided htree_decodeBits routine.)
For these questions you will implement an encoder that uses one bit to represent each encoding bit instead. We have provided an almost complete implementation of the htree_decodeBits routine that will decode a bit-Huffman-encoded file.
C provides bitwise operators for manipulating bytes at the bit-level. For example, the & operator performan a bitwise and. The ith bit of a & b is 1 if and only if the ith bit of a and the ith bit of b are both 1.
However, C does not provide a direct bit datatype or a way to write a single bit to a file. Instead, we use the unsigned char datatype, which is a byte (8 bits), and must write to the file one byte at a time. The causes some complication at the end of the file, if we are not on an even byte boundary. The solution assumed by our htree_decodeBits is that the very last byte of the file represents a number which gives the number of bits in the next-to-last byte that are valid. For example, if the file ends with a 3-bit partial code, we will output a full byte with the first 3 bits being the code and the remaining 5 bits of that byte undetermined, and a final byte with value 3 that indicates that only the first 3 bits of the next-to-last byte are valid.
You can check if your implementation is correct by trying the test binary-encoded files alphabet.bh and declaration.bh included in ps4.zip:
> huffman -bd alphabet.bh abcdefghijklmnopqrstuvwxyz > huffman -bd declaration.bh displays the Declaration of IndependenceNote that our implementation now is an effective compressor. The original declaration.txt is 8586 bytes, but the Huffman-encoded file is 5123 bytes.
CS216: Program and Data Representation University of Virginia |
David Evans evans@cs.virginia.edu Using these Materials |