University of Virginia Computer Science CS216: Program and Data Representation, Spring 2006 |
16 February 2013 |
Problem Set 7 Eighty-Sixing Compilers |
Out: 10 April Due: Monday (beginning of class), 17 April |
Collaboration Policy - Read Carefully
For this assignment, you may work on your own or with any one other person of your choice. If you work with a partner, you should turn in one assignment with both of your names on it. You may (and probably will need to) consult any outside resources including books, papers, web sites and people you wish except for answers or comments on CS216 labs from previous semesters. You are also encouraged to discuss these problems with students in the class. You must acknowledge any people and outside resources you work with on your assignment. If you discuss the assignment with people other than your partner, you may not take any written materials out of your discussion. It is fine to bounce ideas off other people, but the answers you turn in must be your own.
As always, you are strongly encouraged to take advantage of the staffed lab hours posted on the CS216 web site.
Purpose
(Note: Since assembly instructions are variable length, a sequence of 4 instructions may actually be "shorter" than a sequence of 3 instructions in terms of the number of bytes needed to encode it. For these questions, we interpret "shorter" to mean fewer assembly instructions. Note that a shorter sequence, according to this interpretation, is not necessarily smaller or faster.)
mov BYTE PTR [eax], 200 add BYTE PTR [eax], 16
xor ebx, ebx sub ecx, ecx neg ecx and edx, 0 mov eax, ebx add eax, ebx shl eax, cl3. Find a shorter instruction sequence with the same behavior as:
label1: inc eax cmp eax, ebx jl label2 jmp label1 label2: cmp ebx, eax jle label1 imul eax, [var]
a. Describe other changes that would need to be made to the
calling convention.
b. Discuss the advantages and disadvantages of such a change.
Would it improve the running time of typical programs?
For each C language feature, you should:
Note: Printing out these disassembled files is probably not your most useful option. You will most likely find that it is significantly easier to view the files on the screen using the Visual Studio editor or your favorite text editor. In this way you can navigate through the file, searching for particular labels or C statements. Besides, you may want to make a slight modification to your C code and recompile often anyway.
6. Determine how the if, while, and for statments are implemented. Do the results tell you anything about whether it is more efficient to implement C code in particular ways?
7. Explain how the compile implements parameter passing and returns results. Be sure to examine what is happening both in the caller and in the callee. You should be able to identify how the C compiler follows the C calling convention described in the x86 Guide, but also fill in many unspecified details in the calling convention. You should at least answer these two questions: (1) how are arrays on the stack passed and returned? (2) how well does the compiler do in determining which registers must be saved and restored? In addition, answer at least one additional question of your choice where the calling convention is unclear.8. Compare code generated using debug mode (the default) to optimized code. To do this you will need to change your project configuration from debug to release by selecting "Win32 Release" from the "Set Active Configuration" drop-down list on the Build menu. Then select: project-> settings->C++ and select an optimization level. Find some example programs where the assembly code generated with optimizations is substantially better than the code generated without optimizations. Speculate on what the optimizer is doing? Are there things it does surprisingly well on? Are there things it should to better on?
9. For this question, you can pick any language feature or compiler question you want, and answer it. This is open-ended: the more interesting things you can find out the better. For those of you who know (or are interested in learning C++), feel free to consider a C++ language feature for this.
10. (For x86 gurus and wannabe-gurus only, not required to reach green-star level) Devise a sequence of instructions that has two valid different interpretations depending on where the execution starts. One execution should start at the beginning of the first instruction; the other execution should start somewhere inside the first instruction (a number of bytes offset from the start location that is less than the length of the first instruction). An ideal solution would be something like this:
start: Your code: ABCDEFGHIJKLMNOPQRSTUVWXYZ (each letter is one byte) First sequence: AB CD EFGH IJ KL MNO PQ RST UV WX YZ Second sequence: -BC DEF G HI JK LM NOP QR STU VW XY Z ...Even more ideally, the YZ instruction (the last insturction in the first sequence) would jump to location start + 4 (instruction BC in the second sequence).
To do this you will need to examine memory more directly (as well as use your understanding from question 5) to know exactly what bytes are used to represent things. Note that many of the assembly instructions are actually different opcodes depending on the parameters you use. You will also find the Opcode Map (Appendix A of Intel's Instruction Set Reference manuals, starts at page 415) useful.
Include a description of how you found your sequence. Long (non-repeating) sequences are more impressive than short sequences, but the most impressive answers will have two sequences that perform meaningful and different behaviors. You also get bonus points if you can explain a reason why such an instruction sequence might be "useful"?
CS216: Program and Data Representation University of Virginia |
David Evans evans@cs.virginia.edu Using these Materials |