CS655: Programming Languages, Spring 2001 |
Problem Set 4:
Decaffeinating JavaOut: 10 April 2001
Due: Thursday, 19 April 2001 (in class)
Purpose Formal Semantics and Proof-Carrying Code
This problem set will require you to apply several of the formal semantics techniques we have seen in class to the problem of improving the performance of Java programs.Collaboration PolicyYou may choose anyone you want to work with on this assignment. A similar assignment was used in last year's CS655; it would be dishonorable to use it to assist your work.WarningIf you choose to work alone, you should turn in your own solution. If you choose to work with someone else, you should turn in a single solution that represents your combined work.
This problem set is believed to be long and difficult. You should start thinking about these problems early.Optional ReminderThis assignment is optional. You should do as much or as little of it as you think would be worthwhile. For most of you, I believe it will be worthwhile to do all of this problem set, but you should use your best judgement to allocate your time between this assignment, your project, and your other tasks. It won't count against your final grade in this course if you don't do this problem set, but manage to convince me through your class contribution, project and final that you have good understanding of this material (but it is probably hard to get a good understanding of these formal semantics tools without actually attempting to use them).Background
One of the (many) reasons Java programs run slowly is because of the overhead associated with all the run-time checking. A smart compiler can eliminate much of the unnecessary run-time checking, but this is only useful if it can also construct a proof that convinces an untrusting JavaVM (that doesn't see the source code) that it is safe to execute the program without the run-time checks. The goal of this problem set apply proof-carrying code techniques to remove run-time checking from a Java program.Consider the following Java class:
public class Scrunch { public static String Scrunch (String a[]) { Object [] ar = new Object [100]; for (int i = 0; i < a.length; i++) { ar[i] = a[i]; } String s = ""; for (int i = 0; i < a.length; i++) { s = s.concat ((String) ar[i]); } return s; } }For simplicity, we use a code example that would not appear in a (reasonable) Java program. In a real Java program, this would make more sense if ar were a Vector (which because of the lack of parameterized types in Java must be a Vector of Objects) instead of an Object [].A Java compiler (Sun's JDK) produces the following byte codes (the actual byte codes are shown because it builds character to read them, but we will deal with the code at a higher level for our proof) for the Scrunch method (you can see this for yourself by running javap -c -verbose class):
Method java.lang.String Scrunch(java.lang.String[]) 0 bipush 100 Push the constant 100 on the stack 2 anewarray
class <Class java.lang.Object>Pop the top of the stack, and construct a new array of element type java.lang.Object of that size 5 astore_1 Store the top of the stack (the array we just created) in local 1 (corresponds to ar) 6 iconst_0 Push the constant 0 7 istore_2 Store it in local 2 (corresponds to i) 8 goto 20 Jump to instruction numbered 20 11 aload_1 Push local 1 (ar) 12 iload_2 Push local 2 (i) 13 aload_0 Push parameter (a) 14 iload_2 Push local 2 (i) The top of stack is now: [i, a, i, ar, ...] 15 aaload Pop i (top), a (next) from stack; push a[i] aaload performs run-time bounds checking 16 aastore Pop a[i] (top), i (next), ar (next) from stack; store a[i] in ar[i] 17 iinc 2 1 Increment the integer in local 2 by one (i++) 20 iload_2 Push the integer in local 2 (i) 21 aload_0 Push the object in local 0 (the parameter, args) 22 arraylength Replace stack top with length of array 23 if_icmplt 11 Pop x (top), y (next) from stack; if y < x jump to 11 (beginning of loop) 26 ldc <String ""> Push the constant String "" 28 astore_3 Store top of stack in local 3 (s) 29 iconst_0 Push constant 0 30 istore 4 Store it in local 4 (i in second loop) 32 goto 50 35 aload_3 Push local 3 (s) 36 aload_1 Push local 1 (ar) 37 iload 4 Push local 4 (i) 39 aaload Pop i (top), ar (next) from stack; push ar[i] 40 checkcast <Class java.lang.String> If the runtime type of the top of the stack (ar[i]) is not a subtype of java.lang.String, issue a run-time type error; otherwise, continue knowing its type is a subtype of java.lang.String 43 invokevirtual
<Method java.lang.String
concat(java.lang.String)>Invoke the method concat on the object at the top of the stack. Note that if this object is an instance of a subtype of java.lang.String that overrides to concat method, the method in the subtype is called. Pass the next item on the stack as an argument. 46 astore_3 Store the result (which is put on top of the stack) in local 3 (s) 47 iinc 4 1 Increment local 4 (i) by 1 50 iload 4 Push local 4 (i) 52 aload_0 Push local 0 (a) 53 arraylength Replace top of stack with its length (a.length) 54 if_icmplt 35 if i < a.length goto 35 (continue loop) 57 aload_3 Push s on stack 58 areturn Return to caller; result is on top of stack.
Eliminating Run-Time Checks
The generated byte codes have several unnecessary run-time checks:We will attempt to replace these with:
- 15 aaload - the run-time bounds checking on the array load, a[i].
- 16 aastore - the run-time bounds checking on the array store to ar[i] is unnecessary if we know a.length < 100; a smart compiler will check this once before the loop, and remove the checking inside the loop.
- 39 aaload - the run-time bounds checking on the array load, ar[i].
These are not real JavaVM instructions, but one could imagine a future version of Java with a more sophisticated byte code verifier supporting them. We call JVML extended with the safe_aastore and safe_aaload instructions JVML+safe.
- 15 safe_aaload - an array load with no run-time bounds checking
- 16 safe_aastore - an array store with no run-time bounds checking (after inserting one check before the loop)
- 39 safe_aaload - an array load with no run-time bounds checking
Generating Verification Conditions
1. (.10) VCGen
Complete the VCGen function for JVML+safe below. You may assume all the normal JVML instructions are checked by the bytecode verifier, so the only instructions that generate predicates are the new safe_ instructions.To write the VCGen, we use a shorthand that uses arguments to represent the stack slots. You may assume the byte code verifier ensures the top stack slots match the argument types. You may assume no object is NULL. A complete verifier would also need to check these things.
VCGen (PC) = if Inst[PC] = safe_aaload <i: int> <a: array[T]> i < a.length /\ i >= 0 /\ VCGen (PC + 1) else if Inst[PC] = safe_aastore <val: S> <i: int> <a: array[T]> where S <= T (S is a subtype of T) ...else % all non-safe instructions VCGen (PC + 1) (Don't worry about falling off the end or multiple-word instructions.) Optimized Code
Tortilla Systems certifying optimizing compiler generates the following code for Scrunch:
Method java.lang.String Scrunch(java.lang.String[]) 0 bipush 100 2 anewarray classThis code differs from the code produced by Sun's JDK in four ways:5 astore_1 6 iconst_0 7 istore_2 8 check aload_0.length <= 100 9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100 10 iload_2 11 aload_0 12 arraylength 13 if_icmpge 22 14 aload_1 15 iload_2 16 aload_0 17 iload_2 18 safe_aaload 19 safe_aastore 20 iinc 2 1 21 goto 9 22 ldc 23 astore_3 24 iconst_0 25 istore 4 26 invariant ??? 27 iload 4 28 aload_0 29 arraylength 30 if_icmpge 40 31 aload_3 32 aload_1 33 iload 4 34 safe_aaload 35 checkcast 36 invokevirtual 37 astore_3 38 iinc 4 1 39 goto 26 40 aload_3 41 areturn We assume Java objects cannot be NULL. Without this assumption, we would need to include tests for null in the check and invariant clauses.
- Loops have been rearranged to make analysis easier.
- The check instruction in line 8 has been added. A check instruction performs a run-time check that its predicate holds. Immediately after a check instruction, the predicate can safely be assumed to be true, since execution would terminate if it were false.
- An invariant has been introduced for each loop. This follows the PCC requirement that all back-edges point to instructions with associated invariants. The invariant for instruction 26 is not shown (you will produce it in question 4).
- The slow aaload and aastore instructions in the original program have been replaced with the faster instructions safe_aaload and safe_aastore.
First Loop
This section walks through most of the safety proof required for the first loop. In question 4, you will need to derive the invariant and proof for the second loop yourself.For simplicity, we can view the loop between instructions 9 and 21 as:
while i < a.length do ar[i] := a[i] % safe array loads and stores i := i + 1 endTo prove the aaload in instruction 18 is safe, we need to show:
VCGen (18 safe_aaload <i> <a>) == i >= 0 /\ i < a.length /\ VCGen (19)(Hint: you should check that your answer to question 1 would produce this predicate.)We assume (for now) VCGen (19) is true, and show i >= 0 /\ i < a.length.
Since the invariant was provided by the untrustworthy code supplier, we cannot assume it is correct. Instead, we must prove the invariant holds. Then, we use the invariant to prove VCGen(18).
The axiomatic semantics partial correctness (since its a safety proof, we don't care about showing termination) rule for while is:
Inv is given by instruction 9:P => Inv, Inv { Pred } => Inv, Inv /\ Pred { Statement } Inv, (Inv /\ ~Pred) => Q, ___________________________________ P { while Pred do Statement end } Q9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100We can rewrite this as:a.length <= 100 /\ i >= 0 /\ ar.length = 100P can be any predicate that we can prove from the code before the loop. The check clause gives a.length <= 100, instructions 0-5 give ar.length = 100 and instructions 6-7 give i = 0. This is argued informally, but could be shown using axiomatic semantics rules for assignment along with a specification of anewarray. This gives,
P == a.length <= 100 /\ ar.length = 100 /\ i = 0Q is what we need to be true after the loop. Since we won't know this until doing the second loop, we can start with the weakest possible post-condition, Q = true. In question 4, you will find a stronger post-condition is needed, and change Q.
We prove each antecedent clause in turn:
Now, we can use the invariant to prove VCGen (18). At instruction 18, we can assume the invariant is true (because of the above proof, and nothing has been modified since the beginning of the loop), and i < a.length because of the loop predicate. We need to show this is enough to satisfy VCGen (18), assuming VCGen (19):
- P => Inv
a.length <= 100 /\ ar.length = 100 /\ i = 0 => a.length <= 100 /\ i >= 0 /\ ar.length = 100This is true since i = 0 => i >= 0 and all the other clauses match exactly.- Inv { Pred } => Inv
Trivially true, since Pred = i < a.length is side-effect free.- Inv /\ Pred { Statement } Inv
We need to show:(a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100) /\ (i_0 < a.length) /\ i_0 = i { ar[i] := a[i] i := i + 1 } a.length <= 100 /\ i >= 0 /\ ar.length = 100We push the second assignment using the axiomatic semantics assignment rule:(a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100) /\ (i_0 < a.length) /\ i_0 = i{ ar[i] := a[i] } a.length <= 100 /\ (i_0 + 1) >= 0 /\ ar.length = 100The first assignment does not change the length of either array or the value of i, so we need to show:a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100 /\ i < a.length /\ i_0 = i ==> a.length <= 100 /\ (i_0 + 1) >= 0 /\ ar.length = 100This holds, since if i >= 0 we know i + 1 must also be >= 0.- Inv /\ ~Pred => Q
Since Q is true, this always holds.a.length <= 100 /\ i >= 0 /\ ar.length = 100 /\ (i < a.length) ==> i >= 0 /\ i < a.length /\ VCGen (19)This is trivially true.2. (.10) Instruction 19
a. (.05) Predicate.
Show the verification predicate your VCGen generates for instruction 19: safe_aastore <a[i]> <i> <ar>.b. (.05) Proof.
Show the proof that VCGen (19) is satisfied (assuming VCGen (20) is true). You may use everything that was used in the proof of VCGen (18) above.Second Loop
For simplicity, we can view the loop between instructions 26 and 39 as:while i < a.length do s := s.concat ((String) ar[i]) % safe load i := i + 1 endYou should construct your arguments at the same level of detail as the proof for the first loop above.3. (.40) Safe Load
a. (.05) Verification Predicates.
Show the verification predicate for instruction 34, safe_aaload <i> <ar>.b. (.15) Invariant.
Write out a loop invariant (missing from instruction 26) that will be sufficient to prove your verification predicate for instruction 34.c. (.20) Proof.
Use the axiomatic semantics rule for while to prove VCGen (34) is true (assuming VCGen (35)). Your proof should follow the structure of the proof of VCGen (18) --- you should prove the invariant holds first, and then use the invariant to prove VCGen (34).4 (.20) Safe Cast
The next generation Tortilla systems virtual machine adds an instruction safe_cast <type>. Unlike checkcast which does (expensive) run-time checking to ensure the run-time type satisfies the cast constraint, safe_cast implies the type constraint can be verified statically. Our goal is to replace the checkcast in instruction 35 with35 safe_cast <Class java.lang.String>a. (.05) VCGen
Show the clause added to VCGen to handle safe_cast.b. (.05)
Show VCGen (35), the verification predicate generated for the safe_cast version of instruction 35.c. (.10)
Prove VCGen (35) holds for the second loop. You will need to strengthen the invariant, and assume a stronger pre-condition on entry to the loop.In order to complete the proof, you would need to prove that the pre-condition you used for the second loop is true. This would involve strengthening the invariant for the first loop. (It is somewhat tedious to do this, so it is not recommended that you do so.)
5. (.20) Subtyping
Being oxygen-deprived at the top of the Eiffel tower, Mertrude Bryer suggests adding the following typing judgments to Java:
S <= T (<= means is a subtype of) ____________________ [monotonic-arrays] array[S] <= array[T] P_1 <= Q_1, ..., P_n <= Q_n, S <= T _________________________________________ [monotonic-procedures] proc (P_1, ..., P_n) returns (S) <= proc (Q_1, ... , Q_n) returns (T)Show that an attacker could exploit these rules by passing an argument to Scrunch that leads to a type safety violation. This means it passes the Java type checker, but contains a type error that is not detected at run time.
University of Virginia Department of Computer Science CS 655: Programming Languages |
David Evans evans@virginia.edu |