cs205: Problem Set 2

Problem Set 2 - Comments
Procedural Abstraction and Using Abstract Datatypes

For the next question, consider these two specifications for the sort procedure:

A. From the Java 2 Platform API documentation (java.util.Arrays):

public static void sort(int[ ] a)

Sorts the specified array of ints into ascending numerical order. The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance.
Parameters:
a - the array to be sorted.

B. From the textbook (p.46):

public static void sort(int[ ] a)

MODIFIES: a
EFFECTS: Rearranges the elements of a into ascending order.
e.g., if a = [3, 1, 6, 1], a_post = [1, 1, 3, 6]

1. (10) Describe three advantages of specification B over specification A.

There are many advantages:

Specification B is shorter than Specification A.
Specification B is declarative, specification A is operational.
Specification B states clearly what may be modified by the method.
Specification B does not overconstrain the implementation. Specification A requires a particular sorting algorithm; if a better algorithm is discovered for a particular platform, it cannot be used if Specification A is used.

2. (5) Describe one scenario where specification A is more useful.

If the client needs to know the performance of the sort method, the operational information in specification A may be helpful. This would be the case, for example, if the client is using sort to sort a large array, and needs to know if it will perform adequately (on all Java platforms), or if it is necessary to implement a sort routine that has known performance properties.

3. (10) Write a complete, declarative specification of histogram. Your specification should be enough for a client to safely use the implementation provided above.

public static int [] histogram (int [] a)
   // REQUIRES: a is non-null and the values in a are non-negative.
   // EFFECTS:  Returns an array, result, where result[x] is the
   //    number of times x appears in a.  The result array has
   //    maxval(a) + 1 elements.  For example, 
   //      histogram ({1, 1, 2, 5}) = { 0, 2, 1, 0, 0, 1 }

Note that the REQUIRES is needed. The implementation provided would produce a run-time exception if its argument includes a negative value.

4. (10) Write an alternative specification for the histogram procedure that places less burden on the client by using exceptions. What are the advantages and disadvantages of this specification compared to your answer to question 3?

public static int [] histogram (int [] a) throws NegativeValue
   // EFFECTS:  If a contains any negative values, throws NegativeValue.
   //    If a is null, throws a NullPointerException.
   //    Otherwise, returns an array, result, where result[x] is the
   //    number of times x appears in a.  The result array has
   //    maxval(a) + 1 elements.  For example, 
   //      histogram ({1, 1, 2, 5}) = { 0, 2, 1, 0, 0, 1 }

(Note we need to create the NegativeValue exception class.)

The advantage of this specification is the behavior is predictable when an array containing a negative value is passed in. The disadvantage is it requires a more complex implementation.

5. (10) Write an alternative specification for the histogram procedure that is total. A client should be able to pass any array of integers into this procedure and obtain a meaningful return value. What are the advantages and disadvantages of this specification compared to your answers to questions 3 and 4?

A better solution requires changing the return value of histogram to represent the histogram in a more complex way. One possibility is to keep the int [] array return value, but start the indexes from the minimum value in a:

   public static int [] histogram (int [] a) 
   // EFFECTS:  If a is null, returns { }.  Otherwise, 
   //    returns an array, result, where result[minValue(a) + x] is the
   //    number of times x appears in a and minValue(a) is the lowest value
   //    in a.  The result array has maxValue(a) - minValue(a) + 1
   //    elements.  For example, 
   //      histogram ({1, 1, 2, 5}) = { 2, 1, 0, 0, 1 }
   //      histogram ({-2, 0, 1, -2}) = { 2, 0, 1, 1 }

This is a bit awkward for a client, however, since it needs to know minValue(a) to use the return value usefully.

Another option is to return a value table:

   public static java.util.HashMap histogram (int [] a)
   // EFFECTS: Returns a HashMap where the value associated with x in 
   //    the result is the number of times x appears in a.  That is,
   //      if result.containsKey (x)
   //         appearances of x = result.get (x)
   //      else
   //         appearances of x = 0.

This may be more useful for the client, but requires changing the return type to a data type that can represent a mapping (such as the java.util.HashMap used here.

6. (17 points for code, 8 points for tests) Impement the task schedule as described above. Your program should take a file name as input, and output a valid schedule and completion time for completing the first task in the file. You implementation should be total: no matter what input it is given, it should behave in a sensible way. You should use procedural abstraction to avoid code duplication and enhance the clarity of your implementation.

You should implement your program by creating a new TaskScheduler class (in the ps2 package) with a main method.

There are lots of possible implementations. Here is a solution that follows the structure we talked about in class ( http://www.cs.virginia.edu/cs205/ps/ps2/mine/TaskScheduler.java). Note that it avoids the problem of circular dependencies in a fairly kludgey way (using the depth and maxdepth parameters of scheduleTask to recognize when too many nested calls to scheduleTask have been made). There are many more elegant ways to handle this problem, but the provided DirectedGraph datatype is fairly inadequate (for example, without a remove operation, or a way of iterating through all nodes, we cannot implement a standard topological sort).


package ps2;

import java.util.Scanner;
import java.util.Vector;
import java.io.*;

public class TaskScheduler {
   static private void userError(String s) {
      System.out.println("Error: " + s);
      throw new RuntimeException ("Unable to continue");
   }

   static private Task lookupTask(String s, Set<Task> tasks)
   // EFFECTS: If s names a task in tasks, returns the task with the
   // matching name. Otherwise, returns null.
   {
      for (Task t : tasks) {
         if (t.getName().equals(s)) {
            return t;
         }
      }
      return null;
   }

   static private void scheduleTask(Task t, Vector<Task> schedule,
         DirectedGraph<Task> tgraph, int depth, int maxdepth)
   // REQUIRES: t is a task in tgraph, and schedule is a valid task schedule.
   //    A valid schedule is a schedule in which every
   //    task in schedule is preceded by the tasks on which it
   //    depends in tgraph.
   //    The value of depth is the number of nested calls to scheduleTask.
   // MODIFIES: schedule
   // EFFECTS: If depth is greater than maxdepth, issues an error message
   //    and exits.  Otherwise, schedule_post is a valid task schedule that
   //    includes task t, and all of the tasks in schedule_pre.
   {
      if (depth > maxdepth) {
         userError ("The input graph contains a cycle!");
      }
      
      if (schedule.contains(t)) {
         // Already scheduled, nothing to do.
      } else {
         Set<Task> depends;
         try {
            depends = tgraph.getAdjacent(t);
         } catch (NoNodeException e) {
            throw new RuntimeException("Requires violated, no node: " + e);
         }

         // Schedule dependent tasks
         for (Task dt : depends) {
            scheduleTask(dt, schedule, tgraph, depth + 1, maxdepth);
         }
         // All dependent tasks done, ready to do t
         schedule.add(t);
      }
   }

   public static void main(String[] args) {
      String maintask = null;
      Set<Task> tasks = new Set<Task>();
      DirectedGraph<Task> tgraph = new DirectedGraph<Task>();

      if (args.length < 1) {
         userError("Usage: the first parameter must be the name of a file.");
      }

      // Java compiler (unnecessarily) complains about possible
      // non-initialization without this.
      Scanner s = null;

      try {
         s = new Scanner(new File(args[0]));
      } catch (FileNotFoundException e) {
         userError("Unable to open file: " + e);
      }

      // Read task list file. Each line in the file describes a task.
      // entry ::= <name> <time> <depends>
      // depends ::= { <name>* }
      // The first entry is the main task (that must be completed).

      while (s.hasNextLine()) {
         // Read one line
         String line = s.nextLine();
         if (line.startsWith("#")) {
            // comment line, no tokens
         } else {
            Scanner linescanner = new Scanner(line);
            String name = linescanner.next();
            if (maintask == null) {
               maintask = name;
            }
            int time = linescanner.nextInt();
            String lbrace = linescanner.next();
            if (!lbrace.equals("{")) {
               userError ("Task line does not have dependency list (expected {):" 
                                      + line);
            }
            Vector<String> depends = new Vector<String>();
            boolean gotbrace = false;
            while (linescanner.hasNext()) {
               String dname = linescanner.next();
               if (dname.equals("}")) {
                  gotbrace = true;
                  break;
               }
               depends.add(dname);
            }
            if (!gotbrace) {
               userError ("Task line dependency list is missing closing brace:" 
                                      + line);
            }

            if (lookupTask(name, tasks) != null) {
               userError ("Duplicate task in input file: " + name);
            }

            Task t = new Task(name, time, depends
                  .toArray(new String[depends.size()]));

            tasks.add(t);
            try {
               tgraph.addNode(t);
            } catch (DuplicateException de) {
               throw new RuntimeException ("Unexpected exception: " + de);
            }
         }
      }

      if (maintask == null) {
         userError ("No main task: the input file must contain at least one task.");
      }
      
      for (Task t : tasks) {
         String[] dependents = t.getDependencies();
         for (String d : dependents) {
            Task dt = lookupTask(d, tasks);
            
            if (dt == null) {
               userError ("Dependent task for " + t.getName() + " not listed: " 
                                      + dt.getName());
            }
            try {
               tgraph.addEdge(t, dt);
            } catch (DuplicateException de) {
               userError ("Duplicate dependency for " + t.getName() + " on " 
                                      + dt.getName());
            } catch (NoNodeException nne) {
               System.err.println ("BUG: No node error: " + nne); 
            }   
         }   
      }

      Vector<Task> taskOrder = new Vector<Task>();
      scheduleTask(lookupTask(maintask, tasks), taskOrder, tgraph, 
                                0, tasks.size() + 1);
      int timerequired = 0;
      for (Task t : taskOrder) {
         timerequired += t.getTime();
      }

      System.out.println("Schedule: " + taskOrder);
      System.out.println("Completion time: " + timerequired);
   }
}

7. (10) Write a specification for your program. Your specification should be total: it should describe what the program does on all possible inputs.

It is quite difficult to write a clear and complete specification for the main method. The specification needs to describe the input file format and constraints on the contents of valid files. We write a total spec (no requires clause) that describes the behavior on all possible inputs (even though for some inputs the only sensible behavior is to report an error and exit). An alternative would be to use a REQUIRES clause to impose preconditions on the input, including the input file. This makes the implementation easier, but means the behavior on some inputs is unpredictable.

EFFECTS: If the command line arguments do not contain at least one element, and the first element is not the name of a readable file that satisfies the file format described below, the output is an error message. Otherwise, outputs a valid schedule for completing the first task listed in the input file. A valid schedule is a schedule in which: (1) only tasks that must be completed to complete to first task are included, and (2) every task is preceeded by the tasks on which it depends.
The input file is described by this (somewhat informal) grammar:
    Line ::= CommentLine | TaskLine
    CommentLine ::= # any characters up to and including the end of line
    TaskLine ::= Name Number { DependsList } end of line
    DependsList ::= 
    DependsList ::= Name DependsList
    Name ::= a sequence of non-space printable characters
    Number ::= a sequence of characters that can be parsed as an integer
Every task name that appears in a DependsList must also appear as the Name of a TaskLine. The same Name may not appear as the Name on more than on TaskLine; the same Name may not appear more than once in a DependsList. The file must describe an acyclic graph. That is, if a given task A depends on task B, there must be no path in the graph from B to A.

8. (16) Describe a testing strategy for an implementation of the task schedule program. Your answer should include a list of black-box test cases, and any additional glass-box test cases.

The black-box test cases should explore paths through the specification; the glass-box test cases should explore paths through the implementation.

There are many tests needed to try all paths through the specification. In particular, we should try:

Valid and invalid command lines — 0, 1 and more arguments (note that our specification implies that it is okay to have more than one argument, but doesn't do anything with the additional arguments)
An unreadable file
An empty file
Readable files that do not satisfy the file format. Lots of possibilities including any input that does not satisfy the provided grammar. Obvious possibilities include files with incomplete task lines (no Number, no {, no }) at the end and in the middle of the file.
Readable files that satisfy the file format but not the constraints. Obvious possibilities include having task names in depends lists but not in task lines, having cyclic dependencies, duplicate tasks, and duplicate tasks in the depends lists.
Valid inputs including tasks with deep dependency chains, inputs that include unnecessary tasks (not needed to complete the first task), inputs where there are no dependent tasks for the first task.

For glass box testing, we consider any paths through the code that do not appear to be tested by our black box test cases. The main issue this turns up is from the code for scheduleTask — we should be sure to include some test cases where the true brach of the schedule.contains(t) predicate would be taken. For example,

    main 10 { a b }
    a 10 { c } 
    b 10 { c }
    c 5 { }

Note that the implementation contains infinitely many possible paths, so we cannot possible test them all. But, the black box and glass box tests described about should at least cover all lines in the program.

9. (10) How confident are you that your program will always work as intended? (Where "as intended" means as you specified it in question 7, except if there are inputs that are not covered by your specification it must behave as the course staff intended.) Express your answer as a bet of between 0 and 20 points. If the customer (grader) agrees that your program always works as intended, you get the points. If not, you lose twice your bet. You should assume that your program will run with implementations of the provided datatypes that satisfy their specifications, but not necessarily the same implementations as were provided.

I don't know of any inputs for which my code fails, but I'd be very surprised if it is completely correct. But, I'll a gambler, so I'll bet 20 points. If you find an input for which my code behaves badly, you get the 20 points.

Here are the tests I ran (for people who bet more than 0 points for question 9):

No command line arguments
engine.tsl — the example
missingbrace.tsl — file contains a task line with a missing close brace
duplicates.tsl — file contains duplicate tasks with the same name
dupdepends.tsl — file lists the same task twice in a dependencies list
circular.tsl — file contains circular dependencies
unused.tsl — file contains unnecessary tasks (not needed to complete first task)