Problem Set 3 - Comments
Implementing Data Abstractions
1. (8.375 average out of 10) What abstraction function and rep invariant are needed
to make the implementation of degree above satisfy
its specification?
This is the implementation of
degree:
public int degree () {
// EFFECTS: Returns the degree of this, i.e., the largest exponent
// with a non-zero coefficient. Returns 0 if this is the zero Poly.
return terms.lastElement ().power;
}
We need these properties to ensure the code will execute without any
possible run-time exceptions:
terms != null (otherwise terms. could give a null object exception)
terms.size > 0 (otherwise lastElement () could give a
NoSuchElementException)
terms does not contain null (otherwise .power could
give a null object exception)
This is not enough to know the implementation meets its
specification. We also need to know also that the last element in the vector
corresponds to the element with the largest exponent with a non-zero
coefficient. A sufficient rep invariant would state that no element in
the
terms array has a power higher than that of the last
element and a non-zero coefficient:
for all 0 <= i < terms.size - 1:
terms[i].power < terms[terms.size - 1].power \/ terms[i].coeff == 0
/\ terms[terms.size-1].coeff != 0
A more reasonable rep invariant would require that the terms
are sorted by their power value, and that all the terms have non-zero
coefficients (except for representing the zero poly):
terms.size == 1 && terms[0].power == 0 && terms[0].coeff = 0
\/ for all 0 <= i < terms.size:
terms[i].power >= 0
terms[i].coeff != 0
for all 0 < j < i, terms[j].power <= terms[i].power
In an alternate implementation with the same rep (that is, the
abstraction function and rep invariant may be different), suppose this
is the
implementation of
coeff:
public int coeff (int d) {
// EFFECTS: Returns the coefficient of the term of this whose
// exponent is d.
int res = 0;
for (TermRecord r : terms) {
if (r.power == d) { res += r.coeff; }
}
return res;
}
2. (8.625 / 10) What rep invariant would make the
implementation of coeff above correctly satisfy
its specification?
Note that the loop sums the values of the coefficients for every record
with its power matching
d. This indicates that the author of
this code is not assuming a given power can only exist in the terms
once. Hence, the rep invariant could be:
terms != null (otherwise terms. could give a null object exception)
terms does not contain null
Nothing else is needed to be convinced the implementation is correct.
3. (9.125 / 10) Explain how a stronger rep invariant would
make it possible to implement coeff more efficiently.
A stronger rep invariant would make it possible to implement
coeff more efficiently. If the implementer of
coeff
could rely on the terms not containing duplicate powers, then we could
implement
coeff to return right away after finding the first
matching power, instead of having to look through all the terms.
If we made the rep invariant even stronger:
terms[i].power = i for 0 <= i < terms.size
(that is, the vector contains a term record for every power in order) we could
implement
coeff with just:
public int coeff (int d) {
if (d >= terms.size ()) return 0;
else return terms.getElementAt (d).coeff;
}
Note that this rep invariant does not work well for sparse polys (e.g.,
representing 3
x2034 now requires a
terms
vector with 2035 elements).
4. For each representation choice (a, b, and c), provide an abstraction
function and rep invariant.
a. (7.6 / 10)
Vector<String> nodes;
boolean [][] edges;
The most obvious interpretation of this representation is that the
boolean value at edges[i][j] determines if there is an edge from
nodes.elementAt(i) to nodes.elementAt(j). This means the nodes vector
must not contain duplicates (otherwise there could be different edge
values for the same node pairs). It also means the size of the edges
array must be at least as big as the nodes vector in both directions.
It would be reasonable to require it to be equal in size instead, but
that would be overly restrictive since it would require copying the
edges matrix into a new, bigger matrix everytime a node is added.
Abstraction function:
Nodes = { nodes[i] | 0 <= i < nodes.size () }
Edges = { { nodes[a], nodes[b] } |
forall 0 <= a, b < nodes.size()
where edges[a][b] = true }
Rep Invariant:
nodes != null; edges != null
no duplicates in nodes
edges.length >= nodes.size ()
forall 0 <= i < nodes.size: edges[i].length = edges.length
The last two clauses state that edges is a square matrix,
at least as big as nodes (but possibly bigger).
b. (4.0 / 5)
Set<String> nodes;
Set<Edge> edges;
where
Edge is a record type containing two
String values:
class Edge {
String a, b;
Edge (String p_a, String p_b);
}
This one maps very naturally onto the abstract notation. The only
important invariant is that all the names of nodes in the
Edge
objects in
edges match names of nodes in
nodes:
Abstraction Function:
Nodes = nodes
Edges = edges
Rep Invariant:
nodes != null
edges != null
for all e in edges, e.a and e.b are in nodes
c. (3.5 / 5)
Set<NodeRecord> rep;
where
NodeRecord is a record type that records a String and an
associated set of Strings:
class NodeRecord {
String key;
Set<String> values;
}
Here, the abstraction function is more complicated since we need to
extract the nodes and edges from the
NodeRecord objects:
Abstraction Function:
Nodes = { el.key | el is an element in rep }
Edges = { { el.key, value } | el is an element
in rep and value is an element in el.values }
or, more precisely
Edges = {}
for (NodeRecord r: rep) {
for (String val: r.values) {
Edges = Edges U { { r.el, val } }
}
}
Rep Invariant:
rep != null
elements of rep are not null
all elements in e.values where e is an element of rep match the
value of f.key for f some element of rep
5. (7.375 / 10) Which representation choice would make implementing
addNode most difficult? Explain why.
Representation A. To add a node, we need to not only add it to the
nodes vector, but also may need to expand the size of the edges matrix
to preserve the rep invariant. For both representations B and C, adding
a node can be done by just adding a new element to a set.
6. (4.375 / 5) Which representation choice would enable the most efficient
getAdjacent implementation? Explain why.
The only reasonable answer to this question is "it depends". It depends
on two things:
- What is meant by efficient?
- What types of graphs are we dealing with?
If
efficient means the asymptotic running time of
getAdjacent, then we need to consider how the work scales with
the size of the graph. (If you have not yet taken CS150 or CS216, this
explanation probably won't make much sense. Don't worry about this, but
it is included since CS150 graduates should be thinking this way.) For
representation A, we need to find the node in the
nodes vector.
With the given rep invariant that does not impose any ordering on the
strings in the vector, this requires up to one comparison with every
string in the vector, so it is Θ(
n) where
n
is the number of nodes. Then, we need to look through one row of the
edges matrix to find the edges. For each true value, we find the
corresponding element in nodes (this is constant time), and add the
corresponding string to the result (also constant time, given a good
Set
implementation). The size of the matrix is the number of nodes,
n, so this operation is also Θ(
n). Performing
two Θ(
n) operations in sequence is also
Θ(
n), so the total running time for implementation A
scales linearly with the number of nodes in the graph. For
representation B, we need to go through the elements of the Edge array
to find all edges that start with the parameter node. This requires
e iterations where
e is the number of edges in the
graph, so it Θ(
e). Note that the number of edges in a
graph can scale as the square of the number of nodes (if every node is
connected to all nodes), so this is worse that Θ(
n) for
representation A. For representation C, we need to find the node
record element of the rep with a key matching the parameter, and then
return a copy of the set of values associated with that node. (Note
that it must be a copy, otherwise the rep is exposed.) This requires
Θ(
n) iterations to look through the set elements, and
then Θ(
n) work to copy the values (the maximum number of
edges for a given node it
n). So, it is equivalent to
representation A, Θ(
n). Thus, for asymptotic running
time, the most efficient representations are A and C.
If efficient means the actual running time on a particular Java
implementation, then it is hard to know without knowing details of the
underlying Vector and Set implementations.
If efficient means minimal memory usage, then they are all
equivalent (all need to create the Set to return), unless we allow the
rep to be exposed. If rep exposure is allowed (that is, we modify the
spec for getAdjacent to require that the called may not modify
the result or use it after the graph is modified), then C can be
implemented most efficiently by just returning the corresponding
values Set.
The other part of the efficiency question depends on what types of
graphs we are representing. If the graphs are very dense (the number of
edges is scaling as the square of the number of nodes), then the first
representation may be best since it represents the edges with a fixed
size matrix. If the graphs are sparse (there is a large number of
nodes, but most nodes are just connected to a few other nodes), then we
are better off with either B or C.
7. (16.75 / 20) Implement the StringGraph datatype specified above.
You may use any of the datatypes from PS2 you want except the
DirectedGraph datatype (note that the specification suggests
using the NoNodeException, DuplicateException, and
Set provided datatypes). Your implementation may use any
representation you want (including the ones describe above, but not
limited to those choices). Your implementation should clearly document
its abstraction function and rep invariant.
My implementation using the three different reps from question 4 are
attached and available in
http://www.cs.virginia.edu/cs205/ps/ps3/ps3-mine.zip.
Note that I made
StringGraph into an
interface, so I can have the three different implementations
StringGraphA,
StringGraphB, and
StringGraphC
implement that interface. This makes them all subtypes of
StringGraph, so they can be used interchangeably (as they are
in the test code). Implementation B is the shortest, but C is probably
the simplest except for the complex
toString method (which is
related to the complex abstraction function needed).
8. (8.875 / 10) Describe a testing strategy for your StringGraph
datatype. Include all the code you developed for testing in your
answer.
We can develop most of the testing strategy independent of the
implementation (that is, black box testing). We should try all
operations on an empty graph, a graph with some nodes and no edges, and
a graph with many nodes and edges. We should try inputs that cover all
paths through the method specifications:
- addNode — add a duplicate node, add a non-duplicate
node
- addEdge — add an edge where s is not a node, add an
edges where t is not a node, add an edge where both s and t are not
nodes, add a duplicate edge, add a valid edge
- getAdjacent — s is not a node, s is a node. We
should also try calls that should produce an empty set and a set with
several elements.
- toString — test on the empty and full graphs
See the provided code for my tests:
TestGraph.java
9. (8 / 10) Consider adding a removeNode method to the
StringGraph datatype that removes a node from a graph. Write a
declarative specification for the removeNode method. Consider
carefully what should happen with the edges of the graph when a node it
removed, and make sure your specification is total.
public void removeNode(String s) throws NoNodeException
// MODIFIES: this
// EFFECTS: If s is not a node in this, throw NoNodeException.
// Otherwise, remove s from the nodes of this, and removes
// all edges from the edges of this where either endpoint
// of the edge matches s.
10. (9.5 / 10) For this question you have a choice, either do choice 1 or
choice 2:
- Choice 1: Implement the removeNode method you specified in
question 8.
- Choice 2: Implement the generic DirectedGraph datatype (as
specified in Problem Set 2) that generalizes the node type to be any
object type instead of String.
Unsurprisingly (and disappointingly), no one choose choice 2 even though
it is much easier (but required figuring out a few new things on your
own, or from the examples already provided). To implement a generic
DirectedGraph dataype, all you would need to do is add <T> to the class
declaration, and replace String with T at appropriate
places in the implementation.