CS200: Exam 2 Comments

[an error occurred while processing this directive] CS200: Exam 2 Comments [an error occurred while processing this directive]

Exam 2 Comments

Sticker Interpretation

If you have a sticker on your returned exam, it can be interpreted as a lower bound (Ω) on your grade in the course (as long as you do a satisfactory job completing PS8) as described below. There is no upper bound (O) on anyone's grade for the course yet, except as constrained by the registrar (O(A+)).

Enormous Shiny Gold Smiley Face — Congratulations! You've done enough to convince me you have learned to think like a computer scientist and you deserve an "A" in the course. You are free to do the final if you want, but you are not required to. Taking the final cannot hurt your grade. As long as you do a satisfactory job on PS8 you will get no worse than an "A" in the course.
Big Shiny Multi-Colored Star — You've done well in the course, but haven't quite completely convinced me yet that you can think like a computer scientist. You have earned at least a "B+" in the course, but should take advantage of PS8 and the Final to convince me you deserve an "A".

If you did not receive a sticker, you still have an opportunity to get an "A" in the course, but will need to do an exceptional job on the Final. The Final will have a mix of basic concept understanding questions (which you need to answer well to convince me you deserve a B or better in the course) and deeper thinking questions (which will give you a last chance to convince me you have learned think like a computer scientist).

Objects and Environments

1. For Problem Set 6, Question 4, two common answers were given:

Answer A:

(define (make-professor name)
  (let ((super (make-lecturer name)))
    (lambda (message)
      (if (eq? message 'profess)
	    (lambda (self message)
	          (ask self 'say (list "It is intuitively obvious that"))
		  (ask self 'lecture message))
	      (get-method super message)))))

Answer B:

(define (make-professor name)
  (let ((super (make-lecturer name)))
    (lambda (message)
      (if (eq? message 'profess)
	    (lambda (self message)
	          (ask self 'say (list "It is intuitively obvious that"))
		  (ask self 'say stuff)
		  (ask self 'say (list "you should be taking notes")))
	    (get-method super message)))))

Explain which answer is better and why.

Average: 9.6 out of 10
Answer A is better for these reasons:

Since is uses the lecture method of its super class, if the lecture behavior changes, the profess behavior changes also.
It is shorter and avoids code duplication by using inheritance.
It is correct — answer B uses stuff which is not defined. (This was an unintended mistake on my part, but is a valid answer.)
Answer B is better for these reasons:

Since is uses the lecture method of its super class, if the lecture behavior changes, the profess behavior changes also. (Whether or not this is good or bad depends on if we want profess to mean, say "It is intuitively obvious that" and then lecture, or something else that happens to be the same as the current method for lecturing.)
It may be slightly more efficient. (Doesn't need to overhead of the extra method call to lecture.
Overall Answer A is better for most uses, but you could make an argument for Answer B (except for the unintended mistake using stuff).

2. Consider the following definitions taken from the PS6 code (without modification except for removing some methods that are not needed for this question):

(definitions removed, see original exam for details)

Provide the simplest possible expression that could have been used in the definition of mystery to produce the given environment diagram shown on the next page (removed from comments, see original exam).

Average: 6.7 / 10
(define mystery ((make-person "Tom") 'location)
Many students had a hard time with this one and answers just (make-person "Tom"). The first thing you should notice from the diagram is that mystery is pointing to a procedure with parameters self and body location. So, mystery is a procedure (lambda (self) location). The result of an application of make-person is a procedure (lambda (message) (case message (...))) so mystery must be something else. The second thing to notice is the enclosing environment is a frame that defines message as 'location. Hence, we must have applied a procedure that has a parameter message to the input 'location. That would be the result of a make-person application. Looking up to the next frame, we see the let variables for make-person, and abot that, the frame that has name value Tom. This corresponds to applying the make-person procedure to "Tom", since its parameter is name.
Note that is it not quite correct to have
(define mystery (ask (make-person "Tom") 'location)
since the application of ask would have created an additional frame which is not shown in the diagram.

3. How many frames would be created when (make-mobile-object "bike") is evaluated? (Explain why each frame is generated. Recall that let is syntactic sugar for ((lambda (...) )).)

Average: 8.1 / 10
A frame is created whenever an application is done:

Evaluating (make-mobile-object "bike") creates a frame with name bound to "bike". (Frame 1)
Evaluating the body of make-mobile-object:

The let is syntactic sugar for an application, so a new frame is created with place super. (Frame 2)
Evaluating (make-physical-object name) (to get the value for super):

Creates a new frame containing name for the application. (Frame 3)
Creates a new fram containing super and location for the let application. (Frame 4)
Evaluating (make-object name) creates a new frame containing name for the make-object application. (Frame 5)

Computability

4. Is the repeats-configuration problem defined below decidable or undecidable? For credit, your answer must include a convincing argument to support your answer.

Input: A description of a Turing Machine and input tape.
Output: If executing the Turing Machine on the given input would ever repeat a machine configuration (the state of the finite state machine and the contents of the tape), output true. Otherwise, output false. That is, the output should be true if and only if executing the Turing Machine on the given input would encounter the same exact machine configuration more than once.

Average: 7.7 / 10
The repeats-configuration problem is undecidable.
Before explaining why, we'll discuss the common attempts to show this and why they are unconvincing:

Since the tape is infinite, you cannot check if two machine configurations are identical in a finite time, so there is no algorithm to solve repeats-configuration. — Although this sounds reasonable, there are two flaws in this argument: (1) the tape is infinite, but only a finite number of squares contain symbols, as demarked by the end-of-tape symbols. Hence, checking two machine configurations are equivalent only involes examining a finite number of symbols on the tape. (2) Even if the tape could contain infinitely many symbols, that does not prove repeats-configuration is undecibale. One cannot assume it is neceesary to simulate execution of the input TM and examine the entire machine configuration, to know that a configuration never repeats. Although simulating the input TM, keeping track of all machine configurations, and checking if one ever repeats seems like the most straightforward way to implement repeats-configuration, that does not mean there could not be a different way (that doesn't involve simulating the input TM).
If we simulate the input TM and a machine configuration repeats, that means the input TM never halts. Thus, an algorithm that solves repeats-configuration could be used to solve halts? like this:
(define (halts? P I) (not (repeats-configuration? P I))). — This is a good attempt, but there is one major whole in the argument! The definition of halts? is not correct. It is true that if a machine state repeats in evaluating P on I, we know it will never halt: Turing machines are deterministic, so if a configuration is repeated, it must do the same thing as it did last time, and will repeat all the steps to return to the repeated configuration. But, it is not true that if (repeats-configuration? P I) is false that must mean (P I) halts. For example, consider a simple program that in state 1 alwyas moves Right, writes a "*" on the tape, and returns to state 1. This program never halts, but never repeats a machine configuration since it keeps writing more "*"s on the tape forever.
A correct argument needs to show how an algorithm that solves repeats-configuration would allow us to define an algorithm that solves halts. Here's how:

For simplicity of description, assume we have two infinite tapes, T1 and T2. Note that this does not increase the power of the TM, since we could represent two tapes with one tape that is folded over on itself.
Modify the input program P to replace every transition rule with an extra transition that writes a "*" on T2 and moves the T2 tape head to the right. This way, the configuration of T2 will never repeat during the execution of P.
Replace all transitions to the Halt state with transitions to a new state will: rewrite T1 (the original input tape) to contain the original input I, and erase all the "*"s on T2, and then transition to the original Start state. Note that after this is done, the machine configuration will have returned to the original input configuration.
So, the modified program P' will repeat a machine configuration if the original program P halts. But, it will never repeat a machine configuration if P does not halt (even if P would repeat a configuration, since the "*"s on T2 prevent P' from repeating). Hence, we can not correctly define halts: (define (halts? P I) (repeats-configuration (modify-program P I) I) where modify-program performs the transofmrations described above.

5. Is the repeats-fsm-state problem defined below decidable or undecidable? For credit, your answer must include a convincing argument to support your answer.

Input: A description of a Turing Machine and input tape.
Output: If executing the Turing Machine on the given input would ever repeat a finite state machine state, output true. Otherwise, output false. That is, the output should be true if and only if executing the Turing Machine on the given input would enter the same finite state machine state more than once.

Average: 8.9 / 10
The repeats-fsm-state problem is decidable.
The important difference here is that we only care about whether the FSM state repeats, even if the contents of the tape are different. There are a fininte number of FSM states, so this is decidable. Here is an algorithm that decides it:

Simulate the input TM, P for up to s + 1 steps, where s is the number of states in P. Each time P would enter a new state, record the state it enters.
If P halts before repeating a state, output false. Otherwise, output true. We know we can output true after s + 1 steps, since if P has not halted after s + 1 steps, it must have repeated a state — there are only s states to use.
This is an application of the pigeon hole principle. See this Car Talk Puzzler (http://www.cartalk.com/content/puzzler/transcripts/200404/index.html) and answer for a similar problem.

Building Web Communities

6. Bob Metcow is upset that members of his community are not interacting as usefully as he wants since there are not enough acquaintance links in his community. He decides to implement a program that will add acquaintance links between all pairs of members in his community. He adds a throw-mixer.php program to his website that adds acquaintance links between all members of his community:

    $result = executeQuery ("SELECT id FROM users");
    for ($i = 0; $i < mysql_num_rows ($result); $i++) {
        for ($j = 0; $j < mysql_num_rows ($result); $j++) {
           if ($i != $j) {
               $exists = executeQuery ("SELECT fromid FROM acquaints 
                                        WHERE fromid='$i' AND toid='$j'");
               if (mysql_num_rows ($exists) == 0) {
                 executeQuery ("INSERT INTO acquaints (fromid, toid)
                                VALUES ('$i', '$j')");
	       } } } }

How much work is throw-mixer.php? (Assume executeQuery is defined as in PS7; you may assume it is Θ(r) where r is the number of rows in the table resulting from the query. You should clearly document all other assumptions you make and define the meaning of all variables you use in your result.)

Average: 8.9 / 10
Θ(n²) where n is the number of community members (entries in the users table).
The first query produces a table of size n (and takes Θ(n) time to execute). Then, the outer loop executes once for each entry in the result table: n iterations. Within the outer loop, the inner loop ($j) executed n times. So, the body inside the two loops executes n² times. The amount of work for each iteration is constant: it involves at worst two queries, each of which involves Θ(1) work. Note that the first select returns at most one entry, so it is Θ(1).
Note: Bob Metcow is not related to Bob Metcalfe, who invented Ethernet (which is the networking protocol nearly all machines use for local networks today, including those in the ITC labs) for his 1970 Harvard PhD thesis (which was originally rejected for not being "theoretical enough") and went on two found 3Com. Metcalfe's Law states that the value of a communication system is Θ(n²) where n is the number of users on the system. (Note that this is more conservative than Reed's law, which claims that the value of a community scales as Θ(2ⁿ) which may explain why http://www.thefacebook.com/ is more useful than your PS7 site.

7. Alyssa P. Hacker is upset to discover the PS7 community code cannot support communities with more than 1000 members (see PS7, question #5 comments) since because of her tremendous computer science understanding she has many thousands of friends. So, she reimplements Dijkstra's algorithms used in find-links-process to use a priority queue implementation. The reduces the work require for findmin to Θ(log n) and the overall work of find-links-process to Θ(n log n + m) where n is the number of community members and m is the total number of links.

She conducts experiments with communities containing 200 and 400 members and obtains these measurements:

Community Size	Maximum Query Response Time
200	1.5 seconds
400	3.5 seconds

Note: The original exam used a table that was inconsistent with the text (100 and 200 members instead of 200 and 400).

Estimate how big a community Alyssa's new implementation can support. (As in PS7, assume that the only constraint on community size is that the mamximum query response time cannot exceed 60 seconds. State clearly any additional assumptions you need to make.)

Average: 8.6 / 10
About 5000 members. The easiest way to answer this is just some trial and error expression evaluations:
> (define (nlogn n) (* n (log n)))
> (define w (/ 3.5 (nlogn 400)))
> (* w (nlogn 200))
1.547544063485211
> (* w (nlogn 1000))
10.08816095227816
> (* w (nlogn 10000))
134.50881269704217
> (* w (nlogn 5000))
62.19300793565135

8. Because of their expertise in building dynamic web sites, Alyssa and her friends rapidly acquire new friends. Assume each member of Alyssa's community invites one new member into the community every year. Since those members also learn to build dynamic web sites, after one year each new community member also invites a new member into the community, and continues to do so every year. Thus, the size of Alyssa's community doubles every year. If Alyssa's community starts with 1000 members in 2004, for how many years can her community grow before the maximum query response time exceeds 60 seconds. Alyssa's web site runs on ITC's web server which improves consistently with Moore's Law (that is, assume its processing power and memory double every 18 months). (Note: you may find it helpful to write a short program to solve this problem. You may use any language you want. Include all the code you wrote with your answer on an attached sheet.)

Average: 8.1 / 10

About 6 years. Here is my interactions buffer (some of the code is taken from Lecture 19):

> (define (communitysize init years)
    (if (= years 0) init
        (communitysize (* 10 init) (- years 1))))
> (define (time-to-solve n years)
    (/ (nlogn n) (computing-power years)))
> (define (computing-power years)
    (if (= years 0) 1 (* 1.586 (computing-power (- years 1)))))
> (define (time-to-solve n years)
    (* w (/ (nlogn n) (computing-power years))))
> (define (communitysize init years)
    (if (= years 0) init
        (communitysize (* 2 init) (- years 1))))
> (define nmembers 1000)
> (time-to-solve (communitysize nmembers 4) 4)
35.74975753242224
> (time-to-solve (communitysize nmembers 5) 5)
48.30966931405207
> (time-to-solve (communitysize nmembers 6) 6)
64.9907656681721

Lambda Calculus

In Lecture 33, we saw the following definitions:

T ≡ λ x (λ y . x)
F ≡ λ xy . y
if ≡ λ pca . pca

cons ≡ λ xy . (λ z . zxy)
car ≡ λ p . p T
cdr ≡ λ p . p F

null ≡ λ p . T
null? ≡ λ x . (x λ y . λ z . F)

zero? ≡ null?
pred ≡ cdr
succ ≡ λ x . cons F x

0 ≡ null

Suppose we change the definition of succ to be:

succ ≡ λ x . cons T x

9. Show pred (succ 0) reduces to 0 with the new definition. Make sure the reduction steps you perform are clear in your answer.

Average: 8.4
   pred (succ 0)
→ cdr (succ 0)    substituting definition of pred
→ (λ p . p F) (succ 0)    substituting definition if cdr
→_β (succ 0) F    reducing outer lambda (succ 0) binds to p
→ ((λ x . cons T x) 0) F    substituting definition of succ
→_β (cons T 0) F    reducing outer lambda 0 binds with x
→ ((λ xy . (λ z . zxy)) T 0) F    substituting definition of cons
→_β ((λ y . (λ z . z T y)) 0) F    reducing outer lambda T binds to x
→_β ((λ z . z T 0)) F    reducing outer lambda 0 binds to y
→_β T 0 F    reducing outer lambda T binds to z
→_β 0    reducing several times, using meaning of T

10. A popular Computer Science textbook defines Computer Science as:

Computer Science is the study of algorithms including:

Their formal and mathematical properties
Their hardware realizations
Their linguistic realizations
Their applications

Is this a good definition? Explain why or why not. (Hint: you may safely assume that the person grading your answer to this question believes every question on this exam is about Computer Science.)

Average: 8.7
This definition is from G. Michael Schneider and Judith L. Gersting's Invitation to Computer Science. Since the definition uses algorithms, we need a definition of algorithms also. The one they use is similar to the one we used in class:
algorithm — a well-ordered collection of unambiguous and effectively computable operations that, when executed, produces a result and halts in a finite amount of time.
If we interepret this strictly, it would mean that question 4 and 8 do not belong on this exam: question 4 asks whether or not an algorithm exists to solve a problem (but does not require any consideration of an actual algorithm to answer it), and question 8 asks a questions about a process that could go on for ever. In the definition's defense, however, it would be reasonable to include the study of whether or not algorithms exist for a given problem in the study of algorithms.
The other limitation of the definition of algorithm is it assumes a single entity is carrying out the execution. Computer Science also studies protocols, in which multiple independent entities are following their own rules. For example, the SSL protocol we covered in Lecture 36 is not an algorithm since it depends on multiple entities following their own steps to achieve a result.
Another way to consider this question was to look at the specific issues the list. This is problematic since the definition uses including so it does not mean the list is complete. Thus, it is a bad definition if understanding the including ... part changes the meaning (a good definition would allow you to determining if something meets the definition or not; an including list only allows you to know if something meets the definition, but does not allow you to say something does not meet it). On the other hand, if this is a Computer Science course, and their definition is good, anything important enough to be listed in their including list should be covered by this course. In fact, there are questions on this exam about all four topics:

Their formal and mathematical properties — Questions 4 and 5 ask whether or not an algorithm exists to solve some problem; Question 6 asks about a property of an algorithm.
Their hardware realizations — Question 8 asks about a property of an algorithm implemented on ITC's web server. (But, we did not cover much about hardware realizations in this class.) Question 2 and 3 ask about the machine state while an algorithm is interpreted.
Their linguistic realizations — Questions 1, 2, 3 and 9 ask about linguistic descriptions of algorithms.
Their applications — Questions 7 and 8 ask about using an algorithm to build a community.
So, I think it is a pretty good definition, but not a great one. A better definition would use procedure instead of algorithm (both to use a simpler word, and to make it clear that we include procedures that do not always terminate within the scope of computer science), and would not have included the including part (at least as part of the definition).