This page does not represent the most current semester of this course; it is present merely as an archive.
Most machines, from the simplest hammers to the most sophisticated appliances,
A general-purpose computing machine is a special machine where the control, data, and modified environment are all the same. This allows one to have the computer modify its own instructions, expanding the capability of the finite set of core operations to perform arbitrarily complex work. General-purpose computing machines have become so pervasive that they are now commonly called simply “computers,” a term formerly reserved for a human occupation.
The first known general-purpose computer to be designed was Charles Babbage’s Analytical Engine, designed and partially constructed in the 1830s. Although Babbage designed it, it was Ada King-Noel, Countess of Lovelace, who first realized it was general-purpose and published the first algorithm for it, thus making Babbage the first computer engineer and Lovelace the first computer scientist. Babbage’s machine was never completed and no subsequent work built on Babbage and Lovelace’s pioneering efforts.
In 1933 Kurt Gödel and Jacques Herbrand defined a class of general recursive functions; in 1936 Alonzo Church expanded on this to define the lambda calculus; and later in 1936 Alan Turing defined an abstract notion of a universal computing machine, later known as the Turing Machine. In his 1939 Ph.D. dissertation Turing (with his Ph.D. advisor Alonzo Church) proposed that these three are all equivalent, and that no more computationally-powerful machine can exist. Known as the Church-Turing thesis, the postulate that no more powerful machine can exist cannot be proven but is generally acknowledged as true. Of the three models, Turing’s had the advantage that Babbage’s had had 100 years earlier of being a description of a plausibly constructable machine, but was not realistic enough to actually be constructed until decades later, when people started creating them as historical novelties.
Although many people were involved in the development of functional universal computing machines1, the clearest progenitor to how most computers are designed is John von Neumann, as described in the 1945 publication First Draft of a Report on the EDVAC. This model is the subject of our next section.
The von Neumann architecture consists of the following parts:
The memory unit, which acts like an array or list of numbers. Given an address (an index into the array) it gives back the number stored there; given an address and a number, it stores the number at that address.
The control unit, which has several parts:
The arithmetic logic unit (usually abbreviated as ALU), consisting of
Ability to communicate with external devices, such as disks, keyboards, screens, etc.
Write the basic von Neumann architecture simulator in the programming language of your choice. You should include:
A list or array of 256 0
s as a representation of memory
, in such a way you can easily change this to have different specific values inside it.
A list or array of 4 0
s as a representation of the program registers
.
Two variables, pc
and ir
to store the contents of the program counter and instruction register, respectively.
A newPC
function that returns the value the pc
should have next based on the current pc
and ir
; you can just return -1
for now.
A work
function that uses the contents of the ir
to decide what to do at each step; you can have it do nothing at all for now.
A cycle
function that does the following:
newPC()
into pc
memory[pc]
into ir
work()
A run
function that repeatedly:
pc
cycle
ir
and registers
pc
is negative or larger than memory
can accessIf you run the run
function you should see something like
pc = 0
ir = 0; r0=0, r1=0, r2=0, r3=0
We’ll come back to this basic simulator and flesh it out in subsequent sections.
The core idea of a general-purpose computer is that the contents of memory describe the actions to take. But memory just stores bytes, and there is no intrinsic right set of things to do on these bytes. Various computational theorists have proven than even very limited sets of operations can be “complete” in the sense that they can be used to emulate everything else, but the general model is to have a range of actions that let programmers easily make the computer do what they wish.
The set of instructions a computer provides, together with how those instructions are encoded, defines the computer’s Instruction Set Architecture or ISA.
The most common three categories of actions are
One of the pillars of the imperative programming paradigm2 is the assignment operator =
. All it does is take a value which exists in one place and put it into another place, but as you have already seen that operation allows a great deal of flexibility.
When building a machine, making circuitry that is as general as programming’s =
is not trivially possible. Recall that our work()
function must decide what to do based only on information it can see, generally meaning the number stored in ir
. register[0] = register[1]
is going to have to have a different number for it than is register[2] = register[7]
; and both will be different from register[3] = memory[131]
or register[3] = memory[register[0]]
.
That said, once we enumerate all of the moves we think are important, we can make them all happen in work()
with a long but simple if
-statement, or equivalently with a large mux in hardware.
After =
, the next building block of most imperative programming languages is binary operators. These are big networks of gates: +
we’ve already looked at, and -
is very similar; *
is basically one +
for each bit in the input numbers; and \\
and %
, while quite a bit more complicated still, can still be handled with a lot of gates.
There are other operators we’ll need too, like >=
and its friends, and boolean operators as well, but in the end they’re all just a bunch of gates.
To make the machine use them, though, we have an additional complexity. If there were a lot of ways to combine a source and destination for moves, there are even more ways to combine two sources and a destination for maths. Even using just 8 registers, there are 83 = 512 possible combinations of register[?] = register[?] + register[?]
, and more when we get to operands in memory or literals.
That said, the work()
for maths is not dramatically different from the work()
for move; there is a lot more circuitry in if (ir == 12345): register[2] = register[3] * register[4]
than in if (ir == 12345): register[2] = register[3]
, but the logic needed to make it work is still basically a big mux.
While moves and maths have direct correlation to components of common programming languages, jumps are somewhat less direct.
The concept of a jump is a program instruction that specifies the new pc
as something other than the next instruction. These can enable loops:
|
|
and conditionals
|
|
and also, by combining those, all the other kind of control constructs: while
loops are loops with a conditional redirection, function calls and returns are (mostly just) jumps to and from the function code, etc.
There is no work()
in most jumps; instead, they add a mux to the newPC()
function.
There is no one “right” way to encode instructions into binary, but it is common to do this by having particular sets of bits with particular meanings. For example, we can specify one of 4 registers using 2 bits; one of 8 operations using 3 bits; etc. Thus, a simple way to create an instruction encoding is to figure out all the information any action will need, figure out how many bits we need to encode that, and then reserve a set of bits in the instruction for that information.
Suppose we want the following operations:
a = a op b
where op
is one of +
, -
, *
, and <
; and a
and b
are registersThat’s eight operations, so we need 3 bits to encode which one is which. We also need 4 bits to encode up 2 registers and 8 more to encode a literal value, for a total of 15 bits. 15 bits fits into 2 bytes, so we can make our encoding be the following:
Look up the high-order 3 bits of the first byte on the following table:
bits | meaning |
---|---|
000 |
move a literal into a register |
001 |
jump to address if register non-zero |
010 |
move from memory to register |
011 |
move from register to memory |
100 |
add one register to another |
101 |
subtract one register from another |
110 |
multiply one register by another |
111 |
set one register to 1 if it is less than another, 0 otherwise |
The low-order 4 bits of the first byte give two registers; if the action changes a register, that is given by the higher-order two bits; if a register is not changed, it is given in the lower-order two bits.
The second byte is the literal value (for 000
) or address (for 001
).
This is of course not the only encoding we could make, but it does contain all the needed information.
Using this encoding, to encode
x = 3
while x < 50
x *= x
we’d first translate that into steps our example ISA has access to
r0 = 3
r1 = 50
r0 = r0 × r0
r2 = r0
since we don’t have this operation, we’ll do it as
r2 = 1 if r2 < r1 else 0
if (r2 ≠ 0) jump to step 3
and then encode those instructions into binary (using _
for unspecified bits
000 00__ 00000011
000 01__ 00110010
110 0000
000 1000 00000000
100 1000
111 1001
001 10__ ????????
Finally we have to decide where in memory we are putting this so we know what value to put into the ????????
jump address. Let’s put it at address 0, so the index of the multiply instruction is 4
001___10 00000100
Now we assume all _
are 0, prepend a 1 bit to each instruction, and turn that into hex so we can put it into a simulator and get the following contents of memory:
0x80,0x03, 0x84,0x32, 0xE0, 0x88,0x00, 0xC8, 0xF9, 0x98,0x04
Create an encoding for the following code that computes r1 = r2 / r3
by implementing the following:
r1 = 0
while r1 * r2 < r3
r1 += 1
We can change the loop to a jump by doing
r1 = -1
r1 += 1
r1 * r2 < r3
, go to step 2 againand reduce that to the few operations we have available to us
Encode the above sequence of instructions into bytes. Remember to use the two’s complement encoding of −1.
Modify your simulator from the last exercise to implement the example instruction set encoding listed above. In particular,
loop()
we’ll still have ir
become memory[pc]
(the first byte only) for simplicity.newPC
will generally return pc + 2
since instructions are 2 bytes long; but if the top three bits of ir
are 000
and the register indicated by the low-order two bits of ir
is not 0
, it will return the second byte of the instruction (i.e., memory[pc+1]
) instead.work()
will need to consider the high-order three bits of ir
and act as per the table in the example above.Verify that your simulator runs the two example programs above. The code from the example should put 81 (0x51) into r0
; the code from the exercise should put r2 / r3
into r1
.
Not all instructions need the same amount of information to be represented. Should instructions that need less information take up fewer bytes than instructions that need a lot of information?
For such a simple question, there does not appear to be a simple answer. Fixed-length instructions make some of what the computer does simpler, theoretically providing cost and power benefits; variable-length instructions make more efficient use of memory, also theoretically providing cost and power benefits. Intel and AMD are known for variable-length instruction sets, while IBM and Motorola have more fixed-length instruction sets.
Modify your simulator so it increases the pc
by 2 only if the instruction needs 2 bytes (i.e., 000
and 001
); otherwise only increase the pc
by 1.
This should let us shorten our example program
0x00,0x03, 0x04,0x32, 0xC0,0x00, 0x68,0x00, 0xE9,0x00, 0x22,0x04
to just
0x00,0x03, 0x04,0x32, 0xC0, 0x68, 0xE9, 0x22,0x04
removing 3 of the original 12 bytes (a 25% space saving). How much space does this save in the division program?
How should we be able to represent operands of operations? Many different representations are useful, leading to a a diversity of addressing modes:
Immediate addressing refers to putting the value directly in the instruction. This roughly correlates to using literals in source code.
Immediate addressing is never used for destinations of operations. Theoretically it would be possible to store a result into the program memory itself, but this is not done by any mainstream ISA.
Because access to memory is generally through a low-capacity interface, most ISAs limit at most one operand to be in memory and some limit all memory addressing to move operations only.
There are multiple sub-types of memory addressing. The three basic building-blocks are:
These three blocks can be usefully combined in various ways. For example, one common addressing mode in Intel’s x86 architecture is “Relative Immediate1 + Register1 + Register2 × 2Immediate2”, where Immediate2 is limited to numbers no larger than 4. While that may seem strangely specific and ideosyncratic, it is useful for several common code patterns, such as x[i].y
.
Picking the right set of addressing modes in an ISA is challenging. Fixed-length encodings in particular often face problems in these, as something as simple as a plain immediate value requires that instructions contain more bits than does any legal constant. To void extra code bloat to make every instruction large enough to fit a full-length immediate value, mainstream ISAs that use fixed-length encoding generally can represent only a small subset of integer literals as immediate values and require all others to be constructed by instruction sequences, sometimes adding customized instructions to enable that construction to be relatively efficient. Variable-length ISAs do not suffer from that limitation, but generally have quite complicated rules on how to determine the meaning of each instruction anyway so there is little gain in clarity.
In addition to program registers and memory, processors generally store additional information. We’ve already seen a few of these: the PC and IR are registers but not program registers because they cannot be directly addressed by programs. Modern processors have hundreds of other similar hidden support state, a few of which are important enough to deserve additional conversation.
Many processors have a set of registers set automatically by some operations, called condition codes. The can be used to determine how certain other operations behave.
In the x86 series of ISAs, for example, every mathematical operation not only computes its resulting value but also sets several condition code bits that describe the result of the computation. These are used to determine if all conditional jumps (and other conditional operations) are executed or not. Thus instead of writing
r0 = 1 if r0 < r3 else 0
if (r0 ≠ 0) jump to step 4
in x86 you’d write the equivalent of
r0 -= r3
if (last result < 0) jump to step 4
The ARM family of ISAs also use similar condition codes, but unlike x86 where only a few instructions are conditional, in ARM 7 almost every instruction is conditional.
Other ISAs, such as PowerPC and MIPS, do not make as extensive use of condition codes, instead using explicit condition operands in conditional operations. Even these still have some condition codes, however, to track things like integer overflow and other unusual conditions.
There was a period in which enough code was written directly in machine language that hardware developers chose to start adding higher-level abstractions directly into their ISAs. While all of these still exist in older ISAs like x86, most are no longer much used. One, however, is very pervasive: the stack operations.
The concept of a stack, as implemented by the x86 ISA, is easier to explain in code than words:
stack_memory = big contiguous chunk of memory
stack_pointer = length_of(stack_memory)
how to push(value):
1. stack_pointer -= 1
2. stack_memory[stack_pointer] = value
how to pop():
1. result = stack_memory[stack_pointer]
2. stack_pointer += 1
3. return result
Consider the following code:
push(3)
push(4)
push(5)
push(6)
pop()
x = pop()
pop()
At the end of the code x
contains the value 5
and the stack still contains one value (the 3
).
Consider the following code:
push(3)
push(4)
x = pop()
push(5)
y = pop()
z = pop()
What are in x
, y
, and z
at the end of the code?3
In x86-64, the stack_pointer
in the above pseudocode is a program register, rsp
. The ISA instruction pop rax
does the equivalent of rax = pop()
in the above pseudocode: that is, it reads a value from memory, stores that value in rax
, and also increases the value in rsp
. push
and pop
, as well as related instructions like call
and ret
, are extremely prevalent in code in the wild today.
For various reasons, some historical and some performance-based, some instructions assume particular registers are part of their operation. This is particularly common in x86 and x86-64.
Intel’s ISA manual, volume 2, defines the 64-bit version of the div
instruction as dividing rdx:rax
by the source operand and storing the quotient in rax
, the remainder in rdx
. In other words, this is a 128-bit integer divided by a 64-bit integer with both result and remainder returned, using three source registers and two destination registers; but only one of the source registers is able to be specified.
This results in assembly code for answer = numerator / denominator
looking something like
rax
rdx
idiv
denominatorrax
into answerIt is also fairly common to have the computer processor logically or physically divided into distinct subprocessors, each with a specialized set of instructions and registers. x86, for example, uses separate registers for floating-point maths than for integer maths, and has more registers for various vector operations and so on.
In the 1990s it was common for computer architecture textbooks to describe two kinds of ISAs: the RISC (reduced instruction set) and the CISC (complex instruction set). These do not describe any major production ISAs, but are still common terms in computer architecture discussions. A summary follows:
RISC | CISC | Today we see |
---|---|---|
Fixed-length encoding | Variable-length encoding | Both |
Few, simple instructions | Many, complicated instructions | Mostly CISC |
Few addressing modes | Many addressing modes | More CISC |
Uniform operand notation | Idiosyncratic operand notation | Both |
Many program registers | Few program registers | More RISC |
Clean new design | Backwards-compatible design | Both |
All program registers are interchangeable | Some operations only work with specific registers | More CISC |
1 instruction = 1 cycle | some instructions take several cycles | CISC |
Immediate values cannot represent all numbers | Immediate values can represent all numbers | Both |
All maths operands are registers | One maths operand may be in memory | Both |
Some instructions may not follow others | No limitations on instruction ordering | More CISC |
Conditions are in registers | There are special "condition codes | More CISC |
Function arguments usually passed in registers | Function arguments usually passed on the stack | More RISC |
There are also architectures with significant design decisions beyond those in this table, such as the compiler-controlled instruction scheduling of the Itanium architecture, but they are beyond the scope of this course.
Executing most instructions involved several steps, some of which depend on others. For example, the push
instruction involves the following:
While each instruction has its own dependency graph of steps, they can all be fit into the following sequence:
Naming the steps.
Every computer architecture text I have seen names the steps in the above sequence, but the exact set of steps and names, as well as the set of work included in each step, varies. The names “Fetch”, “Decode”, and “Execute” are almost always included, but which one includes accessing program registers varies. Some texts also name steps “Memory”, “Writeback”, and/or “PC Update”.
Official ISA documentation also tends to use these names in some form, though often with additional steps as well.
This decomposition of work into steps is not merely academic; processor throughput of instructions can be dramatically increased by pipelining these steps. “Pipeline” is the name used in hardware design for the manufacturing notion of an assembly line: when the part of the chip that fetches instructions finishes fetching the first instruction it passes it off to the next part of the chip and proceeds to fetch the second instruction. Thus, at any given time the processor might have many instructions in different stages of completion.
There are many additional issues that arise when pipelining. A few of the more common techniques used to resolve these issues include:
There is indirect evidence that many of them were anonymous women: many women in part because most people with the profession “computer” were women and because there was a shortage of male labor during the second world war when computers were being pioneered; anonymous because the culture of the day did not generally acknowledge female input or provide scope for their leadership.↩︎
It is likely that if you have learned to program but have not learned about programming paradigms, you learned a variant of the imperative paradigm. There are other approaches to programming that do not depend on variable assignment, the two most famous being the functional paradigm (where recursion is the key to expressivity) and the logical paradigm (where power is obtained from clever application of horn clause reduction rules in formal logic). Computer hardware that uses these paradigms directly is unknown to the author of this text.↩︎
Answer: x = 4, y = 5, z = 3↩︎