HW4 - Binary Squaring
Update 10/6: Clarified the wording of function calls and calling conventions.
This is the second of two assignments in which you will write binary code for the simple machine you created a simulator for in lab. In this assignment, we will be building upon our multiplication code from hw3 and extending our ISA to provide better support for function calls.
Adding a Stack
Our initial machine did not directly support a stack. While we could have managed a stack in our ISA using icodes 1, 2, and 3, we would like to provide the functionality at the hardware level. To do that, we decided to introduce a new register: the stack pointer rsp
. This register will contain the memory address of the top of our stack.
Aside: In our machine’s version of the stack, we will only store 8-bit values. However, in practice we will want to store data of different sizes in our stack. We will see more of this as we move to x86-64.
The (Updated) Instructions
Remember that when our reserved bit (the 7th bit) is 0, we used the following icodes:
icode
Behaviors- 0
rA = rB
- 1
rA += rB
- 2
rA &= rB
- 3
rA =
read from memory at addressrB
- 4
- write
rA
to memory at addressrB
- 5
- do different things for different values of
b
:b
action 0 rA = ~rA
1 rA = -rA
2 rA = !rA
3 rA = pc
- 6
- do different things for different values of
b
:b
action 0 rA =
read from memory atpc + 1
1 rA +=
read from memory atpc + 1
2 rA &=
read from memory atpc + 1
3 rA =
read from memory at the address stored atpc + 1
In all 4 cases, increase
pc
by 2, not 1, at the end of this
instruction - 7
- Compare
rA
(as an 8-bit 2’s-complement number) to0
;- if
rA <= 0
, setpc = rB
- otherwise, increment
pc
like normal.
- if
Now, we will add one more icode when the reserved bit is set to 1:
- 0
- do different things for different values of
b
:b
action 0 Decrement rsp
and push the contents ofrA
to the stack1 Pop the top value from the stack into rA
and incrementrsp
2 Push pc+2
onto the stack, setpc = M[pc+1]
3 pc =
pop the top value from the stackIf
b
is not 2, update thepc
as normal.
The first 2 actions (b = 0, 1) allow us to interact with the stack. When we push a value onto the stack, we decrement rsp
to make space, then write the value of register A to memory at the address stored in rsp
. When a value is popped off the stack, we read the value from memory at position rsp
into register A and then increment rsp
. Note, this does still leave the old value in memory, and that’s okay. Also note that our stack grows downward in memory.
The next two actions (b = 2, 3) provide additional support for function calls.
- The first (b=2) is the function call. It adds 2 to the current
pc
and writes that value to the stack (also updatingrsp
). Consider: why might we write pc+2 to the stack instead of pc? It then performs an unconditional jump to the address given as the immediate value following the instruction (i.e., the next byte of our program). This is essentially calling the function. - The second (b=3) is the function return. It reads the top value from the stack (also updating
rsp
) and sets that value as the newpc
. This is an unconditional jump to the address stored at the top of the stack. Use this at the end of the function to return back to the code that called it.
Aside: One thing to note here is that we are assuming that the top value of the stack, when executing instruction
83
, is the address that called the current function. If the called function pushes any values onto the stack, it must pop them before returning, or else we may return to an unexpected position in memory.We will see more of how the stack works, including the handling of return addresses, parameters, and return values, as we discuss x86-64 assembly in greater detail.
Function Call Syntax
We need more than just the address of our function code and the address to return when the function is complete. We also need parameters. Consider the following function call (in python-like syntax):
z = mult(x,y)
- We need to know:
- The location of our function
mult
, i.e., the address in memory where that code starts - The location of where
x
andy
are stored - A place to put the result,
z
- The location of our function
Our ISA will define the following calling conventions for functions. Calling conventions determine where the parameters and return values are located and also include any requirements on the registers that the function call is not using.
- Instructions may have up to 2 arguments. They will be placed in registers 2 and 3; with the first parameter in register 2 and the second in register 3.
- The return value for the instruction is stored in register 0.
- Registers 0, 2, and 3 are considered volatile; that means that if the code calling the function wants to keep those values, it should save them to memory or push them onto the stack before calling the function (using instruction
82
). - Register 1 is considered nonvolatile; that means that if the function wants to use register 1, it must save the value to memory (i.e., push onto the stack) and restore it (i.e., pop from the stack) before returning to the caller (using instruction
83
).
Updating our example above with these conventions, our function call would look something like the following (in python syntax):
r0 = mult(r2, r3)
Or more specifically:
r2 = x
r3 = y
jump to (address of mult() code)
z = r0
Therefore, when we want to make a function call, we will place the parameters in registers 2 and 3 before making the call with instruction 82
. We cannot assume that the values in registers 2 and 3 will be the same when the function has completed and returns. When the function completes, it will jump back (using instruction 83
) to the next instruction in our code to continue execution. Immediately after the function returns, we can safely assume that the function’s return value has been stored in register 0 (overwriting the value previously stored in that register before the function call).
What about r1? What does nonvolatile mean? This means that if we were using r1 before we jumped to our mult() code, then it should still be the same afterwards. So, for example, if we wanted to pre-load our destination address 0xA0
:
r1 = 0xA0
r2 = x
r3 = y
jump to (address of mult() code)
z = r0
// r1 should still have A0
It appears that the function’s code is not allowed to use register 1. But it can! It must simply stash the value away somewhere for safe-keeping, then bring it back out before the function returns. We also now have a place to store values like this: the stack. So our function can do something like:
push r1
// all the code to multiply, r0 = r2 * r3,
// using r1 as needed
pop r1 // put r1's value back
return // use reserved=1, icode=0, b=3 to jump back
So remember: register 1 will have its same value after the function returns, but registers 0, 2, and 3 may have changed.
Your tasks
This homework is divided into two separate tasks:
- Update the simulator from Lab 4 to provide the stack and new instructions
- Use the new instructions to write a binary program that squares the input value
Updating Our Simulators
We must first update our simulators from Lab 4 to support the stack and this new instruction.
- Add the
rsp
to your code. In our simulators, we will initialize the stack pointer to a location late in memory (0xFF
) and create a stack that grows downward in memory. There are trade offs for this choice (if our stack gets too big, it might overwrite our program!).-
If you are using Java, update the following section of SimBase.java:
// memory and registers public byte[] M; public byte[] R; public byte rsp = (byte) 0xFF; // new!
-
If you are using Python, update the following section of sim_base.py:
# initialize memory and registers R = [0 for i in range(4)] M = [0 for i in range(256)] rsp = 0xFF # new!
-
- Add the new instruction defined above (when the reserved bit is
1
and the icode is0
) to yourexecute()
method. Note: it is up to the binary that our simulators execute to properly read the function parameters from memory and write return values to register 0. Your simulator should correctly updatersp
and provide the jumps. - If the reserved bit is
1
and the icode is not0
, then set the nextpc
to the currentpc
instead of advancing it and do nothing else.
Square
Next, you will write a binary program square.binary
using this new icode and associated operations.
- Modify your multiplication code from HW3 to be used as a function. You will include the updated version of this code at the end of your program below.
- Instead of reading immediate values at bytes 01 and 03, it should start with the two values already stored in registers 2 and 3 (respectively).
- You should save the contents of register 1, since it should be unchanged when your function returns. Just before returning, save the value back to register 1.
- Instead of storing the result at memory location
A0
, return the result to the caller. That is, store the result in register 0 and use the return instruction (83
).
- Write a new binary program that computes the square of a number (i.e., \(x^2 = x \times x\)) and stores the product at address
0xA0
. Your new code should:- Load the value in memory at address
0x01
into a register - Call your original multiply code to compute \(x \times x\) (hint: you will need the address where your mult() code starts)
- Store the result (the squared \(x\)) at address
0xA0
- Halt once it is done.
- Load the value in memory at address
- Include your updated multiplication code (from step 1 above) at the end of your new instructions. Note: you will need to know the starting address of your multiplication code to call it.
You should ignore overflow, so since 0x79^2 = 0x3931, the answer should be 31. This is likely to happen automatically without your explicit planning for it. You may assume that the input value will not be negative, but it may be 0.
Thus, if square.binary
begins __ 0A
then when it is finished it should have 64
in address 0xA0; We should be able to change the second byte of your program to compute other squares too.
Testing Your Square Binary
To test your code, do one of
python3 sim_base.py square.binary
or
java SimBase square.binary
or going to our updated online simulator and click the file upload button at the top of the page to load your square.binary
into the simulator’s memory.
Hints, tips, and suggestions
How to write binary
We suggest following these steps, carefully, saving the result of each in a file so you can go back and fix them if they were wrong:
- Write pseudocode that does the desired task
- Convert any
for
loops towhile
loops with explicit counters - Change any
if
orwhile
guards to the formsomething <= 0
a <= b
becomesa-b <= 0
a < b
becomesa+1 <= b
becomesa+1-b <= 0
a >= b
becomes0 >= b-a
becomesb-a <= 0
a > b
becomes0 > b-a
becomesb+1-a <= 0
a == b
becomesa-b == 0
becomes!(a-b) == 1
becomes!!(a-b) <= 0
a != b
becomesa-b != 0
becomes!(a-b) == 0
becomes!(a-b) <= 0
- Add more variables to split multi-operation lines into a series of single-operation lines
- Add more operations to convert ones not in the instruction set into ones in the instruction set
- Change each loop into a pair of instructions, opening with “
spot1
=pc
” and closing with “if …, gotospot1
” - Count the number of variables needed
- Pick a memory address for each variable. Make these big enough your code is unlikely to get that big; for example, you might pick
0x80
though0x80
+ number of variables - Convert each statement that uses variables into
- register ← load variable’s memory
- original statement
- store variable’s memory ← register
- translate each instruction into numeric (
icode
,a
,b
) triples, possibly followed by aM[pc+1]
immediate value - turn (
icode
,a
,b
) into hex - Write all the hex into
square.binary
Debugging binary is hard. That’s part of why we don’t generally write code in binary. If you get stuck, you should probably try pulling just the part you are stuck on separate from the rest and test it until it works, then put it back in the main solution.
Submit
Submit both your square.binary
and SimBase.java
or sim_base.py
files via Gradescope. Submissions will open on Thursday.