Contents
Your task
-
Submit your solutions to Problem Set-2 as a scanned PDF document on Gradescope. Problem Set-2 may be found in Collab under the resources folder. Alternatively, you may also get a physical copy of the assignment from me during lecture or office hours (Tu/Th 12:30pm at Rice 312).
- Combine your solution from the previous HW and the previous lab into a new file called
pipehw2.hcl
to create a five-stage pipelined processor with forwarding and branch prediction as described in the textbook that implements:nop
halt
irmovq
rrmovq
OPq
cmovXX
rmmovq
mrmovq
We will provide an example lab solution
-
Add the jXX instruction (and make it predict all jumps as taken).
-
Test your combined simulator with
make test-pipehw2
- Submit your solution to kytos
Hints/Approach
General Approach
You may approach this however you wish, but I suggest the following flow:
- Combine your
pipehw1.hcl
andpipelab2.hcl
and test the combination. All of the tests that either source file passed previously ought to still pass the combination. - Add
jXX
with speculative execution and branch misprediction recovery. Predict that all branches are taken. Test.
Implementing jXX
1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
---|---|---|---|---|---|---|---|---|---|
jXX |
F | D | E | (next PC available) | M | W | |||
wrong1 |
F | D | (bubble needed) | ||||||
wrong2 |
F | (bubble needed) | |||||||
right1 |
F | D | E | M | W |
-
Replace
pc
in thefF
(orxF
orpP
or whatever else you called it) register bank wtihpredPC
, which will store a predicted PC value instead of the actual PC value.To speculatively use this prediction, we can set
pc
to the predicted PC (pc = F_predPC
). -
Your processor should predict that all
jXX
s are taken (the new PC isvalC
). -
We will detect that predictions are wrong near the end of the
jXX
’s execute stage (when we check the condition codes). We will fetch the correct instruction during the fetch stage in the next cycle (whenjXX
is in the memory stage). - When we react to a misprediction, we need to:
- Squash the mispredicted instructions (which are about to enter the decode and execute stages).
This can be done with by setting the
bubble_X
signals in the cycle before the corrected instruction is fetched. (Setting thebubble_X
signal will make theX_*
pipeline registers output their default values in the next cycle instead of using their input values.) - Fetch the corrected instruction next cycle (e.g. with a MUX in front of the
pc
signal).
- Squash the mispredicted instructions (which are about to enter the decode and execute stages).
This can be done with by setting the
-
You can fetch the corrected instruction with a MUX front of the
pc
signal:pc = [ mispredicted : oldValP ; ... 1: F_predPC ; ];
You may need to pass the
conditionsMet
signal or something equivalent through a pipeline register to be able to tell when a misprediction happened at the appropriate time. -
You will need access to the
valP
from thejXX
instruction. To do so, you will probably need to pass it through pipeline registers. -
Make sure you correctly handle interactions between
jXX
andhalt
. Consider code like:jne foo halt foo: rmmovq %rax, (%rax) rmmovq %rax, (%rax) rmmovq %rax, (%rax)
When the
halt
is executed,F_predPC
may contain the address of anrmmovq
instead ofhalt
, so simply settingstall_F
may not be enough to fetch ahalt
next cycle.Some solutions to this problem may involve using an technique other than setting
stall_F
to prevent the PC from changing, like adding a case to thepc = [...]
MUX. - If instead of squashing the mispredicted instructions when they are about to enter the decode and execute stages (like suggested above), you squash them when they are about to enter the execute and memory stages, you will have to worry about preventing the conditions codes from being changed by one of the mispredicted instructions.
Testing your code
-
You can run the command
make test-pipehw2
to run your processor on almost all the files iny86/
, comparing its output to references supplied intestdata/pipe-reference
. The list of tested files is intestdata/pipehw2.txt
. For the filespop-forward2.yo
,pop-forward3.yo
,pop-forward4.yo
,load-store.yo
, you should have the same values, but you may take fewer cycles. -
For each input file in
y86/
, there is a trace from our reference implementation intestdata/pipe-traces
. -
Your code should have the same semantics as
tools/yis
: set the same registers and memory. You can use this to see if your processor does the correct thing on any input files, including files you come up with yourself. -
We will check the number of cycles your processor takes. As a general rule, your pipelined processor will need
- 1 cycle per instruction executed
- 4 extra cycles because we have a five-stage pipeline; even
halt
takes 5 cycles now. - +1 more cycle for each load-use hazard (i.e., read from memory in one cycle, use with ALU next cycle)
- +2 more cycles for each conditional jump the code should not take (the misprediction penalty)
- +3 more cycles for each
ret
executed
Specific Test Cases
jXX
y86/j-cc.yo
irmovq $1, %rsi
irmovq $2, %rdi
irmovq $4, %rbp
irmovq $-32, %rax
irmovq $64, %rdx
subq %rdx,%rax
je target
nop
halt
target:
addq %rsi,%rdx
nop
nop
nop
halt
takes 15 cycles and leaves
| RAX: ffffffffffffffa0 RCX: 0 RDX: 40 |
| RBX: 0 RSP: 0 RBP: 4 |
| RSI: 1 RDI: 2 R8: 0 |
A full trace is available in testdata/pipe-traces/j-cc.txt
y86/jxx.yo
irmovq $3, %rax
irmovq $-1, %rbx
a:
jmp b
c:
jge a
halt
b:
addq %rbx, %rax
jmp c
takes 25 cycles and leaves
| RAX: ffffffffffffffff RCX: 0 RDX: 0 |
| RBX: ffffffffffffffff RSP: 0 RBP: 0 |
A full trace is available in testdata/pipe-traces/jxx.txt
(distributed with hclrs.tar
)