Changelog:
- 6 October 2021: replace
wire:4 icode
withwire icode:4
Your Task
-
Download pipelab1_base.hcl (which will be posted on Collab in the resources tab) to get a copy of the sequential simulator with only
nop
,halt
,irmovq
, andrrmovq
implemented. - Rename this to
pipelab1.hcl
, and add pipelining to this simulator with two pipeline stages that uses forwarding to handle data hazards:- Stage 1: Fetch and Decode
- Stage 2: Writeback
(There is no execute or memory phase for these instructions.)
-
Test your simulator with
make test-pipelab1
. - Submit to kytos.
Suggested Approach
We suggest the following steps:
- Add a pipeline register bank. Identify where the two pipeline stages will split your processor, then identify all wires which cross that point and put them in a pipeline register
- Replace wires with pipeline register inputs and outputs wires.
- Look for hazards and solve them with forwarding (if possible) or stalling
Adding the pipeline register bank
Following the textbook’s tradition, we suggest calling the input side of the register d
for decode and the output side W
for writeback:
register dW {
# todo: fill in the details here
}
Look through pipelab1_base.hcl
; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline register. For example, the reg_inputE
mux uses icode
, reg_outputA
, and valC
as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.
Always pick the default values in the pipeline register to be the values you’d expect for nop
; in NOP
in the icode
, REG_NONE
in any register spots, etc.
Replacing wires with register inputs/outputs
Recall that if we name our register bank dW
then whatever signal we put into d_thing
will come out of W_thing
on the next cycle.
Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_...
and every use after the register bank with W_...
.
For example, consider icode
:
- Remove the
wire icode:4
declaration since we have it indW
. - In fetch and decode, replace all occurrences of
icode
withd_icode
- In writeback replace all occurrences of
icode
withW_icode
Do the same thing with valC
.
Handling built-in signals
The signals reg_outputA
, reg_dstE
, and Stat
have to be treated specially because they interact with fixed functionality (like the register file).
We will need make an explicit connection between the fixed functionality and the appropriate pipeline register since we can’t, for example,
tell the register file to send its output directly to a pipeline register bank.
Thus, reg_outputA
(an output set by fixed functionality during decode) will need to be saved into d_...
during decode and used as W_...
afterwards, as in
# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, use W_rvalA instead of reg_outputA
Similarly, reg_dstE
(an input used by fixed functionality during writeback) will be originally computed into d_dstE
during Decode and then reg_dstE = W_dstE
placed in writeback to pass the value from the pipeline register to HCLRS’s
built-in register file.
Stat
is an output like reg_dstE
and will need the same treatment (set d_Stat
before the pipeline register based on the icode, and Stat = W_Stat
afterward). Stat
needs to be set in the writeback stage and not the fetch/decode stage, so a halt
doesn’t halt the processor early.
(You could also compute Stat
from the icode
in the writeback stage instead of computing it during fetch+decode, but computing the new Stat
value early may make implementing the full processor easier and matches our textbook’s design.)
Testing without hazard handling
At this point, the rrmovq.yo
we used for irrr.hcl
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should take 6 (not 5) cycles to set three registers:
| RAX: 22 RCX: 22 RDX: 162e |
Once you handle halt
according to the instructions below, it should leave the PC at
address 0x18, like the single-cycle processor.
Handling data hazards
Consider
irmovq $1, %rax
rrmovq %rax, %rbx
In a pipeline diagram (given that we have no execute or memory phases), these will look like
Instr | cycle 1 | cycle 2 | cycle 3 |
---|---|---|---|
irmovq |
FD | W | |
rrmovq |
FD | W |
Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far. (We will talk about/talked about this on lecture on Tuesday 5 October.)
We can bypass this hazard in two ways. We can either stall, or we can forward data.
To stall, we would delay the rrmovq
instruction until after the irmovq
’s instruction
finishes, causing it to take an extra cycle.
To forward, we would take the value from the writeback stage of the irmovq
instruction
to the decode stage of the rrmovq
instruction directly and use it in place of the value
we would have otherwise read from the register file.
Since forwarding avoids taking extra cycles, forwarding is always preferred to stalling if both are possible, so we’ll forward.
Forward
We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA
(the value of register A being sent from the decode stage to writeback stage) will probably be set by a MUX:
- if
reg_dstE
is the same as the decode phase’sreg_srcA
and is notREG_NONE
, it should come fromreg_inputE
(the value that is about to be written); - otherwise, it should be
reg_outputA
, as usual
If correctly implemented, y86/irrr7.yo
irmovq $1, %rax
rrmovq %rax, %rbx
should take 4 cycles to put a 1 in both %rax
and %rbx
, while y86/rrmovq.yo
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should still take 6 cycles and result in
| RAX: 22 RCX: 22 RDX: 162e |
like it did before.
I mention the number of cycles because the other solution (stalling) would increase them.
Handling halt
In the pipelined processors you will be implementing in upcoming homeworks,
halt
and invalid instructions need to be handled carefully since the instructions after
a halt
(or an invalid instruction) are not supposed to run and cause changes to the processor’s state.
We recommend keeping the register that feeds the fetch stage the same when you encounter a halt or invalid instruction to avoid starting to executing instructions that aren’t part of the program. However, in the two-stage pipeline in this lab, instructions do not change any state (memory, program registers, condition codes) until the last stage, so this is not yet necessary.
(To keep the PC register from changing, you could change the f_pc
assignment, or use the special stall_F
signal.)
Debugging and Miscellaneous Hints
Pay attention to your pipeline registers
Reading the values of the pipeline registers is a good way to tell what instruction is being run in each stage.
If you declare a register bank like:
register dW {
icode : 4 = NOP;
otherValue : 64 = 0;
}
Then part of the HCLRS output (when not using -q
) will have a line like following for every cycle:
| register dW(N) { icode=7 otherValue=00000000000000A4 }
indicating that during that cycle, the W_icode
was 0x7
(the number for JXX, as you can lookup in the
HCLRS manual or your textbook) and W_otherValue
was 0xA4
. With our usual pipeline register
naming convention, this means that a JXX insturction is currently in the writeback stage, and the otherValue
for that instruction was 0xA4
.
Use debug mode
You can run it in debug mode: ./hclrs -d pipelab1.hcl somefile.yo
. This will show you the value of all wires during each step.
You can also redirect the output from debug mode to a file, like:
./hclrs -d pipelab1.hcl somefile.yo >output.txt 2>&1
Then open output.txt
in a text editor. (>output.txt
sends normal output to the file output.txt
instead of the
normal output file. 2>&1
means to redirect error output where normal output is going.) This redirection support is a
shell feature that works with any command, not just hclrs
.
Use reference outputs
You can look at the outputs in hclrs/testdata/pipelab1-reference
to see what our implementation of pipelab1.hcl
did for each .yo
file included with HCLRS.
Initialize what you declare
The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say
wire baz:4;
register qB { xyxxy:32 = 0; }
then you must also say
baz = something;
q_xyxxy = something_else;
or else you will get an error.
Test your code often
We’ve been telling you this for years now, test your code often! At least check that it compiles
with ./hclrs --check pipelab1.hcl
. Particularly when working with a language you don’t know well, frequent feedback is useful.