This page is for a prior offering of CS 3330. It is not up-to-date.
In this lab we’ll add some basic pipelining to a subset of the Y86-64 instruction set. In particular, we’ll deal with a subset of instructions: nop
, halt
, irmovq
, and rrmovq
. We’ll add just one pipeline register, between decode and writeback (there is no execute or memory phase for these instructions).
Download pipelab1_base.hcl to get a copy of the sequential simulator with only those instructions implemented.
To add pipelining,
We’ll explore this idea by adding a pipeline register between decode and writeback. Following the textbook’s tradition, we’ll call the input side of the register d
for decode and the output side W
for writeback:
register dW {
# todo: fill in the details here
}
Look through pipelab1_base.hcl
; each value used as an input in writeback that is not also computed in that stage will need to be stored in a pipeline. For example, the reg_inputE
mux uses icode
, reg_outputA
, and valC
as inputs, so we’ll need all three of those in our new register, as well as registers for the other signals used as inputs in writeback.
Always pick the default values in the pipeline register to be the values you’d expect for nop
; in NOP
in the icode
, REG_NONE
in any register spots, etc.
Recall that if we name our register bank dW
then whatever signal we put into d_thing
will come out of W_thing
on the next cycle.
Go through each signal and, if it crosses the register bank, replace every use before the register bank with d_...
and ever use after the register bank with W_...
.
For example, consider icode
:
wire:4 icode
declaration since we have it in dW
.icode
with d_icode
icode
with W_icode
Do the same thing with valC
.
The signals reg_outputA
, reg_dstE
, and Stat
have to be treated specially because they interact with fixed functionality. Thus, reg_outputA
(an output created during decode) will need to be saved into d_...
during decode and used as W_...
afterwards, as in
# in decode:
d_rvalA = reg_outputA;
# in execute and later phases, used W_rvalA instead of reg_outputA
Similarly, reg_dstE
will need to be originally computed as d_dstE
during Decode and then reg_dstE = W_dstE
placed in writeback to get that value back out. Stat
is an output like reg_dstE
and will need the same treatment (set d_Stat
before the pipeline register and Stat = W_Stat
afterward).
At this point, the rrmovq.yo
we used for irrr.hcl
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should take 6 (not 5) cycles to set three registers:
| RAX: 22 RCX: 22 RDX: 162e |
Once you handle halt
according to the instructions below, it should leave the PC at address 0x18, like the single-cycle processor.
Consider
irmovq $1, %rax
rrmovq %rax, %rbx
In a pipeline diagram (given that we have no execute or memory phases), these will look like
Instr | cycle 1 | cycle 2 | cycle 3 |
---|---|---|---|
irmovq |
FD | W | |
rrmovq |
FD | W |
Note that the immediate value won’t be written to the register file until the after of cycle 2, but it will be attempted to be read by the next instruction at the during of cycle 2. This is an example of a data dependency that exercises a hazard in our hardware design so far.
We can bypass this hazard in two ways. We can either stall, or we can forward data. Forwarding is always preferred to stalling if both are possible, so we’ll forward.
We want to grab the value that is being prepped for writing to the register file before it actually gets written if it is the register we are trying to read. Thus, d_rvalA
will be reg_outputA
unless reg_dstE
is both (1) not REG_NONE
and (2) the same as the decode phase’s reg_srcA
; in that case, we’ll forward reg_inputE
into d_rvalA
instead.
If correctly implemented, y86/irrr7.yo
irmovq $1, %rax
rrmovq %rax, %rbx
should take 4 cycles to put a 1 in both %rax
and %rbx
, while y86/rrmovq.yo
irmovq $5678, %rax
irmovq $34, %rcx
rrmovq %rax, %rdx
rrmovq %rcx, %rax
should still take 6 cycles and result in
| RAX: 22 RCX: 22 RDX: 162e |
like it did before.
I mention the number of cycles because the other solution (stalling) would increase them.
You can think of halt
and invalid instructions as special kind of control hazard, since the instructions after a halt
(or an invalid instruction) are not supposed to run.
We recommend stalling the register that feeds the fetch stage when you encounter a halt or invalid instruction to avoid starting to executing instructions that aren’t part of the program. However, in the two-stage pipeline in this lab, instructions do not change any state (memory, program registers, condition codes) until the last stage, so this is not strictly necessary.
If your hcl compiles, you can run it in debug mode: mysimulator.exe -i -d somefile.yo
The simulator has to provide input to every wire and register in order to run. It does not know what those inputs should be unless you tell it. Thus, if you say
wire baz:4;
register qB { xyxxy:32 = 0; }
then you must also say
baz = something;
q_xyxxy = something_else;
or else you will get an error.
We’ve been telling you this for years now, test your code often! At least check that it compiles with ./hclrs --check pipelab1.hcl
. Particularly when working with a language you don’t know well, frequent feedback is useful.
You can run make test-pipelab1
to test your code on the list of testcases in testdata/pipelab1-tests.txt
, comparing its output to reference outputs we have included in testdata/pipelab1-reference
.
See here for an explanation of the output format.
Submit pipelab1.hcl
on the submission page.
If you didn’t have time to finish everything, still submit the file (it’s OK if it is incomplete; we are looking for effort more than correctness).
If you want to understand pipelines more, I’d encourage you to add another pipeline register between Fetch and Decode. Don’t submit that three-stage-pipeline file, though.