This page is for a prior offering of CS 3330. It is not up-to-date.
In this lab you’ll pipeline rmmovq
and mrmovq
(and halt
). If you work with others in lab (which is encouraged), start from a clean file so as not to accidentally share solutions to seqhw with your partner(s). We’ll post a reference sequential implement of these instructions before lab begins which you may use if you so desire.
We encourage you to work from a subset of SEQ that has just these instructions: seq_memory.hcl
These instructions use all five pipeline stages.
Implement forwarding to resolve hazards.
We can use the existing xF
register, since predicting the PC is not needed for these instructions.
At the beginning of decode add register fD {}
.
At the beginning of execute add register dE {}
.
At the beginning of memory add register eM {}
.
At the beginning of writeback add register mW {}
.
For each stage,
Consider SEQ’s decode stage as it relates to these two instructions:
reg_srcA = [
icode in {RMMOVQ} : rA;
1 : REG_NONE;
];
reg_srcB = [
icode in {RMMOVQ, MRMOVQ} : rB;
1 : REG_NONE;
];
reg_dstM = [
icode in {MRMOVQ} : rA;
1: REG_NONE;
];
This stage
Decode uses icode
, rA
, and rB
from the previous stage. That means we’ll need to add those to the incoming register bank (fD
) and replace decode’s use of them with the outputs from that bank (e.g., icode
becomes D_icode
).
Decode creates three outputs (reg_srcA
, reg_srcB
, and reg_dstM
) so we change them into inputs into the next pipeline register bank (which is dE
, so reg_dstM
becomes d_dstM
instead, etc.)
All three outputs are inputs into the fixed functionality. Did we really want to put them in the pipeline register?
reg_srcA
is sent to the register file to produces reg_outputA
as an output. We want to do this register read in Decode, so we should put d_valA
, not d_srcA
, into the dE
register bank.
reg_srcA = [ ... ];
d_valA = reg_outputA;
reg_srcB
is like reg_srcA
; we want reg_outputB
, not reg_srcB
, in dE
register bank.
reg_dstM
is half of how we write to the register file (mem_output
is the other half). We don’t want to write to the register file in decode, so we’ll use d_dstM
in Decode and save writing to reg_dstM
until Writeback
d_dstM = [ ... ];
# don't update built-in reg_dstM until writeback...
Work though the same three steps for each of the other stages.
By this point you should have a more-or-less functioning simulator, but you need to add data forwarding. Instead of d_valA = reg_outputA
we want to check for all possible pending writes in a big mux.
d_valA = [
First, we do not want to forward if the register is REG_NONE
:
reg_srcA == REG_NONE : 0;
Where could data come from? Given mrmovq
is the only possible source, we might need to forward from
an mrmovq
in the Memory stage at the time of Decode, grabbing from the value we are about to put into the mW
register bank:
reg_srcA == m_dstM : m_valM; # forward post-memory
an mrmovq
in the Writeback stage at the time of Decode, grabbing from the value currently stored in the mW
register bank:
reg_srcA == W_dstM : W_valM; # forward pre-writeback
Or the usual value read from the register file
1 : reg_outputA; # returned by register file based on reg_srcA
];
We’ll do the same thing for d_valB
…
Even with forwarding we have a data hazard:
problem | 1 | 2 | 3 | 4 | 5 | 6 | ||
---|---|---|---|---|---|---|---|---|
mrmovq (%rax), %rcx |
F | D | E | M | (available) | W | ||
mrmovq (%rcx), %rdx |
F | D | (needed) | E | M | W |
This means we’ll need to use the stall_
and bubble_
signals on our register banks to add a stall to the decode stage.
solution | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
---|---|---|---|---|---|---|---|---|---|
mrmovq (%rax), %rcx |
F | D | E | M | (available) | W | |||
mrmovq (%rcx), %rdx |
F | D | (stall) | D | (needed) | E | M | W |
This means forwarding is not enough: we need to stall.
Detecting the dependency that exercises this hazard.
The load-use hazard condition is exercised when mrmovq
is in Execute and its reg_dstM
is the same as a srcX
in Decode. We could further constrain ourselves to sources that are going to be used in execute, but we won’t for this lab or its following homework: if there is a Decode src
that matches mrmovq
’s Execute reg_dstM
, we’ll call it a load-use hazard.
I suggest creating a wire loadUse:1;
and initializing it to be true if this hazard is exercised.
Reacting to the hazard when it is exercised.
We need to stall the decode phase by stalling the previous pipeline register banks and bubbling the one right after decode:
stall_F = loadUse;
stall_D = loadUse;
bubble_E = loadUse;
This pattern enables the solution listed above by ensuring every stage ends up with the right work:
time | F | D | E | M | W |
---|---|---|---|---|---|
before | … | mrmovq (%rcx), %rdx |
mrmovq (%rax), %rcx |
… | … |
after | … | mrmovq (%rcx), %rdx |
nop |
mrmovq (%rax), %rcx |
… |
The following assembly (y86/rrmrb.yo
):
mrmovq 1, %rax
rmmovq %rax, 160(%rax)
mrmovq 158(%rax), %rdx
should stall once to take 9 cycles and result in
+----------------------- halted in state: ------------------------------+
| RAX: 10f RCX: 0 RDX: 10f0000 |
| used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f |
| 0x0000000_: 50 0f 01 00 00 00 00 00 00 00 40 00 a0 00 00 00 |
| 0x0000001_: 00 00 00 00 50 20 9e 00 00 00 00 00 00 00 |
| 0x000001a_: 0f |
| 0x000001b_: 01 00 00 00 00 00 00 |
+--------------------- (end of halted state) ---------------------------+
The following assembly (y86/mrmr.yo
):
mrmovq 0x100, %rax
mrmovq (%rax), %rcx
mrmovq (%rcx), %rdx
.pos 0x100
.quad 0x200
.pos 0x200
.quad 0xfe
should stall twice to take 10 cycles and result in
+----------------------- halted in state: ------------------------------+
| RAX: 200 RCX: fe RDX: 2000000 |
The following assembly (y86/mrmreasy.yo
):
mrmovq 0x100, %rax
mrmovq 0x200, %rcx
mrmovq 0xfe, %rdx
.pos 0x100
.quad 0x200
.pos 0x200
.quad 0xfe
should not stall at all, take 8 cycles, and result in
+----------------------- halted in state: ------------------------------+
| RAX: 200 RCX: fe RDX: 2000000 |
You can run make test-pipelab2
to run all the tests listed in testdata/pipelab2-tests.txt
, comparing the outputs to supplied references in testdata/pipe-references
.
You can see traces for all test cases in testdata/pipe-traces
.
Submit pipelab2.hcl
on the submission page.