Changelog:
- 2 Nov 2023: correct URL to point to tool for this semester; adjust due date slightly later
- 14 Nov 2023: more text re: how branch prediction works
1 Your Task
Fill in a pipeline diagram for a small function’s full execution.
Place you answer in the online tool https://kytos02.cs.virginia.edu/cs3130-fall2023/pld.php.
2 Guidelines
Consider the following code:
long count_odd(long *arr, long len) {
long ans = 0;
for(long i=0; i<len; i+=1)
+= arr[i]&1;
ans return ans;
}
which has been compiled to (numbers inserted for ease of reference)
count_odd:
%eax, %eax
0. xorl %rsi, %rsi
1. testq jle .LBB0_2
2. .LBB0_1:
movq (%rdi), %rcx
3. $1, %ecx
4. andl %rcx, %rax
5. addq $8, %rdi
6. addq %rsi
7. decq jne .LBB0_1
8. .LBB0_2:
9. retq
Assume that this is run on a five-stage pipeline, where
every instruction must go through all five stages in order
each stage may handle at most one instruction per cycle
decode needs to stall until values (program register and condition code) are available, but only needs values by the end of its cycle.
e.g., if a memory read is followed by an operation on the read value, the decode stage of the operation can be co-scheduled with the memory stage of the read.
read F D E M W use F D D E M W e.g., if a computation is followed by an operation on the computed value, the decode stage of the operation can be co-scheduled with the execute stage of the computation.
compute F D E M W use F D E M W
computation not available until the end of execute includes
- results of arithmetic, logic, and address computation
- condition code creation from test, compare, arithmetic, and logic
- condition code comparison for conditional moves and jumps
e.g., if a conditional jump is poorly speculated (predicted), the correct instruction will be fetched after the jump completes its execute stage.
jump F D E M W guess1 F D guess2 F correct F D E M W (where guess1 and guess2 represent two incorrect predicted instructions that are fetched and then squashed (cancelled) before they can complete).
conditional jumps are speculated using the
backwards-taken forwards-not-taken
branch prediction heuristic (meaning that jumps to a later instruction are predicted to be not taken and jumps to an earlier instruction are predicted to be taken)When a conditional jump is predicted to be taken, the processor will fetch the target of the jump in the cycle after the jump is fetched. When a conditional jump is predicted to be not taken, the processor will fetch the following instruction in memory in the cycle after the jump is fetched.
When a prediction is correct, the processor will continue executing instructions without delay:
jump F D E M W guess F D E M W When a prediction is incorrect, the processor will need to cancel the predicted instructions and fetch the correct instruction during the jump’s memory stage, as in the example timeline under the previous bullet
You’re asked to expand this assuming
- you start with the first instruction after
count_odd
is called - the first instruction of
count_odd
does not need to stall arr
pointing to the array{1, 2}
len
stores the value2
- the
retq
is the last instruction you need to handle