Pipelining/Problem12

Consider the following assembly program:

0 ADD r3,r31,r2
1 LW r6,0(r3)
2 ANDI r7,r5,#3
3 ADD r1,r6,r0
4 SRL r7,r0,#8
5 OR r2,r4,r7
6 SUB r5,r3,r4
7 ADD r15,r1,r10
8 LW r6,0(r5)
9 SUB r2,r1,r6
10 ANDI r3,r7,#15

Assume the use of a four-stage pipeline: fetch (IF), decode/issue (DI), execute (EX) and write back (WB). Assume that all pipeline stages take one clock cycle except for the execute stage. For simple integer arithmetic and logical instructions, the execute stage takes one cycle, but for a load from memory, ve cycles are needed in the execute stage. Suppose we have a simple scalar pipeline but allow some sort of out-of-order execution that results in the following table for the fi rst seven instructions:

Instruction IF DI EX WB
0 ADD r3,r31,r2 0 1 2 3
1 LW r6,0(r3) 1 2 4a 9
2 ANDI r7,r5,#3 2 3 5b 6
3 ADD r1,r6,r0 3 4 10 11
4 SRL r7,r0,#8 4 5 6 7
5 OR r2,r4,r7 5 6 8 10
6 SUB r5,r3,r4 6 7 9 12c

A number in the table indicates the clock cycle a certain instruction starts at a pipeline stage. There are a lot of implementation details that can be deduced from the execution table above. a) Explain why the rst lw-instruction (instruction 1) cannot start in the execute stage until clock cycle 4. b) Explain why the rst and-instruction (instruction 2) cannot start the execution stage until clock cycle 5. c) Explain why the rst sub-instruction (instruction 6) cannot start the write back stage until clock cycle 12. d) Complete the table for the remaining instructions. e) Suppose instruction 2 was changed to: ANDI r6,r5,#3. What implications would that have on the design of the pipeline? What would the table look like?

Adapted from "Exercises for EITF20 Computer Architecture HTI 2010", Anders Ardo, Lund University

Solution

a) Because there is no forwarding from the WB-stage and the correct value of source register r3 is therefore not available until clock cycle 4.

b) Only one instruction can be issued in each clock cycle and since instruction 1 has to wait, instruction 2 also must wait one clock cycle.

c) The write back stage is occupied in clock cycles 10 and 11 by instructions that have been issued earlier.

d) Here is the continuation of the table:

Instruction IF DI EX WB Comment
0 ADD r3,r31,r2 0 1 2 3
1 LW r6,0(r3) 1 2 4 9
2 ANDI r7,r5,#3 2 3 5 6
3 ADD r1,r6,r0 3 4 10 11
4 SRL r7,r0,#8 4 5 6 7
5 OR r2,r4,r7 5 6 8 10
6 SUB r5,r3,r4 6 7 9 12
7 ADD r15,r1,r10 7 8 12 13 Wait for register r1
8 LW r6,0(r5) 8 9 13 18 Wait for register r5
9 SUB r2,r1,r6 9 10 19 20 Wait for register r6
10 ANDI r3,r7,#15 10 11 14 15 Can execute out-of-order

e) The main implication is that instruction 2 is not allowed to execute out-of-order in relation to instruction 1. There is a name-dependence between these instructions and if instruction 2 completes before instruction 1 there will be a WAW-hazard. Instruction 2 is stalled in the DI stage and the table must be modi ed:

Instruction IF DI EX WB
0 ADD r3,r31,r2 0 1 2 3
1 LW r6,0(r3) 1 2 4 9
2 ANDI r6,r5,#3 2 3 10 11
3 ADD r1,r6,r0 3 4 11 12
4 SRL r7,r0,#8 4 5 6 7
5 OR r2,r4,r7 5 6 8 11
6 SUB r5,r3,r4 6 7 10 13