hpca ยป

hpca/pipelining-problem-set

Problem 1

Consider the following assembly language program:

1:  MOV R3, R7
2:  LD R8, (R3)
3:  ADD R3, R3, 4
4:  LOAD R9, (R3)
5:  BNE R8, R9, L3

This program includes WAW (write after write), RAW (read after write), and WAR (write after read) dependencies.  Show these.  If the dependency is between instructions 1 and 2, write it as L1-L2.

Instructor notes: Adapted from Computer Architecture, a Quantitative Approach, by Hennessy and Patterson, 5th ed.

Problem 2

We have a single stage, non-pipelined machine and a pipelined machine with 5 pipeline stages.  The cycle time of the former is 5ns and the latter is 1ns.

a. Assuming no stalls, what is the speedup of the pipelined machine over the single stage machine?

b. Given the pipeline stalls 1 cycle for 40% of the instructions, what is the speedup now?

Instructor notes: Adapted from http://www.csee.umbc.edu/~olano/611f12/HW2%20Sol.pdf

Problem 3

Use the following code fragment:

1:  Loop: LD R1, 0(R2)
2:  DADDI R1, R1, #1
3:  SD 0(R2), R1
4:  DADDI R2, R2, #4
5:  DSUB R4, R3, R2
6:  BNEZ R4, Loop

Is there a true (RAW, read after write) data dependency between the lines listed below?

line 1 -> 2
line 2 -> 3
line 3 -> 4
line 4 -> 5
line 5 -> 6

Instructor notes:

Adapted from: Computer Architecture a Quantitative Approach; Hennessy and Patterson, 5th Edition

Problem 4

Use the code fragment from Problem 3.

Show the timing of this instruction for a 5 stage pipeline along with the number of cycles required to execute one iteration of the loop with no forwarding.  What is the number of cycles required?

Instructor notes: Adapted from Computer Architecture a Quantitative Approach; Hennessy and Patterson, 5th Edition

Problem 5

Use the code fragment from Problem 3.

Show the timing of this instruction sequence for a 5-stage pipeline along with the number of cycles required to execute one iteration of the loop with forwarding.  Assume registers can be written and red in the same cycle, during writeback.  (The number of cycles for the execution of one iteration of the loop ends after the EX stage of the BNEZ instruction).  What is the number of cycles required?

Instructor notes: Adapted from Computer Architecture a Quantitative Approach; Hennessy and Patterson, 5th Edition

Problem 6

Individual stages of a processor have the following latencies:

IF   instruction fetch    210 ps
ID   instruction decode    90 ps
EX   execute              110 ps
MEM  memory access        240 ps
WB   register write-back   50 ps

If the processor is pipelined, each pipeline latch adds a latency of 20ps to the stage that precedes it - this is a so-called "setup latency," where the signals need to be stable at the input of the latch for some amount of time before they can be latched correctly at the end of the cycle.

In this approach, no pipeline is used, and in each cycle one instruction is executed from start (IF) to finish (WB).

What is the clock cycle time if we implement this processor using a single-cycle approach (in ps)?

Problem 7

Individual stages of a processor have latencies as in problem 6.  Pipelining adds setup latency as described in problem 6.

What is the clock cycle time if we implement this processor using a 5-stage pipeline (in ps)?

Problem 8

Individual stages of a processor have latencies as in problem 6.  Pipelining adds setup latency as described in problem 6.

What is the speedup of the piplined processor from problem 7 over the single-cycle processor from problem 6 if the single-cycle processor has a CPI of 1 and the pipelined processor achieves a CPI of 1.2?

Problem 9

Individual stages of a processor have latencies as in problem 6.  Pipelining adds setup latency as described in problem 6.

If this processor must be implemented with a 3-stage pipeline, some of the existing five stages must be combined (assume that the existing 5 stages cannot be split).  Which of the existing five stages should be placed into which stage of the 3-stage pipeline to minimize the resulting clock cycle time?

Problem 10

Individual stages of a processor have latencies as in problem 6.  Pipelining adds setup latency as described in problem 6.

If this processor is to be implemented with a 6-stage pipeline, but the design effort and time-to-market are such that there is only enough time to split one of the five existing stages into two new stages, which would you chose to split?