Individual stages of the processor have the following latencies:
If the processor is pipelined, each pipeline latch adds a latency of 20ps to the stage that precedes it - this is a so-called “setup latency”, where the signals need to be stable at the input of the latch for some amount of time before they can be latched correctly at the end of the cycle.
If this processor must be implemented with a 3-stage pipeline, some of the existing five stages must be combined (assume that the existing 5 stages cannot be split). Which of the existing five stages should be placed into which stage of the 3-stage pipeline to minimize the resulting clock cycle time?
If the IF stage is left by itself, its latency would be 230ps (with the latch). If we then also leave the ID stage by itself, its latency is 110ps, and the third stage would contain the original EX, MEM, and WB stages, with a total latency of 400ps (no latch at the end of the last stage). The clock cycle time is the maximum of the three – 400ps. If we combine ID and EX, the latency is 220ps, and the last stage (MEM and WB) would be 290ps. The clock cycle time would then be 290ps, so the (IF, ID+EX, MEM+WB) 3-stage pipeline is beter than (IF, ID, EX+MEM+WB). If we combine ID, EX, and MEM the latency of that stage is 460ps, which is clearly longer than 290ps of the (IF, ID+EX, MEM+WB) pipeline. If we combine IF and ID, we have a 320ps stage, which is already worse than 290ps, and further combining other original stages into the new first stage does not improve things. Overall, the best 3-stage pipeline is (IF, ID+EX, MEM+WB) with a 290ps clock cycle time.