VirtualMemory/Problem 3

Consider the following three hypothetical, but not atypical, processors, which we run with the SPEC gcc benchmark.

1. A simple MIPS two-issue static pipe running at a clock rate of 4 GHz and achieving a pipeline CPI of 0.8. This processor has a cache system that yields 0.005 misses per instruction.

2. A deeply pipelined version of a two-issue MIP processor with slightly smaller caches and a 5 GHz clock rate. The pipeline CPI of the processor is 1.0, and the smaller caches yield 0.0055 misses per instruction on average.

3. A speculative MIPS, superscalar with a 64-entry window. It achieves one-half of the ideal issue rate measured for this window size (9 instruction issues per cycle). This processor has the smallest caches, which leads to 0.01 misses per instruction, but hides scheduling. This processor has a 2.5 GHz clock.

Assume that the main memory time (which sets the miss penalty) is 50 ns. Determine the relative performance of these three processors.

Solution

First, we use the miss penalty and miss rate information to compute the contribution to CPI from cache misses for each configuration. We do this with the formula:

`````` Cache CPI = Misses per instruction * Miss Penalty
``````

We need to compute the miss penalties for each system:

`````` Miss Penalty = Memory Access Time / Clock Cycle
``````

The clock cycle times for the processors are 250 ps, 200 ps, and 400 ps, respectively. Hence, the miss penalties are

`````` 1 : 50 ns / 250 ps = 200 cycles

2 : 50 ns / 200 ps = 250 cycles

3 : (0:75  * 50 ns) / 400 ps = 94 cycles
``````

Applying this for each cache:

`````` CP I1 = 0:005 *  200 = 1:0

CP I2 = 0:0055 * 250 = 1:4

CP I3 = 0.01  * 94 = 0.94
``````

We know the pipeline CPI contribution for everything but processor 3; its pipeline CPI is given by:

Pipeline CPI = 1 / Issue rate = 1 / 9 * 0.5 = 1 / 4.5 = 0.22

Now we find the CPI for each processor by adding the pipeline and cache CPI contributions:

`````` 1 : 0.8 + 1.0 = 1.8

2 : 1.0 + 1.4 = 2.4

3 : 0.22 + 0.94 = 1.16
``````

Since this is the same architecture, we can compare instruction execution rates in millions of instructions per second (MIPS) to determine relative performance CR / CPI as

`````` 1 : 4000 MHz / 1.8 = 2222 MIPS

2 : 5000 MHz / 2.4 = 2083 MIPS

3 : 2500 MHz / 1.16 = 2155 MIP S
``````

In this example, the simple two-issue static superscalar looks best. In practice, performance depends on both the CPI and clock rate assumption.