hpca ยป

This problem assumes that we are using a processor with a 15-stage pipeline and that the actual direction (taken or not taken) for a branch is determined during the 10th stage. The processor is executing a program in which 10% of executed instructions are conditional branches, and the only stalls occur when the direction of a conditional branch is mispredicted. The accuracies of different predictors for this program are as follows:

```
Always taken: 45%
Always not taken: 55%
2-bit 90%
Global 95%
Tournament 98%
```

What is the branch misprediction penalty (in terms of cycles)? In other words, how many cycles are added to the execution time if we have a branch misprediction instead of a correct prediction.

Assume the processor and predictors described in Problem 1.

For an 'always taken' branch predictor, what is the CPI achieved by the processor?

Assume the processor and predictors described in Problem 1.

For an 'always not taken' branch predictor, what is the CPI achieved by the processor?

Assume the processor and predictors described in Problem 1.

For a 2-bit branch predictor, what is the CPI achieved by the processor?

Assume the processor and predictors described in Problem 1.

For a global branch predictor, what is the CPI achieved by the processor?

Assume the processor and predictors described in Problem 1.

For a tournament branch predictor, what is the CPI achieved by the processor?

Assume the processor and predictors described in Problem 1.

Some branch instructions are much more predictable than others. If we know that 80% of all executed conditional branch instructions are easy-to-predict loop-back branches that are always predicted correctly by a dynamic predictor, what is the accuracy of the *dynamic* predictor on the remaining 20% of conditional branch instructions if a 2-bit predictor is used?

Assume the processor and predictors described in Problem 1. Assume branch instructions are as predictable as described in Problem 7.

What is the accuracy of the dynamic predictor on the remaining 20% of conditional branch instructions if a *global* predictor is used?

Assume the processor and predictors described in Problem 1. Assume branch instructions are as predictable as described in Problem 7.

What is the accuracy of the dynamic predictor on the remaining 20% of conditional branch instructions if a *tournament* predictor is used?

Assume the processor and predictors described in Problem 1.

For each of these branch predictors, what speedup would be achieved if we apply a program transformation that replaces a conditional branch instruction with an ADD instruction, but can only be applied to half of the branch instructions executed in this program? Note: assume that correctly and incorrectly predicted instructions have the same chance of being replaced during this program transformation.

What is the speedup for the *always taken* predictor?

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 10.

What is the speedup for the *global* predictor?

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 10.

What is the speedup for the *always not taken* predictor?

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 10.

What is the speedup for the *tournament* predictor?

What is the speedup for the *2-bit* predictor?

Assume the processor and predictors described in Problem 1.

For each of these branch predictors, what speedup would be achieved if we apply a program transformation that replaces a conditional branch instruction with two ADD instructions, but can only be applied to half of the branch instructions executed in this program?

What is the speedup for the *always taken* predictor?

Instructor notes: There are two separate issues here.

1) Can we use the CPI alone to compute the speedup if the program changes?

2) How do we compute the speedup in this problem?

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 15.

What is the speedup for the *global* predictor?

Instructor notes: IF clock cycle time stays the same AND IF the instructions that get executed are exactly the same instructions, THEN we can use CPI to compute the speedup. In this case the instructions are not exactly the same, so CPI alone gives an incorrect speedup.

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 15.

What is the speedup for the *always not taken* predictor?

Assume the processor and predictors described in Problem 1. Assume the program transformation described in Problem 15.

What is the speedup for the *tournament* predictor?

What is the speedup for the *2-bit* predictor?