Consider the following MIPS code. Note that R0 (register 0) in the MIPS ISA always has a value of zero. Also note that MIPS instructions always start at word-aligned addresses, so the two least significant bits of the instruction’s address are always zero. As a result, the two least significant bits are never used to index branch predictor tables. Instead, these bits are ignored and branch predictor table is indexed using the least significant bits of the instruction address that remain after the two least significant bits are removed.

```
0x110 li R2,0 ; v=0
0x114 li R3,16 ; Loop bound for LoopI
0x118 li R4,0 ; i=0
LoopI:
0x11C beq R4,R3,EndLoopI ; Exit LoopI if i==16
0x120 li R5,0 ; j=0
LoopJ:
0x124 beq R5,R3,EndLoopJ ; Exit LoopJ if j==16
0x128 add R6,R5,R4 ; j+i
0x12C andi R6,R6,3 ; (j+i)%4
0x130 bne R6,R0,EndIf ; Skip if (j+i)%4!=0
0x134 add R2,R2,R5 ; v+=j
EndIf:
0x138 addi R5,R5,1 ; j++
0x13C beq R0,R0,LoopJ ; Go back to LoopJ
EndLoopJ:
0x140 addi R4,R4,1 ; i++
0x144 beq R0,R0,LoopI ; Go back to LoopI
EndLoopI:
```

For this code, determine the exact number of branch mispredictions that occur during its execution (show your work), if we use A one-bit predictor with 8 entries. All predictor entries start out with zero values.

**Solution:**

The five branches have addresses of 0x11C, 0x124, 0x130, 0x13C, and 0x144. These addresses map to predictor entries numbered (in binary) 111, 001, 100, 111, and 001, o branches 0x11C and 0x13C use the same entry (111), and also branches 0x124 and 0x144 both use entry 001.

The branch at 0x11C is not taken 16 times and is then taken once. The branch at 0x13C, which maps to the same predictor entry, is taken 16 times per iteration of the outer loop. Therefore, the branch pattern observed by predictor entry 111 is 16 repetitions of {NT (from 0x11C), 16 times T (from 0x13C)} followed by one {T (from 0x11C)}. The predictor starts out predicting NT, so its first prediction is correct. Then it predicts NT again, which is a misprediction for the first (of 16) T outcomes produced by 0x13C. The remaining T outcomes from 0x13C will be predicted correctly. In the next iteration of the outer loop, the outcome for 0x11C will be mispredicted (prediction is T, outcome is NT), then the first outcome for 0x13C will be mispredicted (now prediction is NT, but outcome is T), and the remaining 15 T outcomes for 0x13C are predicted correctly. This pattern (two mispredictions, 15 correct predictions) repeats for all the remaining iterations of the outer loop. The last outcome for 0x11C (the one T it does to exit the loop) is predicted correctly because the previous outcome seen by the predictor was a T from 0x13C. Overall, we have a total of 31 (1+15*2) mispredictions from branches 0x11C and 0x13C.

The branch at 0x124 is NT 16 times, then T once, for each iteration of the outer loop. The branch at 0x144, which uses the same predictor entry, is taken once at the very end of each iteration of the outer loop. Therefore, the pattern observed by predictor entry 001 is 16 repetitions of {16 times NT (from 0x124), then 2 times T (once from 0x124, one from 0x144)}. The predictor starts out predicting NT, so its first 16 predictions are correct. Then the first T outcome is mispredicted (prediction is still NT), the predictor is updated, and the second T outcome is correctly predicted. In the remaining 15 iterations of the outer loop, the first NT outcome is mispredicted, the next 15 are predicted correctly, then the first T outcome is mispredicted, then the second T outcome is predicted correctly. Overall, we have a total of 31 (1+15*2) mispredictions from branches 0x124 and 0x144.

The branch at 0x130 has a more complex pattern of outcomes. In the first iteration of the outer loop, the value of “i” is 0, so this branch is NT when “j” is 0, 4, 8, and 12, and is T otherwise. Therefore, we have a pattern of {NT, T, T, T} repeated 4 times during this first iteration of the outer loop. This results in correct prediction of the first outcome (NT), misprediction for the first T, correct prediction of the next two T outcomes, then three times we have 2 mispredictions (NT predicted as T, then T predicted as NT) and 2 correct predictions. The total number of mispredictions during the first iteration of the outer loop is 7 (1+3*2) and the predictor is left predicting T. In the next iteration of the outer loop, “i” is 1, so the outcome is NT when “j” is 3, 7, 11, and 15. Therefore, we have a pattern of {T, T, T, NT} repeated 4 times. The first T is predicted correctly, and so are the next two T outcomes. Then NT is mispredicted, the next T is also mispredicted, then two correct T predictions, then another NT misprediction, another T misprediction, etc. The total number of mispredictions in this second iteration of the outer loop is again 7 (every of the four NT outcomes and three of the T outcomes) and the predictor starts out the next iteration predicting NT. In that next iteration, the pattern is {T,T,NT,T} repeated 4 times, resulting in a misprediction for the first T outcome, and then misprediction of each NT and the T that follows it, for a total of 9 (1+4*2) mispredictions, and leaving the predictor to predict T for the next iteration of the outer loop. In the fourth iteration of the outer loop, the pattern of outcomes is {T,NT,T,T} repeated 4 times, resulting in 8 mispredictions (each NT and the T that follows it), and leaving the predictor to predict T for the first outcome in the fifth iteration. That fifth iteration repeats the outcome pattern of the first iteration ({NT,T,T,T} repeated 4 times) and results in 8 mispredictions (same as the first, except that now the initial NT is also mispredicted). After that, iterations 6, 7, and 8 again result in 7, 9, and 8 mispredictions, respectively. Iterations 9, 10, 11, and 12 repeat the pattern of outcomes and mispredictions from iterations 5, 6, 7, and 8, respectively, and the same patterns repeat again for the last four iterations. Overall, we have a total of 127 mispredictions (7+7+9+8+3*(8+7+9+8)) from branch 0x130.

In summary, the number of mispredictions when this code is executed with the 8-entry 1-bit predictor is 189 (31+31+127).