Consider the following MIPS code. Note that R0 (register 0) in the MIPS ISA always has a value of zero. Also note that MIPS instructions always start at word-aligned addresses, so the two least significant bits of the instruction’s address are always zero. As a result, the two least significant bits are never used to index branch predictor tables. Instead, these bits are ignored and branch predictor table is indexed using the least significant bits of the instruction address that remain after the two least significant bits are removed.

```
0x110 li R2,0 ; v=0
0x114 li R3,16 ; Loop bound for LoopI
0x118 li R4,0 ; i=0
LoopI:
0x11C beq R4,R3,EndLoopI ; Exit LoopI if i==16
0x120 li R5,0 ; j=0
LoopJ:
0x124 beq R5,R3,EndLoopJ ; Exit LoopJ if j==16
0x128 add R6,R5,R4 ; j+i
0x12C andi R6,R6,3 ; (j+i)%4
0x130 bne R6,R0,EndIf ; Skip if (j+i)%4!=0
0x134 add R2,R2,R5 ; v+=j
EndIf:
0x138 addi R5,R5,1 ; j++
0x13C beq R0,R0,LoopJ ; Go back to LoopJ
EndLoopJ:
0x140 addi R4,R4,1 ; i++
0x144 beq R0,R0,LoopI ; Go back to LoopI
EndLoopI:
```

For this code, determine the exact number of branch mispredictions that occur during its execution (show your work), if we use an always-taken predictor.

**Solution:**

This code can be written in C as follows:

```
v=0;
for(i=0;i!=16;i++)
for(j=0;j!=16;j++)
if((j+i)%4)==0)
v+=j;
```

The branch at 0x11C is executed 17 times, 16 times it is not taken (to stay in the outer loop) and once it is taken (to exit the outer loop). Correspondingly, the branch at 0x144 is taken 16 times (at the end of each iteration of the outer loop).

The branch at 0x124 is executed 17 times in each of the 16 iterations of the outer loop. It is NT 16 times, then T once, then NT 16 times, then T once, etc. Overall, this branch is not taken 256 times and taken 16 times. Correspondingly, the branch at 0x13C is taken 256 times (at the end of each iteration of the inner loop).
The branch at 0x130 is NT when (j+i) is evenly divisible by, which happens 4 times per inner loop. In total, this branch is NT 64 times (4*16) and T 192 times (12*16).

In summary, the number of mispredictions when this code is executed with the always-taken predictor is 336 (16+256+64).