Problem Set Solutions/Branches/Problem 1

Consider the following MIPS code. Note that R0 (register 0) in the MIPS ISA always has a value of zero. Also note that MIPS instructions always start at word-aligned addresses, so the two least significant bits of the instruction’s address are always zero. As a result, the two least significant bits are never used to index branch predictor tables. Instead, these bits are ignored and branch predictor table is indexed using the least significant bits of the instruction address that remain after the two least significant bits are removed.

0x110       li   R2,0           ; v=0 
0x114       li   R3,16      ; Loop bound for LoopI 
0x118       li   R4,0           ; i=0  
    LoopI: 
0x11C       beq  R4,R3,EndLoopI ; Exit LoopI if i==16  
0x120       li   R5,0           ; j=0
    LoopJ: 
0x124       beq  R5,R3,EndLoopJ ; Exit LoopJ if j==16
0x128       add  R6,R5,R4       ; j+i  
0x12C       andi R6,R6,3        ; (j+i)%4 
0x130       bne  R6,R0,EndIf    ; Skip if (j+i)%4!=0
0x134       add R2,R2,R5        ; v+=j 
    EndIf:  
0x138       addi R5,R5,1        ; j++  
0x13C       beq  R0,R0,LoopJ    ; Go back to LoopJ 
EndLoopJ:
0x140       addi R4,R4,1        ; i++ 
0x144       beq     R0,R0,LoopI ; Go back to LoopI  
EndLoopI:

For this code, determine the exact number of branch mispredictions that occur during its execution (show your work), if we use an always-taken predictor.

Solution:

This code can be written in C as follows:

v=0;
for(i=0;i!=16;i++)
  for(j=0;j!=16;j++)
    if((j+i)%4)==0)
      v+=j;

The branch at 0x11C is executed 17 times, 16 times it is not taken (to stay in the outer loop) and once it is taken (to exit the outer loop). Correspondingly, the branch at 0x144 is taken 16 times (at the end of each iteration of the outer loop).

The branch at 0x124 is executed 17 times in each of the 16 iterations of the outer loop. It is NT 16 times, then T once, then NT 16 times, then T once, etc. Overall, this branch is not taken 256 times and taken 16 times. Correspondingly, the branch at 0x13C is taken 256 times (at the end of each iteration of the inner loop). The branch at 0x130 is NT when (j+i) is evenly divisible by, which happens 4 times per inner loop. In total, this branch is NT 64 times (416) and T 192 times (1216).

In summary, the number of mispredictions when this code is executed with the always-taken predictor is 336 (16+256+64).