hpca »

hpca/branches-problem-set

Problem 1

0x110    li  R2, 0               ; v = 0
0x114    li  R3, 16              ; Loop bound for LoopI
0x118    li  R4, 0               ; i = 0
     LoopI:
0x11c    beq R4, R3, EndLoopI    ; Exit LoopI if i==16
0x120    li  R5, 0               ; j = 0
     LoopJ:
0x124    beq R5, R3, EndLoopJ    ; Exit LoopJ if j==16
0x128    add R6, R5, R4          ; j + i
0x12c    andi R6, R6, 3          ; (j+i) % 4
0x130    bne R6, R0, EndIf       ; Skip if (j+i)%4 != 0
0x134    add R2, R2, R5          ; v += j
     EndIf:
0x138    addi R5, R5, 1          ; j++
0x13c    beq R0, R0, LoopJ       ; Go back to LoopJ
     EndLoopJ:
0x140    addi R4, R4, 1          ; i++
0x144    beq R0, R0, LoopI       ; Go back to LoopI
     EndLoopI:

For this code, determine the exact number of branch mispredictions that occur during its execution (show your work), if we use an always-taken predictor.

Instructor notes: For the given MIPS code, note that R0 (register 0) in the MIPS ISA always has a value of zero. Also note that MIPS instructions always start at word-aligned addresses, so the two least significant bits of the instruction’s address are always zero. As a result, the two least significant bits are never used to index branch predictor tables. Instead, these bits are ignored and branch predictor table is indexed using the least significant bits of the instruction address that remain after the two least significant bits are removed.

Problem 2

For the code from Problem 1, determine the exact number of branch mispredictions that occur if we use a one-bit predictor with 8 entries.  All predictor entries start out with zero values.

Mispredictions from branches 0x124 and 0x144?

Mispredictions from branch 0x130?

Mispredictions from branches 0x11c and 0x13c?

Problem 3

For the code from Problem 1, determine the exact number of branch mispredictions that occur if we use a local history predictor with 8 entries.  Each entry has a 2-bit local history and four 2-bit counters.  All history bits and 2-bit counters start out with zero values.  For each 2-bit counter, use the 0-1-2-3/3-2-1-0 counting method from the lectures, not the 0-1-3/3-2-0 method from the textbook.

Problem 4

Consider an unpipelined processor.  Assume that it has a 1-ns clock cycle and that it uses 4 cycles for ALU operations and 5 cycles for branches and 4 cycles for memory operations.  Assume that the relative frequencies of these operations are 50%, 35%, and 15% respectively.

Suppose that due to clock skew and set up, pipelining the processor adds 0.15ns of overhead to the clock.  Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?