Problem Set Solutions/Advanced Caches/Problem15

Consider the following MIPS code, which sums up into register $2 all elements of an array whose size is in $4 and whose address is in register $5:

add $2,$0,$0    ; Initialize $2 to zero
loop:   beqz $4,end ; Finish if $4 is zero
    lw $3,0($5) ; Load array element (note: it’s a 4-byte word)
    add $2,$2,$3    ; Add to sum
    addi $5,$5,4    ; Point to next element (each element is 4 bytes)
addi $4,$4,-1  ; Decrement element count
    b loop      ; Go to next iteration
end:

Assume that we have a simple processor that executes instructions one at a time and every non-memory instruction executes in one cycle. However, the memory is slow so the processor includes a small 256-byte direct-mapped cache with 16-byte blocks. If a memory instruction is a cache hit, it can execute in a single cycle just like any other instruction. However, if the memory instruction is a cache miss, it takes 51 cycle to execute – 50 cycles to bring the block from memory to the cache and one to actually execute the instruction.

If we run this code on an array that has 512 elements (i.e. $4 is 512 at the start) and the cache starts off as empty:

How many cycles does it take to execute this code? Explain your answer.

Solution:

We execute 1 initial instruction (init $2), then the 6 instructions in the loop execute 512 times, and then beqz executes one more time to exit the loop. Overall, we execute 3074 (1+6512+1) instructions Assuming that the array starts at the beginning of a cache block, we will have 128 misses (array size if 2048 bytes, so we end up reading 128 16-byte blocks and have a miss the first time we access each block) The penalty for each miss is 50 cycles, so we need a total of 9474 (3074+50128) cycles. If the array is not block-aligned, there will be an additional miss (50 more cycles).