Consider the following MIPS code, which sums up into register \$2 all elements of an array whose size is in \$4 and whose address is in register \$5:

``````add \$2,\$0,\$0    ; Initialize \$2 to zero
loop:   beqz \$4,end ; Finish if \$4 is zero
lw \$3,0(\$5) ; Load array element (note: it’s a 4-byte word)
addi \$5,\$5,4    ; Point to next element (each element is 4 bytes)
addi \$4,\$4,-1  ; Decrement element count
b loop      ; Go to next iteration
end:
``````

Assume that we have a simple processor that executes instructions one at a time and every non-memory instruction executes in one cycle. However, the memory is slow so the processor includes a small 256-byte direct-mapped cache with 16-byte blocks. If a memory instruction is a cache hit, it can execute in a single cycle just like any other instruction. However, if the memory instruction is a cache miss, it takes 51 cycle to execute – 50 cycles to bring the block from memory to the cache and one to actually execute the instruction.

If we run this code on an array that has 512 elements (i.e. \$4 is 512 at the start) and the cache starts off as empty:

How many cycles does it take to execute this code? Explain your answer.

Solution:

We execute 1 initial instruction (init \$2), then the 6 instructions in the loop execute 512 times, and then beqz executes one more time to exit the loop. Overall, we execute 3074 (1+6512+1) instructions Assuming that the array starts at the beginning of a cache block, we will have 128 misses (array size if 2048 bytes, so we end up reading 128 16-byte blocks and have a miss the first time we access each block) The penalty for each miss is 50 cycles, so we need a total of 9474 (3074+50128) cycles. If the array is not block-aligned, there will be an additional miss (50 more cycles).