Consider the following MIPS code, which sums up into register $2 all elements of an array whose size is in $4 and whose address is in register $5:
add $2,$0,$0 ; Initialize $2 to zero loop: beqz $4,end ; Finish if $4 is zero lw $3,0($5) ; Load array element (note: it’s a 4-byte word) add $2,$2,$3 ; Add to sum addi $5,$5,4 ; Point to next element (each element is 4 bytes) addi $4,$4,-1 ; Decrement element count b loop ; Go to next iteration end:
Assume that we have a simple processor that executes instructions one at a time and every non-memory instruction executes in one cycle. However, the memory is slow so the processor includes a small 256-byte direct-mapped cache with 16-byte blocks. If a memory instruction is a cache hit, it can execute in a single cycle just like any other instruction. However, if the memory instruction is a cache miss, it takes 51 cycle to execute – 50 cycles to bring the block from memory to the cache and one to actually execute the instruction.
If we run this code on an array that has 512 elements (i.e. $4 is 512 at the start) and the cache starts off as empty:
How many cycles does it take to execute this code? Explain your answer.
We execute 1 initial instruction (init $2), then the 6 instructions in the loop execute 512 times, and then beqz executes one more time to exit the loop. Overall, we execute 3074 (1+6512+1) instructions Assuming that the array starts at the beginning of a cache block, we will have 128 misses (array size if 2048 bytes, so we end up reading 128 16-byte blocks and have a miss the first time we access each block) The penalty for each miss is 50 cycles, so we need a total of 9474 (3074+50128) cycles. If the array is not block-aligned, there will be an additional miss (50 more cycles).