Consider the following MIPS code, which sums up into register $2 all elements of an array whose size is in $4 and whose address is in register $5:
add $2,$0,$0 ; Initialize $2 to zero loop: beqz $4,end ; Finish if $4 is zero lw $3,0($5) ; Load array element (note: it’s a 4-byte word) add $2,$2,$3 ; Add to sum addi $5,$5,4 ; Point to next element (each element is 4 bytes) addi $4,$4,-1 ; Decrement element count b loop ; Go to next iteration end:
Assume that we have a simple processor that executes instructions one at a time and every non-memory instruction executes in one cycle. However, the memory is slow so the processor includes a small 256-byte direct-mapped cache with 16-byte blocks. If a memory instruction is a cache hit, it can execute in a single cycle just like any other instruction. However, if the memory instruction is a cache miss, it takes 51 cycle to execute – 50 cycles to bring the block from memory to the cache and one to actually execute the instruction.
If we run this code on an array that has 512 elements (i.e. $4 is 512 at the start) and the cache starts off as empty:
If you can make only one of these changes to the cache, which would you choose to help with this code?
Change total size to 1024 bytes (still direct mapped with 16-byte blocks) Change to fully associative (still 256 bytes in total with 16-byte block) Change block size to 32 (still direct mapped with 256-byte total size)
Options 1 and 2 do not help – cache capacity and associativity do not matter here, because all misses are compulsory. Option 3 eliminates half of the misses, so that’s what we must choose