hpca/Problem Set Solutions/Synchronization/Problem01

Loop-invariant code motion is an optimization where the compiler identifies a statement that is repeated in a loop but can be moved outside the loop. For example, consider the following code:

  … code that reads the “pi” variable but does not change it…

In this code, the compiler finds that the initialization of “pi” can be moved outside (before) the loop, which reduces execution time because it would be executed once instead of 1000 times. Consider the following code (initially, f is 0 and g is 1):

Thread 0:           Thread 1:

for(i=0;i<1000;i++){        for(j=0;j<1000;j++){
  while(f==0);            while(g==0); 
  printf(“%d\n”,data);    data=…compute some value…;
  f=0;                g=0;
  g=1;                f=1;
}                   }

What does this code do (what is printed by Thread 0)?


This code is intended to use Thread 1 as a producer thread that compute a value and puts it in “data”, and Thread 0 as a consumer thread that takes values from “data” and prints them out. If the compiler and the hardware both provide sequential consistency, this code actually works and Thread 0 prints out 1,000 integers computed by Thread 1.

The way the code works is as follows: Thread 0 spins until f becomes non-zero, which will not happen until Thread 1 is done placing a value in the “data” variable and chaned g to 0. Thread 1 then spins until g becomes non-zero, which will only happen after Thread 0 is done printing the data item and changing f back to 0, then Thread 0 again waits until Thread 1 produces another data item, etc.

This is for you information but is not required for getting the points in this problem: Note that this code is broken under weak or release consistency – accesses to “f” and “g” are intended to be synchronization, but they are not marked as such. Consequently, “g=0” and “f=1” in Thread 1 can be reordered. If this happens, Thread 1 can set “f” to 1 before it sets “g” to 0. After Thread 1 sets “f” to 1, Thread 0 can print out the data value, set “f” to 0, and set “g” to 1 before Thread 1 gets to set “g” to zero. After this, Thread 0 will spin waiting for “f” to become 1 (as it should), but Thread 0 will wait for “g” to become 1 (the value of 1 set by thread 0 was overwritten by the belated “g=0” from Thread 1). As a result, under weak or release consistency this program may deadlock after printing a few values. Note that, even under weak or release consistency, this program may correctly print out all 1,000 numbers – it can but does not have to deadlock.