# hpca/Problem Set Solutions/Mulitprocessing/Problem15

Loop-invariant code motion is an optimization where the compiler identifies a statement that is repeated in a loop but can be moved outside the loop. For example, consider the following code:

``````for(i=0;i<1000;i++){
pi=3.14159265;
… code that reads the “pi” variable but does not change it…
}
``````

In this code, the compiler finds that the initialization of “pi” can be moved outside (before) the loop, which reduces execution time because it would be executed once instead of 1000 times. Consider the following code (initially, f is 0 and g is 1):

``````Thread 0:           Thread 1:

for(i=0;i<1000;i++){        for(j=0;j<1000;j++){
while(f==0);            while(g==0);
printf(“%d\n”,data);    data=…compute some value…;
f=0;                g=0;
g=1;                f=1;
}                   }
``````

When the compiler compiles the code for Thread 0 and then for Thread 1, what is the result if it applies loop-invariant code motion for the for-loop? What is the effect on the parallel execution of these two threads, i.e. what is the output of the program (printed by Thread 0)?

Solution:

When compiling the code for Thread 0, the compiler certainly finds that “g=1” is loop-invariant code: the value of g is never used in the loop and in each iteration “g” is set to the same value. As a result, it moves “g=1” outside the loop. Similarly, in Thread 1 the compiler takes “f=1” outside the loop. We get:

``````g=1;                f=1;
for(i=0;i<1000;i++){    for(j=0;j<1000;j++){
while(f==0);        while(g==0);
printf(“%d\n”,data);    data=…compute some value…;
f=0;                g=0;
}                   }
``````

Note that the “printf” line in Thread 0 is not loop invariant code – if we take any printf from inside to outside the loop, even if the compiler thinks the same value is being printed, the output of the program is changed (only one thing printed instead of printing 1,000 things).

Interestingly, the “data=…” line in Thread 1 may or may not be considered loop-invariant code, depending on how the value is computed. If the value computation updates some value or calls another function, then the compiler cannot move the value computation outside the loop. If the line is something like “data=2” then the compiler will find that this is loop-invariant code. If the line is something like “data=j” then this is not strictly loop-invariant code but the compiler may figure out that only the last value written is relevant, replace this with “j=1000”, and move that outside the loop.

Finally, “while(f==0);” and “f=0;” in Thread 0 may or may not be considered to be loop-invariant code, depending on how smart the compiler is. Note that f is both read and written inside the loop, and the compiler (usually) does not know its value before the loop. Also, for loop-invariant code motion the compiler has to ensure that the resulting code has the exact same behavior (in sequential execution) as the original code, and it is hard to produce equivalent code if accesses to “f” are moved outside the loop. This code must do “while(f==0);” then do “f=0” and print a data element, then do another “while(f==0);” and then enter a loop that prints “data” 999 more times. This is unlikely for a compiler to figure out, so this code is likely to not be considered loop-invariant. A similar discussion can be made for accesses to “g” in Thread 1.

Assuming that only “f=1” and “g=1” are moved (so we have the code shown above), we can get several different parallel executions (which are all wrong):

Option 1: Thread 0 executes “g=1” (note that “g” is already 1 initially). Then either Thread 1 does “f=1” and Thread 0 exists its while loop immediately or Thread 0 waits in the “while(f==0)’” loop until Thread 1 sets “f” to 1. Either way, after “f=1” in Thread 1 we have two options:

Option 1.1: Thread 0 can print a garbage “data” value (note that data is not initialized yet)

Option 1.2: Thread 1 may do its “while(g==0);” (note that t exits immediately because g is initially 1) and produce a data value, and then Thread 0 prints this value. Either way, this is the last thing Thread 1 prints because it then does “f=0;” and starts waiting for “f” to change (which will never happen). Similarly, Thread 1, after producing one value (either in time for it to be printed or after garbage has been printed) sets “g” to 0 and then waits forever in its “while(g==0);” loop.

Option 2: Thread 1 completes the first iteration (and sets “g” to zero) before Thread 0 manages to do “g=1”. In this case, either Thread 1 waits until Thread 0 sets “g” to 1, or Thread 0 sets “g” to 1 just before Thread 0 enters its “while” loop. Either way, Thread 0 will produce another data value and then get stuck, but we have two options again:

Option 2.1: Thread 0 prints a value before Thread 0 produces the second value. In this case, only the first value produced by Thread 1 is printed by Thread 0.

Option 2.2: Thread 0 prints a value after Thread 0 produces the second value, in which case only the second value produced by Thread 1 is printed by Thread 1. Either way, Thread 0 gets stuck after the first thing it prints so nothing else gets printed.

In summary, the visible behavior of the program is that both threads never complete execution and that exactly one value is printed, and this value can be 1) the first value produced by Thread 1, 2) the second value produced by Thread 2, or 3) any value (whatever happens to be in “data” before Thread 1 produces the first value).