[llvm-commits] [llvm] r73636 - /llvm/trunk/lib/VMCore/Pass.cpp

Thu Jun 18 01:26:50 PDT 2009

Hi Owen,

> +  // Use double-checked locking to safely initialize the registrar when
> +  // we're running in multithreaded mode.
>    if (!PassRegistrarObj)
> -    PassRegistrarObj = new PassRegistrar();
> +    if (llvm_is_multithreaded()) {
> +      llvm_acquire_global_lock();
> +      if (!PassRegistrarObj) {
> +        PassRegistrar* tmp = new PassRegistrar();
> +        sys::MemoryFence();
> +        PassRegistrarObj = tmp;
> +      }
> +      llvm_release_global_lock();
> +    } else
> +      PassRegistrarObj = new PassRegistrar();

usual style is to have {} around the else branch too.

More seriously, I think you need an additional fence in
the case when PassRegistrarObj is seen to be non-null at
the start, to prevent subsequent loads from PassRegistrarObj
being reordered and getting uninitialized values.

To explain what I mean, consider the following.  Processors
1 and 2 both have the value of the PassRegistrarObj pointer in
cache.  They also both have the memory corresponding to the malloced
object in cache but in a different cache line.  Initially
the value of the pointer is null.  Processor 1 mallocs some
memory (memory is in a cache line, see above), initializes
it (updates the values in its cache), and assigns the value
to the PassRegistrarObj (in a different cache line).  Now
processor 2 comes along.  If its cache line holding the
PassRegistrarObj has been updated then it will see the new
non-null value.  Now it reads values from the malloced memory,
but if this other cache line has not been synced with the
other processor's yet then it will see uninitialized values.
Of course you are going to say that sys::MemoryFence(); takes
care of this.  In fact it doesn't - both processors have to
execute a memory barrier, not just one.  In fact this is a
general rule: if there is only one memory barrier then you
are probably doing something wrong!  The memory barrier for
processor 1 means that it publishes information about which
cache lines it wrote to the rest of the system.  But other
processors can (and do) ignore that if they have not been
told themselves to not reorder reads (via a memory barrier).
When processor 2 sees its memory barrier (missing in your code)
then it will take a look at the published cache info and actually
do something.  Without a memory barrier it can read the cache
lines in the order it likes.

That said, most processors try to be helpful: when dereferencing
a pointer they apparently do an implicit memory barrier to get
the cache line of the pointee appropriately, so you tend not to
see this.  However it can happen on alpha - this caused all kind
of pain in the linux kernel it seems.

Ciao,

Duncan.

PS: Take a look at
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
and notice the two memory barriers on page 12.