[LLVMdev] Lowering Atomic Load to Acquire and Load

Fri Aug 2 16:15:24 PDT 2013

On Thu, Aug 1, 2013 at 12:36 PM, Sam Cristall <cristall at eleveneng.com> wrote:
> I'm working with an experimental backend for an MCU with heavy
> multithreading capabilities but lacks proper acquire/release semantics.
> This is okay, as the programmer can customize __cxa_guard_acquire and
> __cxa_guard_release to lower/raise appropriate semaphores.  The issue I'm
> having is that I can't seem to figure out when to lower atomic load into an
> acquire/load pair early enough that the __cxa_guard_acquire is evaluated for
> optimization (most importantly inlining.)  First, is this even the proper
> way to do this and further am I going about this the wrong way and is there
> a "best time" to do a pass to catch these guys?

The code clang generates for a guarded initialization looks like this normally:

entry:
  %0 = load atomic i8* bitcast (i64* @_ZGVZ3barvE1x to i8*) acquire, align 8
  %guard.uninitialized = icmp eq i8 %0, 0
  br i1 %guard.uninitialized, label %init.check, label %init.end

init.check:                                       ; preds = %entry
  %1 = tail call i32 @__cxa_guard_acquire(i64* @_ZGVZ3barvE1x) #1
  %tobool = icmp eq i32 %1, 0
  br i1 %tobool, label %init.end, label %init

init:                                             ; preds = %init.check
  %call = tail call i32 @_Z3foov() #1
  store i32 %call, i32* @_ZZ3barvE1x, align 4, !tbaa !0
  tail call void @__cxa_guard_release(i64* @_ZGVZ3barvE1x) #1
  br label %init.end

Given this, there is no reason to inline the call to
__cxa_guard_acquire; it would bloat code-size for no performance
benefit.

What does the IR you are working with look like?

-Eli