[PATCH] Implement low-level ARM ldrex/strex intrinsics

Eli Friedman eli.friedman at gmail.com
Sat Jul 13 11:59:56 PDT 2013

On Sat, Jul 13, 2013 at 5:10 AM, Tim Northover <t.p.northover at gmail.com> wrote:
> Hi Eli,
>> The obvious objection is that the definition of these intrinsics is
>> basically "this intrinsic has undefined behavior".
> I think that's rather pessimistic, though I can see how my warning
> could have given that impression.
> In practice I believe CPUs implement the exclusive monitors at the
> level of the cache line, and "unrelated" loads and stores are not
> allowed to interfere by the specification. On a CPU like this, the
> only way LLVM could interfere would be with a spill to a stack slot on
> the same cache line as the atomic variable being accessed.

That's much less scary than you made it sound originally.  That isn't too bad.

> I think that leaves a large segment of uses which are guaranteed to
> work: put your ldrexed variables somewhere other than the stack and
> either make sure they're alone on their cache line or don't store to
> any others nearby between the ldrex and strex.
> Navigating that space might be tricky, but given that the alternative
> in use now is inline assembly, we can probably assume some degree of
> competence for the users of these intrinsics.

There's still the issue that code motion could potentially mess up the
required performance characteristics... but that's unlikely given that
most uses of strex are going to be in a loop.

> Also, these already exist in other compilers (and ours!) which make no
> more guarantees than we do as far as I'm aware. That doesn't
> necessarily mean we should follow, but it might suggest the issues are
> surmountable.

Okay... please make the following changes to the documentation: (1)
clearly state that users should use the atomic intrinsics if possible,
and (2) clarify the specific risks in terms of the optimizer and how
to mitigate them (avoid the stack, use memory barriers, noinline,

+  // GCC does an implicit conversion to the pointer or integer ValType.  This
+  // can fail in some cases (1i -> int**), check for this error case now.

Stale comment?

+    if (RealResTy->isPointerTy())
+      return Builder.CreateIntToPtr(Val, RealResTy);
+    else
+      return Builder.CreateTruncOrBitCast(Val, RealResTy);

This is implicitly assuming the return type is a pointer, an integer,
or a 32-bit float.  This is probably mostly safe, but have you
considered the case of a half*?

+    Diag(DRE->getLocStart(), diag::err_atomic_builtin_pointer_size)
+      << PointerArg->getType() << PointerArg->getSourceRange();

This diagnostic is misleading.


More information about the cfe-commits mailing list