[PATCH] Implement low-level ARM ldrex/strex intrinsics

Tim Northover t.p.northover at gmail.com
Sat Jul 13 12:35:43 PDT 2013


>> That's much less scary than you made it sound originally.  That isn't too bad.
>
> I'm not sure about that. I think the monitor can be by set, so you can
> cause a self-deadlock quite easily by storing in a ldrex/strex loop.

A lot of it's implementation-defined (in the CPU implementer's sense
of the word), but it's apparently (ARMARM A3.4.3) Memory_address[31:a]
where 3 <= a <= 11. I don't see any reference to allowing the decision
to be made by set.

> Shouldn't you use something like this for the intrinsic (borrowing
> from some NaCl code I'm playing with):
> def int_nacl_atomic_load : Intrinsic<[llvm_anyint_ty],
>     [LLVMPointerType<LLVMMatchType<0>>, llvm_i32_ty],
>     [IntrReadWriteArgMem]>;
>
> I think all your intrinsics need IntrReadWriteArgMem, to act more like barriers.

I really want about the most conservative interpretation possible. I
thought that was with no specifier (though admittedly only from the
vagueish documentation). In particular just reading and writing
argument memory is too weak, in my opinion. They shouldn't be
reordered with other ldrex/strex intrinsics even if LLVM can prove
that there's no overlap in the addresses.

These instructions notionally write to a separate monitor which can
probably be thought of as a random, fixed memory location as far as
LLVM is concerned.

> I may also be missing something obvious, but how does the Clang part
> handle alignment? It'll fail when the pointer isn't naturally aligned.
> I don't know if you have sufficient information to at least catch some
> of this issue.

That's probably a good point, Clang should be able to diagnose at
least some obvious cases. I'll look into it.

> Also, ldrex/strex can be reg-imm on Thumb, I don't think you allow
> this? I'm not sure you'd want to, and I guess SelectionDAG should be
> able to fold the immediate into the address.

I'm not sure what you mean. The address can accept an offset on 32-bit
Thumb2, which I do account for (and make use of, and test) with the
new ComplexPattern. Is there some other way they can make use of an
immediate?

Cheers.

Tim.



More information about the cfe-commits mailing list