[llvm-dev] RFC: Element-atomic memory intrinsics
Vedant Kumar via llvm-dev
llvm-dev at lists.llvm.org
Mon May 8 11:22:43 PDT 2017
> On May 8, 2017, at 10:49 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>
> Hi Daniel,
>
> [+CC Mehdi, Vedant for the auto upgrade issue]
>
> On Mon, May 8, 2017 at 7:54 AM, Daniel Neilson via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> **Method**
>>
>> Clearly we are going to have to teach LLVM about unordered memory
>> intrinsics. There are, as I can see it, a couple of ways to accomplish this.
>> I’d like your opinions on which are preferable, or if you can think of any
>> other options. In no particular order…
>>
>> Option 1)
>> Introduce a new unordered/element-atomic version of each of the memory
>> intrinsics.
>> Ex: @llvm.memcpy_element_atomic — work was already started to introduce
>> this one in D27133, but could be backed out and restarted.
I'm curious about this -- do you know why the decision was made in D27133 to go with a new intrinsic, instead of extending the existing one? Would the same rationale apply here?
>> Intrinsic prototype: @llvm.memcpy_element_atomic.<overload desc>(<ty>*
>> dest, <ty>* src, <ty> len, i32 align, i2 isunordered, i16 element_size)
>> Semantics:
>> * Will do a memcpy of len bytes from src to dest.
>> * len must = k * lcm( #bytes in dest type, #bytes in src type), for
>> some non-negative integer k [note: lcm = least-common multiple]
>> * load/store size given by the constant power-of-2 parameter
>> “element_size”; expected to be the lcm(sizeof(dest_ty), sizeof(src_ty))
>
> I'm not sure if sizeof(dest_ty) and sizeof(src_ty) adds anything here.
>
> LLVM is moving towards "typeless pointers" (i.e. pointers will not
> have pointee types, instead they will just be a "generic pointer" in
> some address space), so working in the types of dest and src into the
> specification seems awkward.
>
> Also, does the non-overlap restriction of src and dest (as in the
> regular llvm.memcpy) apply here as well?
>
>> * isunordered param: bit 0 == 1 => stores to dest must be marked
>> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered'
>
> What if the bits are zero -- will the stores / loads (depending on
> which bit) be "ordered" in that case, or something stronger?
>
>> Option 2)
>> Expand the current/existing memory intrinsics to identify the unordered
>> constraint, if one exists, in much the same way that volatility is expressed
>> — i.e. add an ‘isunordered’ parameter(s) to the intrinsic.
>> This option has the same semantics as option 1; the only difference is,
>> literally, that we expand the existing memcpy/memset/memmove intrinsics to
>> have an ‘isunordered’ parameter and an ‘element_size’ parameter, so the
>> prototype becomes something like:
>> @llvm.memcpy.<overload desc>(<ty>* dest, <ty>* src, <ty> len, i32 align,
>> i1 isvolatile, i2 isunordered, i16 element_size)
>>
>> Pros:
>> * Minimal extra work to handle the new version in existing passes — only
>> need to change passes that create calls to memory intrinsics, expand memory
>> intrinsics, or that need to care about unordered (which none should that are
>> reasoning about memory intrinsic semantics).
>> * New code that’s introduced by others to exploit/handle memory
>> intrinsics should just handle unordered for free — unordered being a part of
>> the memory intrinsic means it’s more likely that the person will realize
>> that they have to think about it (i.e. it raises the profile of unordered
>> memory intrinsics).
>
> I like the second point, but (unfortunately) I suspect in practice
> you'll see new code do:
>
> if (MCI->isOrdered())
> return false; // be conservative
>
>> Cons:
>> * Breaks backward compatibility of the IR — is there a mechanism for
>> migrating old IR into a new IR version when loading the IR into LLVM?
>
> I think the migration here will be fairly straightforward -- you can
> just auto-upgrade old calls to memcpy to pass in 0 for the isordered
> argument. But I've CC'd Mehdi and Vedant to help shed some light on
> this.
LLVM has one test which for backwards-compatibility with old versions of the memcpy intrinsic, which provides limited coverage (standardCIntrinsic.3.2.ll).
Whichever option you choose, it would be helpful to add uses of the new intrinsic to test/Bitcode/compatibility.ll. If you choose Option 2, it would also help to copy these uses into the 4.0 bitcode compatibility test (with "is_unordered" dropped), and to re-generate the bitcode for the test. I can help out with this if you'd like.
vedant
>
> -- Sanjoy
More information about the llvm-dev
mailing list