[llvm-dev] RFC: Element-atomic memory intrinsics

Mon May 8 12:08:25 PDT 2017

Hi Sanjoy,
 Responses inlined…

> On May 8, 2017, at 12:49 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> Hi Daniel,
> 
> [+CC Mehdi, Vedant for the auto upgrade issue]
> 
> On Mon, May 8, 2017 at 7:54 AM, Daniel Neilson via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> **Method**
>> 
>> Clearly we are going to have to teach LLVM about unordered memory
>> intrinsics. There are, as I can see it, a couple of ways to accomplish this.
>> I’d like your opinions on which are preferable, or if you can think of any
>> other options. In no particular order…
>> 
>> Option 1)
>> Introduce a new unordered/element-atomic version of each of the memory
>> intrinsics.
>>  Ex: @llvm.memcpy_element_atomic — work was already started to introduce
>> this one in D27133, but could be backed out and restarted.
>>  Intrinsic prototype: @llvm.memcpy_element_atomic.<overload desc>(<ty>*
>> dest, <ty>* src, <ty> len, i32 align, i2 isunordered, i16 element_size)
>>    Semantics:
>>       * Will do a memcpy of len bytes from src to dest.
>>       * len must = k * lcm( #bytes in dest type, #bytes in src type), for
>> some non-negative integer k [note: lcm = least-common multiple]
>>       * load/store size given by the constant power-of-2 parameter
>> “element_size”; expected to be the lcm(sizeof(dest_ty), sizeof(src_ty))
> 
> I'm not sure if sizeof(dest_ty) and sizeof(src_ty) adds anything here.
> 
> LLVM is moving towards "typeless pointers" (i.e. pointers will not
> have pointee types, instead they will just be a "generic pointer" in
> some address space), so working in the types of dest and src into the
> specification seems awkward.
> 

 Poor choice of wording on my part. By sizeof(<thing>) I mean the element-size of the load/store that appeared in the original loop from which the memory intrinsic was materialized — arguably, these will be the same in any real cases so it probably doesn’t make any sense to even mention it...

> Also, does the non-overlap restriction of src and dest (as in the
> regular llvm.memcpy) apply here as well?
> 

 Yes, I would think so.

>>       * isunordered param: bit 0 == 1 => stores to dest must be marked
>> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered'
> 
> What if the bits are zero -- will the stores / loads (depending on
> which bit) be "ordered" in that case, or something stronger?
> 

 This is partly why I prefer option 2. An ‘isunordered’ value of 0 is nonsense for the standalone atomic-unordered memory intrinsic. It would imply that neither the source nor dest needs to be loaded/stored via unordered-atomic ops, and so the memory intrinsic is identical to the ordinary non-atomic one.

>> Option 2)
>>  Expand the current/existing memory intrinsics to identify the unordered
>> constraint, if one exists, in much the same way that volatility is expressed
>> — i.e. add an ‘isunordered’ parameter(s) to the intrinsic.
>>  This option has the same semantics as option 1; the only difference is,
>> literally, that we expand the existing memcpy/memset/memmove intrinsics to
>> have an ‘isunordered’ parameter and an ‘element_size’ parameter, so the
>> prototype becomes something like:
>>   @llvm.memcpy.<overload desc>(<ty>* dest, <ty>* src, <ty> len, i32 align,
>> i1 isvolatile, i2 isunordered, i16 element_size)
>> 
>> Pros:
>>   * Minimal extra work to handle the new version in existing passes — only
>> need to change passes that create calls to memory intrinsics, expand memory
>> intrinsics, or that need to care about unordered (which none should that are
>> reasoning about memory intrinsic semantics).
>>   * New code that’s introduced by others to exploit/handle memory
>> intrinsics should just handle unordered for free — unordered being a part of
>> the memory intrinsic means it’s more likely that the person will realize
>> that they have to think about it (i.e. it raises the profile of unordered
>> memory intrinsics).
> 
> I like the second point, but (unfortunately) I suspect in practice
> you'll see new code do:
> 
>  if (MCI->isOrdered())
>    return false;  // be conservative
> 

Yes, that would be an unfortunate reality, but one can hope. :-)

>> Cons:
>>   * Breaks backward compatibility of the IR — is there a mechanism for
>> migrating old IR into a new IR version when loading the IR into LLVM?
> 
> I think the migration here will be fairly straightforward -- you can
> just auto-upgrade old calls to memcpy to pass in 0 for the isordered
> argument.  But I've CC'd Mehdi and Vedant to help shed some light on
> this.
> 
> — Sanjoy

Thanks!

-Daniel

---
Daniel Neilson, Ph.D.
Azul Systems