[llvm-dev] RFC: Element-atomic memory intrinsics

Mon May 8 10:49:30 PDT 2017

Hi Daniel,

[+CC Mehdi, Vedant for the auto upgrade issue]

On Mon, May 8, 2017 at 7:54 AM, Daniel Neilson via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> **Method**
>
>  Clearly we are going to have to teach LLVM about unordered memory
> intrinsics. There are, as I can see it, a couple of ways to accomplish this.
> I’d like your opinions on which are preferable, or if you can think of any
> other options. In no particular order…
>
> Option 1)
>  Introduce a new unordered/element-atomic version of each of the memory
> intrinsics.
>   Ex: @llvm.memcpy_element_atomic — work was already started to introduce
> this one in D27133, but could be backed out and restarted.
>   Intrinsic prototype: @llvm.memcpy_element_atomic.<overload desc>(<ty>*
> dest, <ty>* src, <ty> len, i32 align, i2 isunordered, i16 element_size)
>     Semantics:
>        * Will do a memcpy of len bytes from src to dest.
>        * len must = k * lcm( #bytes in dest type, #bytes in src type), for
> some non-negative integer k [note: lcm = least-common multiple]
>        * load/store size given by the constant power-of-2 parameter
> “element_size”; expected to be the lcm(sizeof(dest_ty), sizeof(src_ty))

I'm not sure if sizeof(dest_ty) and sizeof(src_ty) adds anything here.

LLVM is moving towards "typeless pointers" (i.e. pointers will not
have pointee types, instead they will just be a "generic pointer" in
some address space), so working in the types of dest and src into the
specification seems awkward.

Also, does the non-overlap restriction of src and dest (as in the
regular llvm.memcpy) apply here as well?

>        * isunordered param: bit 0 == 1 => stores to dest must be marked
> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered'

What if the bits are zero -- will the stores / loads (depending on
which bit) be "ordered" in that case, or something stronger?

> Option 2)
>   Expand the current/existing memory intrinsics to identify the unordered
> constraint, if one exists, in much the same way that volatility is expressed
> — i.e. add an ‘isunordered’ parameter(s) to the intrinsic.
>   This option has the same semantics as option 1; the only difference is,
> literally, that we expand the existing memcpy/memset/memmove intrinsics to
> have an ‘isunordered’ parameter and an ‘element_size’ parameter, so the
> prototype becomes something like:
>    @llvm.memcpy.<overload desc>(<ty>* dest, <ty>* src, <ty> len, i32 align,
> i1 isvolatile, i2 isunordered, i16 element_size)
>
>  Pros:
>    * Minimal extra work to handle the new version in existing passes — only
> need to change passes that create calls to memory intrinsics, expand memory
> intrinsics, or that need to care about unordered (which none should that are
> reasoning about memory intrinsic semantics).
>    * New code that’s introduced by others to exploit/handle memory
> intrinsics should just handle unordered for free — unordered being a part of
> the memory intrinsic means it’s more likely that the person will realize
> that they have to think about it (i.e. it raises the profile of unordered
> memory intrinsics).

I like the second point, but (unfortunately) I suspect in practice
you'll see new code do:

  if (MCI->isOrdered())
    return false;  // be conservative

>  Cons:
>    * Breaks backward compatibility of the IR — is there a mechanism for
> migrating old IR into a new IR version when loading the IR into LLVM?

I think the migration here will be fairly straightforward -- you can
just auto-upgrade old calls to memcpy to pass in 0 for the isordered
argument.  But I've CC'd Mehdi and Vedant to help shed some light on
this.

-- Sanjoy