[llvm-dev] RFC: Element-atomic memory intrinsics
Sanjoy Das via llvm-dev
llvm-dev at lists.llvm.org
Mon May 8 10:49:30 PDT 2017
Hi Daniel,
[+CC Mehdi, Vedant for the auto upgrade issue]
On Mon, May 8, 2017 at 7:54 AM, Daniel Neilson via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> **Method**
>
> Clearly we are going to have to teach LLVM about unordered memory
> intrinsics. There are, as I can see it, a couple of ways to accomplish this.
> I’d like your opinions on which are preferable, or if you can think of any
> other options. In no particular order…
>
> Option 1)
> Introduce a new unordered/element-atomic version of each of the memory
> intrinsics.
> Ex: @llvm.memcpy_element_atomic — work was already started to introduce
> this one in D27133, but could be backed out and restarted.
> Intrinsic prototype: @llvm.memcpy_element_atomic.<overload desc>(<ty>*
> dest, <ty>* src, <ty> len, i32 align, i2 isunordered, i16 element_size)
> Semantics:
> * Will do a memcpy of len bytes from src to dest.
> * len must = k * lcm( #bytes in dest type, #bytes in src type), for
> some non-negative integer k [note: lcm = least-common multiple]
> * load/store size given by the constant power-of-2 parameter
> “element_size”; expected to be the lcm(sizeof(dest_ty), sizeof(src_ty))
I'm not sure if sizeof(dest_ty) and sizeof(src_ty) adds anything here.
LLVM is moving towards "typeless pointers" (i.e. pointers will not
have pointee types, instead they will just be a "generic pointer" in
some address space), so working in the types of dest and src into the
specification seems awkward.
Also, does the non-overlap restriction of src and dest (as in the
regular llvm.memcpy) apply here as well?
> * isunordered param: bit 0 == 1 => stores to dest must be marked
> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered'
What if the bits are zero -- will the stores / loads (depending on
which bit) be "ordered" in that case, or something stronger?
> Option 2)
> Expand the current/existing memory intrinsics to identify the unordered
> constraint, if one exists, in much the same way that volatility is expressed
> — i.e. add an ‘isunordered’ parameter(s) to the intrinsic.
> This option has the same semantics as option 1; the only difference is,
> literally, that we expand the existing memcpy/memset/memmove intrinsics to
> have an ‘isunordered’ parameter and an ‘element_size’ parameter, so the
> prototype becomes something like:
> @llvm.memcpy.<overload desc>(<ty>* dest, <ty>* src, <ty> len, i32 align,
> i1 isvolatile, i2 isunordered, i16 element_size)
>
> Pros:
> * Minimal extra work to handle the new version in existing passes — only
> need to change passes that create calls to memory intrinsics, expand memory
> intrinsics, or that need to care about unordered (which none should that are
> reasoning about memory intrinsic semantics).
> * New code that’s introduced by others to exploit/handle memory
> intrinsics should just handle unordered for free — unordered being a part of
> the memory intrinsic means it’s more likely that the person will realize
> that they have to think about it (i.e. it raises the profile of unordered
> memory intrinsics).
I like the second point, but (unfortunately) I suspect in practice
you'll see new code do:
if (MCI->isOrdered())
return false; // be conservative
> Cons:
> * Breaks backward compatibility of the IR — is there a mechanism for
> migrating old IR into a new IR version when loading the IR into LLVM?
I think the migration here will be fairly straightforward -- you can
just auto-upgrade old calls to memcpy to pass in 0 for the isordered
argument. But I've CC'd Mehdi and Vedant to help shed some light on
this.
-- Sanjoy
More information about the llvm-dev
mailing list