[PATCH] D79279: Add overloaded versions of builtin mem* functions

Wed Aug 5 11:42:50 PDT 2020

rsmith added a comment.

Two observations that are new to me in this review:

1. We already treat all builtins as being overloaded on address space.
2. The revised patch treats `__builtin_mem*_overloaded` as being overloaded *only* on address space, volatility, and atomicity. (We've tuned the design to a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) volatility, and (c) atomicity. (a) is already available in the non-`_overloaded` form, and we seem to be approaching agreement that (b) should be available in the non-`_overloaded` form too. So that only leaves (c), which is really not `_overloaded` but `_atomic`.

Based on those observations I'm now thinking that we might prefer a somewhat different approach (but one that should require only minimal changes to the patch in front of us). Specifically:

1. Stop treating lib builtins (eg, plain `memcpy`) as overloaded on address space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.
2. Keep `__builtin_` forms of lib builtins overloaded on address space. (No change.)
3. Also overload `__builtin_` forms of lib builtins on volatility where it makes sense, instead of adding new builtin names `__builtin_mem*_overloaded`.
4. Add a new name for the builtin for the atomic forms of `memcpy` and `memset` (`__builtin_memcpy_unordered_atomic` maybe?).
5. Convert the "trivial types" check from an error to a warning and apply it to all the mem* overloads. (Though I think we might actually already have such a check, so this might only require extending it to cover the atomic builtin.)

What do you think?

================
Comment at: clang/docs/LanguageExtensions.rst:2434
+* ``volatile``
+* ``restrict``
+* ``__unaligned``
----------------
Does `restrict` really make sense here? It seems like the only difference it could possibly make would be to treat `memcpy` as `memmove` if either operand is marked restrict, but
(a) if the caller wants that, they can just use `memcpy` directly, and
(b) it's not correct to propagate restrict-ness from the caller to the callee anyway, because restrict-ness is really a property of the declaration of the identifier in its scope, not a property of its type:
```
void f(void *restrict p) {
  __builtin_memmove(p, p + 1, 4);
}
```
(c) a restrict qualifier on the pointee type is irrelevant to memcpy and a restrict qualifier on the pointer type isn't part of QUAL.

================
Comment at: clang/docs/LanguageExtensions.rst:2435
+* ``restrict``
+* ``__unaligned``
+* non-default address spaces
----------------
I don't think ``__unaligned`` matters any more. We always take the actual alignment inferred from the pointer arguments, just like we do for non-overloaded `memcpy`.

================
Comment at: clang/docs/LanguageExtensions.rst:2455-2456
+
+The overloaded builtins expect both destination and source to be trivially
+copyable types.
+
----------------

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2639-2640
   case Builtin::BI__builtin_mempcpy: {
+    QualType DestTy = getPtrArgType(CGM, E, 0);
+    QualType SrcTy = getPtrArgType(CGM, E, 1);
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
----------------
jfb wrote:
> rsmith wrote:
> > jfb wrote:
> > > rsmith wrote:
> > > > Looking through implicit conversions in `getPtrArgType` here will change the code we generate for cases like:
> > > > 
> > > > ```
> > > > void f(volatile void *p, volatile void *q) {
> > > >   memcpy(p, q, 4);
> > > > }
> > > > ```
> > > > 
> > > > ... (in C, where we permit such implicit conversions) to use a volatile memcpy intrinsic. Is that an intentional change?
> > > I'm confused... what's the difference that this makes for the pre-existing builtins? My intent was to get the `QualType` unconditionally, but I can conditionalize it if needed... However this ought to make no difference:
> > > ```
> > > static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E,
> > >                               unsigned ArgNo) {
> > >   QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType();
> > >   if (ArgTy->isArrayType())
> > >     return CGM.getContext().getAsArrayType(ArgTy)->getElementType();
> > >   if (ArgTy->isObjCObjectPointerType())
> > >     return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
> > >   return ArgTy->castAs<clang::PointerType>()->getPointeeType();
> > > }
> > > ```
> > > and indeed I can't see the example you provided change in IR from one to the other. The issue I'm working around is that getting it unconditionally would make ObjC code sad when `id` is passed in as I outlined above.
> > The example I gave should produce a non-volatile memcpy, and used to do so (we passed `false` as the fourth parameter to `CreateMemCpy`). With this patch, `getPtrArgType`will strip off the implicit conversion from `volatile void*` to `void*` in the argument type, so `isVolatile` below will be `true`, so I think it will now create a volatile memcpy for the same testcase. If that's not what's happening, then I'd like to understand why not :)
> > 
> > I'm not saying that's necessarily a bad change, but it is a change, and it's one we should make intentionally if we make it at all.
> Oh yes, sorry I thought you were talking about something that `getPtrArgType` did implicitly! Indeed the C code behaves differently in that it doesn't just strip `volatile` anymore.
> 
> I'm not super thrilled by the default C behavior, and I think this new behavior removes a gotcha, and is in fact what I was going for in the first iteration of the patch. Now that I've separated the builtin I agree that it's a bit odd... but it seems like the right thing to do anyways? But it no longer matches the C library function to do so.
> 
> FWIW, this currently "works as you'd expect":
> ```
> void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
>     __builtin_memcpy(dst, src, sz);
> }
> ```
> https://godbolt.org/z/dcWxcK
> 
> and I think that's completely accidental because the C library function doesn't (and, as John pointed out earlier, the builtin is meant to act like the C function).
> FWIW, this currently "works as you'd expect":
>
> ```
> void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
>     __builtin_memcpy(dst, src, sz);
> }
> ```

The same is true even if you remove the `__builtin_` (and add a suitable include), and that seems like a bug to me. It looks like we have special logic that treats *all* builtins taking pointers as being overloaded on address space, which seems wrong at least for lib builtins. C TR 18037:2008 is quite explicit about this. Section 5.1.4 says:

"""
The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; the library's
functions and objects continue to be declared only with regard to the generic address space. One
consequence is that pointers into named address spaces cannot be passed as arguments to library
functions except in the special case that the named address spaces are subsets of the generic
address space.
"""

We could retain that special rule for `__builtin_`-spelled variants of lib builtins. If we do, then maybe we shouldn't be adding `__builtin_memcpy_overloaded` at all and should only extend the behavior of `__builtin_memcpy` to also propagate volatility (and add a new builtin for the atomic case).

Regarding volatile, consider:

```
void maybe_volatile_memcpy(volatile void *dst, const volatile void *src, int sz, _Bool is_volatile) {
  if (is_volatile) {
#ifdef __clang__
    __builtin_memcpy_overloaded(dst, src, sz);
#elif __GNUC__
    // ...
#else
    // volatile char copy loop
#endif
  }
  memcpy(dst, src, sz);
}
```

With this patch, the above code will always perform a volatile `memcpy`. I think that's a surprise. A call to `memcpy` should follow the C semantics, even if we choose to change the semantics of `__builtin_memcpy`.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:5732
+
+    if (!DstValTy.isTriviallyCopyableType(Context) && !DstValTy->isVoidType())
+      return ExprError(Diag(TheCall->getBeginLoc(),
----------------
You need to call `Sema::isCompleteType` first before asking this question, in order to trigger class instantiation when necessary in C++. (Likewise for the checks in the previous function.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79279/new/

https://reviews.llvm.org/D79279