[PATCH] D79279: Add overloaded versions of builtin mem* functions

Wed Aug 26 16:25:47 PDT 2020

jfb added a comment.

In D79279#2235085 <https://reviews.llvm.org/D79279#2235085>, @rsmith wrote:

> Thanks, I'm happy with this approach.
>
> If I understand correctly, the primary (perhaps sole) purpose of `__builtin_memcpy_sized` is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains `atomic` would be more obvious. As a concrete alternative, `__atomic_unordered_memcpy` is not much longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other use case (and the reason I assume Azul folks want it too) is when there's a GC that looks at objects. With this it knows it won't see torn objects when the GC is concurrent. It's similar, but generally `atomic` also implies an ordering, and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but `atomic` seem slightly wrong.

Follow-ups I suggest in comments:

- Make "must be trivially copyable" a warning throughout (not just here, but for atomics too).
- Implement that diagnostic for `mem*` functions.

================
Comment at: clang/docs/LanguageExtensions.rst:2403-2406
+Constant evaluation support is only provided when the source and destination are
+pointers to arrays with the same trivially copyable element type, and the given
+size is an exact multiple of the element size that is no greater than the number
+of elements accessible through the source and destination operands.
----------------
rsmith wrote:
> Is this true in general, or only for `[w]mem{cpy,move}`? I thought for the other cases, we required an array of `char` / `wchar_t`?
This is just moving documentation that was below. Phab is confused with the diff.

================
Comment at: clang/docs/LanguageExtensions.rst:2409
+Support for constant expression evaluation for the above builtins can be detected
+with ``__has_feature(cxx_constexpr_string_builtins)``.
+
----------------
rsmith wrote:
> I think this only applies to the above list minus the five functions you added to it. Given this and the previous comment, I'm not sure that merging the documentation on string builtins and memory builtins is working out well -- they seem to have more differences than similarities.
> 
> (`memset` is an outlier here that should be called out -- we don't seem to provide any constant evaluation support for it whatsoever.)
[w]memset are indeed the odd ones, update says so.

================
Comment at: clang/lib/AST/ExprConstant.cpp:8870
+        return false;
+      if (N.urem(ElSz.getLimitedValue()) != 0) {
+        Info.FFDiag(E, diag::note_constexpr_mem_sized_bad_size)
----------------
rsmith wrote:
> `getLimitedValue()` here seems unnecessary; `urem` can take an `APInt`.
Their bitwidth doesn't always match, and that asserts out.

================
Comment at: clang/lib/AST/ExprConstant.cpp:8872
+        Info.FFDiag(E, diag::note_constexpr_mem_sized_bad_size)
+            << (int)N.getLimitedValue() << (int)ElSz.getLimitedValue();
+        return false;
----------------
rsmith wrote:
> Consider using `toString` instead of truncating each `APSInt` to `uint64_t` then to `int`. The size might reliably fit into `uint64_t`, but I don't think we can assume that `int` is large enough.
OK I updated 2 other places as well.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:635
+    return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
+  return ArgTy->castAs<clang::PointerType>()->getPointeeType();
+}
----------------
rsmith wrote:
> I'm not sure this `castAs` is necessarily correct. If the operand is C++11 `nullptr`, we could perform a null-to-pointer implicit conversion, and `ArgTy` could be `NullPtrTy` after stripping that back off here.
> 
> It seems like maybe what we want to do is strip off implicit conversions until we hit a non-pointer type, and take the pointee type we found immediately before that?
Ah good catch! The new functions I'm adding just disallow nullptr, but the older ones allow it. I've modified the code accordingly and added a test in CodeGen for nullptr.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2639-2640
   case Builtin::BI__builtin_mempcpy: {
+    QualType DestTy = getPtrArgType(CGM, E, 0);
+    QualType SrcTy = getPtrArgType(CGM, E, 1);
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
----------------
rsmith wrote:
> jfb wrote:
> > rsmith wrote:
> > > jfb wrote:
> > > > rsmith wrote:
> > > > > Looking through implicit conversions in `getPtrArgType` here will change the code we generate for cases like:
> > > > > 
> > > > > ```
> > > > > void f(volatile void *p, volatile void *q) {
> > > > >   memcpy(p, q, 4);
> > > > > }
> > > > > ```
> > > > > 
> > > > > ... (in C, where we permit such implicit conversions) to use a volatile memcpy intrinsic. Is that an intentional change?
> > > > I'm confused... what's the difference that this makes for the pre-existing builtins? My intent was to get the `QualType` unconditionally, but I can conditionalize it if needed... However this ought to make no difference:
> > > > ```
> > > > static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E,
> > > >                               unsigned ArgNo) {
> > > >   QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType();
> > > >   if (ArgTy->isArrayType())
> > > >     return CGM.getContext().getAsArrayType(ArgTy)->getElementType();
> > > >   if (ArgTy->isObjCObjectPointerType())
> > > >     return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
> > > >   return ArgTy->castAs<clang::PointerType>()->getPointeeType();
> > > > }
> > > > ```
> > > > and indeed I can't see the example you provided change in IR from one to the other. The issue I'm working around is that getting it unconditionally would make ObjC code sad when `id` is passed in as I outlined above.
> > > The example I gave should produce a non-volatile memcpy, and used to do so (we passed `false` as the fourth parameter to `CreateMemCpy`). With this patch, `getPtrArgType`will strip off the implicit conversion from `volatile void*` to `void*` in the argument type, so `isVolatile` below will be `true`, so I think it will now create a volatile memcpy for the same testcase. If that's not what's happening, then I'd like to understand why not :)
> > > 
> > > I'm not saying that's necessarily a bad change, but it is a change, and it's one we should make intentionally if we make it at all.
> > Oh yes, sorry I thought you were talking about something that `getPtrArgType` did implicitly! Indeed the C code behaves differently in that it doesn't just strip `volatile` anymore.
> > 
> > I'm not super thrilled by the default C behavior, and I think this new behavior removes a gotcha, and is in fact what I was going for in the first iteration of the patch. Now that I've separated the builtin I agree that it's a bit odd... but it seems like the right thing to do anyways? But it no longer matches the C library function to do so.
> > 
> > FWIW, this currently "works as you'd expect":
> > ```
> > void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
> >     __builtin_memcpy(dst, src, sz);
> > }
> > ```
> > https://godbolt.org/z/dcWxcK
> > 
> > and I think that's completely accidental because the C library function doesn't (and, as John pointed out earlier, the builtin is meant to act like the C function).
> > FWIW, this currently "works as you'd expect":
> >
> > ```
> > void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
> >     __builtin_memcpy(dst, src, sz);
> > }
> > ```
> 
> The same is true even if you remove the `__builtin_` (and add a suitable include), and that seems like a bug to me. It looks like we have special logic that treats *all* builtins taking pointers as being overloaded on address space, which seems wrong at least for lib builtins. C TR 18037:2008 is quite explicit about this. Section 5.1.4 says:
> 
> """
> The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; the library's
> functions and objects continue to be declared only with regard to the generic address space. One
> consequence is that pointers into named address spaces cannot be passed as arguments to library
> functions except in the special case that the named address spaces are subsets of the generic
> address space.
> """
> 
> We could retain that special rule for `__builtin_`-spelled variants of lib builtins. If we do, then maybe we shouldn't be adding `__builtin_memcpy_overloaded` at all and should only extend the behavior of `__builtin_memcpy` to also propagate volatility (and add a new builtin for the atomic case).
> 
> Regarding volatile, consider:
> 
> ```
> void maybe_volatile_memcpy(volatile void *dst, const volatile void *src, int sz, _Bool is_volatile) {
>   if (is_volatile) {
> #ifdef __clang__
>     __builtin_memcpy_overloaded(dst, src, sz);
> #elif __GNUC__
>     // ...
> #else
>     // volatile char copy loop
> #endif
>   }
>   memcpy(dst, src, sz);
> }
> ```
> 
> With this patch, the above code will always perform a volatile `memcpy`. I think that's a surprise. A call to `memcpy` should follow the C semantics, even if we choose to change the semantics of `__builtin_memcpy`.
Yes, I believe that this is a pre-existing inconsistency with the non-`__builtin_` variants.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:5609
+    return ExprError(Diag(TheCall->getBeginLoc(),
+                          PDiag(diag::err_atomic_op_needs_trivial_copy))
+                     << DstValTy << DstOp->getSourceRange());
----------------
rsmith wrote:
> Generally, I'm a little uncomfortable about producing an error if a type is complete but allowing the construct if the type is incomplete -- that seems like a situation where a warning would be more appropriate to me. It's surprising and largely unprecedented that providing more information about a type would change the program from valid to invalid.
> 
> Do we really need the protection of an error here rather than an enabled-by-default warning? Moreover, don't we already have a warning for `memcpy` of a non-trivially-copyable object somewhere? If not, then I think we should add such a thing that also applies to the real `memcpy`, rather than only warning on the builtin.
That rationale makes sense, but it's pre-existing behavior for atomic. I can change all of them in a follow-up if that's OK?

We don't have such a check for other builtins. I can do a second follow-up to then adopt these warnings for them too?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79279/new/

https://reviews.llvm.org/D79279