[cfe-dev] Can indirect class parameters be noalias?
Richard Smith via cfe-dev
cfe-dev at lists.llvm.org
Wed Jul 29 14:56:32 PDT 2020
On Wed, 29 Jul 2020 at 14:42, Richard Smith <richard at metafoo.co.uk> wrote:
> On Wed, 29 Jul 2020 at 12:52, John McCall <rjmccall at apple.com> wrote:
>
>> Clang IRGen currently doesn’t mark indirect parameters as noalias.
>> Considerations:
>>
>> -
>>
>> A lot of targets don’t pass struct arguments indirectly outside of
>> C++, but some do, notably AArch64.
>> -
>>
>> In a pure C world, we would always be able to mark such parameters
>> noalias, because arguments are r-values and there’s no way to have a
>> pointer to an r-value.
>> -
>>
>> ObjC __weak references can have pointers to them from the ObjC
>> runtime. You can’t pass a weak reference immediately as an argument because
>> __weak is a qualifier and qualifiers are ignored in calls, but you
>> can put one in a struct and pass that, and that struct has to be passed
>> indirectly. Arguably such a parameter cannot be noalias because of
>> the pointer from the runtime, but then again, ObjC code isn’t allowed to
>> directly access the weak reference (it has to call the runtime), which
>> means that no accesses that LLVM can actually see violate the noalias
>> restriction.
>> -
>>
>> C++ parameters of non-trivially-copyable class type cannot be marked
>> noalias: it is absolutely permitted to escape a pointer to this
>> within a constructor and to replace that pointer whenever the object is
>> moved. This is both well-defined and sometimes useful.
>> -
>>
>> It’s actually possible to escape a pointer to *any* C++ object within
>> its constructor, and that pointer remains valid for the duration of the
>> object’s lifetime. And you can do this with NRVO, too, so you don’t even
>> need to have a type with non-trivial constructors, as long as the object
>> isn’t copied. Note that this even messes up the C case, which is really
>> unfortunate: arguably we need to pessimize C code because of the
>> possibility it might interoperate with C++.
>> -
>>
>> But I think there’s an escape hatch here. C++ has a rule which is
>> intended to give implementation extra leeway with passing and returning
>> trivial types, e.g. to pass them in registers. This rule is C++
>> [class.temporary]p3, which says that implementations can create an extra
>> temporary object to pass an object of type X as long as “each copy
>> constructor, move constructor, and destructor of X is either trivial or
>> deleted, and X has at least one non-deleted copy or move constructor”. This
>> object is created by (trivially) copy/move-initializing from the
>> argument/return object. Arguably we can consider any type that satisfies
>> this condition to be *formally* copied into a new object as part of
>> passing or returning it. We don’t need to *actually* do the copy, I
>> think, we just need to consider a copy to have been done in order to
>> formally disrupt any existing pointers to the object. (Although arguably
>> you aren’t allowed to copy an object into a new object at the original
>> object’s current address; it would be an unfortunate consequence of this
>> wording if we had to either forgo optimization or do an unnecessary copy
>> here.)
>>
>> Thoughts?
>>
> From a high level: I think the C++ language semantics *should* permit us
> to assume that objects passed by value to functions, and objects returned
> by value from functions (in which category I include *this in a
> constructor), are noalias.
>
... specifically in the case where they're trivially copyable and the
implementation was permitted to make a copy. In the case of non-trivial
copy operations, I think we probably should be forced to assume that the
address of the object may have escaped.
> I think concretely, the escape hatch doesn't stop things from going wrong,
> because -- as you note -- even though we *could* have made a copy, it's
> observable whether or not we *did* make a copy. For example:
>
> #include <stdio.h>
>
> struct A {
> A(A **where) : data{"hello world"} { *where = this; }
> char data[65536];
> };
> A *p;
>
> [[gnu::noinline]]
> void f(A a) {
> for (int i = 0; i != sizeof(A::data) - 2; ++i)
> p->data[i+1] = a.data[i];
> puts(a.data);
> }
>
> // elsewhere, perhaps compiled by a smarter compiler that doesn't make a
> copy here
> int main() { f({&p}); }
>
> I think it's valid for this program to print "hello world" or for it to
> print "hhhhhhhhhhhhh...", but it's not valid to (eg) turn the copy loop
> into a memcpy with undefined behavior.
>
> As it happens, we do actually make a redundant copy here when performing
> the call to `f`, which seems wasteful. And so do GCC and ICC, which means
> the 'noalias' would actually be correct here considering only the behavior
> of those compilers. So in principle we could address this in the ABI by
> saying that the copy is mandatory. But I don't think we should -- I think
> the above code should have undefined behavior because it accesses a
> function parameter through an access path not derived from the name of the
> function parameter.
>
> We do have some wording in the standard that tries to give aliasing
> guarantees in some of these cases, but does so in a way that's not really
> useful. Specifically, [class.cdtor]p2: "During the construction of an
> object, if the value of the object or any of its subobjects is accessed
> through a glvalue that is not obtained, directly or indirectly, from the
> constructor’s this pointer, the value of the object or subobject thus
> obtained is unspecified." (I mean, thanks for trying, but that's not all
> the cases, and "the value is unspecified" is not enough permission.)
>
> Maybe we could mark such cases as 'noalias', behind a known-non-conforming
> flag. The question would then be whether we enable it by default or not.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200729/023a0987/attachment.html>
More information about the cfe-dev
mailing list