[cfe-dev] Optimizing returning a struct instance larger than the quadword

John McCall via cfe-dev cfe-dev at lists.llvm.org
Wed Feb 21 20:49:20 PST 2018



> On Jan 23, 2018, at 6:10 AM, Denis Sukhonin via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Hi list,
> 
> I am doing a research on my own trying to understand the best way to
> report an error from a function.
> My environment:
> * exceptions are disabled with -fno-exceptions,
> * x86_64 Clang 5.0.1,
> * C++17, and
> * macOS 10.12.
> 
> Suppose, there is a function which can fail: `FailingFn’. As a simple
> and quite common solution, I could give it this signature: `bool
> FaillingFn(Args…, Error &err)’. This way I have the bool variable
> returned via a register, and err object allocated on the stack. So, I
> can avoid accessing memory if return bool is true (meaning success).
> 
> However, since we have got C++17 with copy elision and structure
> binding I want to simplify the signature to `std::tuple<bool, Error>
> FaillingFn(Args…)’, or even `RetValue FaillingFn(Args…)’. Then I can
> handle errors this way
> 
> if (auto [ok, err] = FailingFn(args…); !ok)
>   // Handle error or, perhaps, just return it.
>   return err;
> 
> Looks more expressive. With a few changes, we can make the `err’
> object a variant and carry a result of the successful evaluation.
> Though for the sake of simplicity, let’s assume it carries error
> information, e.g., a `std::string’ which is obviously larger than a
> 64bit register.
> I am hoping gcc recognizes the case of copy elision, construct the
> error object in the caller, and pass it as a reference.

Returning the components of a tuple separately is a good idea in the abstract;
I know for a fact that Swift does it with its native tuple types.  In C++, it would
have to be done by special-casing std::pair / std::tuple in the ABI, which would
technically be an ABI break for those types, but in principle it could be done,
perhaps as an opt-in operation.

Unfortunately, C++'s rules around object lifetime would limit this optimization
so much as to make it completely ineffective.  The return of 'err' in your example
is not a legal opportunity for copy elision under the standard, and so a temporary
must be introduced.  On the flip side, the standard requires temporary materialization
to be delayed so as to minimize copies, which means that it would not be legal
to break apart an existing tuple temporary in order to return its components separately.

Furthermore, it is likely that this optimization would allow the user to observe
an inconsistent ordering between pointers to similar components of different tuples,
which I believe would also violate the standard.

Also, this:

> I am hoping gcc recognizes the case of copy elision, construct the
> error object in the caller, and pass it as a reference.

This is not how objects are returned in C++; instead, the caller passes the callee
a pointer to uninitialized memory, and the callee constructs the return value into
that memory.

Anyway, if you wanted to pursue this as a non-conforming optimization, I think that
would be an interesting project, but I think you will find that Clang is not currently
well engineered for this kind of high-level value-propagation optimization.

John.

> Also, having
> the returned tuple broken into two independent variables would permit
> the compiler to use registers for them, at least for the first one
> which fits a register (I do not care about the second one until it
> carries a payload though.)
> 
> Here below a sample. Assume we have a structure and some functions
> that may fail:
> 
> template <typename T>
> struct Pair {
>     bool ok;
>     T    value;
> };
> 
> Pair<std::int64_t> FuncInt(bool const toFail) {
>     return {!toFail, 42};
> }
> 
> Pair<std::string> FuncString(bool const toFail) {
>     return {!toFail, "DEADBEEF"};
> }
> 
> auto UseInt(bool const flag) {
>     if (auto const [ok, value] = FuncInt(flag); ok) {
>         return value;
>     }
>     return -1L;
> }
> 
> auto UseString(bool const flag) {
>     if (auto const [ok, value] = FuncString(flag); ok) {
>         return value;
>     }
>     return std::string{"DEADFA11"};
> }
> 
> The optimization works with `FuncInt’: the ok and value have got to
> `eax` and `edx’. The check code compiles into:
>   call FuncInt(bool)
>   and al, 1
> 
> But, in case of `FuncString’, it is not happening. Here is what I get:
>   call FuncString[abi:cxx11](bool)
>   cmp byte ptr [rsp], 0
> 
> which obviously compares against memory. My intention is to avoid this
> redundant read from memory and use a register instead like in
> `FuncInt’.
> 
> Is there a way to tell clang that instances of Pair should (or can at
> least) be broken into two separate variables and returned with the
> most efficient way?
> 
> I believe this optimization won't work perfectly with current ABI,
> though the compiler should not limit itself to the spec if a call is
> happening to a non-exposed function or `-flto’ is used.
> 
> Here is a sample with assembly: https://godbolt.org/g/EKUqaF
> 
> In other words, I want `Pair<std::string> FuncString(bool const)’ to
> behave like `bool FuncString(bool const, std::string &value)’
> utilizing expressiveness of C++17 including "Structured binding."
> 
> As I understand, this problem is very similar to "Scalar replacement
> of aggregates," but playing around with optimizer's options didn't
> give me any positive outcome or insight.
> 
> I don't have any specific requirements regarding the target OS and
> architecture beside it is x86. If it is possible to get it done in a
> generic way: I’m happy to know; if it works only in the very specific
> environment: still glad to know. Perhaps, I can try to implement this
> optimization with your help if it looks interesting.
> 
> --
> Best regards,
> Denis Sukhonin
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list