[cfe-dev] About NRVO (named return value optimization)

Arnaud Allard de Grandmaison arnaud.adegm at gmail.com
Wed Jun 25 04:55:55 PDT 2014


This specific testcase shows 2 problems:
 - unamed variables are not handled the same way than named variables
 - if using a named temporary instead; the object size is the second problem

Cheers,
--
Arnaud


On Wed, Jun 25, 2014 at 12:58 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "David Blaikie" <dblaikie at gmail.com>
> > To: "Jiangning Liu" <liujiangning1 at gmail.com>
> > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>
> > Sent: Wednesday, June 25, 2014 1:14:16 AM
> > Subject: Re: [cfe-dev] About NRVO (named return value optimization)
> >
> > This isn't related to NRVO - as the name suggests, NRVO is about
> > named
> > return values. The example you gave has no return values and no named
> > values.
> >
> > The optimization necessary here is stack reuse, which classically
> > LLVM/Clang haven't done a great job on. I'm not sure of the precise
> > details of the current state, but there have been some efforts to
> > make
> > it better.
>
> There is now an enabled-by-default stack coloring optimization (and has
> been for a while now); but Jiangning's comment that our lack of
> interprocedural alias analysis might be defeating it here is plausible.
>
> >
> > One part of that is the lifetime intrinsics (
> > http://llvm.org/docs/LangRef.html#memory-use-markers ) which would
> > allow the backend to know that the stack memory used by the first
> > temporary is dead before the first use of the stack memory for the
> > second temporary, and thus reuse the stack. I don't know what the
> > current state of the lifetime markers is (I guess we don't turn them
> > on by default? not sure whether they're brokne/inefficient/slow/not
> > valuable enough yet) and whether they're a viable way forward, but
> > someone thought so at some point.
>
> As I recall, clang does generate lifetime markers by default (at least in
> some circumstances), they now work well, and this does seem like a good use
> case for them. I recommend investigating why this is not happening here.
> One thing to look at is in CodeGen/CGDecl.cpp:
>
> /// Should we use the LLVM lifetime intrinsics for the given local
> variable?
> static bool shouldUseLifetimeMarkers(CodeGenFunction &CGF, const VarDecl
> &D,
>                                      unsigned Size) {
>   // For now, only in optimized builds.
>   if (CGF.CGM.getCodeGenOpts().OptimizationLevel == 0)
>     return false;
>
>   // Limit the size of marked objects to 32 bytes. We don't want to
> increase
>   // compile time by marking tiny objects.
>   unsigned SizeThreshold = 32;
>
>   return Size > SizeThreshold;
> }
>
> Maybe the problem is that sizeof(X) < 32? If so, further testing of this
> limit's impact on compile time might be worthy of investigation.
>
>  -Hal
>
> >
> > - David
> >
> > On Tue, Jun 24, 2014 at 10:57 PM, Jiangning Liu
> > <liujiangning1 at gmail.com> wrote:
> > > Hi,
> > >
> > > For the following small test case,
> > >
> > > // RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -O1 -o -
> > > %s |
> > > FileCheck %s
> > >
> > > // Test code generation for the named return value optimization.
> > > class X {
> > > public:
> > >   X();
> > > };
> > >
> > > void f(const X& x);
> > > void test10(bool b) {
> > >   f(X());
> > >   f(X());
> > > }
> > >
> > > we are generating the following LLVM IR with "
> > >
> > > %class.X = type { i8 }
> > >
> > > ; Function Attrs: nounwind
> > > define void @_Z6test10b(i1 zeroext %b) #0 {
> > > entry:
> > >   %ref.tmp = alloca %class.X, align 1
> > >   %ref.tmp1 = alloca %class.X, align 1
> > >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> > >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> > >   call void @_ZN1XC1Ev(%class.X* %ref.tmp1) #2
> > >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp1) #2
> > >   ret void
> > > }
> > >
> > > declare void @_Z1fRK1X(%class.X* nonnull) #1
> > > declare void @_ZN1XC1Ev(%class.X*) #1
> > >
> > > So my questions is should NRVO be able to know ref.tmp and ref.tmp1
> > > can be
> > > merged to be a single one? That is, I'm expecting the following
> > > LLVM IR code
> > > to be generated,
> > >
> > > define void @_Z6test10b(i1 zeroext %b) #0 {
> > > entry:
> > >   %ref.tmp = alloca %class.X, align 1
> > >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> > >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> > >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> > >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> > >   ret void
> > > }
> > >
> > > If we leave both ref.tmp and ref.tmp1 to LLVM IR, it seems to be
> > > hard for
> > > middle-end to combine them unless we demangle the function name
> > > _ZN1XC1Ev to
> > > know it is a C++ constructor and do more alias analysis.
> > >
> > > Any idea?
> > >
> > > Thanks,
> > > -Jiangning
> > >
> > > _______________________________________________
> > > cfe-dev mailing list
> > > cfe-dev at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> > >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140625/3ac1f6f0/attachment.html>


More information about the cfe-dev mailing list