[cfe-dev] About NRVO (named return value optimization)

Hal Finkel hfinkel at anl.gov
Wed Jun 25 03:58:18 PDT 2014


----- Original Message -----
> From: "David Blaikie" <dblaikie at gmail.com>
> To: "Jiangning Liu" <liujiangning1 at gmail.com>
> Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>
> Sent: Wednesday, June 25, 2014 1:14:16 AM
> Subject: Re: [cfe-dev] About NRVO (named return value optimization)
> 
> This isn't related to NRVO - as the name suggests, NRVO is about
> named
> return values. The example you gave has no return values and no named
> values.
> 
> The optimization necessary here is stack reuse, which classically
> LLVM/Clang haven't done a great job on. I'm not sure of the precise
> details of the current state, but there have been some efforts to
> make
> it better.

There is now an enabled-by-default stack coloring optimization (and has been for a while now); but Jiangning's comment that our lack of interprocedural alias analysis might be defeating it here is plausible.

> 
> One part of that is the lifetime intrinsics (
> http://llvm.org/docs/LangRef.html#memory-use-markers ) which would
> allow the backend to know that the stack memory used by the first
> temporary is dead before the first use of the stack memory for the
> second temporary, and thus reuse the stack. I don't know what the
> current state of the lifetime markers is (I guess we don't turn them
> on by default? not sure whether they're brokne/inefficient/slow/not
> valuable enough yet) and whether they're a viable way forward, but
> someone thought so at some point.

As I recall, clang does generate lifetime markers by default (at least in some circumstances), they now work well, and this does seem like a good use case for them. I recommend investigating why this is not happening here. One thing to look at is in CodeGen/CGDecl.cpp:

/// Should we use the LLVM lifetime intrinsics for the given local variable?
static bool shouldUseLifetimeMarkers(CodeGenFunction &CGF, const VarDecl &D,
                                     unsigned Size) {
  // For now, only in optimized builds.
  if (CGF.CGM.getCodeGenOpts().OptimizationLevel == 0)
    return false;

  // Limit the size of marked objects to 32 bytes. We don't want to increase
  // compile time by marking tiny objects.
  unsigned SizeThreshold = 32;

  return Size > SizeThreshold;
}

Maybe the problem is that sizeof(X) < 32? If so, further testing of this limit's impact on compile time might be worthy of investigation.

 -Hal

> 
> - David
> 
> On Tue, Jun 24, 2014 at 10:57 PM, Jiangning Liu
> <liujiangning1 at gmail.com> wrote:
> > Hi,
> >
> > For the following small test case,
> >
> > // RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -O1 -o -
> > %s |
> > FileCheck %s
> >
> > // Test code generation for the named return value optimization.
> > class X {
> > public:
> >   X();
> > };
> >
> > void f(const X& x);
> > void test10(bool b) {
> >   f(X());
> >   f(X());
> > }
> >
> > we are generating the following LLVM IR with "
> >
> > %class.X = type { i8 }
> >
> > ; Function Attrs: nounwind
> > define void @_Z6test10b(i1 zeroext %b) #0 {
> > entry:
> >   %ref.tmp = alloca %class.X, align 1
> >   %ref.tmp1 = alloca %class.X, align 1
> >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> >   call void @_ZN1XC1Ev(%class.X* %ref.tmp1) #2
> >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp1) #2
> >   ret void
> > }
> >
> > declare void @_Z1fRK1X(%class.X* nonnull) #1
> > declare void @_ZN1XC1Ev(%class.X*) #1
> >
> > So my questions is should NRVO be able to know ref.tmp and ref.tmp1
> > can be
> > merged to be a single one? That is, I'm expecting the following
> > LLVM IR code
> > to be generated,
> >
> > define void @_Z6test10b(i1 zeroext %b) #0 {
> > entry:
> >   %ref.tmp = alloca %class.X, align 1
> >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> >   call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
> >   call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
> >   ret void
> > }
> >
> > If we leave both ref.tmp and ref.tmp1 to LLVM IR, it seems to be
> > hard for
> > middle-end to combine them unless we demangle the function name
> > _ZN1XC1Ev to
> > know it is a C++ constructor and do more alias analysis.
> >
> > Any idea?
> >
> > Thanks,
> > -Jiangning
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list