[PATCH] Initial instrumentation based PGO implementation

Justin Bogner mail at justinbogner.com
Tue Dec 17 00:58:32 PST 2013


New instrumentation patch. I won't bother re-uploading the other two
(unchanged) patches for now.

Eric Christopher <echristo at gmail.com> writes:
> What's the compile time impact of these patches when the option is and
> isn't being used? It's looking a bit heavyweight and very invasive.

Very unscientific measurements on SPEC2006 indicate that it's pretty
reasonable. The difference between r197442 (arbitrarily) with and
without this patch is in the noise (less than 1% difference), when we
aren't using instr-gen or instr-use, and with instr-use it's still
probably noise (less than 4%).

The instr-generate compile time is uglier, in that there were some
compile times that were on the order of 5-20% worse. Since this is only if
you actually use the feature I figure it's safe to look into whether we
can improve this in more detail after check in.

> +  uint64_t LoopCount = Cnt.getCount();
> +  uint64_t ExitCount =
> +    Cnt.getParentCount() + Cnt.getContinueCounter().getCount() +
> +    Cnt.getAdjustedCount() - Cnt.getCount();
> +  CondBr->setMetadata(llvm::LLVMContext::MD_prof,
> +                      PGO.createBranchWeights(LoopCount, ExitCount));
> +
>    EmitBranch(CondBlock);
>
> You appear to cut and paste this a few times.

Yeah, that's a bit sloppy. I've pushed the exit count calculation
intoRegionCounter.getLoopExitCount instead of repeating it all over the
place. This also gives a nice place to explain the calculation.

>    // If the body of the case is just a 'break', and if there was no
> fallthrough,
>    // try to not emit an empty block.
> -  if ((CGM.getCodeGenOpts().OptimizationLevel > 0) &&
> +  if (!CGM.getCodeGenOpts().ProfileInstrGenerate &&
> +      CGM.getCodeGenOpts().OptimizationLevel > 0 &&
>
> You should probably update comments in cases like this (there are more of them).

I've updated this comment. It seemed kind of out of date anyway, since
we elide the body even if there is fallthrough, and it didn't mention
the -O0 thing, so it should be better now.

I couldn't find any other cases where I've changed logic and broke a
comment. There are places where I do things like adding a beginRegion
inside an if block that's commented, but the comments are still accurate
as far as I can tell. Let me know if you saw something I've missed.

> +    // The default on our switch instruction will come to this block,
> so we need
> +    // to update its count.
> +    (*SwitchWeights)[0] += ThisCount;
>
> SwitchWeights is just a std::vector or SmallVector right? This seems a
> little odd.

SwitchWeights is a std::vector *, which makes for a bit of an awkward
dereference here. I've updated the comment to better explain why we're
interested in the first element, anyway.

> +      EmitBranchOnBoolExpr(CondBOp->getRHS(), TrueBlock, FalseBlock,
> +                           TrueCount - FalseCount);
>
> This sort of thing needs some documentation/comments in the code. It's
> not clear what's going on from looking at the patch.

You're right. There were a few places in EmitBranchOnBool where things
got a bit hairy. I've added some comments and named variables to try to
make this a bit clearer.

> +    if (TrueCount != 0) {
>
> if (TrueCount) ?

Sure. I had some similar "== 0" checks in CodeGenPGO::createBranchWeights,
so I've updated those as well.

> +  struct MapRegionCounters :
> +    public ConstStmtVisitor<MapRegionCounters> {
> +    unsigned NextCounter;
> +    llvm::DenseMap<const Stmt*, unsigned> *CounterMap;
> +    MapRegionCounters(llvm::DenseMap<const Stmt*, unsigned> *CounterMap) :
> +      NextCounter(0), CounterMap(CounterMap) {
> +    }
> +    void VisitChildren(const Stmt *S) {
> +      for (Stmt::const_child_range I = S->children(); I; ++I)
> +        if (*I)
> +         this->Visit(*I);
> +    }
>
> Document this whole anonymous namespace please.

Good point! This seems like a good place to explain what all these
counters are attached to, and why loops have three counters, so I've
explained what the counters are for in doc comments on the Visit*
methods.

> +// RUN: %clangxx %s -o %t1 -fprofile-instr-generate
> +// RUN: mkdir -p %tdir && cd %tdir && %t1
>
> This is going to fail on any platform that doesn't have native codegen
> enabled by default.

Hmm. Is there a REQUIRES or something I can use to conditionally enable
them then? I'm guessing if we don't have native codegen I can't do
something tricky like generate IR and run it with lli or something
either.

> There's probably more review, just getting something started here.

Keep it coming! Thanks for taking a look.

-- Justin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-CodeGen-Initial-instrumentation-based-PGO-implementa.patch
Type: text/x-patch
Size: 94437 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131217/4215f30b/attachment.bin>


More information about the cfe-commits mailing list