[PATCH] D22778: Add Loop Sink pass to reverse the LICM based of basic block frequency.

Chandler Carruth via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 3 20:25:12 PDT 2016


chandlerc added a comment.

First batch of comments. While I'll probably have a some more minor comments later, there are a couple of particularly interesting ones that I wanted to go ahead and send out.



> LoopUtils.h:472-474
> +/// If SafetyInfo is nullptr, we are checking for sinking instructions from
> +/// preheader to loop body (no speculation).
> +/// If SafetyInfo is not nullptr, we are checking for hoisting/sinking

Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".

> LoopSink.cpp:48
> +
> +static bool SinkLoop(Loop *L, AliasAnalysis *AA, LoopInfo *LI,
> +                     DominatorTree *DT, BlockFrequencyInfo *BFI,

I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration.

> LoopSink.cpp:101
> +///     AdjustedFreq(BBs) = 99 / SinkFrequencyPercentThreshold%
> +static BlockFrequency AdjustedSumFreq(SmallPtrSet<BasicBlock *, 2> &BBs,
> +                                      BlockFrequencyInfo *BFI) {

Naming convention: adjustedSumFreq

> LoopSink.cpp:129
> +static SmallPtrSet<BasicBlock *, 2>
> +FindBBsToSinkInto(const Loop *L, const SmallPtrSet<BasicBlock *, 2> &UseBBs,
> +                  DominatorTree *DT, BlockFrequencyInfo *BFI) {

naming: findBBsToSinkInto

> LoopSink.cpp:180-181
> +
> +/// SinkLoop - Sink instructions from loop's preheader to the loop body if the
> +/// sum frequency of inserted copy is smaller than preheader's frequency.
> +bool SinkLoop(Loop *L, AliasAnalysis *AA, LoopInfo *LI, DominatorTree *DT,

This doxygen is still in the old form. Also, this should be 'sinkLoop'.

Use `AARseluts` here rather than the old name?

Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact.

> LoopSink.cpp:10-14
> +// This pass traverses all instructions in loop preheader and sink it to the
> +// loop body where frequency is lower than the loop's preheader.
> +// This pass is a reverse-transformation of LICM. It differs from the Sink
> +// pass that it only processes instructions in loop's preheader, and has more
> +// accurate alias/profile info to guide sinking decisions.

Some grammar issues here:

- "all instructions" -> "all of the instructions"
- "in loop preheader" -> "in the loop preheader"
- "sink it" -> "sink them"
- "the Sink pass that it only" -> "the Sink pass in that it only"
- "in loop's preheader" -> "in the loop's preheader".

I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of:

  This pass does the inverse transformation of what LICM does. It
  traverses all of the instructions ...
  
  It differs from the Sink pass ...

Does that make sense?

> LoopSink.cpp:41
> +
> +STATISTIC(NumLoopSunk, "Number of instructions sunk into loop");
> +

Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not.

> LoopSink.cpp:135-143
> +  // Sort loop's basic blocks by frequency
> +  SmallVector<BasicBlock *, 10> SortedLoopBBs;
> +  for (BasicBlock *B : L->blocks())
> +    if (BFI->getBlockFreq(B) <= BFI->getBlockFreq(L->getLoopPreheader()))
> +      SortedLoopBBs.push_back(B);
> +  std::stable_sort(SortedLoopBBs.begin(), SortedLoopBBs.end(),
> +                   [&](BasicBlock *A, BasicBlock *B) {

This shouldn't be done each time we try to sink an instruction. This should be  pre-computed once for the loop and re-used for each instruction we try to sink.

> LoopSink.cpp:148-170
> +  // For every iteration:
> +  //   * Pick the ColdestBB from SortedLoopBBs
> +  //   * Find the set BBsDominatedByColdestBB that satisfy:
> +  //     - BBsDominatedByColdestBB is a subset of BBsToSinkInto
> +  //     - Every BB in BBsDominatedByColdestBB is dominated by ColdestBB
> +  //   * If Freq(ColdestBB) < Freq(BBsDominatedByColdestBB), remove
> +  //     BBsDominatedByColdestBB from BBsToSinkInto, add ColdestBB to

This seems really expensive. By my reading this is O(N * L * M) where L is the  number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large.

Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...)

> LoopSink.cpp:189-191
> +  if (!any_of(L->blocks(), [&](const BasicBlock *BB) {
> +        return BFI->getBlockFreq(BB) <= PreheaderFreq;
> +      }))

I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities."

I would also find this easier to read without the negation as:

  if (all_of(...
        return BFI->getBlockFreq(BB) > PreheaderFreq;

> LoopSink.cpp:210
> +
> +    // All blocks that have uses of I and are in the sub loop of L.
> +    SmallPtrSet<BasicBlock *, 2> BBs;

This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me.

Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops.

> LoopSink.cpp:214
> +      Instruction *UI = cast<Instruction>(U.getUser());
> +      // If the use is phi node, we can not sink I to this BB.
> +      if (dyn_cast<PHINode>(UI) ||

If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment?

> LoopSink.cpp:215-216
> +      // If the use is phi node, we can not sink I to this BB.
> +      if (dyn_cast<PHINode>(UI) ||
> +          !L->contains(LI->getLoopFor(UI->getParent()))) {
> +        BBs.clear();

I would use two ifs here since one needs its own comment (and it is a nice comment!)

> LoopSink.cpp:216
> +      if (dyn_cast<PHINode>(UI) ||
> +          !L->contains(LI->getLoopFor(UI->getParent()))) {
> +        BBs.clear();

Why not use `L->contains(UI)`?

> LoopSink.cpp:222
> +    }
> +
> +    // Find the set of BBs that we should insert a copy of I.

Check for an empty `BBs` here to handle the case of a use that can't be handled?

This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function.

> LoopSink.cpp:234
> +
> +    for (++BI; BI != BBsToSinkInto.end(); ++BI) {
> +      BasicBlock *N = *BI;

So, this has an important problem: it introduces a non-determinism into the compiler.

The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone.

However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is *also* something we would very much like to not depend on: the use list order.

I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like:

  int i = 0;
  for (auto *BB : L->blocks())
    LoopBlockNumber[BB] = ++i;

(Just pseudo code, but you get the idea.)

That will punt the ordering requirement to LoopInfo which is I think the right place for it.

> LoopSink.cpp:250
> +      replaceDominatedUsesWith(I, IC, *DT, N);
> +      DEBUG(dbgs() << "Sinking " << I << " To: " << N->getName() << '\n');
> +      NumLoopSunk++;

I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking".

https://reviews.llvm.org/D22778





More information about the llvm-commits mailing list