[PATCH] D11605: [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions

Chen Li meloli87 at gmail.com
Mon Aug 3 11:47:12 PDT 2015


chenli added a comment.

In http://reviews.llvm.org/D11605#216478, @davidxl wrote:

> The main problem with the patch is that the cold loop detection can have many false positives. With profile feedback,  cold loops can be reliably detected but not with the method described in this patch.


Do you mean by using other profiles together with BFI or using BFI differently?

> Without profile feedback, the cold loop filtering should probably not be turned on by default unless it is optimized for size.


Yes, you are correct. Initially I was thinking this could take cold loop's unswitch budget to hot loop and thus make hot loop unswitch more aggressively. But it turned out separate loops will have the same initial unswitch budget and not effect each other. So it will only optimize code size.

> When the filtering is not on, we should not pay the compile time penalty of computing  BFI unconditionally either. Cong has recently made it possible to compute BFI without relying on running the wrapper pass.


Do you mean this patch http://reviews.llvm.org/D11196 ?


================
Comment at: lib/Transforms/Scalar/LoopUnswitch.cpp:82
@@ +81,3 @@
+// The default frequency 20 was chosen based on LoopVectorize's choice of cold frequency.
+static cl::opt<unsigned>
+ColdFrequency("loop-unswitch-cold-block-frequency", cl::init(20), cl::Hidden,
----------------
reames wrote:
> Have you explored this parameter at all?
> 
> My guess would be that disabling loop unswitching should only be done for *really* cold loops.  I have no evidence, but a cool loop in a really hot function seems like something we should unswitch.  
I was thinking about that too. Maybe adding function hotness to its loops is a good idea, ie. lower this threshold for hot functions and increase it for cold functions.

================
Comment at: lib/Transforms/Scalar/LoopUnswitch.cpp:496
@@ +495,3 @@
+  // is less than ColdFrequency% of the function entry baseline frequency.
+  BlockFrequency LoopEntryFreq = BFI->getBlockFreq(loopPreheader);
+  if (LoopUnswitchWithBlockFrequency && LoopEntryFreq < ColdEntryFreq)
----------------
davidxl wrote:
> reames wrote:
> > chenli wrote:
> > > I chose to use loop preheader's frequency for coldness comparison. This will ignore the size of the loop because it does not count backedges or iteration numbers. My reasoning is that if the loop was only entered once in the pass 1000 executions, we should be confident that it will never be entered again. Therefore, no matter how many  times the loop iterates, it shouldn't affect our decision. But I could see the other way around. I am open to change it if anyone prefers the other way.
> > I feel pretty strongly that making decisions based on the number of times the loop executes rather than the number of times the header executes is likely to produce negative results.  
> > 
> > Consider:
> > while(true) {
> >   if (loop invariant) {
> >   }
> >   if (loop invariant2) {
> >   }
> >   if (loop invariant3) {
> >   }
> >   do work();
> > }
> > 
> > Your change would completely disable unswitching in this loop which seems non-ideal.  
> The hotness of the loop should be measured by the  frequency of the hottest BB in the loop relative to the function entry.
> 
> With PGO, you will need to check the execution count of the hottest BB in the loop (relative to some threshold set by the program summary).
> 
I think the hottest BB should be the header, assuming its inner loops will be processed by their own unswitch passes. 

================
Comment at: lib/Transforms/Scalar/LoopUnswitch.cpp:498
@@ +497,3 @@
+  if (LoopUnswitchWithBlockFrequency && LoopEntryFreq < ColdEntryFreq)
+    return false;
+
----------------
reames wrote:
> Given there's already handling for OptForSize within UnswitchIfProfitable, I'd like to see the handling combined.  A cleanup change to move the OptForSize check here - I think that's NFC right? - and then using that in this patch would help.
I will have a look at this.

================
Comment at: test/Transforms/LoopUnswitch/cold-loop.ll:21
@@ +20,3 @@
+continue:
+  br i1 %cond3, label %do_something, label %do_something_else  ; non-trivial condition  
+
----------------
davidxl wrote:
> reames wrote:
> > This is perhaps a bad example.  In this particular case, we could split the loop into two subloops without code growth.  Maybe tweak your example so it's more obvious why we want to avoid unswitching this?
> Unswitching duplicates the loop, so size overhead is the loop setup code (in this case probably just a branch associated with backedge).
Yes, this is not a good case. I will add some side-effect code after the first condition (so the first one is still trivial and not affected by coldness check, but unswitching the second one will grow code size).


http://reviews.llvm.org/D11605







More information about the llvm-commits mailing list