[PATCH] D20017: Aggressive choosing best loop top

Thu May 12 08:27:51 PDT 2016

cycheng planned changes to this revision.
cycheng added a comment.

davidxl:

Yes, -force-precise-rotation-cost=true solve this issue, and the result looks better then this patch. I hope we can enable it by default.

Test on Power8 with ref data size (largest size), iteration = 1, cpu frequency governor is "performance" (maximum speed), bind on physical-cpu 0:

- llvm: r269273 (2016 May 12)
- -force-precise-rotation-cost=true v.s. false, > 1.0 means true is good.
- Result:

  400.perlbench 	1.00x
  401.bzip2     	1.00x
  403.gcc       	1.07x
  429.mcf       	1.04x
  445.gobmk     	1.02x
  456.hmmer     	1.00x
  458.sjeng     	1.01x
  462.libquantum	1.00x
  464.h264ref   	1.04x
  471.omnetpp   	1.06x
  473.astar     	0.97x
  483.xalancbmk 	0.99x



- llvm r267873 (2016 Apr 28) + this patch
- aggressive-best-top=true v.s. false, > 1.0 means true is good.
- Result:

  400.perlbench 	1.01x
  401.bzip2     	1.01x
  403.gcc       	1.07x
  429.mcf       	1.00x
  445.gobmk     	1.02x
  456.hmmer     	0.98x
  458.sjeng     	0.99x
  462.libquantum	1.03x
  464.h264ref   	1.06x
  471.omnetpp   	1.03x
  473.astar     	1.00x
  483.xalancbmk 	0.99x

So I thought I should use existing rotation strategy.

By the way, please look at this pattern:

            entry               
              |                 
  ------> loop.header (body)    
  |97%    /       \             
  |      /50%      \50%         
  --- latch <--- if.then        
         |
         |3%
     loop.end

Current strategy generates this order: entry loop.header if.then latch loop.end
But I thought this pattern is also eligible to optimize, right?

The cost of loop rotation calculated by current rotation strategy:

  BB#1 ('header') to the top: 273
  BB#4 ('if.then') to the top: 297
  BB#2 ('latch') to the top: 297


================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:932-934
@@ -929,3 +931,5 @@
 
-  Builder.CreateCondBr(Success, ExitBB, LoopBB);
+  // Set LoopBB -> ExitBB and LoopBB -> LoopBB with 50% probability, so block
+  // placement won't apply aggressive-best-top on LoopBB
+  Builder.CreateCondBr(Success, ExitBB, LoopBB, BrWeight);
 
----------------
chandlerc wrote:
> This comment doesn't really help here. What is "aggressive-best-top" and what does it mean?
> 
> I think what you're actually doing is trying to give probabilities to override the default loop probabilities because the spin loop for an atomic isn't expected to loop often if at all. I'm actually surprised you would have a 50% probability of looping here, I would have expected a relatively low probability of spinning to be more appropriate.
> 
> I think you should:
> 
> 1) split this out into an independent change with its own description and test case
> 2) improve this comment to talk generically about the fact that the loop isn't expected to be taken unless there is contention.
> 
> Then the rest of this change can be predicated on that. I'm assuming that you can test this independently because its actually annotating the IR.
Exactly! And yes, that loop isn't expected to be taken, I should use lower value.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:114-117
@@ -113,1 +113,6 @@
 
+static cl::opt<bool>
+    AggressiveBestTop("aggressive-best-top",
+    cl::desc("Find best top from all latches even with conditional exit."),
+    cl::init(true), cl::Hidden);
+
----------------
chandlerc wrote:
> Any particular reason for a flag? Or for calling this aggressive? It seems pretty straight forward to try to select the best loop top among latches.
For testing purpose :P

================
Comment at: test/CodeGen/X86/code_placement_loop_rotation2.ll:8
@@ -7,1 +7,3 @@
 ; CHECK-LABEL: foo:
+; CHECK: jmp
+; CHECK: callq h
----------------
davidxl wrote:
> By the way, this example shows that the new expected layout is not as optimal as the one when -force-precise-rotation-cost is on (see below CHECK-PROFILE). 
> 
> The total branch cost of the optimal layout [e, f, c, d, h, b, g ] is:
> 
> C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c)  + C(g->h)
> 
> while the total cost of aggressive loop top layout [h, b, g, f, c, d, e] is
> 
> C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c)  + C(g->h) + C(e->f)
> 
> Edge e->f is in the inner loop and it is a hot edge , the additional cost of C(e->f) can be high.
Thanks for pointing out this : D


http://reviews.llvm.org/D20017