[PATCH] D20017: Aggressive choosing best loop top
Chuang-Yu Cheng via llvm-commits
llvm-commits at lists.llvm.org
Thu May 12 08:27:51 PDT 2016
cycheng planned changes to this revision.
cycheng added a comment.
davidxl:
Yes, -force-precise-rotation-cost=true solve this issue, and the result looks better then this patch. I hope we can enable it by default.
Test on Power8 with ref data size (largest size), iteration = 1, cpu frequency governor is "performance" (maximum speed), bind on physical-cpu 0:
- llvm: r269273 (2016 May 12)
- -force-precise-rotation-cost=true v.s. false, > 1.0 means true is good.
- Result:
400.perlbench 1.00x
401.bzip2 1.00x
403.gcc 1.07x
429.mcf 1.04x
445.gobmk 1.02x
456.hmmer 1.00x
458.sjeng 1.01x
462.libquantum 1.00x
464.h264ref 1.04x
471.omnetpp 1.06x
473.astar 0.97x
483.xalancbmk 0.99x
- llvm r267873 (2016 Apr 28) + this patch
- aggressive-best-top=true v.s. false, > 1.0 means true is good.
- Result:
400.perlbench 1.01x
401.bzip2 1.01x
403.gcc 1.07x
429.mcf 1.00x
445.gobmk 1.02x
456.hmmer 0.98x
458.sjeng 0.99x
462.libquantum 1.03x
464.h264ref 1.06x
471.omnetpp 1.03x
473.astar 1.00x
483.xalancbmk 0.99x
So I thought I should use existing rotation strategy.
By the way, please look at this pattern:
entry
|
------> loop.header (body)
|97% / \
| /50% \50%
--- latch <--- if.then
|
|3%
loop.end
Current strategy generates this order: entry loop.header if.then latch loop.end
But I thought this pattern is also eligible to optimize, right?
The cost of loop rotation calculated by current rotation strategy:
BB#1 ('header') to the top: 273
BB#4 ('if.then') to the top: 297
BB#2 ('latch') to the top: 297
================
Comment at: lib/CodeGen/AtomicExpandPass.cpp:932-934
@@ -929,3 +931,5 @@
- Builder.CreateCondBr(Success, ExitBB, LoopBB);
+ // Set LoopBB -> ExitBB and LoopBB -> LoopBB with 50% probability, so block
+ // placement won't apply aggressive-best-top on LoopBB
+ Builder.CreateCondBr(Success, ExitBB, LoopBB, BrWeight);
----------------
chandlerc wrote:
> This comment doesn't really help here. What is "aggressive-best-top" and what does it mean?
>
> I think what you're actually doing is trying to give probabilities to override the default loop probabilities because the spin loop for an atomic isn't expected to loop often if at all. I'm actually surprised you would have a 50% probability of looping here, I would have expected a relatively low probability of spinning to be more appropriate.
>
> I think you should:
>
> 1) split this out into an independent change with its own description and test case
> 2) improve this comment to talk generically about the fact that the loop isn't expected to be taken unless there is contention.
>
> Then the rest of this change can be predicated on that. I'm assuming that you can test this independently because its actually annotating the IR.
Exactly! And yes, that loop isn't expected to be taken, I should use lower value.
================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:114-117
@@ -113,1 +113,6 @@
+static cl::opt<bool>
+ AggressiveBestTop("aggressive-best-top",
+ cl::desc("Find best top from all latches even with conditional exit."),
+ cl::init(true), cl::Hidden);
+
----------------
chandlerc wrote:
> Any particular reason for a flag? Or for calling this aggressive? It seems pretty straight forward to try to select the best loop top among latches.
For testing purpose :P
================
Comment at: test/CodeGen/X86/code_placement_loop_rotation2.ll:8
@@ -7,1 +7,3 @@
; CHECK-LABEL: foo:
+; CHECK: jmp
+; CHECK: callq h
----------------
davidxl wrote:
> By the way, this example shows that the new expected layout is not as optimal as the one when -force-precise-rotation-cost is on (see below CHECK-PROFILE).
>
> The total branch cost of the optimal layout [e, f, c, d, h, b, g ] is:
>
> C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h)
>
> while the total cost of aggressive loop top layout [h, b, g, f, c, d, e] is
>
> C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h) + C(e->f)
>
> Edge e->f is in the inner loop and it is a hot edge , the additional cost of C(e->f) can be high.
Thanks for pointing out this : D
http://reviews.llvm.org/D20017
More information about the llvm-commits
mailing list