[PATCH] D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops
Hal Finkel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 3 20:13:02 PDT 2017
hfinkel added inline comments.
================
Comment at: lib/Target/PowerPC/PPCCTRLoops.cpp:495
+ Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
+ if (Metrics.NumInsts < (6 * SchedModel.getIssueWidth()))
+ return false;
----------------
Please add a comment here that 6 is an approximate latency for the mtctr instruction.
================
Comment at: test/CodeGen/PowerPC/ctr-minmaxnum.ll:63
; QPX-LABEL: test1v:
-; QPX: mtctr
; QPX-NOT: bl fminf
----------------
hfinkel wrote:
> nemanjai wrote:
> > lei wrote:
> > > hfinkel wrote:
> > > > Please update the trip counts on these tests so we don't lose coverage.
> > > This test case verifies that for short loops with a trip count of 2, we don't generate `mtctr` for for pwr7+ but we do generate `mtctr` for bluegene. On power, these short loops are usually unrolled and never make it this far so naturally we don't see `mtctr`. However now that we put in a check to not use `mtctr` for short loops, we should not see this instruction anymore even for short loops which were not unrolled previously. As far as I can tell this should be the desired behaviour now. Unless I am missing something, I don't think we are losing any coverage here.
> > >
> > > I have added test case below to ensure we still generate the mtctr loops for trip counts >= 4.
> > Well, there are still two open questions here:
> > - Is this something we want on the A2 cores or would we **always** want an `mtctr`?
> > - can we add a "trip count >= 4" test case for the A2 core as well?
> I'm not entirely sure I understand the premise for doing exactly this in any case. I understand that short loops are generally unrolled (for all cores). However:
>
> * I imagine this is not really optimal for the A2. mtctr has a 6-cycle latency, so the loop would need to be really small in order for that to cause a hazard.
> * Even on the POWER cores, is this really a function of a small trip count along, or also having a small loop? You can have a large loop with a small trip count, and will want to do this transformation in order to reduce register pressure within the loop.
>
> In short, I suspect we'll want to restrict this to only some cores, and we'll want to use CodeMetrics to limit this to only do something on small loops. See, for example, ApproximateLoopSize in LoopUnrollPass.cpp.
Please increase the trip counts of these loops so they still test what they're supposed to test (i.e. whether we can form a ctr-based loop with these various functions calls in the loop based on our knowledge of how these will be lowered (i.e., some will be lowered using instructions, not actual calls, depending on the enabled target features, enabling us to nevertheless use the ctr-based loop)).
https://reviews.llvm.org/D38212
More information about the llvm-commits
mailing list