[llvm] r182023 - PPC32 cannot form counter loops around i64 FP conversions

Thu May 16 15:50:08 PDT 2013

----- Original Message -----
> Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > > There are essentially two problems:
> > >
> > >  1. Generating an expression for the count.
> > >  2. Using the counter-based loop instructions.
> > >
> > > As we've discovered, (1) should really be done at the IR level.
> > > Doing
> > > this at the MI level essentially means reimplementing large parts
> > > of
> > > SE at the MI level, and the resulting expressions often contain
> > > min/maxs (which turn into selects), etc. and an entire custom
> > > code
> > > generator for these expressions is also difficult to maintain and
> > > produces suboptimal code.
> > >
> > > As for (2), this should really be done at the MI level. That's
> > > where
> > > we can really detect interfering uses of the counter register
> > > (and
> > > avoid problem such as this).
> > >
> > > So to solve problems with generating good (and correct) code for
> > > the
> > > counts, I tried moving everything from the MI level into the IR
> > > level. Maybe it is possible to do the count-expression generation
> > > at
> > > the IR level and the actual loop conversion at the MI level. I'll
> > > try to make it work this way.
> >
> > I should add: The IR-level pass can use SE to identify a countable
> > backedge, and then insert a count. The problem now becomes, at the
> > MI level, to make sure that we identify that same backedge for
> > branch conversion (and to make sure that nothing in the mean time
> > invalidates the count).
> 
> It seems to me that attempting to introduce this sort of "tight
> coupling" between an IR pass and a later MI pass will probably
> lead to problems as well.
> 
> I'd instead suggest to have two self-contained passes that are
> only loosely coupled.  First, an IR pass recognizes likely CTR
> loops and rewrites them on the IR level into counting-down loops;
> that is a loop that uses regular IR to describe a counter being
> set to an initial value, counting it down, and testing it against
> zero as condition of the loop back-edge branch.  (This
> transformation as such can never lead to wrong code generation
> no matter what happens later.  In fact, I'd assume that there
> are already loop optimizers that perform exactly this type of
> transformation ...)
> 
> Later on, an MI pass detects loops that look on the MI level like
> counting-down loops

Unfortunately, this "looks like" gets difficult, and this is what motivated me to attempt this on the IR level. In simple cases it is fine, but, as I learned from the process of porting the current Hexagon hardware loops pass, it is possible to fool the MI level pass (or you need to make the MI level pass create silly-looking code). Also, SE is much more powerful for recognising these kinds of things.

The problem that the current Hexagon pass has, for example, is that for non-constant-trip-count loops, it assumes that there is a loop guard. This means that there is some comparison that skips the loop if the count is zero or, importantly, negative. However, do-while loops have no guard, and so nothing prevents negative counts from be calculated (which causes miscompiles). So a loop do {--i} while (i > 0), which normally executes only the first iteration, could be calculated to have a large (negative) count if the 'naive' formula is used. Alternatively, we could always generate the isel (or the branch code on earlier processors), but that's silly when we have a guard. And checking for the loop guard also gets hard quickly. Because the loop guards are generated fairly early by the IR-level loop simplification pass, the guard conditions get hoisted, combined with other things, etc. (not that SE does not sometimes generate redundant guards, but at least that is common high-level code that can be improved).

I think that a robust solution has the IR-level generate count expressions, and these count expressions are somehow explicitly tied to the relevant backedge (using some intrinsic). Both the count and this backedge are transformed into pseudo instructions, and a cleanup pass either turns these things into the real mtctr/bdnz instructions (and DCEs any now-dead induction variables and compares), or DCEs the count and turns the compare into a regular instruction.

Thoughts?

Thanks again,
Hal

>, and -if possible- allocates the counter into
> the CTR register and then uses bdnz.  At this point there is no
> need to attempt to rewrite the loop if it doesn't already look
> like a counting-down loop.  (Again, this transformation is easy
> to verify and can never lead to wrong code generation.  The worst
> that could happen is that the MI no longer recognizes the loop
> or isn't able to handle it; but then it will just get emitted
> as a counting-down loop using "normal" instructions instead.)
> 
> Does this sound reasonable?
> 
> Bye,
> Ulrich
> 
>