[llvm] r182023 - PPC32 cannot form counter loops around i64 FP conversions

Thu May 16 16:11:57 PDT 2013

Hal Finkel <hfinkel at anl.gov> wrote on 17.05.2013 00:50:08:

> > It seems to me that attempting to introduce this sort of "tight
> > coupling" between an IR pass and a later MI pass will probably
> > lead to problems as well.
> >
> > I'd instead suggest to have two self-contained passes that are
> > only loosely coupled.  First, an IR pass recognizes likely CTR
> > loops and rewrites them on the IR level into counting-down loops;
> > that is a loop that uses regular IR to describe a counter being
> > set to an initial value, counting it down, and testing it against
> > zero as condition of the loop back-edge branch.  (This
> > transformation as such can never lead to wrong code generation
> > no matter what happens later.  In fact, I'd assume that there
> > are already loop optimizers that perform exactly this type of
> > transformation ...)
> >
> > Later on, an MI pass detects loops that look on the MI level like
> > counting-down loops
>
> Unfortunately, this "looks like" gets difficult, and this is what
> motivated me to attempt this on the IR level. In simple cases it is
> fine, but, as I learned from the process of porting the current
> Hexagon hardware loops pass, it is possible to fool the MI level
> pass (or you need to make the MI level pass create silly-looking
> code). Also, SE is much more powerful for recognising these kinds of
things.

Well, I guess my thought was that the MI pass would only even attempt
to handle the "simple cases" and ignore everything else.  Basically,
the pass would handle only code that already does "decrement vreg
by one, branch if vreg nonzero".  No need to compute counts or
anything on this level.

However, the IR pass would recognize complex cases and rewrite them
- completely on the standard IR level - so that the rewritten loop
would just happen to be one of the simple cases that the MI pass
recognizes.

> The problem that the current Hexagon pass has, for example, is that
> for non-constant-trip-count loops, it assumes that there is a loop
> guard. This means that there is some comparison that skips the loop
> if the count is zero or, importantly, negative. However, do-while
> loops have no guard, and so nothing prevents negative counts from be
> calculated (which causes miscompiles). So a loop do {--i} while (i >
> 0), which normally executes only the first iteration, could be
> calculated to have a large (negative) count if the 'naive' formula
> is used. Alternatively, we could always generate the isel (or the
> branch code on earlier processors), but that's silly when we have a
> guard. And checking for the loop guard also gets hard quickly.
> Because the loop guards are generated fairly early by the IR-level
> loop simplification pass, the guard conditions get hoisted, combined
> with other things, etc. (not that SE does not sometimes generate
> redundant guards, but at least that is common high-level code that
> can be improved).

Can you not do all that on the IR level, and if it detects a count,
simply rewrite the IR to a simple counting loop?

> I think that a robust solution has the IR-level generate count
> expressions, and these count expressions are somehow explicitly tied
> to the relevant backedge (using some intrinsic). Both the count and
> this backedge are transformed into pseudo instructions, and a
> cleanup pass either turns these things into the real mtctr/bdnz
> instructions (and DCEs any now-dead induction variables and
> compares), or DCEs the count and turns the compare into a regular
instruction.

I don't really like have the "count annotation" on the side,
since it's just duplicate information and may get out of date
with subsequent transformation.  If after computing the count,
you don't use it for a side-band annotation, but instead just
rewrite the IR using that count information, this problem
wouldn't be there ...

Bye,
Ulrich