[llvm] r182023 - PPC32 cannot form counter loops around i64 FP conversions

Fri May 17 03:33:12 PDT 2013

Hal Finkel <hfinkel at anl.gov> wrote on 17.05.2013 01:43:25:

> > Can you not do all that on the IR level, and if it detects a count,
> > simply rewrite the IR to a simple counting loop?
>
> The other problem is that, if you do this rewriting then you can end
> up with redundant induction variables if the transformation does not
happen.

Yes, of course, that's what will happen if you make the wrong decision.
Note that I'm not saying you *shouldn't* try to detect in advance
whether or not CTR will be available and the transformation is
possible -- of course the IR pass should have a heuristics that
attempts to detect this, probably much along the lines of your
existing code.

My only point is that I'd prefer if the overall algorithm were
structured so that the "failure mode" if that heuristic gets it
wrong in some (hopefully rare) case, is simply generation of
slightly less optimal code rather than completely broken code.

> Maybe it would be better to do it the other way? What if we always
> generate the counter-based loops, and then undo the transformation
> if we detect, at the MI level, other clobbers of the counter
> register. We might end up generating suboptimal code in these cases
> (because we could end up with effectively two induction variables
> (that are not exactly the same and so won't be CSE'd), but for loops
> with function calls, indirect jumps, etc., maybe the performance hit
> won't matter much -- but I don't really believe that :(

That seems more along the lines of what GCC does; the doloop pass
will insert a loop counter as a pseudo register if its heuristic
determines this will likely be useful, and then the register
allocator tries to allocate the pseudo to CTR, but if it fails,
we just get regular code to decrement and test the counter (in a GPR).

> In the end, this is why I decided to do this on the IR level. It is
> fragile to detect IR sequences that might become function calls, but
> as LLVM is currently constructed, and because the selection-DAG is
> basic-block local, there is a finite set of such things and we can
> audit the code generator for them. I don't *like* it, but it seems
> to be the only way that we can really generate the best code
> (without extraneous guard expressions, and without redundant GPR
> induction variables, etc.) in practice. We could certainly pair this
> with some kind of 'undo' mechanism just in case. Maybe the best
> solution is to leave things this way, but make a verification pass
> that checks for clobbers inside the loops after the fact. This would
> make it ICE instead of miscompile, but that is preferable (and we
> could turn it off in non-asserts builds).

I certainly agree that an ICE is better than a miscompile :-)

Bye,
Ulrich