[PATCH] D49151: [SimplifyIndVar] Avoid generating truncate instructions with non-hoisted Load operand.

Wed Jul 25 16:45:05 PDT 2018

az added a comment.

In https://reviews.llvm.org/D49151#1174443, @efriedma wrote:

> Your example doesn't really help make the case for this patch.  The load in that test is actually loop-invariant; we just don't figure that out until after indvars transforms the induction variable.  Probably LICM could be fixed to handle this case earlier.  Then ultimately, the multiply goes away; the extra operation you're trying to get rid of is actually the sign-extension of a PHI node created by LSR.  LSR and/or SCEV could probably be fixed so this produces an i64 PHI instead.  Either of those fixes would be more straightforward and more obviously profitable.

My example is a greatly simplified example of the actual benchmark but you are absolutely right on the suggestion that we should solve this problem in LICM. If we can do that, SimplifyIndVar would work with clean hoisted loads and LICM would be improved in general. That was my original approach too and a good portion of the work for this performance issue was spent on trying to improve LICM/AA. What I found out is that LICM uses a simple/fast but conservative Type-Based Alias Analysis (AliasSet). I would have to make major changes to that alias analysis to solve my problem and it is unlikely to have it accepted given the strong push back against adding too much complexity into AliasSet. Then, I also thought about making LICM use the more accurate but expansive MemDep alias analysis in a similar way that GVN based PRE uses it (Memdep gives good Alias Analysis info for my real benchmark). But given that the LICM pass is called numerous times, this would add substantial compile time. Given that I did not have an agreeable fix in LICM/AA, I went to the next best place to put a fix which is SimplifyIndVar. It is the next best place because 1) it is there where the truncate instruction and inefficient code is first generated, 2) it is an early pass and I prefer that we clean code early on instead of letting inefficient IR go through other passes, and 3) It makes the IndvarSimp widening optimization more solid because let's consider we have two similar C examples with the only difference being one with a Load inside the loop and the other with a Load outside the loop (LICM could not hoist because of real legality issue or because of limitation of licm such as my case). The widening generates quite different IRs for both cases even though the Load being outside or inside is irrelevant to widening itself. In other words, we want it to generate similar type of code for closely similar input IR.

https://reviews.llvm.org/D49151