[PATCH] D49151: [SimplifyIndVar] Avoid generating truncate instructions with non-hoisted Load operand.

Tue Jul 24 17:08:40 PDT 2018

efriedma added subscribers: mkazantsev, qcolombet.
efriedma added a comment.

Your example doesn't really help make the case for this patch.  The load in that test is actually loop-invariant; we just don't figure that out until after indvars transforms the induction variable.  Probably LICM could be fixed to handle this case earlier.  Then ultimately, the multiply goes away; the extra operation you're trying to get rid of is actually the sign-extension of a PHI node created by LSR.  LSR and/or SCEV could probably be fixed so this produces an i64 PHI instead.  Either of those fixes would be more straightforward and more obviously profitable.

================
Comment at: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp:1408
+bool WidenIV::widenVariantLoadUse(NarrowIVDefUse DU) {
+  const SCEV* ExtendOperExpr = getExtendExpr(DU);
+  if (!ExtendOperExpr)
----------------
az wrote:
> efriedma wrote:
> > Not sure the getExtendExpr helper is actually buying anything, given you have to check the extension kind anyway.
> I took the code that does the non-scalar evolution legality check (opcode = {add..}, call to hasNoSignWrap(), etc.) and put it, as is, in a function called getExtendExpr that both existing code and patch code calls it. It may have a little bit of redundancy but I can rewrite it if needed by un-putting it in a function and replicate what I need in terms of legality check.
> 
Okay, it's probably fine as-is.

================
Comment at: llvm/lib/Transforms/Scalar/IndVarSimplify.cpp:1460
+  auto *WideBO = BinaryOperator::Create(NarrowBO->getOpcode(), LHS, RHS,
+                                        NarrowBO->getName());
+  IRBuilder<> Builder(NarrowUse);
----------------
az wrote:
> efriedma wrote:
> > It probably isn't a good idea to create a new multiply without erasing the old one.
> > 
> > 
> Actually, this is cloning the use instruction and not removing it yet. The old instruction is removed by the defining instruction when we return up the call chain if it has no other use. This is my understanding of how things worked with widening the use. However, I am adding some code in the new revision to immediately remove the new instruction when not useful. I used to leave the instruction unused and hoping that it will be removed by dead code elimination.  
We need to make sure to avoid the situation where after this transform runs, there's an i32 multiply and an i64 multiply.  It's possible IV rewriting could lead to this sort of situation anyway in certain edge cases, I guess (haven't looked too closely), but multiplies are relatively expensive, and i64 multiplies are more expensive than i32 multiplies.

Actually, more generally, we probably need to weigh the extra cost of the multiply; on multiple targets, an i64 multiply is substantially slower than an i32 multiply.

https://reviews.llvm.org/D49151