[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Sun Aug 16 19:43:54 PDT 2015

> I don't understand why you want to factor out the information,
> exactly. It seems like what you need is a function like:
>   unsigned getMinLeadingZeros(const SCEV *);
> then, if you want to get the non-extended expression, you can just
> apply an appropriate truncation. I assume, however, that I'm missing
> something.

The problem is not about how to codegen a specific SCEV expression
efficiently, but about the overall solution LSR converges to (which
includes considering how it will codegen *other* SCEVs).  In the
latter sense there is a difference between `X` and `(trunc (zext X))`
even though both of them produce the same value (and possibly are
equally efficient).

The example in [1] is roughly like this at the high level:

You have an i32 induction variable `V` == `{VStart,+,1}`.
`{VStart,+,1}` does not unsigned-overflow given the loop bounds.  Two
interesting uses in the loop are something like (in this mail by zext
I'll always mean zext from i32 to i64) `GEP @Global, zext(V)` and
`V ult limit`.

Because SCEV can prove that `{VStart,+,1}` does not unsigned-overflow,
`GEP @Global, zext(V)` can be computed as `GEP (@Global + zext
VStart), {i64 0,+,1}`.  Similarly, `V` == `trunc (zext V)` ==
`trunc {zext VStart,+,1}` == `trunc {zext VStart,+,1}` ==
`trunc({i64 0,+,1}) + trunc (zext VStart)` ==
`trunc({i64 0,+,1}) + VStart`.  This is the solution LSR ends up
choosing.  However, the cheaper solution here is

`GEP @Global, zext(V)` == `GEP @Global, zext({VStart,+,1})`
`V ult limit` == `{VStart,+,1} ult limit`

(i.e. pretty much the direct naive lowering).

This naive lowering is better because we can use the increment in
`{VStart,+,1}` to zero extend the value implicitly on x86 (you can
find the generated machine code at [1]).

The reason why LSR does not pick this cheaper solution is because it
"forgets" that `{zext VStart,+,1}` == `zext({VStart,+,1})`.  I'm
trying to fix this by adding a rule that replaces `{zext X,+,zext Y}`
with `zext({X,+,Y})` when possible on architectures that have "free"
zero extensions, so that if there is a solution involving the narrower
induction variable, it will be found.

Now I could use the trick you mentioned in the place in LSR where it
computes the cost of a solution (i.e. when looking for "common"
registers see if one register is the zero extension of another).  I
have not tried it, but I suspect it will be fairly tricky -- a given
solution will now have more than one "cost" depending on how you
interpret specific registers (as truncations of some extended value or
as a native wide value) and you'd have to search over the possible
interpretations of a specific solution.

> Adding simplification barriers into SCEV expressions seems like a
> dangerous idea, especially since LSR is not the only user of SCEV.

So I don't intend for SCEV to produce SCEVOpaqueExpr nodes itself.
There would instead be a 'const SCEV *getOpaqueExpr(const SCEV *)'
function that clients of SCEV can use to create an opaque expression.
Then most of the simplification routines SCEV will be changed to have
the moral equivalent of 'if (isa<SCEVOpaqueExpr>(E)) continue;'.

-- Sanjoy

[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html

More information about the llvm-dev mailing list