# [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Sun Aug 16 19:43:54 PDT 2015

```> I don't understand why you want to factor out the information,
> exactly. It seems like what you need is a function like:
>
>
> then, if you want to get the non-extended expression, you can just
> apply an appropriate truncation. I assume, however, that I'm missing
> something.

The problem is not about how to codegen a specific SCEV expression
efficiently, but about the overall solution LSR converges to (which
includes considering how it will codegen *other* SCEVs).  In the
latter sense there is a difference between `X` and `(trunc (zext X))`
even though both of them produce the same value (and possibly are
equally efficient).

The example in  is roughly like this at the high level:

You have an i32 induction variable `V` == `{VStart,+,1}`.
`{VStart,+,1}` does not unsigned-overflow given the loop bounds.  Two
interesting uses in the loop are something like (in this mail by zext
I'll always mean zext from i32 to i64) `GEP @Global, zext(V)` and
`V ult limit`.

Because SCEV can prove that `{VStart,+,1}` does not unsigned-overflow,
`GEP @Global, zext(V)` can be computed as `GEP (@Global + zext
VStart), {i64 0,+,1}`.  Similarly, `V` == `trunc (zext V)` ==
`trunc {zext VStart,+,1}` == `trunc {zext VStart,+,1}` ==
`trunc({i64 0,+,1}) + trunc (zext VStart)` ==
`trunc({i64 0,+,1}) + VStart`.  This is the solution LSR ends up
choosing.  However, the cheaper solution here is

`GEP @Global, zext(V)` == `GEP @Global, zext({VStart,+,1})`
and
`V ult limit` == `{VStart,+,1} ult limit`

(i.e. pretty much the direct naive lowering).

This naive lowering is better because we can use the increment in
`{VStart,+,1}` to zero extend the value implicitly on x86 (you can
find the generated machine code at ).

The reason why LSR does not pick this cheaper solution is because it
"forgets" that `{zext VStart,+,1}` == `zext({VStart,+,1})`.  I'm
trying to fix this by adding a rule that replaces `{zext X,+,zext Y}`
with `zext({X,+,Y})` when possible on architectures that have "free"
zero extensions, so that if there is a solution involving the narrower
induction variable, it will be found.

Now I could use the trick you mentioned in the place in LSR where it
computes the cost of a solution (i.e. when looking for "common"
registers see if one register is the zero extension of another).  I
have not tried it, but I suspect it will be fairly tricky -- a given
solution will now have more than one "cost" depending on how you
interpret specific registers (as truncations of some extended value or
as a native wide value) and you'd have to search over the possible
interpretations of a specific solution.

> Adding simplification barriers into SCEV expressions seems like a
> dangerous idea, especially since LSR is not the only user of SCEV.

So I don't intend for SCEV to produce SCEVOpaqueExpr nodes itself.
There would instead be a 'const SCEV *getOpaqueExpr(const SCEV *)'
function that clients of SCEV can use to create an opaque expression.
Then most of the simplification routines SCEV will be changed to have
the moral equivalent of 'if (isa<SCEVOpaqueExpr>(E)) continue;'.

-- Sanjoy

: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
```