[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

Sun Aug 16 17:42:16 PDT 2015

This is related to an issue in loop strength reduction [1] that I've
been trying to fix on and off for a while.  [1] has a more detailed
description of the issue and an example, but briefly put, I want LSR
to consider formulae that have "Zext T" as base and/or scale
registers, and to appropriately rate such formulae.

My first attempt[2] at fixing this was buggy and had to be reverted.
While it is certainly possible to "fix" the bug in [2], the fixed
version is ugly and fragile, and I'd prefer having something cleaner.

The key issue here is that currently there is no way to represent
"Zext T" as a *distinct computation* within SCEV -- a SCEV expression
of the form "Zext T" (any SCEV expression, actually) is always
maximally simplified on construction.  This means I cannot "factor
out" a Zext from a base register (say) and force SCEV to maintain it
as a "SCEVZeroExtend T"; SCEV is free to re-simplify "SCEVZeroExtend
T" to something equivalent on construction.  This is not just an issue
in theory -- factoring out a 'SCEVZeroExtend' from an expression 'E'
requires us to prove that 'E' == 'SCEVZeroExtend T'.  SCEV can use
that exact same fact to "simplify" 'SCEVZeroExtend T' back to 'E' on
construction.  There is an analogous issue w.r.t. sign extension
expressions.

There are two possibilities I see here -- the problem can be solved within
LoopStrengthReduce or within ScalarEvolution.

# Within LoopStrengthReduce

Solving it within LoopStrengthReduce is tricky: due to the
representation issue in SCEV mentioned above, the fact that a scale or
base register is actually a narrower than needed and has to be zero /
sign extended before use needs to be represented separately from the
SCEV*.  Every place in LoopStrengthReduce that analyzes and
manipulates registers needs to be updated to understand this bit of
side information ([2] was buggy because not every place in LSR had
been updated this way in the change).

The cleanest approach I could come up with is to introduce a `class
Register` that wraps a SCEV and some associated metadata; and change
LSR to reason about registers using this `Register` class (and not raw
SCEV pointers).  All of the "this i64 register is actually a zero
extended i32 register" can then be made to live within this `Register`
class, making the structure of the code more obvious.

# Within ScalarEvolution

Another way to fix the problem is to introduce a new kind of SCEV node
-- a `SCEVOpaqueExpr`.  `SCEVOpaqueExpr` wraps another SCEV
expression, and prevents optimizations from optimizing across the
`SCEVOpaqueExpr` boundary.  This will allow us to keep representing
registers as SCEV pointers as they are today while solving the zext
representation problem.  After factoring a SCEV expression 'E' into
'SCEVZeroExtend T', 'SCEVZeroExtend T' can be "frozen" by representing
it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'.  Since 'SCEVZeroExtend
(SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend T' ==
'E', there is no need to update the parts of LSR that analyze / modify
registers to preserve correctness.

Note: this type of "opaque" node also can be used to force the SCEV
expander to generate a particular sequence of expressions; something
that is achieved by the "expand(getSCEVUnknown(expand(Op)))" idiom in
LSR currently.

So far I'm leaning towards the second approach, but given that it would
be a substantial change (though potentially useful elsewhere), I'd like to
hear what others think before diving in.

-- Sanjoy

[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
[2]: http://reviews.llvm.org/rL243939